Multi-Camera Rig Initialization
A multi-camera rig is a set of cameras rigidly mounted on a common frame. The extrinsics of the rig describe the relative pose of each camera with respect to a reference camera (or a rig frame). This chapter covers the linear initialization of rig extrinsics from per-camera pose estimates.
Problem Statement
Given: Per-camera, per-view pose estimates (camera , view , target to camera).
Find: Camera-to-rig transforms and rig-to-target poses .
Assumptions:
- All cameras observe the same calibration target simultaneously
- Per-camera poses have been estimated (e.g., via PnP or homography decomposition)
- The rig is rigid (camera-to-rig transforms are constant across views)
Algorithm
Reference Camera
One camera is designated as the reference camera (default: camera 0). Its camera-to-rig transform is identity:
This defines the rig frame to coincide with the reference camera frame.
Rig-to-Target from Reference
For each view , the rig-to-target pose is the reference camera's pose:
Camera-to-Rig via Averaging
For each non-reference camera , collect the camera-to-rig estimate from each view where both cameras have observations:
Then average across views:
- Rotation: Quaternion averaging with hemisphere correction. Before averaging, flip quaternions to the same hemisphere (since and represent the same rotation). Then compute the mean quaternion and normalize.
- Translation: Arithmetic mean of translation vectors.
Quaternion Hemisphere Correction
Quaternions have a double-cover of SO(3): both and represent the same rotation. Before averaging, all quaternions are flipped to the same hemisphere as the first:
This prevents averaging artifacts where opposite quaternions cancel out.
Accuracy
The initialization accuracy depends on:
- Per-camera pose accuracy: Noisy single-camera poses (from homography decomposition or PnP) propagate to the rig extrinsics
- Number of views: More views improve the averaging
- View diversity: Diverse viewpoints reduce systematic errors
Typical accuracy: 1-5° rotation, 5-15% translation. This is refined in the rig bundle adjustment (see Multi-Camera Rig Extrinsics).
API
#![allow(unused)] fn main() { let extrinsic_poses = estimate_extrinsics_from_cam_target_poses( &cam_se3_target, // Vec<Vec<Option<Iso3>>>: per-view, per-camera poses ref_cam_idx, // usize: reference camera index )?; // extrinsic_poses.cam_to_rig: Vec<Iso3> (T_{R,C} for each camera) // extrinsic_poses.rig_from_target: Vec<Iso3> (T_{R,T} for each view) }
Usage in Calibration Pipeline
In the rig extrinsics calibration pipeline:
- Per-camera intrinsics are estimated and optimized independently
- Per-camera poses are recovered from the calibration board
- Rig extrinsics are initialized via averaging (this chapter)
- Rig bundle adjustment jointly optimizes all parameters