Multi-Camera Rig Initialization

A multi-camera rig is a set of cameras rigidly mounted on a common frame. The extrinsics of the rig describe the relative pose of each camera with respect to a reference camera (or a rig frame). This chapter covers the linear initialization of rig extrinsics from per-camera pose estimates.

Problem Statement

Given: Per-camera, per-view pose estimates $T_{C_{k}, T}^{(v)}$ (camera $k$ , view $v$ , target to camera).

Find: Camera-to-rig transforms ${T_{R, C_{k}}}$ and rig-to-target poses ${T_{R, T}^{(v)}}$ .

Assumptions:

All cameras observe the same calibration target simultaneously
Per-camera poses have been estimated (e.g., via PnP or homography decomposition)
The rig is rigid (camera-to-rig transforms are constant across views)

Algorithm

Reference Camera

One camera is designated as the reference camera (default: camera 0). Its camera-to-rig transform is identity:

$T_{R, C_{0}} = I$

This defines the rig frame to coincide with the reference camera frame.

Rig-to-Target from Reference

For each view $v$ , the rig-to-target pose is the reference camera's pose:

$T_{R, T}^{(v)} = T_{C_{0}, T}^{(v)}$

Camera-to-Rig via Averaging

For each non-reference camera $k$ , collect the camera-to-rig estimate from each view where both cameras have observations:

$\hat{T}_{R, C_{k}}^{(v)} = T_{R, T}^{(v)} \cdot (T_{C_{k}, T}^{(v)})^{- 1} = T_{C_{0}, T}^{(v)} \cdot (T_{C_{k}, T}^{(v)})^{- 1}$

Then average across views:

Rotation: Quaternion averaging with hemisphere correction. Before averaging, flip quaternions to the same hemisphere (since $q$ and $- q$ represent the same rotation). Then compute the mean quaternion and normalize.
Translation: Arithmetic mean of translation vectors.

$T_{R, C_{k}} = average ({\hat{T}_{R, C_{k}}^{(v)}}_{v})$

Quaternion Hemisphere Correction

Quaternions have a double-cover of SO(3): both $q$ and $- q$ represent the same rotation. Before averaging, all quaternions are flipped to the same hemisphere as the first:

$q_{i} \leftarrow {q_{i} - q_{i} if q_{i} \cdot q_{1} \geq 0 if q_{i} \cdot q_{1} < 0$

This prevents averaging artifacts where opposite quaternions cancel out.

Accuracy

The initialization accuracy depends on:

Per-camera pose accuracy: Noisy single-camera poses (from homography decomposition or PnP) propagate to the rig extrinsics
Number of views: More views improve the averaging
View diversity: Diverse viewpoints reduce systematic errors

Typical accuracy: 1-5° rotation, 5-15% translation. This is refined in the rig bundle adjustment (see Multi-Camera Rig Extrinsics).

API

#![allow(unused)]
fn main() {
let extrinsic_poses = estimate_extrinsics_from_cam_target_poses(
    &cam_se3_target,    // Vec<Vec<Option<Iso3>>>: per-view, per-camera poses
    ref_cam_idx,        // usize: reference camera index
)?;

// extrinsic_poses.cam_to_rig: Vec<Iso3>  (T_{R,C} for each camera)
// extrinsic_poses.rig_from_target: Vec<Iso3>  (T_{R,T} for each view)
}

Usage in Calibration Pipeline

In the rig extrinsics calibration pipeline:

Per-camera intrinsics are estimated and optimized independently
Per-camera poses are recovered from the calibration board
Rig extrinsics are initialized via averaging (this chapter)
Rig bundle adjustment jointly optimizes all parameters

vision-calibration Book