Introduction

calibration-rs is a Rust library for camera calibration — the process of estimating the internal parameters (intrinsics, lens distortion) and external parameters (pose, rig geometry, hand-eye transforms) of camera systems from observed correspondences between known 3D points and their 2D projections.

API reference

Who This Book Is For

This book targets engineers and researchers working in machine vision, robotics, and 3D reconstruction who want to:

Calibrate single cameras, stereo rigs, or multi-camera systems
Perform hand-eye calibration for cameras mounted on robot arms
Calibrate laser triangulation devices (camera + laser plane)
Understand the mathematical foundations behind calibration algorithms

We assume familiarity with linear algebra (SVD, eigendecomposition, least squares) and basic projective geometry (homogeneous coordinates, projection matrices). Lie group theory (SO(3), SE(3)) is introduced when needed.

What calibration-rs Provides

The library covers the full calibration pipeline:

Linear initialization — closed-form solvers (Zhang's method, DLT, PnP, Tsai-Lenz hand-eye, epipolar geometry) that produce approximate parameter estimates
Non-linear refinement — Levenberg-Marquardt bundle adjustment that minimizes reprojection error to sub-pixel accuracy
Session framework — a high-level API with step functions, configuration, and JSON checkpointing

Workspace Structure

calibration-rs is organized as a 5-crate Rust workspace with a layered architecture:

vision-calibration (facade)
    → vision-calibration-pipeline (sessions, workflows)
        → vision-calibration-optim (non-linear refinement)
        → vision-calibration-linear (initialization)
            → vision-calibration-core (primitives, camera models, RANSAC)

The key dependency rule: the linear and optimization crates are peers — they both depend on vision-calibration-core but not on each other. This separation keeps initialization algorithms independent of the optimization backend.

Relation to OpenCV

Readers familiar with OpenCV's calibration module will find analogous functionality throughout:

OpenCV	calibration-rs
`cv::calibrateCamera`	Planar intrinsics pipeline (Zhang init + bundle adjustment)
`cv::solvePnP`	`PnpSolver::p3p()`, `PnpSolver::dlt()`
`cv::findHomography`	`HomographySolver::dlt()`, `dlt_homography_ransac()`
`cv::findFundamentalMat`	`EpipolarSolver::fundamental_8point()`
`cv::findEssentialMat`	`EpipolarSolver::essential_5point()`
`cv::stereoCalibrate`	Rig extrinsics pipeline

calibration-rs differs from OpenCV in several ways: it is written in pure Rust, uses a composable camera model with generic type parameters, provides a backend-agnostic optimization IR, and offers a session framework with JSON checkpointing for production workflows.

Book Organization

The book is structured in seven parts:

Part I: Camera Model — the composable projection pipeline (pinhole, distortion, sensor tilt, intrinsics)
Part II: Geometric Primitives — rigid transforms, RANSAC, robust loss functions
Part III: Linear Initialization — all closed-form solvers with full mathematical derivations
Part IV: Non-Linear Optimization — Levenberg-Marquardt, manifold constraints, autodiff, the IR architecture
Part V: Calibration Workflows — end-to-end pipelines for each problem type, with both synthetic and real data examples
Part VI: Session Framework — the high-level CalibrationSession API
Part VII: Extending the Library — adding new problems, backends, and pipeline types

Each algorithm chapter includes a formal problem statement, objective function, assumptions, and a full derivation leading to the implementation.

Quickstart

This chapter walks through a minimal camera calibration using synthetic data. By the end you will have estimated camera intrinsics and lens distortion from simulated observations of a planar calibration board.

Add the Dependency

[dependencies]
vision-calibration = "0.2"

Minimal Example

The following program generates synthetic calibration data, runs the two-step calibration pipeline (linear initialization followed by non-linear refinement), and prints the results:

use anyhow::Result;
use vision_calibration::planar_intrinsics::{step_init, step_optimize};
use vision_calibration::prelude::*;
use vision_calibration::synthetic::planar;

fn main() -> Result<()> {
    // 1. Define ground truth camera
    let k_gt = FxFyCxCySkew {
        fx: 800.0, fy: 780.0, cx: 640.0, cy: 360.0, skew: 0.0,
    };
    let dist_gt = BrownConrady5 {
        k1: 0.05, k2: -0.02, k3: 0.0,
        p1: 0.001, p2: -0.001, iters: 8,
    };
    let camera = make_pinhole_camera(k_gt, dist_gt);

    // 2. Generate synthetic observations
    let board = planar::grid_points(8, 6, 0.04); // 8×6 grid, 40 mm squares
    let poses = planar::poses_yaw_y_z(6, -0.2, 0.08, 0.5, 0.05);
    let views = planar::project_views_all(&camera, &board, &poses)?;

    // 3. Create session and set input
    let dataset = PlanarDataset::new(
        views.into_iter().map(View::without_meta).collect()
    )?;
    let mut session = CalibrationSession::<PlanarIntrinsicsProblem>::new();
    session.set_input(dataset)?;

    // 4. Run calibration
    step_init(&mut session, None)?;      // Linear initialization
    step_optimize(&mut session, None)?;   // Non-linear refinement

    // 5. Inspect results
    let export = session.export()?;
    let k = export.params.intrinsics();
    println!("fx={:.1}, fy={:.1}, cx={:.1}, cy={:.1}",
             k.fx, k.fy, k.cx, k.cy);
    println!("Mean reprojection error: {:.4} px",
             export.mean_reproj_error);

    Ok(())
}

What Happens in Each Step

Step 1: `step_init` — Linear Initialization

Computes a homography from each view's 2D-3D correspondences using DLT (direct linear transform)
Estimates intrinsics $K$ from the homographies using Zhang's method
Estimates lens distortion coefficients from homography residuals
Iterates steps 2-3 to refine the joint $K$ + distortion estimate
Recovers each view's camera pose from its homography and $K$

After this step, intrinsics are typically within 10-40% of the true values — good enough to initialize non-linear optimization.

Step 2: `step_optimize` — Bundle Adjustment

Runs Levenberg-Marquardt optimization minimizing the total reprojection error:

$K, d, {T_{i}} min i = 1 \sum N j = 1 \sum M_{i} ∥ π (K, d, T_{i}, P_{j}) - p_{ij} ∥^{2}$

where $π$ is the camera projection function, $T_{i}$ is the pose of view $i$ , $d$ are distortion coefficients, $P_{j}$ are known 3D board points, and $p_{ij}$ are observed pixel coordinates.

After optimization, expect <2% error on intrinsics and <1 px mean reprojection error.

Running the Built-in Example

The library ships with this example (and several others):

cargo run -p vision-calibration --example planar_synthetic

Alternative: One-Line Pipeline

If you don't need to inspect intermediate results:

#![allow(unused)]
fn main() {
use vision_calibration::planar_intrinsics::run_calibration;

run_calibration(&mut session)?;
}

Next Steps

Architecture Overview — understand the crate structure
Composable Camera Pipeline — the projection model in detail
Planar Intrinsics Calibration — the full mathematical treatment

Architecture Overview

calibration-rs is organized as a layered workspace of Rust crates plus Python bindings. This chapter explains the dependency structure, data flow, and key design patterns.

Crate Dependency Graph

vision-calibration          (facade: re-exports everything)
    │
    └── vision-calibration-pipeline    (sessions, workflows, step functions)
            │
            ├── vision-calibration-optim   (non-linear optimization)
            │       │
            │       └── vision-calibration-core
            │
            └── vision-calibration-linear  (closed-form initialization)
                    │
                    └── vision-calibration-core

Key rule: vision-calibration-linear and vision-calibration-optim are peers. They both depend on vision-calibration-core but never on each other. This keeps initialization algorithms free of optimization dependencies and vice versa.

Crate Responsibilities

vision-calibration-core

The foundation layer providing:

Math types — nalgebra-based aliases: Pt2, Pt3, Vec3, Mat3, Iso3, Real (= f64)
Camera models — composable trait-based pipeline: ProjectionModel, DistortionModel, SensorModel, IntrinsicsModel
Observation types — CorrespondenceView (2D-3D point pairs), View<Meta>, PlanarDataset, RigDataset
RANSAC engine — generic Estimator trait with configurable options
Synthetic data utilities — grid generation, pose sampling, projection
Reprojection error computation — single-camera and multi-camera rig

vision-calibration-linear

Closed-form initialization solvers. Each produces an approximate estimate suitable for seeding non-linear optimization:

Solver	Input	Output
Zhang's method	Homographies	Intrinsics $K$
Distortion fit	$K$ + homographies	Brown-Conrady coefficients
Iterative intrinsics	Observations	Joint $K$ + distortion
Homography DLT	2D-2D correspondences	$3 \times 3$ homography
Planar pose	$K$ + homography	SE(3) pose
P3P / DLT PnP	3D-2D + $K$	SE(3) pose
5-point essential	Normalized correspondences	Essential matrix
8-/7-point fundamental	Pixel correspondences	Fundamental matrix
Tsai-Lenz hand-eye	Robot + camera motions	Hand-eye SE(3)
Rig extrinsics	Per-camera poses	Camera-to-rig SE(3)
Laser plane	Laser pixels + target poses	Plane (normal + distance)

vision-calibration-optim

Non-linear refinement with a backend-agnostic architecture:

IR (Intermediate Representation) — ProblemIR with ParamBlock and ResidualBlock types that describe optimization problems independently of any solver
Factors — generic residual functions parameterized over RealField for automatic differentiation
Backends — currently TinySolverBackend (Levenberg-Marquardt with sparse linear solvers)
Problem builders — domain-specific functions that construct IR from calibration data

vision-calibration-pipeline

The session framework providing production-ready workflows:

CalibrationSession<P: ProblemType> — generic state container with config, input, state, output, exports
Step functions — free functions operating on &mut CalibrationSession<P> (e.g., step_init, step_optimize)
Pipeline functions — convenience wrappers chaining all steps
JSON checkpointing — full serialization for session persistence
Six problem types: PlanarIntrinsicsProblem, ScheimpflugIntrinsicsProblem, SingleCamHandeyeProblem, RigExtrinsicsProblem, RigHandeyeProblem, LaserlineDeviceProblem

vision-calibration

Unified facade crate that re-exports everything through a clean module hierarchy:

#![allow(unused)]
fn main() {
use vision_calibration::prelude::*;            // Minimal planar hello-world imports
use vision_calibration::planar_intrinsics::*;  // Planar workflow
use vision_calibration::core::*;               // Math types
use vision_calibration::linear::homography;    // Linear solvers (per-module paths)
use vision_calibration::optim::RobustLoss;     // Optimization vocabulary (hand-picked)
}

vision-calibration-py

Python bindings crate (maturin/PyO3) exposing high-level session workflows.

Data Flow

Every calibration workflow follows the same pattern:

Observations (2D-3D correspondences)
    │
    ▼
Linear Initialization (vision-calibration-linear)
    │  Closed-form solvers: ~5-40% accuracy
    ▼
Non-Linear Refinement (vision-calibration-optim)
    │  Levenberg-Marquardt: <2% accuracy, <1 px reprojection
    ▼
Calibrated Parameters (K, distortion, poses, ...)

The session framework wraps this flow with configuration, state tracking, and checkpointing:

CalibrationSession::new()
    │
    ▼
session.set_input(dataset)
    │
    ▼
step_init(&mut session)      ← linear initialization
    │
    ▼
step_optimize(&mut session)  ← non-linear refinement
    │
    ▼
session.export()             ← calibrated parameters

Design Patterns

Composable Camera Model

The camera projection pipeline is built from four independent traits composed via generics:

pixel = K(sensor(distortion(projection(direction))))

Each stage can be mixed and matched. For example, a standard camera uses Pinhole + BrownConrady5 + IdentitySensor + FxFyCxCySkew, while a laser profiler might use Pinhole + BrownConrady5 + ScheimpflugParams + FxFyCxCySkew.

Backend-Agnostic Optimization

Problems are defined as an intermediate representation (IR) that is independent of any specific solver. The IR is then compiled to a solver-specific form:

Problem Builder  →  ProblemIR  →  Backend.compile()  →  Backend.solve()
                  (generic)       (solver-specific)

This allows swapping the optimization backend without changing problem definitions.

Step Functions

Calibration workflows are decomposed into discrete steps implemented as free functions. This allows:

Inspection of intermediate state between steps
Resumption from any point (via JSON checkpointing)
Customization of per-step options
Composition of steps from different problem types

Composable Camera Pipeline

A camera model maps 3D points in the camera frame to 2D pixel coordinates. In calibration-rs, this mapping is decomposed into four composable stages, each implemented as a trait:

$p_{pixel} = K \circ S \circ D \circ Π (d)$

where:

$Π$ — projection: maps a 3D direction to normalized coordinates on the image plane
$D$ — distortion: warps normalized coordinates to model lens imperfections
$S$ — sensor: applies a homography for tilted sensor planes (identity for standard cameras)
$K$ — intrinsics: scales and translates to pixel coordinates

The `Camera` Struct

The camera is a generic struct parameterized over the four model traits:

#![allow(unused)]
fn main() {
pub struct Camera<S, P, D, Sm, K>
where
    S: RealField,
    P: ProjectionModel<S>,
    D: DistortionModel<S>,
    Sm: SensorModel<S>,
    K: IntrinsicsModel<S>,
{
    pub proj: P,
    pub dist: D,
    pub sensor: Sm,
    pub k: K,
}
}

The scalar type S is generic over RealField, enabling the same camera to work with f64 for evaluation and with dual numbers for automatic differentiation.

Forward Projection

Given a 3D point $P_{c} = [X, Y, Z]^{T}$ in the camera frame, projection to pixel coordinates proceeds through the four stages:

#![allow(unused)]
fn main() {
pub fn project_point_c(&self, p_c: &Vector3<S>) -> Option<Point2<S>>
}

Projection: Compute normalized coordinates $n_{u} = Π (P_{c})$ . For pinhole: $n_{u} = [X / Z, Y / Z]^{T}$ . Returns None if the point is behind the camera ( $Z \leq 0$ ).
Distortion: Apply lens distortion $n_{d} = D (n_{u})$ . Warps normalized coordinates according to the distortion model (e.g., Brown-Conrady radial + tangential).
Sensor transform: Apply sensor model $s = S (n_{d})$ . For standard cameras this is identity; for tilted sensors it applies a homography.
Intrinsics: Map to pixels $p = K (s)$ . Applies focal lengths, principal point, and optional skew.

Inverse (Back-Projection)

The inverse maps a pixel to a ray in the camera frame:

#![allow(unused)]
fn main() {
pub fn backproject_pixel(&self, px: &Point2<S>) -> Ray<S>
}

$s = K^{- 1} (p)$ — pixel to sensor coordinates
$n_{d} = S^{- 1} (s)$ — sensor to distorted normalized
$n_{u} = D^{- 1} (n_{d})$ — undistort (iterative)
$d = Π^{- 1} (n_{u})$ — normalized to 3D direction

The result is a point on the $Z = 1$ plane in the camera frame, defining a ray from the camera center.

Common Type Aliases

For the most common configuration (pinhole + Brown-Conrady + no tilt):

#![allow(unused)]
fn main() {
type PinholeCamera = Camera<f64, Pinhole, BrownConrady5<f64>,
                            IdentitySensor, FxFyCxCySkew<f64>>;
}

The make_pinhole_camera(k, dist) convenience function constructs this type.

Projecting World Points

To project a world point through a posed camera, first transform from world to camera frame using the extrinsic pose $T_{C, W} \in SE (3)$ :

$P_{c} = T_{C, W} \cdot P_{w}$

Then apply the camera projection:

$p = camera.project_point_c (P_{c})$

Trait Composition

Each stage is an independent trait. This design allows mixing and matching:

Projection	Distortion	Sensor	Intrinsics
`Pinhole`	`NoDistortion`	`IdentitySensor`	`FxFyCxCySkew`
`Pinhole`	`BrownConrady5`	`IdentitySensor`	`FxFyCxCySkew`
`Pinhole`	`BrownConrady5`	`HomographySensor`	`FxFyCxCySkew`

Note: ScheimpflugParams stores the tilt angles and has a compile() method that produces a HomographySensor suitable for the Camera struct. See the Sensor Models chapter.

The subsequent sections detail each stage.

Pinhole Projection

The pinhole model is the simplest and most widely used camera projection. It models a camera as an ideal point through which all light rays pass — no lens, no aperture, just a geometric center of projection.

Mathematical Definition

Given a 3D point $P_{c} = [X, Y, Z]^{T}$ in the camera frame, the pinhole projection maps it to normalized image coordinates:

$n = Π (P_{c}) = [X / Z Y / Z]$

This is perspective division: the point is projected onto the $Z = 1$ plane. The projection is defined only for points in front of the camera ( $Z > 0$ ).

The inverse (back-projection) maps normalized coordinates to a 3D direction:

$Π^{- 1} (n) = n_{x} n_{y} 1$

This defines a ray from the camera center through the image point.

Camera Frame Convention

calibration-rs uses a right-handed camera frame:

$X$ : right
$Y$ : down
$Z$ : forward (optical axis)

This matches the convention used by OpenCV and most computer vision libraries. A point with positive $Z$ is in front of the camera; negative $Z$ is behind it.

The `ProjectionModel` Trait

#![allow(unused)]
fn main() {
pub trait ProjectionModel<S: RealField> {
    fn project_dir(&self, dir_c: &Vector3<S>) -> Option<Point2<S>>;
    fn unproject_dir(&self, n: &Point2<S>) -> Vector3<S>;
}
}

The Pinhole struct implements this trait:

#![allow(unused)]
fn main() {
pub struct Pinhole;

impl<S: RealField> ProjectionModel<S> for Pinhole {
    fn project_dir(&self, dir_c: &Vector3<S>) -> Option<Point2<S>> {
        if dir_c.z > S::zero() {
            Some(Point2::new(
                dir_c.x.clone() / dir_c.z.clone(),
                dir_c.y.clone() / dir_c.z.clone(),
            ))
        } else {
            None
        }
    }

    fn unproject_dir(&self, n: &Point2<S>) -> Vector3<S> {
        Vector3::new(n.x.clone(), n.y.clone(), S::one())
    }
}
}

Note the use of .clone() — this is required for compatibility with dual numbers used in automatic differentiation (see Autodiff and Generic Residual Functions).

Limitations

The pinhole model assumes:

No lens distortion — addressed by the distortion stage
No sensor tilt — addressed by the sensor stage
Central projection — all rays pass through a single point

For cameras with significant distortion (wide-angle, fisheye), the distortion model corrects for lens effects after pinhole projection to normalized coordinates.

Brown-Conrady Distortion

Real lenses introduce geometric distortion: straight lines in the world project as curves in the image. The Brown-Conrady model captures the two dominant distortion effects — radial and tangential — as polynomial functions of the distance from the optical axis.

Distortion Model

Given undistorted normalized coordinates $n_{u} = [x, y]^{T}$ , the distorted coordinates $n_{d} = [x_{d}, y_{d}]^{T}$ are:

$r^{2} = x^{2} + y^{2}$

Radial distortion (barrel/pincushion):

$α = 1 + k_{1} r^{2} + k_{2} r^{4} + k_{3} r^{6}$

Tangential distortion (decentering):

$δ_{x} = 2 p_{1} x y + p_{2} (r^{2} + 2 x^{2})$ $δ_{y} = p_{1} (r^{2} + 2 y^{2}) + 2 p_{2} x y$

Combined:

$x_{d} = x \cdot α + δ_{x}$ $y_{d} = y \cdot α + δ_{y}$

The model has five parameters: radial coefficients $(k_{1}, k_{2}, k_{3})$ and tangential coefficients $(p_{1}, p_{2})$ .

OpenCV equivalence: This is identical to OpenCV's distortPoints with the 5-parameter model (k1, k2, p1, p2, k3). Note OpenCV's parameter ordering differs from the mathematical ordering.

Physical Interpretation

$k_{1} > 0$ : barrel distortion (lines bow outward from center)
$k_{1} < 0$ : pincushion distortion (lines bow inward)
$k_{2}, k_{3}$ : higher-order radial corrections
$p_{1}, p_{2}$ : tangential distortion from imperfect lens-sensor alignment (lens elements not perfectly centered)

Typical values for industrial cameras: $∣ k_{1} ∣ \sim 0.01 - 0.3$ , $∣ k_{2} ∣ \sim 0.001 - 0.1$ , $∣ p_{1} ∣, ∣ p_{2} ∣ \sim 0.0001 - 0.01$ .

When to Fix $k_{3}$

The sixth-order radial term $k_{3} r^{6}$ is only significant for wide-angle lenses where $r$ reaches large values. For typical industrial cameras with moderate field of view:

$k_{3}$ is poorly constrained by the data and often absorbs noise
Estimating $k_{3}$ can cause overfitting, leading to worse extrapolation outside the calibration region
The library defaults to fix_k3: true in most problem configurations

Recommendation: Only estimate $k_{3}$ with high-quality data covering the full image area, or for lenses with field of view > 90°.

Undistortion (Inverse)

Given distorted coordinates $n_{d}$ , recovering undistorted coordinates $n_{u}$ requires inverting the distortion model. Since the forward model is a polynomial without a closed-form inverse, calibration-rs uses iterative fixed-point refinement:

$n_{u}^{(0)} = n_{d}$

$n_{u}^{(k + 1)} = n_{d} - (D (n_{u}^{(k)}) - n_{u}^{(k)})$

This rearranges the distortion equation to isolate $n_{u}$ and iterates to convergence. The default is 8 iterations, which provides accuracy well below sensor noise for typical distortion magnitudes.

Convergence

The fixed-point iteration converges when the Jacobian of the distortion residual has spectral radius less than 1, which holds for physically realistic distortion values (small $k_{i}$ , $p_{i}$ relative to $r$ ). For extreme distortion, more iterations may be needed via the iters parameter.

OpenCV equivalence: cv::undistortPoints performs the same iterative inversion.

The `DistortionModel` Trait

#![allow(unused)]
fn main() {
pub trait DistortionModel<S: RealField> {
    fn distort(&self, n_undist: &Point2<S>) -> Point2<S>;
    fn undistort(&self, n_dist: &Point2<S>) -> Point2<S>;
}
}

The BrownConrady5<S> struct:

#![allow(unused)]
fn main() {
pub struct BrownConrady5<S: RealField> {
    pub k1: S, pub k2: S, pub k3: S,
    pub p1: S, pub p2: S,
    pub iters: u32,   // undistortion iterations (default: 8)
}
}

NoDistortion is the identity implementation for distortion-free cameras.

Distortion in the Projection Pipeline

Distortion operates in normalized image coordinates (after pinhole projection, before intrinsics):

$pixel = K (distorted normalized D (Π (P_{c})))$

This is the standard convention: distortion is defined on the $Z = 1$ plane, not in pixel space. The advantage is that distortion parameters are independent of image resolution and focal length.

Intrinsics Matrix

The intrinsics matrix $K$ maps from the sensor coordinate system (normalized or sensor-transformed coordinates) to pixel coordinates. It encodes the camera's internal geometric properties: focal length, principal point, and optional skew.

Definition

$K = f_{x} 00 s f_{y} 0 c_{x} c_{y} 1$

where:

$f_{x}, f_{y}$ — focal lengths in pixels (horizontal and vertical). These combine the physical focal length with the pixel pitch: $f_{x} = f / Δ_{x}$ where $f$ is the focal length in mm and $Δ_{x}$ is the pixel width in mm.
$c_{x}, c_{y}$ — principal point in pixels. The projection of the optical axis onto the image plane. Typically near the image center but not exactly at it.
$s$ — skew coefficient. Non-zero only if pixel axes are not perpendicular. Effectively zero for all modern sensors; calibration-rs defaults to zero_skew: true in most configurations.

Forward and Inverse Transform

Sensor to pixel (forward):

$[u v] = [f_{x} \cdot s_{x} + s \cdot s_{y} + c_{x} f_{y} \cdot s_{y} + c_{y}]$

where $[s_{x}, s_{y}]^{T}$ are sensor coordinates (output of the distortion + sensor stages).

Pixel to sensor (inverse):

$s_{y} = \frac{v - c _{y}}{f _{y}}, s_{x} = \frac{u - c _{x} - s \cdot s _{y}}{f _{x}}$

The `IntrinsicsModel` Trait

#![allow(unused)]
fn main() {
pub trait IntrinsicsModel<S: RealField> {
    fn sensor_to_pixel(&self, sensor: &Point2<S>) -> Point2<S>;
    fn pixel_to_sensor(&self, pixel: &Point2<S>) -> Point2<S>;
}
}

The FxFyCxCySkew<S> struct implements this trait:

#![allow(unused)]
fn main() {
pub struct FxFyCxCySkew<S: RealField> {
    pub fx: S, pub fy: S,
    pub cx: S, pub cy: S,
    pub skew: S,
}
}

Coordinate Utilities

The coordinate_utils module provides convenience functions that combine intrinsics with distortion:

pixel_to_normalized(pixel, K) — apply $K^{- 1}$ to get normalized coordinates on the $Z = 1$ plane
normalized_to_pixel(normalized, K) — apply $K$ to get pixel coordinates
undistort_pixel(pixel, K, distortion) — pixel → normalized → undistort → return normalized
distort_to_pixel(normalized, K, distortion) — distort → apply $K$ → return pixel

These functions are used extensively in the linear initialization algorithms, which need to convert between pixel and normalized coordinate spaces.

Aspect Ratio

For most cameras, $f_{x} \neq = f_{y}$ because pixels are not perfectly square. The ratio $f_{x} / f_{y}$ reflects the pixel aspect ratio. Typical industrial cameras have aspect ratios very close to 1.0 (within 1-3%).

OpenCV equivalence: cv::initCameraMatrix2D provides initial intrinsics estimates. The $K$ matrix format is identical to OpenCV's cameraMatrix.

Sensor Models and Scheimpflug Tilt

In standard cameras, the sensor plane is perpendicular to the optical axis. Some industrial cameras — particularly laser triangulation profilers — deliberately tilt the sensor to achieve a larger depth of field along the laser plane (the Scheimpflug condition). The sensor model stage accounts for this tilt.

The `SensorModel` Trait

#![allow(unused)]
fn main() {
pub trait SensorModel<S: RealField> {
    fn normalized_to_sensor(&self, n: &Point2<S>) -> Point2<S>;
    fn sensor_to_normalized(&self, s: &Point2<S>) -> Point2<S>;
}
}

The sensor model sits between distortion and intrinsics in the pipeline:

$p = K (sensor coords S (D (Π (P_{c}))))$

IdentitySensor

For standard cameras with untilted sensors:

$S (n) = n, S^{- 1} (s) = s$

This is the default and most common case.

Scheimpflug Tilt Model

The Scheimpflug condition states that when the lens plane, image plane, and object plane intersect along a common line, the entire object plane is in sharp focus. Laser profilers exploit this by tilting the sensor to keep the laser line in focus.

Tilt Projection Matrix

The tilt is parameterized by two rotation angles:

$τ_{x}$ — tilt around the horizontal (X) axis
$τ_{y}$ — tilt around the vertical (Y) axis

The tilt homography is constructed as:

$R_{x} = 100 0 cos τ_{x} sin τ_{x} 0 - sin τ_{x} cos τ_{x}, R_{y} = cos τ_{y} 0 - sin τ_{y} 010 sin τ_{y} 0 cos τ_{y}$

$R = R_{y} \cdot R_{x}$

The projection onto the tilted sensor plane normalizes by the third row of $R$ applied to the point in homogeneous coordinates:

$H_{tilt} = proj_{z} \cdot R$

where $proj_{z}$ normalizes by the $Z$ -component (perspective division through the tilted plane). In practice, the first two rows of $R$ divided by the third row give a $2 \times 2$ rational transform in the image plane.

OpenCV equivalence: This matches OpenCV's computeTiltProjectionMatrix used in the 14-parameter rational model (CALIB_TILTED_MODEL).

`ScheimpflugParams`

#![allow(unused)]
fn main() {
pub struct ScheimpflugParams {
    pub tilt_x: f64,   // tau_x in radians
    pub tilt_y: f64,   // tau_y in radians
}
}

The compile() method generates a HomographySensor containing the $3 \times 3$ homography and its precomputed inverse. ScheimpflugParams::default() returns zero tilt (identity sensor).

`HomographySensor`

The general homography sensor applies an arbitrary $3 \times 3$ projective transform:

$s = dehomogenize (H \cdot [n, 1]^{T})$

$n = dehomogenize (H^{- 1} \cdot [s, 1]^{T})$

This is used as the runtime representation compiled from ScheimpflugParams.

When to Use Scheimpflug

Laser triangulation profilers: The laser plane is at an angle to the camera axis. Tilting the sensor brings the entire laser line into focus.
Machine vision with tilted object planes: When the depth of field is insufficient to keep the entire object in focus.

For standard cameras (no tilt), use IdentitySensor (the default).

Optimization of Tilt Parameters

In the non-linear optimization stage, Scheimpflug parameters $(τ_{x}, τ_{y})$ are treated as additional optimization variables. The tilt_projection_matrix_generic<T>() function in the factor module computes the tilt homography using dual numbers for automatic differentiation, enabling joint optimization of tilt with intrinsics, distortion, and poses.

Serialization and Runtime-Dynamic Types

The generic Camera<S, P, D, Sm, K> type provides compile-time composition of camera stages. However, for JSON serialization (session checkpointing, data exchange) and runtime-dynamic camera construction, calibration-rs provides an enum-based parameter system.

CameraParams

The CameraParams struct holds camera parameters as serializable enums:

#![allow(unused)]
fn main() {
pub struct CameraParams {
    pub projection: ProjectionParams,
    pub distortion: DistortionParams,
    pub sensor: SensorParams,
    pub intrinsics: IntrinsicsParams,
}
}

Each component is an enum of supported variants:

#![allow(unused)]
fn main() {
pub enum ProjectionParams { Pinhole }

pub enum DistortionParams {
    None,
    BrownConrady5 { k1, k2, k3, p1, p2 },
}

pub enum SensorParams {
    Identity,
    Scheimpflug { tilt_x, tilt_y },
}

pub enum IntrinsicsParams {
    FxFyCxCySkew { fx, fy, cx, cy, skew },
}
}

Building a Camera from Parameters

The build() method constructs a concrete Camera from serialized parameters:

#![allow(unused)]
fn main() {
let params = CameraParams { /* ... */ };
let camera = params.build();
}

This is used internally by the session framework to reconstruct cameras from JSON-checkpointed state.

Convenience Accessors

CameraParams provides typed accessors:

#![allow(unused)]
fn main() {
params.intrinsics()   // → &FxFyCxCySkew<f64>
params.distortion()   // → &BrownConrady5<f64> or zero distortion
}

Use in Session Framework

CameraParams appears throughout the pipeline:

Export records store calibrated parameters as CameraParams
Session JSON serializes intermediate and final camera parameters
Config types reference CameraParams for initial guesses

This provides a stable, version-safe serialization format decoupled from the generic type system.

Rigid Transformations and SE(3)

Camera calibration is fundamentally about estimating rigid transformations — rotations and translations that relate different coordinate frames: camera, world, robot gripper, calibration target, rig, etc. This chapter covers the rotation representations and transformation algebra used throughout calibration-rs.

Rotation Representations

Rotation Matrix $R \in SO (3)$

A $3 \times 3$ orthogonal matrix with determinant $+ 1$ :

$R^{T} R = I, det (R) = 1$

The set of all such matrices forms the special orthogonal group $SO (3)$ . It has 3 degrees of freedom (9 entries minus 6 constraints).

Unit Quaternion

A 4-element vector $q = [q_{x}, q_{y}, q_{z}, q_{w}]^{T}$ with unit norm $∥ q ∥ = 1$ . The rotation of a vector $v$ is:

$v^{'} = q \otimes v \otimes q^{*}$

where $\otimes$ is quaternion multiplication and $q^{*}$ is the conjugate.

Quaternions are preferred for optimization because they avoid gimbal lock and have simpler interpolation properties than Euler angles. calibration-rs (via nalgebra) stores quaternions as [i, j, k, w] internally, which maps to $[q_{x}, q_{y}, q_{z}, q_{w}]$ .

Axis-Angle

A rotation of angle $θ$ around unit axis $\hat{a}$ is represented as:

$ω = θ \hat{a} \in R^{3}$

with $θ = ∥ ω ∥$ and $\hat{a} = ω / θ$ .

The Rodrigues formula converts axis-angle to rotation matrix:

$R = I + \frac{sin θ}{θ} [ω]_{\times} + \frac{1 - cos θ}{θ ^{2}} [ω]_{\times}^{2}$

where $[ω]_{\times}$ is the skew-symmetric matrix of $ω$ :

$[ω]_{\times} = 0 ω_{z} - ω_{y} - ω_{z} 0 ω_{x} ω_{y} - ω_{x} 0$

Rigid Transformations: SE(3)

A rigid body transformation combines a rotation $R$ and translation $t$ :

$T = [R 0^{T} t 1] \in SE (3)$

The special Euclidean group $SE (3)$ has 6 degrees of freedom (3 rotation + 3 translation).

Applying a Transform

A point $P$ transformed by $T$ :

$P^{'} = R P + t$

or in homogeneous coordinates:

$[P^{'} 1] = T [P 1]$

Composition

Transforms compose by matrix multiplication:

$T_{A, C} = T_{A, B} \cdot T_{B, C}$

Inverse

$T^{- 1} = [R^{T} 0^{T} - R^{T} t 1]$

Naming Convention

calibration-rs uses the notation $T_{A, B}$ to mean "the transform from frame $B$ to frame $A$ ":

$P_{A} = T_{A, B} \cdot P_{B}$

Common transforms:

Symbol	Meaning
$T_{C, W}$	World to camera (camera extrinsics)
$T_{C, B}$	Calibration board to camera
$T_{R, C}$	Camera to rig (rig extrinsics)
$T_{G, C}$	Camera to gripper (hand-eye)
$T_{B, T}$	Target to base (hand-eye target pose)

nalgebra Type: `Iso3`

calibration-rs uses nalgebra's Isometry3<f64> (aliased as Iso3) for SE(3) transforms:

#![allow(unused)]
fn main() {
pub type Iso3 = nalgebra::Isometry3<f64>;
}

This stores a UnitQuaternion (rotation) and a Translation3 (translation) separately, avoiding the redundancy of a full $4 \times 4$ matrix.

SE(3) Storage for Optimization

In the optimization IR, SE(3) parameters are stored as a 7-element vector:

$[q_{x}, q_{y}, q_{z}, q_{w}, t_{x}, t_{y}, t_{z}]$

The quaternion $[q_{x}, q_{y}, q_{z}, q_{w}]$ is unit-constrained. The optimizer uses the SE(3) manifold (see Manifold Optimization) to maintain this constraint during parameter updates.

The Exponential Map

The Lie algebra $se (3)$ provides a 6-dimensional tangent space at the identity. A tangent vector $ξ = [ω, v]^{T} \in R^{6}$ maps to an SE(3) element via the exponential map:

$T = exp (ξ^{\land}) = [exp ([ω]_{\times}) 0^{T} V v 1]$

where $V$ is the left Jacobian of SO(3):

$V = I + \frac{1 - cos θ}{θ ^{2}} [ω]_{\times} + \frac{θ - sin θ}{θ ^{3}} [ω]_{\times}^{2}$

The exponential map is central to manifold optimization: parameter updates are computed in the tangent space and then retracted to the manifold (see Manifold Optimization).

RANSAC

RANSAC (Random Sample Consensus) is a robust estimation method that fits a model to data containing outliers. Unlike least-squares methods that use all data points and can be corrupted by even a single outlier, RANSAC repeatedly samples minimal subsets, fits a model to each, and selects the model with the most support (inliers).

Algorithm

Given $N$ data points and a model requiring $m$ points to fit:

Repeat for up to $k_{m a x}$ iterations:
1. Sample $m$ random data points
2. Fit a model to the $m$ -point sample (skip if degenerate)
3. Score: count inliers — points with residual $< ϵ$
4. If this model has more inliers than the current best, update the best
5. Optionally refit the model on all inliers for a tighter fit
6. Update the iteration bound dynamically based on the current inlier ratio
Return the best model and its inlier set

Dynamic Iteration Bound

After finding a model with inlier ratio $w = n_{inlier} / N$ , the number of iterations needed to find an all-inlier sample with probability $p$ is:

$k = \frac{lo g ( 1 - p )}{lo g ( 1 - w ^{m} )}$

where $m$ is the minimum sample size. As more inliers are found, $w$ increases and $k$ decreases, allowing early termination. The iteration count is clamped to $[k_{current}, k_{m a x}]$ — it can only decrease.

Example: For $m = 4$ (homography), 50% inliers, and 99% confidence: $k = \frac{l o g ( 0.01 )}{l o g ( 1 - 0. 5 ^{4} )} = \frac{- 4.6}{- 0.065} \approx 72$ iterations.

The `Estimator` Trait

calibration-rs provides a generic RANSAC engine that works with any model implementing the Estimator trait:

#![allow(unused)]
fn main() {
pub trait Estimator {
    type Datum;
    type Model;

    const MIN_SAMPLES: usize;

    fn fit(data: &[Self::Datum], sample_indices: &[usize])
        -> Option<Self::Model>;

    fn residual(model: &Self::Model, datum: &Self::Datum) -> f64;

    fn is_degenerate(data: &[Self::Datum], sample_indices: &[usize])
        -> bool { false }

    fn refit(data: &[Self::Datum], inliers: &[usize])
        -> Option<Self::Model> {
        Self::fit(data, inliers)
    }
}
}

Method	Purpose
`MIN_SAMPLES`	Minimum points for a model (e.g., 4 for homography)
`fit`	Fit model from selected points
`residual`	Distance from a point to the model
`is_degenerate`	Reject degenerate samples (e.g., collinear points for homography)
`refit`	Optional: refit model on full inlier set (may differ from `fit`)

Configuration

#![allow(unused)]
fn main() {
pub struct RansacOptions {
    pub max_iters: usize,      // Maximum iterations
    pub thresh: f64,           // Inlier distance threshold
    pub min_inliers: usize,    // Minimum consensus set size
    pub confidence: f64,       // Desired probability (0-1) for dynamic bound
    pub seed: u64,             // RNG seed for reproducibility
    pub refit_on_inliers: bool, // Refit model on full inlier set
}
}

Choosing the threshold $ϵ$ : The threshold should reflect the expected noise level. For pixel-space residuals, $ϵ = 2 - 5$ pixels is typical. For normalized-coordinate residuals, scale accordingly by the focal length.

Result

#![allow(unused)]
fn main() {
pub struct RansacResult<M> {
    pub success: bool,
    pub model: Option<M>,
    pub inliers: Vec<usize>,
    pub inlier_rms: f64,
    pub iters: usize,
}
}

Instantiated Models

RANSAC is used with several models in calibration-rs:

Model	`MIN_SAMPLES`	Residual	Usage
Homography	4	Reprojection error (pixels)	`dlt_homography_ransac`
Fundamental matrix	8	Epipolar distance	`fundamental_8point_ransac`
PnP (DLT)	6	Reprojection error (pixels)	`dlt_ransac`

Deterministic Seeding

All RANSAC calls in calibration-rs use a deterministic seed for the random number generator. This ensures:

Reproducible results across runs
Deterministic tests that do not flake
Debuggable behavior — the same input always produces the same output

The seed can be configured via RansacOptions::seed.

Best-Model Selection

When multiple models achieve the same inlier count, RANSAC selects the model with the lowest inlier RMS (root mean square of inlier residuals). This breaks ties in favor of more accurate models.

Robust Loss Functions

Standard least squares minimizes the sum of squared residuals $\sum r_{i}^{2}$ . This objective is highly sensitive to outliers: a single point with a large residual can dominate the entire cost function and corrupt the solution. Robust loss functions (M-estimators) reduce the influence of large residuals, making optimization tolerant to outliers in the data.

Problem Setup

In non-linear least squares, we minimize:

$θ min i = 1 \sum N ρ (r_{i} (θ))$

where $ρ$ is the loss function applied to each residual $r_{i}$ . The standard (non-robust) case uses $ρ (r) = \frac{1}{2} r^{2}$ .

Available Loss Functions

calibration-rs provides three robust loss functions, each parameterized by a scale $c > 0$ that controls the transition from quadratic (inlier) to robust (outlier) behavior.

Huber Loss

$ρ (r) = {\frac{1}{2} r^{2} c (∣ r ∣ - \frac{c}{2}) if ∣ r ∣ \leq c if ∣ r ∣ > c$

Properties:

Quadratic for small residuals, linear for large residuals
Continuous first derivative
Influence function: bounded at $\pm c$ — outliers contribute constant gradient, not growing

When to use: The default robust loss. Good general-purpose choice when you expect a moderate number of outliers.

Cauchy Loss

$ρ (r) = \frac{c ^{2}}{2} ln (1 + \frac{r ^{2}}{c ^{2}})$

Properties:

Grows logarithmically for large residuals (slower than linear)
Smooth everywhere
Influence function: $ψ (r) = r / (1 + r^{2} / c^{2})$ — decreases to zero for large $∣ r ∣$ , effectively down-weighting far outliers

When to use: When outliers are far from the bulk of the data and should have near-zero influence.

Arctan Loss

$ρ (r) = c^{2} arctan (\frac{r ^{2}}{c ^{2}})$

Properties:

Bounded: $ρ (r) \to \frac{π}{2} c^{2}$ as $∣ r ∣ \to \infty$
Influence function approaches zero for large residuals (redescending)

When to use: When very strong outlier rejection is needed. More aggressive than Cauchy but can make convergence harder.

Comparison

Loss	Large- $r$ growth	Outlier influence	Convergence
Quadratic ( $r^{2} /2$ )	Quadratic	Unbounded	Best
Huber	Linear	Bounded (constant)	Good
Cauchy	Logarithmic	Decreasing	Moderate
Arctan	Bounded	Approaching zero	Can be tricky

Choosing the Scale Parameter $c$

The scale $c$ sets the boundary between "inlier" and "outlier" behavior:

Too small: Treats good data as outliers, reducing effective sample size
Too large: Outliers still dominate (approaches standard least squares)
Rule of thumb: Set $c$ to the expected residual magnitude for good data points. For reprojection residuals, $c = 1 - 3$ pixels is typical.

Usage in calibration-rs

Robust losses are specified per-residual block in the optimization IR:

#![allow(unused)]
fn main() {
pub enum RobustLoss {
    None,
    Huber { scale: f64 },
    Cauchy { scale: f64 },
    Arctan { scale: f64 },
}
}

Each problem type exposes the loss function as a configuration option:

#![allow(unused)]
fn main() {
session.update_config(|c| {
    c.robust_loss = RobustLoss::Huber { scale: 2.0 };
})?;
}

The backend applies the loss function during residual evaluation, modifying both the cost and the Jacobian.

Iteratively Reweighted Least Squares (IRLS)

Under the hood, robust loss functions are typically implemented via IRLS: each residual is weighted by $w_{i} = ρ^{'} (r_{i}) / r_{i}$ , and the weighted least-squares problem is solved iteratively. The Levenberg-Marquardt backend handles this automatically.

Interaction with RANSAC

RANSAC and robust losses address outliers at different stages:

RANSAC (linear initialization): Binary inlier/outlier classification. Used during model fitting to reject gross outliers before any optimization.
Robust losses (non-linear refinement): Soft down-weighting. Used during optimization to reduce the influence of moderate outliers that passed RANSAC.

The two approaches are complementary: RANSAC handles gross outliers during initialization, while robust losses handle smaller outliers during refinement.

Why Linear Initialization Matters

Non-linear optimization is the workhorse of camera calibration: it achieves sub-pixel reprojection error by jointly optimizing all parameters. But non-linear optimizers — Levenberg-Marquardt, Gauss-Newton, and their variants — are local methods. They converge to the nearest local minimum, which is only the global minimum if the starting point is sufficiently close.

The role of linear initialization is to provide that starting point.

The Init-Then-Refine Paradigm

Every calibration workflow in calibration-rs follows a two-phase pattern:

Linear initialization: Closed-form solvers estimate approximate parameters. These are fast, deterministic, and require no initial guess — but they achieve only ~5-40% accuracy on camera parameters.
Non-linear refinement: Levenberg-Marquardt bundle adjustment minimizes reprojection error starting from the linear estimate. This converges to <2% parameter accuracy and <1 px reprojection error.

The linear estimate does not need to be highly accurate. It needs to be in the basin of convergence of the non-linear optimizer — close enough that gradient-based iteration converges to the correct solution rather than a local minimum.

Why Not Just Optimize Directly?

Without initialization, you would need to guess intrinsics ( $f_{x}, f_{y}, c_{x}, c_{y}$ ), distortion ( $k_{1}, k_{2}, p_{1}, p_{2}$ ), and all camera poses ( $R, t$ per view). For a typical 20-view calibration:

4 intrinsics + 5 distortion + 20 × 6 pose = 129 parameters
The cost landscape has many local minima
A random starting point will almost certainly diverge or converge to a wrong solution

Linear initialization eliminates this problem by computing a reasonable estimate from the data structure alone.

Linear Solvers in calibration-rs

Solver	Estimates	Method	Chapter
Homography DLT	$3 \times 3$ homography	SVD nullspace	Homography
Zhang's method	Intrinsics $K$	IAC constraints	Zhang
Distortion fit	$k_{1}, k_{2}, k_{3}, p_{1}, p_{2}$	Linear LS on residuals	Distortion Fit
Iterative intrinsics	$K$ + distortion	Alternating refinement	Iterative
Planar pose	$R, t$ from $H$	Decomposition + SVD	Planar Pose
P3P (Kneip)	$R, t$ from 3 points	Quartic polynomial	PnP
DLT PnP	$R, t$ from $\geq 6$ points	SVD + SO(3) projection	PnP
8-point fundamental	$F$ matrix	SVD + rank constraint	Epipolar
7-point fundamental	$F$ matrix	Cubic polynomial	Epipolar
5-point essential	$E$ matrix	Action matrix eigenvalues	Epipolar
Camera matrix DLT	$P = K [R ∥ t]$	SVD + RQ decomposition	Camera Matrix
Triangulation	3D point	SVD nullspace	Triangulation
Tsai-Lenz	Hand-eye $T_{G, C}$	Quaternion + linear LS	Hand-Eye
Rig extrinsics	$T_{R, C}$ per camera	SE(3) averaging	Rig Init
Laser plane	Plane normal + distance	SVD on covariance	Laser Init

Common Mathematical Tools

Most linear solvers share a common toolkit:

SVD (Singular Value Decomposition): Solves homogeneous systems $A x = 0$ via the right singular vector corresponding to the smallest singular value. This is the core of DLT methods.
Hartley normalization: Conditions the DLT system for numerical stability by centering and scaling the input points.
Polynomial root finding: Minimal solvers (P3P, 7-point F, 5-point E) reduce to polynomial equations.

The following chapters derive each algorithm in detail.

Hartley Normalization

DLT algorithms build large linear systems from point coordinates and solve them via SVD. When the coordinates span vastly different scales (e.g., pixel values in the hundreds vs. homogeneous scale factor of 1), the resulting system is ill-conditioned: small perturbations in the data cause large changes in the solution. Hartley normalization is a simple preprocessing step that dramatically improves numerical stability.

The Problem

Consider the homography DLT: we build a $2 N \times 9$ matrix $A$ from pixel coordinates $(u, v)$ and solve $A h = 0$ . If $(u, v) \sim (640, 480)$ , the entries of $A$ range from $\sim 1$ to $\sim 64 0^{2} \approx 4 \times 1 0^{5}$ . The condition number of $A$ becomes large, and the SVD solution is sensitive to floating-point errors.

The Solution

Normalize the input points so they are centered at the origin with a controlled scale, solve the system, then denormalize the result.

2D Normalization

Given $N$ points ${p_{i} = (x_{i}, y_{i})}$ :

Centroid: $\overset{x}{ˉ} = \frac{1}{N} \sum x_{i}$ , $\overset{y}{ˉ} = \frac{1}{N} \sum y_{i}$
Mean distance from centroid: $\overset{ˉ}{d} = \frac{1}{N} \sum (x_{i} - \overset{x}{ˉ})^{2} + (y_{i} - \overset{y}{ˉ})^{2}$
Scale factor: $s = 2 / \overset{ˉ}{d}$
Normalization matrix:

$T = s 00 0 s 0 - s \overset{x}{ˉ} - s \overset{y}{ˉ} 1$

Normalized points: $\tilde{p}_{i} = T x_{i} y_{i} 1$

After normalization, the points have centroid at the origin and mean distance $2$ from the origin.

3D Normalization

For 3D points ${P_{i} = (X_{i}, Y_{i}, Z_{i})}$ , the analogous normalization uses:

Mean distance target: $3$ (instead of $2$ )
A $4 \times 4$ normalization matrix $T_{3D}$

Denormalization

After solving the normalized system, the result must be denormalized. For a homography estimated from normalized points:

$H = T_{image}^{- 1} \cdot H_{norm} \cdot T_{world}$

For a fundamental matrix:

$F = T_{2}^{T} \cdot F_{norm} \cdot T_{1}$

The denormalization formula depends on the specific problem. Each chapter states the appropriate denormalization.

When It Is Used

Hartley normalization is applied in every DLT-based solver in calibration-rs:

Solver	Normalized spaces
Homography DLT	World 2D + image 2D
Fundamental 8-point	Image 1 2D + image 2 2D
Essential 5-point	Image 1 2D + image 2 2D
Camera matrix DLT	World 3D + image 2D
PnP DLT	World 3D (image points use normalized coordinates via $K^{- 1}$ )

Reference

Hartley, R.I. (1997). "In Defense of the Eight-Point Algorithm." IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(6), 580-593.

The normalization technique is described in Algorithm 4.2 of Hartley & Zisserman (2004).

Homography Estimation (DLT)

A homography is a $3 \times 3$ projective transformation between two planes. In calibration, the most common case is the mapping from a planar calibration board to the image plane. Homography estimation is the first step in Zhang's calibration method.

Problem Statement

Given: $N \geq 4$ point correspondences ${(x_{i}, x_{i}^{'})}$ where $x_{i}$ are points on the world plane and $x_{i}^{'}$ are the corresponding image points.

Find: $H \in R^{3 \times 3}$ such that $x_{i}^{'} \sim H x_{i}$ (equality up to scale) for all $i$ .

Assumptions:

The world points are coplanar (the calibration board is flat)
Correspondences are correct (or RANSAC is used to handle outliers)
At least 4 non-collinear point correspondences

Derivation

From Projective Relation to Linear System

The relation $x^{'} \sim H x$ in homogeneous coordinates means:

$u^{'} v^{'} 1 \sim h_{11} h_{21} h_{31} h_{12} h_{22} h_{32} h_{13} h_{23} h_{33} x y 1$

Expanding and eliminating the scale factor using cross-product (the constraint $x^{'} \times H x = 0$ ), we get two independent equations per correspondence:

$- x \cdot h_{11} - y \cdot h_{12} - h_{13} + u^{'} (x \cdot h_{31} + y \cdot h_{32} + h_{33}) = 0$

$- x \cdot h_{21} - y \cdot h_{22} - h_{23} + v^{'} (x \cdot h_{31} + y \cdot h_{32} + h_{33}) = 0$

The DLT System

Stacking all $N$ correspondences into a $2 N \times 9$ matrix $A$ :

$A = - x_{1} 0 ⋮ - y_{1} 0 - 1 0 0 - x_{1} 0 - y_{1} 0 - 1 u_{1}^{'} x_{1} v_{1}^{'} x_{1} u_{1}^{'} y_{1} v_{1}^{'} y_{1} u_{1}^{'} v_{1}^{'} ⋮$

where $h = [h_{11}, h_{12}, h_{13}, h_{21}, h_{22}, h_{23}, h_{31}, h_{32}, h_{33}]^{T}$ .

Solving via SVD

We seek $h$ minimizing $∥ A h ∥^{2}$ subject to $∥ h ∥ = 1$ (to avoid the trivial solution $h = 0$ ). This is the standard homogeneous least squares problem, solved by the SVD of $A$ :

$A = U Σ V^{T}$

The solution is the last column of $V$ (corresponding to the smallest singular value). Reshaping this 9-vector into a $3 \times 3$ matrix gives $H$ .

Normalization and Denormalization

The complete algorithm with Hartley normalization:

Normalize world and image points: $(x_{i}, T_{w})$ and $(x_{i}^{'}, T_{i})$
Build the $2 N \times 9$ matrix $A$ from normalized points
Solve via SVD to get $H_{norm}$
Denormalize: $H = T_{i}^{- 1} \cdot H_{norm} \cdot T_{w}$
Scale: $H \leftarrow H / H_{33}$

Degrees of Freedom

$H$ is a $3 \times 3$ matrix with 9 entries, but it is defined up to scale, giving 8 degrees of freedom. Each correspondence provides 2 equations, so the minimum is $N = 4$ correspondences (giving $2 \times 4 = 8$ equations for 8 unknowns).

With $N > 4$ , the system is overdetermined and the SVD solution minimizes the algebraic error in a least-squares sense.

RANSAC Wrapper

For real data with potential mismatched correspondences, dlt_homography_ransac wraps the DLT solver in RANSAC:

#![allow(unused)]
fn main() {
let (H, inliers) = dlt_homography_ransac(&world_pts, &image_pts, &opts)?;
}

MIN_SAMPLES = 4
Residual: Euclidean reprojection error in pixels: $∥ dehomogenize (H x) - x^{'} ∥$
Degeneracy check: Tests if the first 3 world points are collinear (a degenerate configuration for homography estimation)

OpenCV equivalence: cv::findHomography with RANSAC method.

API

#![allow(unused)]
fn main() {
// Direct DLT (all points assumed inliers)
let H = dlt_homography(&world_2d, &image_2d)?;

// With RANSAC
let opts = RansacOptions {
    max_iters: 1000,
    thresh: 3.0,   // pixels
    confidence: 0.99,
    ..Default::default()
};
let (H, inliers) = dlt_homography_ransac(&world_2d, &image_2d, &opts)?;
}

Geometric Interpretation

For a planar calibration board at $Z = 0$ in world coordinates, the $3 \times 4$ camera projection matrix $P = K [R ∣ t]$ reduces to a $3 \times 3$ homography:

$H \sim K [r_{1} r_{2} t]$

where $r_{1}, r_{2}$ are the first two columns of $R$ . This relationship is the foundation of Zhang's calibration method (next chapter).

Zhang's Intrinsics from Homographies

Zhang's method is the most widely used technique for estimating camera intrinsics from a planar calibration board. Given homographies from multiple views of the board, it recovers the intrinsics matrix $K$ using constraints from the image of the absolute conic (IAC).

Problem Statement

Given: $M \geq 3$ homographies ${H_{k}}_{k = 1}^{M}$ from a planar calibration board to the image, computed from different viewpoints.

Find: Camera intrinsics $K = f_{x} 00 s f_{y} 0 c_{x} c_{y} 1$ .

Assumptions:

The calibration board is planar
At least 3 homographies from distinct viewpoints
The camera intrinsics are constant across all views
Distortion is negligible (or has been accounted for — see Iterative Intrinsics)

Derivation

Homography and Intrinsics

From the previous chapter, the homography from a $Z = 0$ board plane is:

$H = λ K [r_{1} r_{2} t]$

where $λ$ is an arbitrary scale and $r_{1}, r_{2}$ are columns of the rotation matrix. Let $H = [h_{1} h_{2} h_{3}]$ . Then:

$h_{1} = λ K r_{1}, h_{2} = λ K r_{2}$

Orthogonality Constraints

Since $r_{1}$ and $r_{2}$ are columns of a rotation matrix, they satisfy:

$r_{1}^{T} r_{2} = 0 (orthogonality)$ $r_{1}^{T} r_{1} = r_{2}^{T} r_{2} (equal norm)$

Substituting $r_{i} = \frac{1}{λ} K^{- 1} h_{i}$ :

$h_{1}^{T} K^{- T} K^{- 1} h_{2} = 0$ $h_{1}^{T} K^{- T} K^{- 1} h_{1} = h_{2}^{T} K^{- T} K^{- 1} h_{2}$

The Image of the Absolute Conic (IAC)

Define the symmetric matrix:

$B = K^{- T} K^{- 1} = B_{11} B_{12} B_{13} B_{12} B_{22} B_{23} B_{13} B_{23} B_{33}$

Since $B$ is symmetric, it has 6 independent entries. The constraints become:

$h_{1}^{T} B h_{2} = 0$ $h_{1}^{T} B h_{1} - h_{2}^{T} B h_{2} = 0$

Each homography gives 2 linear equations in the 6 unknowns of $B$ .

The $v_{ij}$ Vectors

To write the constraints as a linear system, define the 6-vector:

$v_{ij} = h_{1 i} h_{1 j} h_{1 i} h_{2 j} + h_{2 i} h_{1 j} h_{2 i} h_{2 j} h_{3 i} h_{1 j} + h_{1 i} h_{3 j} h_{3 i} h_{2 j} + h_{2 i} h_{3 j} h_{3 i} h_{3 j}$

where $h_{ki}$ is the $(k, i)$ -th element of $H$ (1-indexed, column $i$ , row $k$ ). Then:

$h_{i}^{T} B h_{j} = v_{ij}^{T} b$

where $b = [B_{11}, B_{12}, B_{22}, B_{13}, B_{23}, B_{33}]^{T}$ .

The Linear System

For each homography $H_{k}$ , the two constraints become:

$[v_{12}^{T} (v_{11} - v_{22})^{T}] b = 0$

Stacking $M$ homographies gives a $2 M \times 6$ system:

$V b = 0$

With $M \geq 3$ (giving $\geq 6$ equations), this is solved via SVD: $b$ is the right singular vector of $V$ corresponding to the smallest singular value.

Extracting Intrinsics from $B$

Given $b = [B_{11}, B_{12}, B_{22}, B_{13}, B_{23}, B_{33}]$ , the intrinsics are recovered by a Cholesky-like factorization of $B = K^{- T} K^{- 1}$ .

The closed-form extraction:

$v_{0} = \frac{B _{12} B _{13} - B _{11} B _{23}}{B _{11} B _{22} - B _{12}^{2}}$

$λ = B_{33} - \frac{B _{13}^{2} + v _{0} ( B _{12} B _{13} - B _{11} B _{23} )}{B _{11}}$

$f_{x} = \frac{λ}{B _{11}}$

$f_{y} = \frac{λ B _{11}}{B _{11} B _{22} - B _{12}^{2}}$

$s = - \frac{B _{12} f _{x}^{2} f _{y}}{λ}$

$c_{x} = \frac{s \cdot v _{0}}{f _{y}} - \frac{B _{13} f _{x}^{2}}{λ}$

$c_{y} = v_{0}$

Validity Checks

The extraction can fail when:

$B_{11} B_{22} - B_{12}^{2} \approx 0$ (degenerate: insufficient view diversity)
$∣ B_{11} ∣ \approx 0$
$λ$ and $B_{11}$ have different signs (negative focal length squared)

These cases indicate that the input homographies do not sufficiently constrain the intrinsics, typically because the views are too similar (e.g., all near-frontal).

Minimum Number of Views

$M = 3$ homographies: 6 equations for 6 unknowns — the minimum for a unique solution (with skew)
$M = 2$ : Only 4 equations; requires the additional assumption that $s = 0$ (zero skew) to reduce unknowns to 5
$M > 3$ : Overdetermined system; SVD gives the least-squares solution

Practical Considerations

View diversity: The views should include significant rotation around both axes. Pure translations or rotations around a single axis lead to degenerate configurations where some intrinsic parameters are undetermined.

Distortion: Zhang's method assumes distortion-free observations. When applied to distorted pixels, the estimated $K$ is biased. The Iterative Intrinsics chapter addresses this with an alternating refinement scheme.

OpenCV equivalence: Zhang's method is the internal initialization step of cv::calibrateCamera. OpenCV does not expose it as a separate API.

API

#![allow(unused)]
fn main() {
let K = estimate_intrinsics_from_homographies(&homographies)?;
}

Returns FxFyCxCySkew with the estimated intrinsics. Typically followed by distortion estimation and non-linear refinement.

Reference

Zhang, Z. (2000). "A Flexible New Technique for Camera Calibration." IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11), 1330-1334.

Distortion Estimation from Homography Residuals

After estimating intrinsics $K$ via Zhang's method (which assumes distortion-free observations), the residuals between observed and predicted pixel positions encode the lens distortion. This chapter derives a linear method to estimate Brown-Conrady distortion coefficients from these residuals.

Problem Statement

Given:

Camera intrinsics $K$ (from Zhang's method)
$M$ homographies ${H_{k}}$ from the calibration board to the image
Point correspondences: board points ${P_{j}}$ and observed pixels ${p_{kj}}$

Find: Distortion coefficients $(k_{1}, k_{2}, k_{3}, p_{1}, p_{2})$ .

Assumptions:

$K$ is a reasonable estimate (possibly biased by distortion)
Distortion is moderate (the linear approximation holds)
The homographies were computed from distorted pixel observations

Derivation

Ideal vs. Observed Coordinates

For each correspondence in view $k$ , the ideal (undistorted) pixel position predicted by the homography is:

$\tilde{p} = dehomogenize (H_{k} \cdot [P_{j}, 1]^{T})$

Convert both observed and ideal pixels to normalized coordinates:

$n_{obs} = K^{- 1} [p_{obs} 1], n_{ideal} = K^{- 1} [\tilde{p} 1]$

The residual in normalized coordinates encodes the distortion effect:

$Δ n = n_{obs} - n_{ideal} \approx D (n_{ideal}) - n_{ideal}$

Linearized Distortion Model

The Brown-Conrady distortion applied to point $n = [x, y]^{T}$ with $r^{2} = x^{2} + y^{2}$ produces a displacement:

$Δ x = x (k_{1} r^{2} + k_{2} r^{4} + k_{3} r^{6}) + 2 p_{1} x y + p_{2} (r^{2} + 2 x^{2})$ $Δ y = y (k_{1} r^{2} + k_{2} r^{4} + k_{3} r^{6}) + p_{1} (r^{2} + 2 y^{2}) + 2 p_{2} x y$

This is linear in the distortion coefficients $(k_{1}, k_{2}, k_{3}, p_{1}, p_{2})$ .

Building the Linear System

For each correspondence, write the $x$ and $y$ components:

$[x r^{2} y r^{2} x r^{4} y r^{4} x r^{6} y r^{6} 2 x y r^{2} + 2 y^{2} r^{2} + 2 x^{2} 2 x y] k_{1} k_{2} k_{3} p_{1} p_{2} = [Δ x Δ y]$

where $x, y$ are the ideal normalized coordinates and $Δ x, Δ y$ are the observed residuals.

Stacking all correspondences from all views gives an overdetermined system:

$A c = b$

where $A$ is $2 NM \times 5$ (or smaller if some coefficients are fixed), and $c = [k_{1}, k_{2}, k_{3}, p_{1}, p_{2}]^{T}$ .

Solving

The least-squares solution is:

$c = (A^{T} A)^{- 1} A^{T} b$

In practice, this is solved via SVD of $A$ for numerical stability.

Options

#![allow(unused)]
fn main() {
pub struct DistortionFitOptions {
    pub fix_tangential: bool,  // Fix p1 = p2 = 0
    pub fix_k3: bool,          // Fix k3 = 0 (default: true)
    pub iters: u32,            // Undistortion iterations
}
}

fix_k3 (default true): Removes $k_{3}$ from the system, reducing to a 4-parameter or 2-parameter model. Recommended unless the lens has strong higher-order radial distortion.
fix_tangential (default false): Removes $p_{1}, p_{2}$ , estimating only radial distortion. Useful when tangential distortion is known to be negligible.

When parameters are fixed, the corresponding columns are removed from $A$ .

Accuracy

This linear estimate is typically within 10-50% of the true distortion values. The accuracy depends on:

Quality of $K$ : If intrinsics are biased (from distorted observations), the estimated distortion absorbs some of the bias
Number of points: More correspondences improve the overdetermined system
Point distribution: Points across the full image area constrain distortion better than points clustered near the center (where distortion is small)

This accuracy is sufficient for initializing non-linear refinement.

API

#![allow(unused)]
fn main() {
let dist = estimate_distortion_from_homographies(
    &k_matrix, &views, opts
)?;
}

Returns BrownConrady5 with the estimated coefficients.

OpenCV equivalence: OpenCV estimates distortion jointly with intrinsics inside cv::calibrateCamera. The separate distortion estimation step is specific to calibration-rs's initialization approach.

Iterative Intrinsics + Distortion

Zhang's method assumes distortion-free observations. When applied to images from a camera with significant distortion, the estimated intrinsics are biased because the distorted pixels violate the linear homography model. The iterative intrinsics solver addresses this by alternating between intrinsics estimation and distortion correction.

Problem Statement

Given: $M$ views of a planar calibration board, with observed (distorted) pixel coordinates.

Find: Camera intrinsics $K$ and distortion coefficients $(k_{1}, k_{2}, k_{3}, p_{1}, p_{2})$ jointly.

Assumptions:

The observations are distorted (raw detector output)
Distortion is moderate enough that Zhang's method on distorted pixels gives a usable initial $K$
1-3 iterations of alternating refinement suffice

Why Alternation Works

The joint estimation of $K$ and distortion is a non-convex problem. However, it decomposes naturally:

Given $K$ : Distortion estimation is a linear problem (see Distortion Fit)
Given distortion: Undistorting the pixels and re-estimating $K$ is a linear problem (Zhang's method)

Each step solves a convex subproblem. The alternation converges because:

The initial Zhang estimate (ignoring distortion) is typically within the basin where the alternation contracts
Each step reduces the residual between the model and observations
The coupling between $K$ and distortion is relatively weak for moderate distortion

Algorithm

$Iterative Intrinsics Estimation$

Input: Views ${(P_{j}, p_{kj})}$ with distorted observations $p_{kj}$

Output: Intrinsics $K$ , distortion $d$

Compute homographies ${H_{k}}$ from distorted pixels via DLT
Estimate initial $K^{(0)}$ from ${H_{k}}$ via Zhang's method
For $i = 1, \dots, n_{iter}$ :

a. Estimate distortion $d^{(i)}$ from homography residuals using $K^{(i - 1)}$

b. Undistort all observed pixels: $\hat{p}_{kj} = undistort (p_{kj}, K^{(i - 1)}, d^{(i)})$

c. Recompute homographies ${H_{k}^{(i)}}$ from undistorted pixels

d. Re-estimate $K^{(i)}$ from ${H_{k}^{(i)}}$ via Zhang's method

e. (Optional) Enforce zero skew: $K_{12}^{(i)} = 0$
Return $K^{(n_{iter})}, d^{(n_{iter})}$

Convergence

Typically 1-2 iterations suffice:

Iteration 0 (Zhang on distorted pixels): 10-40% intrinsics error
Iteration 1: Distortion estimate corrects the dominant radial effect; intrinsics error drops to 5-20%
Iteration 2: Further refinement; diminishing returns

More iterations are safe but rarely improve the estimate significantly. The default is 2 iterations.

Configuration

#![allow(unused)]
fn main() {
pub struct IterativeIntrinsicsOptions {
    pub iterations: usize,                    // 1-3 typical (default: 2)
    pub distortion_opts: DistortionFitOptions, // Controls fix_k3, fix_tangential, iters
    pub zero_skew: bool,                      // Force skew = 0 (default: true)
}
}

The Undistortion Step

Step 3b converts distorted pixels back to "ideal" pixels by inverting the distortion model:

Convert distorted pixel to normalized coordinates: $n_{d} = K^{- 1} [p, 1]^{T}$
Undistort using the current distortion estimate: $n_{u} = D^{- 1} (n_{d})$ (iterative fixed-point, see Brown-Conrady Distortion)
Convert back to pixel: $\hat{p} = K \cdot [n_{u}, 1]^{T}$

Note that this uses $K$ both for normalization and for converting back — the undistorted pixels are in the same coordinate system as the original distorted pixels, just without the distortion effect.

Accuracy Expectations

Stage	Typical intrinsics error
Zhang on distorted pixels	10-40%
After 1 iteration	5-20%
After 2 iterations	5-15%
After non-linear refinement	<2%

The iterative linear estimate is not meant to be highly accurate. Its purpose is to provide a starting point good enough for the non-linear optimizer to converge.

API

#![allow(unused)]
fn main() {
use vision_calibration::linear::iterative_intrinsics::*;
use vision_calibration::linear::distortion_fit::DistortionFitOptions;

let opts = IterativeIntrinsicsOptions {
    iterations: 2,
    distortion_opts: DistortionFitOptions {
        fix_k3: true,
        fix_tangential: false,
        iters: 8,
    },
    zero_skew: true,
};

let camera = estimate_intrinsics_iterative(&dataset, opts)?;
// camera is PinholeCamera = Camera<f64, Pinhole, BrownConrady5, IdentitySensor, FxFyCxCySkew>
// camera.k: FxFyCxCySkew (intrinsics)
// camera.dist: BrownConrady5 (distortion)
}

Pose from Homography

Given camera intrinsics $K$ and a homography $H$ from a planar calibration board to the image, we can decompose $H$ to recover the camera pose (rotation $R$ and translation $t$ ) relative to the board.

Problem Statement

Given: Intrinsics $K$ and homography $H$ such that $p \sim H \cdot [P_{x y}, 1]^{T}$ for board points at $Z = 0$ .

Find: Rigid transform $T_{C, B} = [R ∣ t]$ (board to camera).

Assumptions:

The board lies at $Z = 0$ in world coordinates
$K$ is known (or has been estimated)
The homography $H$ was computed from correct correspondences

Derivation

Extracting Rotation and Translation

Recall from the Homography chapter:

$H \sim K [r_{1} r_{2} t]$

where $r_{1}, r_{2}$ are the first two columns of the rotation matrix and $t$ is the translation.

Removing the intrinsics:

$K^{- 1} H = λ [r_{1} r_{2} t]$

Let $[a_{1} a_{2} a_{3}] = K^{- 1} H$ . The scale factor $λ$ is recovered from the constraint that $r_{1}$ and $r_{2}$ have unit norm:

$λ = \frac{2}{∥ a _{1} ∥ + ∥ a _{2} ∥}$

Then:

$r_{1} = λ a_{1}, r_{2} = λ a_{2}, t = λ a_{3}$

The third rotation column is:

$r_{3} = r_{1} \times r_{2}$

Projecting onto SO(3)

Due to noise, the matrix $R_{approx} = [r_{1} r_{2} r_{3}]$ is not exactly orthonormal. We project it onto $SO (3)$ using SVD:

$R_{approx} = U Σ V^{T}$

$R = U V^{T}$

If $det (R) = - 1$ , flip the sign of the third column of $U$ and recompute.

Ensuring Forward-Facing

If $t_{z} < 0$ , the board is behind the camera. In this case, flip the sign of both $R$ and $t$ :

$R \leftarrow - R, t \leftarrow - t$

This resolves the sign ambiguity inherent in the scale factor $λ$ .

Accuracy

The pose from homography is an approximate estimate because:

The homography itself is subject to noise (DLT algebraic error minimization, not geometric)
The SVD projection onto SO(3) corrects non-orthogonality but introduces additional error
Distortion (if not corrected) biases the homography

Typical rotation error: 1-5°. Typical translation direction error: 5-15%. These estimates are refined in non-linear optimization.

OpenCV equivalence: cv::decomposeHomographyMat provides a similar decomposition, returning up to 4 pose candidates. calibration-rs returns a single pose by resolving ambiguities via the forward-facing constraint.

API

#![allow(unused)]
fn main() {
let pose = estimate_planar_pose_from_h(&K, &H)?;
// pose: Iso3 (T_C_B: board to camera transform)
}

Usage in Calibration Pipeline

In the planar intrinsics calibration pipeline, pose estimation is applied to every view after $K$ has been estimated:

Compute homography $H_{k}$ per view
Estimate $K$ from all homographies (Zhang's method)
Decompose each $H_{k}$ to get pose $T_{C, B}^{(k)}$
Use poses as initial values for non-linear optimization

Perspective-n-Point Solvers

The Perspective-n-Point (PnP) problem estimates the camera pose from $n$ known 3D-2D point correspondences. Unlike homography-based pose estimation, PnP does not require coplanar points.

Problem Statement

Given: $n$ 3D world points ${P_{i}}$ and their corresponding 2D image points ${p_{i}}$ , plus camera intrinsics $K$ .

Find: Camera pose $T_{C, W} = [R ∣ t] \in SE (3)$ such that $p_{i} \sim K (R P_{i} + t)$ .

Assumptions:

Camera intrinsics $K$ are known
Correspondences are correct (or RANSAC is used)
Points are not degenerate (e.g., not all collinear)

P3P: Kneip's Minimal Solver

The P3P solver uses exactly 3 correspondences — the minimum for a finite number of solutions. It returns up to 4 candidate poses.

Algorithm

Input: 3 world points ${P_{0}, P_{1}, P_{2}}$ , 3 pixel points ${p_{0}, p_{1}, p_{2}}$ , intrinsics $K$ .

Bearing vectors: Convert pixels to unit bearing vectors in the camera frame:

$\hat{b}_{i} = \frac{K ^{- 1} [ p _{i} , 1 ] ^{T}}{∥ K ^{- 1} [ p _{i} , 1 ] ^{T} ∥}$

Inter-point distances in world frame:

$a = ∥ P_{1} - P_{2} ∥, b = ∥ P_{0} - P_{2} ∥, c = ∥ P_{0} - P_{1} ∥$

Bearing vector cosines:

$cos α = \hat{b}_{1} \cdot \hat{b}_{2}, cos β = \hat{b}_{0} \cdot \hat{b}_{2}, cos γ = \hat{b}_{0} \cdot \hat{b}_{1}$

Quartic polynomial: Using the ratios $d = (b^{2} - a^{2}) / c^{2}$ and $e = b^{2} / c^{2}$ , Kneip derives a quartic polynomial in a distance ratio $u$ . The coefficients are functions of $d$ , $e$ , $cos α$ , $cos β$ , $cos γ$ .
Solve quartic for up to 4 real roots $u_{k}$ .
For each root:
- Compute the second distance ratio $v$
- Compute the three camera-frame distances $x, y, z$ (depths of the three points)
- Back-project to 3D points in camera frame: $Q_{i} = x_{i} \hat{b}_{i}$
- Recover pose from the 3D-3D correspondence ${P_{i}} \leftrightarrow {Q_{i}}$ using SVD-based rigid alignment

Disambiguation

P3P returns up to 4 poses. To select the correct one, use a fourth point (or more points with RANSAC) and pick the pose with the smallest reprojection error.

DLT PnP: Linear Solver for $n \geq 6$

The DLT (Direct Linear Transform) PnP uses an overdetermined system for $n \geq 6$ points.

Derivation

The projection equation in normalized coordinates is:

$[u v] = \frac{1}{[ r _{3}^{T} t _{z} ] [ P , 1 ] ^{T}} [[r_{1}^{T} t_{x}] [P, 1]^{T} [r_{2}^{T} t_{y}] [P, 1]^{T}]$

where $[u, v]$ are normalized coordinates (after applying $K^{- 1}$ to pixels) and $r_{i}^{T}$ are rows of $R$ .

Cross-multiplying gives two equations per point:

$u (r_{3}^{T} P + t_{z}) - (r_{1}^{T} P + t_{x}) = 0$ $v (r_{3}^{T} P + t_{z}) - (r_{2}^{T} P + t_{y}) = 0$

The 12 unknowns are the entries of the $3 \times 4$ matrix $[R ∣ t]$ .

The Linear System

For each point $(P_{i}, u_{i}, v_{i})$ :

$[X 0 Y 0 Z 0 10 0 X 0 Y 0 Z 01 - u X - v X - u Y - v Y - u Z - v Z - u - v]$

Stacking gives $2 n \times 12$ matrix $A$ . Solve $A p = 0$ via SVD.

Post-Processing

Reshape the 12-vector into a $3 \times 4$ matrix
Normalize the scale using the row norms of the $3 \times 3$ block
Project the rotation block onto SO(3) via SVD (same as in Pose from Homography)
Extract translation from the fourth column

Hartley Normalization

The 3D world points are normalized before building the system (center at origin, scale mean distance to $3$ ). The image points are normalized via $K^{- 1}$ . The result is denormalized after solving.

RANSAC Wrappers

Both solvers have RANSAC variants for handling outliers:

#![allow(unused)]
fn main() {
// DLT PnP + RANSAC
let (pose, inliers) = dlt_ransac(
    &world_pts, &image_pts, &K,
    &RansacOptions { thresh: 5.0, ..Default::default() }
)?;
}

The DLT PnP solver uses MIN_SAMPLES = 6 with RANSAC. P3P (p3p()) returns up to 4 candidates and is intended for manual disambiguation (e.g., using a 4th point), not as a RANSAC estimator.

Comparison

Solver	Min. points	Solutions	Strengths
P3P	3	Up to 4	Best for RANSAC (minimal sample)
DLT PnP	6	1	Simple, no polynomial solving

OpenCV equivalence: cv::solvePnP with SOLVEPNP_P3P or SOLVEPNP_DLS; cv::solvePnPRansac for robust estimation.

References

Kneip, L., Scaramuzza, D., & Siegwart, R. (2011). "A Novel Parametrization of the Perspective-Three-Point Problem for a Direct Computation of Absolute Camera Position and Orientation." CVPR.

Epipolar Geometry

Epipolar geometry describes the geometric relationship between two views of the same scene. It is encoded in the fundamental matrix $F$ (for pixel coordinates) or the essential matrix $E$ (for normalized coordinates). These matrices are central to stereo vision, visual odometry, and structure from motion.

Fundamental Matrix

Problem Statement

Given: $N$ point correspondences ${(p_{i}, p_{i}^{'})}$ in pixel coordinates from two views.

Find: Fundamental matrix $F \in R^{3 \times 3}$ satisfying $p^{' T} F p = 0$ for all correspondences.

Properties of $F$ :

Rank 2 (by the epipolar constraint)
7 degrees of freedom (9 entries, scale ambiguity, rank constraint)
Maps a point in one image to its epipolar line in the other: $l^{'} = F p$

8-Point Algorithm

The simplest method, requiring $N \geq 8$ correspondences.

Derivation: The constraint $p^{' T} F p = 0$ expands to:

$u^{'} u f_{11} + u^{'} v f_{12} + u^{'} f_{13} + v^{'} u f_{21} + v^{'} v f_{22} + v^{'} f_{23} + u f_{31} + v f_{32} + f_{33} = 0$

Each correspondence gives one linear equation in the 9 entries of $F$ .

Algorithm:

Normalize both point sets using Hartley normalization
Build $N \times 9$ design matrix $A$ where each row is: $[u^{'} u, u^{'} v, u^{'}, v^{'} u, v^{'} v, v^{'}, u, v, 1]$
Solve $A f = 0$ via SVD; $f$ is the last column of $V$
Enforce rank 2: Decompose $F = U Σ V^{T}$ and set $σ_{3} = 0$ : $F = U \cdot diag (σ_{1}, σ_{2}, 0) \cdot V^{T}$
Denormalize: $F = T_{2}^{T} \cdot F_{norm} \cdot T_{1}$

The rank-2 enforcement is critical: without it, the epipolar lines do not intersect at a common epipole.

7-Point Algorithm

A minimal solver using exactly 7 correspondences, producing 1 or 3 candidate matrices.

Derivation: With 7 equations for 9 unknowns (modulo scale), the null space of $A$ is 2-dimensional. Let the null space basis be $F_{1}$ and $F_{2}$ . The fundamental matrix is:

$F = F_{1} + t \cdot F_{2}$

for some scalar $t$ . The rank-2 constraint $det (F) = 0$ gives a cubic equation in $t$ :

$det (F_{1} + t F_{2}) = 0$

This cubic has 1 or 3 real roots, each giving a candidate $F$ .

Essential Matrix

Problem Statement

Given: $N$ point correspondences in normalized coordinates ${(\hat{p}_{i}, \hat{p}_{i}^{'})}$ where $\hat{p} = K^{- 1} [p, 1]^{T}$ .

Find: Essential matrix $E$ satisfying $\hat{p}^{' T} E \hat{p} = 0$ .

Properties of $E$ :

$E = [t]_{\times} R$ where $R$ is the relative rotation and $t$ is the translation direction
Has exactly two equal non-zero singular values: $σ_{1} = σ_{2}$ , $σ_{3} = 0$
5 degrees of freedom (3 rotation + 2 translation direction)

5-Point Algorithm (Nister)

The minimal solver, producing up to 10 candidate matrices.

Algorithm:

Normalize both point sets using Hartley normalization
Build $5 \times 9$ matrix $A$ (same form as 8-point)
Null space: SVD of $A$ gives a 4-dimensional null space. The essential matrix is: $E = x e_{1} + y e_{2} + z e_{3} + e_{4}$ for unknowns $(x, y, z)$ , where ${e_{i}}$ are the null space basis vectors
Polynomial constraints: The essential matrix constraint $E E^{T} E - \frac{1}{2} tr (E E^{T}) E = 0$ (9 equations) and $det (E) = 0$ (1 equation) generate polynomial equations in $(x, y, z)$
Action matrix: These constraints are assembled into a polynomial system solved via the eigenvalues of a $10 \times 10$ action matrix
Extract real solutions: Each real eigenvalue $z$ determines $(x, y)$ and thus $E$

Up to 10 candidate essential matrices may be returned.

Decomposing $E$ into $R, t$

Given $E = U diag (σ, σ, 0) V^{T}$ , there are 4 possible decompositions:

$R_{1} = U W V^{T}, R_{2} = U W^{T} V^{T}$ $t_{1} = + U_{3}, t_{2} = - U_{3}$

where $W = 010 - 1 00 001$ and $U_{3}$ is the third column of $U$ .

Chirality check: Of the 4 candidates $(R_{i}, t_{j})$ , only one places all triangulated points in front of both cameras. This is verified by triangulating a test point and checking that $Z > 0$ in both views.

Coordinate Conventions

Matrix	Input coordinates	Usage
Fundamental $F$	Pixel coordinates	When intrinsics are unknown
Essential $E$	Normalized coordinates ( $K^{- 1} p$ )	When intrinsics are known

The relationship is: $E = K^{' T} F K$ .

RANSAC

Both solvers have RANSAC wrappers for outlier rejection:

#![allow(unused)]
fn main() {
let (F, inliers) = fundamental_8point_ransac(&pts1, &pts2, &opts)?;
}

Currently, only the 8-point fundamental matrix solver has a RANSAC wrapper. The 5-point essential matrix solver (essential_5point) returns multiple candidates and is used directly.

OpenCV equivalence: cv::findFundamentalMat (8-point and 7-point), cv::findEssentialMat (5-point with RANSAC).

References

Nister, D. (2004). "An Efficient Solution to the Five-Point Relative Pose Problem." IEEE TPAMI, 26(6), 756-770.
Hartley, R.I. (1997). "In Defense of the Eight-Point Algorithm." IEEE TPAMI, 19(6), 580-593.

Camera Matrix and RQ Decomposition

The camera matrix (or projection matrix) $P$ is a $3 \times 4$ matrix that directly maps 3D world points to 2D pixel coordinates. Estimating $P$ via DLT and decomposing it into intrinsics and extrinsics via RQ decomposition provides an alternative initialization path.

Camera Matrix DLT

Problem Statement

Given: $N \geq 6$ correspondences between 3D world points ${P_{i}}$ and 2D pixel points ${p_{i}}$ .

Find: $3 \times 4$ projection matrix $P$ such that $p_{i} \sim P [P_{i}, 1]^{T}$ .

Derivation

The projection $p \sim P P_{h}$ (with $P_{h} = [P, 1]^{T}$ in homogeneous coordinates) gives, after cross-multiplication:

$u (p_{3}^{T} P_{h}) - p_{1}^{T} P_{h} = 0$ $v (p_{3}^{T} P_{h}) - p_{2}^{T} P_{h} = 0$

where $p_{i}^{T}$ is the $i$ -th row of $P$ .

This gives a $2 N \times 12$ system $A p = 0$ , solved via SVD with Hartley normalization of the 3D and 2D points.

Post-Processing

After denormalization, $P$ is $3 \times 4$ but not guaranteed to decompose cleanly into $K [R ∣ t]$ due to noise. The RQ decomposition extracts the components.

RQ Decomposition

Problem Statement

Given: The $3 \times 3$ left submatrix $M$ of $P$ (where $P = [M ∣ p_{4}]$ ).

Find: Upper-triangular $K$ and orthogonal $R$ such that $M = K R$ .

Algorithm

RQ decomposition is computed by transposing QR decomposition:

Compute QR decomposition of $M^{T}$ : $M^{T} = Q \hat{R}$
Then $M = \hat{R}^{T} Q^{T}$ , where $\hat{R}^{T}$ is lower-triangular and $Q^{T}$ is orthogonal
Apply a permutation matrix $J$ to flip the matrix to upper-triangular form:
- $K = J \hat{R}^{T} J$ (upper-triangular)
- $R = J Q^{T}$ (orthogonal)

Sign Conventions

After decomposition, ensure:

$K$ has positive diagonal entries: if $K_{ii} < 0$ , negate column $i$ of $K$ and row $i$ of $R$
$det (R) = + 1$ : if $det (R) = - 1$ , negate a column of $R$ (and the corresponding column of $K$ )

Translation Extraction

$t = K^{- 1} p_{4}$

Full Decomposition

The CameraMatrixDecomposition struct:

#![allow(unused)]
fn main() {
pub struct CameraMatrixDecomposition {
    pub k: Mat3,   // Upper-triangular intrinsics
    pub r: Mat3,   // Rotation matrix (orthonormal, det = +1)
    pub t: Vec3,   // Translation vector
}
}

API

#![allow(unused)]
fn main() {
// Estimate the full 3×4 camera matrix
let P = dlt_camera_matrix(&world_pts, &image_pts)?;

// Decompose into K, R, t
let decomp = decompose_camera_matrix(&P)?;
println!("Intrinsics: {:?}", decomp.k);
println!("Rotation: {:?}", decomp.r);
println!("Translation: {:?}", decomp.t);

// Or just RQ decompose any 3×3 matrix
let (K, R) = rq_decompose(&M);
}

When to Use

Camera matrix DLT is useful when:

You have non-coplanar 3D-2D correspondences and want to estimate both intrinsics and pose simultaneously
You need a quick estimate of $K$ from a single view (without multiple homographies)

For calibration with a planar board, Zhang's method is preferred because it uses the planar constraint to get more constraints per view.

Linear Triangulation

Triangulation reconstructs a 3D point from its projections in two or more calibrated views. It is the inverse of the projection operation: given pixel observations and camera poses, recover the world point.

Problem Statement

Given: $n \geq 2$ camera projection matrices ${P_{i}}$ (each $3 \times 4$ ) and corresponding pixel observations ${p_{i} = (u_{i}, v_{i})}$ .

Find: 3D point $X = [X, Y, Z]^{T}$ such that $p_{i} \sim P_{i} [X, 1]^{T}$ .

Assumptions:

Camera poses and intrinsics are known
Correspondences are correct
The point is visible from all views (not occluded)

Derivation (DLT)

For each view $i$ , the projection constraint $p_{i} \sim P_{i} X_{h}$ gives (by cross-multiplication):

$u_{i} (p_{3}^{(i) T} X_{h}) - p_{1}^{(i) T} X_{h} = 0$ $v_{i} (p_{3}^{(i) T} X_{h}) - p_{2}^{(i) T} X_{h} = 0$

where $p_{k}^{(i) T}$ is the $k$ -th row of $P_{i}$ and $X_{h} = [X, Y, Z, 1]^{T}$ .

Stacking all views gives a $2 n \times 4$ system:

$A X_{h} = 0$

Solve via SVD: $X_{h}$ is the right singular vector corresponding to the smallest singular value. Dehomogenize: $X = [X_{h} / w, Y_{h} / w, Z_{h} / w]^{T}$ .

Limitations

No uncertainty modeling: The DLT minimizes algebraic error, not geometric (reprojection) error
Baseline sensitivity: For small baselines (nearly parallel views), the triangulation is poorly conditioned — small pixel errors lead to large depth errors
Outlier sensitivity: A single incorrect correspondence corrupts the result; no robustness mechanism

For high-accuracy 3D reconstruction, triangulation should be followed by bundle adjustment that jointly optimizes points and cameras to minimize reprojection error.

API

#![allow(unused)]
fn main() {
let X = triangulate_point_linear(&projection_matrices, &pixel_points)?;
}

Where each projection matrix $P_{i} = K_{i} [R_{i} ∣ t_{i}]$ is precomputed from the camera intrinsics and pose.

Hand-Eye Calibration (Tsai-Lenz)

Hand-eye calibration estimates the rigid transform between a camera and a robot gripper (or between a camera and a robot base). It is essential for any application where a camera is mounted on a robot arm and the robot needs to localize objects in its own coordinate frame.

The AX = XB Problem

Setup

Consider a camera rigidly mounted on a robot gripper (eye-in-hand configuration). The system involves four coordinate frames:

Base (B): The robot's fixed base frame
Gripper (G): The robot's end-effector frame (known from robot kinematics)
Camera (C): The camera frame (observations come from here)
Target (T): The calibration board frame (fixed in the world)

The unknown is $X = T_{G, C}$ (gripper-to-camera transform).

The Equation

Given two observations $(i, j)$ , we can compute:

Robot motion: $A_{ij} = T_{G_{j}, G_{i}} = T_{B, G_{j}}^{- 1} T_{B, G_{i}}$ (relative gripper motion, known from robot kinematics)
Camera motion: $B_{ij} = T_{C_{j}, C_{i}} = T_{T, C_{j}}^{- 1} T_{T, C_{i}}$ (relative camera motion, from calibration board observations)

Since the camera is rigidly attached to the gripper:

$A_{ij} X = X B_{ij}$

This is the classic $A X = XB$ equation. We need to find $X$ that satisfies this for all motion pairs.

Eye-to-Hand Variant

When the camera is fixed and the target is on the gripper (eye-to-hand), the equation becomes:

$A_{ij} X = X B_{ij}$

where now $A$ is the relative gripper motion and $B$ is the relative target-in-camera motion, and $X = T_{C, B}$ (camera-to-base transform).

The Tsai-Lenz Method

Overview

Tsai and Lenz (1989) decomposed $A X = XB$ into separate rotation and translation subproblems:

Rotation: Solve $R_{A} R_{X} = R_{X} R_{B}$ for $R_{X}$
Translation: Given $R_{X}$ , solve $(R_{A} - I) t_{X} = R_{X} t_{B} - t_{A}$ for $t_{X}$

Step 1: All-Pairs Motion Computation

From $M$ observations, construct all $(2 M)$ motion pairs. For each pair $(i, j)$ :

$A_{ij} = T_{B, G_{i}}^{- 1} \cdot T_{B, G_{j}}$ $B_{ij} = T_{T, C_{i}}^{- 1} \cdot T_{T, C_{j}}$

Step 2: Filtering

Discard pairs with insufficient rotation (degenerate for the rotation subproblem):

Reject pairs where $∥ angle (R_{A}) ∥ < θ_{m i n}$ (default: 10°)
Optionally reject pairs where rotation axes of $A$ and $B$ are near-parallel (ill-conditioned)

Rotation diversity is critical: If all robot motions are rotations around the same axis, the hand-eye rotation around that axis is undetermined. Use poses with diverse rotation axes (roll, pitch, and yaw).

Step 3: Rotation Estimation

The rotation constraint $R_{A} R_{X} = R_{X} R_{B}$ is solved using quaternion algebra.

Convert $R_{A}$ and $R_{B}$ to quaternions $q_{A}$ and $q_{B}$ . The constraint becomes:

$q_{A} \otimes q_{X} = q_{X} \otimes q_{B}$

Using the left and right quaternion multiplication matrices $L (q_{A})$ and $R (q_{B})$ :

$(L (q_{A}) - R (q_{B})) q_{X} = 0$

Stacking all $N$ motion pairs gives a $4 N \times 4$ system:

$M q_{X} = 0, M = L (q_{A_{1}}) - R (q_{B_{1}}) ⋮ L (q_{A_{N}}) - R (q_{B_{N}})$

Solve via SVD: $q_{X}$ is the right singular vector of $M$ corresponding to the smallest singular value. Normalize to unit quaternion.

Step 4: Translation Estimation

Given $R_{X}$ , the translation constraint for each motion pair is:

$(R_{A} - I) t_{X} = R_{X} t_{B} - t_{A}$

This is a $3 N \times 3$ overdetermined linear system $C t_{X} = d$ , solved via least squares:

$t_{X} = (C^{T} C)^{- 1} C^{T} d$

with optional ridge regularization for numerical stability.

Target-in-Base Estimation

After finding $X = T_{G, C}$ , the target pose in the base frame $T_{B, T}$ can be estimated. For each observation $i$ :

$T_{B, T} = T_{B, G_{i}} \cdot X \cdot T_{C_{i}, T}$

The estimates from different views are averaged (quaternion averaging for rotation, arithmetic mean for translation).

Alternatively, $T_{B, T}$ is included in the non-linear optimization.

Practical Requirements

Minimum 3 views with diverse rotations (in practice, 5-10 views recommended)
Rotation diversity: Motions should span multiple rotation axes. Pure translations provide no rotation constraint. Pure Z-axis rotations leave the Z-component of the hand-eye rotation undetermined.
Accuracy: The linear Tsai-Lenz method typically gives 5-20% translation accuracy and 2-10° rotation accuracy. This initializes the non-linear joint optimization.

API

#![allow(unused)]
fn main() {
let X = estimate_handeye_dlt(
    &base_se3_gripper,      // &[Iso3]: robot poses (base to gripper)
    &target_se3_camera,     // &[Iso3]: camera-to-target poses (inverted)
    min_angle_deg,          // f64: minimum rotation angle filter
)?;
}

Returns $X$ as an Iso3 transform.

OpenCV equivalence: cv::calibrateHandEye with method CALIB_HAND_EYE_TSAI.

References

Tsai, R.Y. & Lenz, R.K. (1989). "A New Technique for Fully Autonomous and Efficient 3D Robotics Hand/Eye Calibration." IEEE Transactions on Robotics and Automation, 5(3), 345-358.

Multi-Camera Rig Initialization

A multi-camera rig is a set of cameras rigidly mounted on a common frame. The extrinsics of the rig describe the relative pose of each camera with respect to a reference camera (or a rig frame). This chapter covers the linear initialization of rig extrinsics from per-camera pose estimates.

Problem Statement

Given: Per-camera, per-view pose estimates $T_{C_{k}, T}^{(v)}$ (camera $k$ , view $v$ , target to camera).

Find: Camera-to-rig transforms ${T_{R, C_{k}}}$ and rig-to-target poses ${T_{R, T}^{(v)}}$ .

Assumptions:

All cameras observe the same calibration target simultaneously
Per-camera poses have been estimated (e.g., via PnP or homography decomposition)
The rig is rigid (camera-to-rig transforms are constant across views)

Algorithm

Reference Camera

One camera is designated as the reference camera (default: camera 0). Its camera-to-rig transform is identity:

$T_{R, C_{0}} = I$

This defines the rig frame to coincide with the reference camera frame.

Rig-to-Target from Reference

For each view $v$ , the rig-to-target pose is the reference camera's pose:

$T_{R, T}^{(v)} = T_{C_{0}, T}^{(v)}$

Camera-to-Rig via Averaging

For each non-reference camera $k$ , collect the camera-to-rig estimate from each view where both cameras have observations:

$\hat{T}_{R, C_{k}}^{(v)} = T_{R, T}^{(v)} \cdot (T_{C_{k}, T}^{(v)})^{- 1} = T_{C_{0}, T}^{(v)} \cdot (T_{C_{k}, T}^{(v)})^{- 1}$

Then average across views:

Rotation: Quaternion averaging with hemisphere correction. Before averaging, flip quaternions to the same hemisphere (since $q$ and $- q$ represent the same rotation). Then compute the mean quaternion and normalize.
Translation: Arithmetic mean of translation vectors.

$T_{R, C_{k}} = average ({\hat{T}_{R, C_{k}}^{(v)}}_{v})$

Quaternion Hemisphere Correction

Quaternions have a double-cover of SO(3): both $q$ and $- q$ represent the same rotation. Before averaging, all quaternions are flipped to the same hemisphere as the first:

$q_{i} \leftarrow {q_{i} - q_{i} if q_{i} \cdot q_{1} \geq 0 if q_{i} \cdot q_{1} < 0$

This prevents averaging artifacts where opposite quaternions cancel out.

Accuracy

The initialization accuracy depends on:

Per-camera pose accuracy: Noisy single-camera poses (from homography decomposition or PnP) propagate to the rig extrinsics
Number of views: More views improve the averaging
View diversity: Diverse viewpoints reduce systematic errors

Typical accuracy: 1-5° rotation, 5-15% translation. This is refined in the rig bundle adjustment (see Multi-Camera Rig Extrinsics).

API

#![allow(unused)]
fn main() {
let extrinsic_poses = estimate_extrinsics_from_cam_target_poses(
    &cam_se3_target,    // Vec<Vec<Option<Iso3>>>: per-view, per-camera poses
    ref_cam_idx,        // usize: reference camera index
)?;

// extrinsic_poses.cam_to_rig: Vec<Iso3>  (T_{R,C} for each camera)
// extrinsic_poses.rig_from_target: Vec<Iso3>  (T_{R,T} for each view)
}

Usage in Calibration Pipeline

In the rig extrinsics calibration pipeline:

Per-camera intrinsics are estimated and optimized independently
Per-camera poses are recovered from the calibration board
Rig extrinsics are initialized via averaging (this chapter)
Rig bundle adjustment jointly optimizes all parameters

Laser Plane Initialization

In laser triangulation, a laser line is projected onto the scene and a camera observes the line. The laser plane — defined by a normal vector and distance from the camera origin — must be calibrated. This chapter describes the linear initialization of the laser plane from observations of the laser line on a calibration target.

Problem Statement

Given:

Camera intrinsics $K$ and distortion
Per-view camera poses $T_{C, T}^{(v)}$ (target to camera)
Per-view laser pixel observations ${p_{v i}}$ (pixels where the laser line intersects the target)

Find: Laser plane $(\hat{n}, d)$ in the camera frame, where $\hat{n}$ is the unit normal and $d$ is the signed distance from the camera origin:

$\hat{n}^{T} P = d$

for all 3D points $P$ on the laser plane (in camera coordinates).

Algorithm

Step 1: Back-Project Laser Pixels to 3D

For each laser pixel in each view:

Undistort the pixel to get normalized coordinates: $n = D^{- 1} (K^{- 1} [p, 1]^{T})$
Form a ray in the camera frame: $r (t) = t \cdot [n, 1]^{T}$
Intersect with the target plane: The target at $Z_{target} = 0$ in target coordinates, transformed to camera frame by $T_{C, T}$ , defines a plane. Solve for $t$ that places the ray on this plane.
Compute 3D point: $P = r (t)$ in camera coordinates

This gives a set of 3D points on the laser plane.

Step 2: Fit a Plane via SVD

Given $N$ 3D points ${P_{i}}$ on the laser plane:

Centroid: $\overset{ˉ}{P} = \frac{1}{N} \sum P_{i}$
Center the points: $Q_{i} = P_{i} - \overset{ˉ}{P}$
Covariance matrix: $C = \sum Q_{i} Q_{i}^{T}$
SVD: $C = U Σ V^{T}$
Normal: $\hat{n}$ is the eigenvector corresponding to the smallest singular value (last column of $V$ )
Distance: $d = \hat{n}^{T} \overset{ˉ}{P}$

Degeneracy Detection

If all 3D points are nearly collinear (e.g., because all views have the same target pose), the plane is underdetermined. The algorithm checks the ratio of the two smallest singular values:

$\frac{σ _{2}}{σ _{1}} < ϵ$

If the points are too close to collinear, the plane fit fails with an error.

Plane Representation for Optimization

In the non-linear optimization, the plane is parameterized as:

Normal: A unit vector $\hat{n} \in S^{2}$ (2 degrees of freedom on the unit sphere)
Distance: A scalar $d \in R$ (1 degree of freedom)

The $S^{2}$ manifold is used for the normal to maintain the unit-norm constraint during optimization (see Manifold Optimization).

Accuracy

The linear plane initialization accuracy depends on:

View diversity: Different target orientations provide points at different locations on the laser plane, improving the fit
Camera calibration accuracy: Errors in intrinsics, distortion, or poses propagate to the 3D point estimates
Number of laser pixels: More pixels (from more views or more detected pixels per view) improve the SVD fit

Typical accuracy: 1-5° normal direction error, 5-20% distance error. Refined in the Laserline Device Calibration optimization.

API

#![allow(unused)]
fn main() {
use vision_calibration::linear::laserline::{LaserlinePlaneSolver, LaserlineView};

// Each view carries laser pixels and the camera-to-target pose
let views: Vec<LaserlineView> = poses.iter().zip(laser_pixels_per_view.iter())
    .map(|(pose, pixels)| LaserlineView {
        camera_se3_target: *pose,
        laser_pixels: pixels.clone(),
    })
    .collect();

let plane = LaserlinePlaneSolver::from_views(&views, &camera)?;
// plane.normal: UnitVector3<f64> (unit normal in camera frame)
// plane.distance: f64 (signed distance from camera origin)
// plane.rmse: f64 (fit residual)
}

Polynomial Solvers

Several minimal geometry solvers in calibration-rs reduce to polynomial equations. This chapter documents the polynomial root-finding utilities used by P3P, 7-point fundamental, and 5-point essential solvers.

Quadratic

$a x^{2} + b x + c = 0$

Solved via the standard discriminant formula: $x = \frac{- b \pm b ^{2} - 4 a c}{2 a}$ . Returns 0, 1, or 2 real roots depending on the discriminant sign.

Cubic

$a x^{3} + b x^{2} + c x + d = 0$

Solved using Cardano's formula:

Convert to depressed cubic $t^{3} + pt + q = 0$ via substitution $x = t - b / (3 a)$
Compute discriminant $Δ = - 4 p^{3} - 27 q^{2}$
For $Δ > 0$ (three real roots): use trigonometric form
For $Δ \leq 0$ (one real root): use Cardano's formula with cube roots

Returns 1 or 3 real roots.

Quartic

$a x^{4} + b x^{3} + c x^{2} + d x + e = 0$

Solved via the companion matrix approach:

Normalize to monic form: divide by $a$
Construct the $4 \times 4$ companion matrix
Compute eigenvalues using Schur decomposition (via nalgebra)
Extract real eigenvalues: those with $∣ Im (λ) ∣ < ϵ$

Returns 0 to 4 real roots.

Usage in Minimal Solvers

Solver	Polynomial	Degree	Max roots
P3P (Kneip)	Distance ratio	Quartic	4
7-point fundamental	$det (F_{1} + t F_{2}) = 0$	Cubic	3
5-point essential	Action matrix eigenvalues	Degree 10	10

The 5-point solver uses eigendecomposition of a $10 \times 10$ matrix rather than a polynomial solver directly, but the underlying mathematics involves degree-10 polynomial constraints.

Numerical Considerations

All roots are deduplicated (roots closer than $ϵ$ are merged)
Real root extraction uses a threshold on the imaginary part ( $∣ Im ∣ < 1 0^{- 8}$ )
The companion matrix approach is more numerically stable than analytical quartic formulas for extreme coefficient ratios

Non-Linear Least Squares Overview

Non-linear least squares (NLLS) is the mathematical framework underlying all refinement in calibration-rs. After linear initialization provides approximate parameter estimates, NLLS minimizes the reprojection error to achieve sub-pixel accuracy.

Problem Formulation

Objective: Minimize the sum of squared residuals:

$θ min F (θ) = \frac{1}{2} i = 1 \sum N ∥ r_{i} (θ) ∥^{2} = \frac{1}{2} r (θ)^{T} r (θ)$

where $θ \in R^{n}$ is the parameter vector and $r (θ) \in R^{m}$ is the stacked residual vector.

In camera calibration, a typical residual is the reprojection error: the difference between an observed pixel and the predicted projection:

$r_{i} = π (K, d, T_{v}, P_{j}) - p_{v j}$

where $π$ is the camera projection function, and the parameters $θ$ include intrinsics $K$ , distortion $d$ , and poses ${T_{v}}$ .

Gauss-Newton Method

The Gauss-Newton method exploits the least-squares structure. At the current estimate $θ$ , linearize the residuals:

$r (θ + δ) \approx r (θ) + J δ$

where $J = \frac{\partial r}{\partial θ} \in R^{m \times n}$ is the Jacobian.

Substituting into the objective and minimizing with respect to $δ$ :

$\frac{\partial}{\partial δ} \frac{1}{2} ∥ r + J δ ∥^{2} = 0$

gives the normal equations:

$J^{T} J δ = - J^{T} r$

The matrix $H_{GN} = J^{T} J$ is the Gauss-Newton approximation to the Hessian ( $H_{GN} \approx \nabla^{2} F$ when residuals are small).

Update: $θ \leftarrow θ + δ$

Levenberg-Marquardt Method

Gauss-Newton can diverge when the linearization is poor (far from the minimum) or when $J^{T} J$ is singular. Levenberg-Marquardt (LM) adds a damping term:

$(J^{T} J + λ diag (J^{T} J)) δ = - J^{T} r$

The damping parameter $λ > 0$ interpolates between:

$λ \to 0$ : Pure Gauss-Newton (fast convergence near minimum)
$λ \to \infty$ : Gradient descent with small step (safe far from minimum)

Trust Region Interpretation

LM is a trust region method: $λ$ controls the size of the region where the linear approximation is trusted.

If the update reduces the cost: accept the step, decrease $λ$ (expand trust region)
If the update increases the cost: reject the step, increase $λ$ (shrink trust region)

Convergence Criteria

LM terminates when any of:

Cost threshold: $F (θ) < ϵ_{min}$
Absolute decrease: $∣ F_{k} - F_{k + 1} ∣ < ϵ_{abs}$
Relative decrease: $∣ F_{k} - F_{k + 1} ∣/ F_{k} < ϵ_{rel}$
Maximum iterations: $k > k_{m a x}$
Parameter change: $∥ δ ∥ < ϵ_{param}$

Sparsity

In bundle adjustment problems, the Jacobian $J$ is sparse: each residual depends on only a few parameter blocks (one camera intrinsics, one distortion, one pose). The $J^{T} J$ matrix has a block-arrow structure that can be exploited by sparse linear solvers:

Sparse Cholesky: Efficient for well-structured problems
Sparse QR: More robust when the normal equations are ill-conditioned

calibration-rs uses sparse linear solvers through the tiny-solver backend.

Cost Function vs. Reprojection Error

The optimizer minimizes the cost $F = \frac{1}{2} \sum r_{i}^{2}$ . The commonly reported mean reprojection error is:

$\overset{e}{ˉ} = \frac{1}{N} i = 1 \sum N ∥ r_{i} ∥$

These are related but not identical: the cost weights large errors quadratically, while the mean error weighs them linearly. calibration-rs reports both: the cost from the solver and the mean reprojection error computed post-optimization.

With Robust Losses

When a robust loss $ρ$ is used, the objective becomes:

$θ min i = 1 \sum N ρ (∥ r_{i} ∥)$

The normal equations are modified to incorporate the loss function's weight:

$J^{T} W J δ = - J^{T} W r$

where $W = diag (ρ^{'} (∥ r_{i} ∥) /∥ r_{i} ∥)$ is the iteratively reweighted diagonal matrix. See Robust Loss Functions for details.

Manifold Optimization

Camera poses are elements of the Lie group $SE (3)$ , which is a 6-dimensional smooth manifold — not a Euclidean space. Laser plane normals live on the unit sphere $S^{2}$ . Optimizing these parameters with standard Euclidean updates (adding $δ$ to the parameter vector) would violate the manifold constraints: rotations would become non-orthogonal, quaternions would lose unit norm, and normals would lose unit length.

Manifold optimization solves this by computing updates in the tangent space and then retracting back to the manifold.

The Retract-Update Pattern

For a parameter $x$ on a manifold $M$ :

Compute update $δ$ in the tangent space $T_{x} M$ (a Euclidean space of dimension equal to the manifold's degrees of freedom)
Retract: $x \leftarrow x \oplus δ$ where $\oplus$ maps from tangent space back to the manifold

This is also called local parameterization: the optimizer sees a Euclidean space of the correct dimension, and the manifold structure handles the constraint.

SE(3): Rigid Transforms

Ambient dimension: 7 (quaternion $[q_{x}, q_{y}, q_{z}, q_{w}]$ + translation $[t_{x}, t_{y}, t_{z}]$ )

Tangent dimension: 6 ( $[ω, v] \in R^{6}$ )

Plus operation ( $\oplus$ ): Given current pose $T$ and tangent vector $ξ = [ω, v]$ :

$T^{'} = exp (ξ^{\land}) \cdot T$

where $exp$ is the SE(3) exponential map (see Rigid Transformations):

$exp (ξ^{\land}) = [exp ([ω]_{\times}) 0^{T} V v 1]$

Minus operation ( $⊖$ ): The inverse — compute the tangent vector between two poses — uses the SE(3) logarithm.

Implementation in calibration-rs

The se3_exp function in the factor module computes the exponential map generically over RealField for autodiff compatibility:

#![allow(unused)]
fn main() {
fn se3_exp<T: RealField>(xi: &[T; 6]) -> [[T; 4]; 4]
}

This uses Rodrigues' formula with special handling for the small-angle case ( $∥ ω ∥ < ϵ$ ) to avoid division by zero.

SO(3): Rotations

Ambient dimension: 4 (unit quaternion)

Tangent dimension: 3 (axis-angle $ω \in R^{3}$ )

Plus operation: $R^{'} = exp ([ω]_{\times}) \cdot R$

SO(3) is used for rotation-only parameters. In calibration-rs, most poses use the full SE(3), but SO(3) is available for specialized problems.

$S^{2}$ : Unit Sphere

Ambient dimension: 3 (unit vector $\hat{n} \in R^{3}$ , $∥ \hat{n} ∥ = 1$ )

Tangent dimension: 2 ( $δ \in R^{2}$ )

The $S^{2}$ manifold is used for laser plane normals — 3D unit vectors with 2 degrees of freedom.

Plus operation: Given current normal $\hat{n}$ and tangent vector $δ \in R^{2}$ :

Compute orthonormal basis $(b_{1}, b_{2})$ of the tangent plane at $\hat{n}$
Tangent vector in 3D: $v = δ_{1} b_{1} + δ_{2} b_{2}$
Retract: $\hat{n}^{'} = \frac{n ^ + v}{∥ n ^ + v ∥}$

For small $∥ δ ∥$ , this is equivalent to the exponential map on $S^{2}$ .

Basis Construction

The tangent plane basis is constructed deterministically:

Choose a reference vector not parallel to $\hat{n}$ (e.g., $[1, 0, 0]$ or $[0, 1, 0]$ )
$b_{1} = \hat{n} \times ref$ , normalized
$b_{2} = \hat{n} \times b_{1}$

Euclidean Parameters

For intrinsics ( $f_{x}, f_{y}, c_{x}, c_{y}$ ), distortion ( $k_{1}, k_{2}, k_{3}, p_{1}, p_{2}$ ), and other unconstrained parameters, the manifold is simply $R^{n}$ with:

$x^{'} = x + δ$

No special treatment needed.

ManifoldKind in the IR

The optimization IR specifies the manifold for each parameter block:

#![allow(unused)]
fn main() {
pub enum ManifoldKind {
    Euclidean,  // R^n, standard addition
    SE3,        // 7D ambient, 6D tangent
    SO3,        // 4D ambient, 3D tangent
    S2,         // 3D ambient, 2D tangent
}
}

The backend maps these to solver-specific manifold implementations.

Fixed Parameters on Manifolds

For Euclidean parameters, individual indices can be fixed (e.g., fix $c_{x}$ but optimize $f_{x}$ ). For manifold parameters (SE3, SO3, S2), fixing is all-or-nothing: either the entire parameter block is optimized or the entire block is fixed. This is because the tangent space decomposition does not naturally support partial fixing on non-Euclidean manifolds.

Backend-Agnostic IR Architecture

calibration-rs separates the definition of optimization problems from their execution by a specific solver. Problems are expressed as an Intermediate Representation (IR) that is compiled to a solver-specific form. This design allows swapping backends without changing problem definitions and makes it straightforward to add new problem types.

Three-Stage Pipeline

Problem Builder  →  ProblemIR  →  Backend.compile()  →  Backend.solve()
 (domain code)      (generic)     (solver-specific)     (optimization)

Problem Builder: Domain code (e.g., build_planar_intrinsics_ir()) constructs a ProblemIR from calibration data and initial parameter estimates
Backend Compilation: The backend (e.g., TinySolverBackend) translates the IR into solver-specific data structures
Solving: The backend runs the optimizer and returns a BackendSolution

ProblemIR

The central type that describes a complete optimization problem:

#![allow(unused)]
fn main() {
pub struct ProblemIR {
    pub params: Vec<ParamBlock>,
    pub residuals: Vec<ResidualBlock>,
}
}

It is validated on construction to ensure all parameter references are valid, dimensions match, and manifold constraints are respected.

ParamBlock

Each parameter block represents a group of optimization variables:

#![allow(unused)]
fn main() {
pub struct ParamBlock {
    pub id: ParamId,
    pub name: String,
    pub dim: usize,
    pub manifold: ManifoldKind,
    pub fixed: FixedMask,
    pub bounds: Option<Vec<Bound>>,  // per-index box constraints
}
}

Field	Description
`id`	Unique identifier within the problem
`name`	Human-readable name (e.g., `"cam"`, `"pose/0"`, `"plane"`)
`dim`	Ambient dimension (e.g., 4 for intrinsics, 7 for SE3)
`manifold`	Geometry of the parameter space
`fixed`	Which indices (or the whole block) are held constant
`bounds`	Optional box constraints on parameter values

Naming Conventions

Problem builders use consistent naming:

"intrinsics" or "cam" — intrinsics ( $f_{x}, f_{y}, c_{x}, c_{y}$ ), dimension 4
"distortion" or "dist" — distortion ( $k_{1}, k_{2}, k_{3}, p_{1}, p_{2}$ ), dimension 5
"sensor" — Scheimpflug tilt ( $τ_{x}, τ_{y}$ ), dimension 2
"pose/0", "pose/1", ... — per-view SE(3) poses (planar, rig), dimension 7
"pose_0", "pose_1", ... — per-view SE(3) poses (laserline), dimension 7
"plane_normal" — laser plane unit normal, dimension 3, $S^{2}$ manifold
"plane_distance" — laser plane distance, dimension 1
"handeye" — hand-eye SE(3), dimension 7
"extrinsics/0", ... — per-camera SE(3) extrinsics, dimension 7

FixedMask

Controls which parameters are held constant during optimization:

#![allow(unused)]
fn main() {
// Construct via factory methods:
FixedMask::all_free()              // nothing fixed
FixedMask::all_fixed(dim)          // everything fixed
FixedMask::fix_indices(&[2])       // fix specific indices
}

Euclidean parameters: Individual indices can be fixed. For example, FixedMask::fix_indices(&[2]) on intrinsics fixes $c_{x}$ while allowing $f_{x}, f_{y}, c_{y}$ to vary.
Manifold parameters: All-or-nothing. If any index is fixed, the entire block is fixed. (The tangent space does not support partial fixing.)

ResidualBlock

Each residual block connects parameter blocks through a factor:

#![allow(unused)]
fn main() {
pub struct ResidualBlock {
    pub params: Vec<ParamId>,
    pub loss: RobustLoss,
    pub factor: FactorKind,
    pub residual_dim: usize,
}
}

The params vector references ParamBlocks by their ParamId. The ordering must match what the factor expects.

FactorKind

A factor is one of four residual families. The camera model and pose chain are data carried by the factor, not separate enum variants:

#![allow(unused)]
fn main() {
pub enum FactorKind {
    /// 2D pixel reprojection residual.
    ReprojPoint {
        model: CameraModelDesc,
        chain: ReprojChain,
        pw: [f64; 3], uv: [f64; 2], w: f64,
    },
    /// 1D point-to-plane laser residual (meters).
    LaserPointToPlane {
        model: CameraModelDesc,
        chain: LaserChain,
        laser_pixel: [f64; 2], w: f64,
    },
    /// 1D line-distance laser residual (pixels).
    LaserLineDistance {
        model: CameraModelDesc,
        chain: LaserChain,
        laser_pixel: [f64; 2], w: f64,
    },
    /// Zero-mean prior on a 6D se(3) tangent block.
    Se3TangentPrior { sqrt_info: [f64; 6] },
}
}

Each factor also carries the per-residual data (3D point coordinates, observed pixel, weight, measured robot pose, …) that is not part of the optimizable parameters.

Camera model as data

#![allow(unused)]
fn main() {
pub struct CameraModelDesc {
    pub projection: ProjectionKind, // Pinhole               → intrinsics (dim 4)
    pub distortion: DistortionKind, // None | BrownConrady5  → dim 0 | 5
    pub sensor: SensorKind,         // None | Scheimpflug2   → dim 0 | 2
}
}

A slot with dimension 0 contributes no parameter block. The backend maps the descriptor to zero-sized kernel types and monomorphizes the residual over them once per factor — autodiff stays generic over the scalar type with no per-evaluation model dispatch. Adding a camera model later (rational distortion, thin-prism, division, fisheye projection) adds a descriptor variant + kernel + one dispatch row; no new factor kinds.

Chains as data

ReprojChain selects the pose chain and the trailing parameter blocks:

Chain	Chain blocks	Transform
`SinglePose`	[pose]	$P_{c} = T \cdot P_{w}$
`TwoSe3`	[extrinsics, pose]	$P_{c} = T_{e x t r}^{- 1} T_{p ose} P_{w}$
`HandEye { base_se3_gripper, mode }`	[extrinsics, handeye, target]	mode-dependent hand-eye chain
`HandEyeRobotDelta { … }`	[extrinsics, handeye, target, robot_delta]	hand-eye + per-view se(3) correction

LaserChain mirrors this for the laser families (SinglePose, RigHandEye, RigHandEyeRobotDelta); every laser chain ends with the plane_normal (S², dim 3) and plane_distance (dim 1) blocks, with robot_delta last when present.

Layout-derived validation

FactorKind::param_layout() derives the full expected block list — [intrinsics, distortion?, sensor?, <chain blocks…>] — as Vec<ParamSlotSpec { dim, manifold, role }>. Validation is a single zip of this layout against the referenced parameter blocks; there are no per-variant validation arms to maintain.

See the Factor Catalog for the full chain math and residual definitions.

Validation

ProblemIR::new() validates:

All ParamId references in residual blocks resolve to existing parameter blocks
Parameter dimensions match what each factor expects
Manifold kinds are compatible with the factor requirements
No duplicate parameter IDs

Initial Values

Parameter blocks carry only their structural definition (dimension, manifold, fixing). The initial values are passed separately as a HashMap<String, DVector<f64>> (keyed by parameter block name) to the backend's solve() method.

Example: Planar Intrinsics IR

For a 10-view planar intrinsics problem with 48 points per view:

Parameters:
  - "cam": dim=4, Euclidean (fx, fy, cx, cy)
  - "dist": dim=5, Euclidean (k1, k2, k3, p1, p2)
  - "pose/0"..."pose/9": dim=7, SE3

Residuals: 480 blocks
  - Each: ReprojPoint { model: PINHOLE4_DIST5, chain: SinglePose, pw, uv, w=1.0 }
  - Params: [cam_id, dist_id, pose_k_id]
  - Residual dim: 2

Total residuals: 960 scalar values
Total parameters: 4 + 5 + 10*7 = 79

Autodiff and Generic Residual Functions

The Levenberg-Marquardt algorithm requires the Jacobian $J = \frac{\partial r}{\partial θ}$ of the residual vector. Computing this analytically for complex camera models (distortion, Scheimpflug tilt, SE(3) transforms) is error-prone and tedious. calibration-rs uses automatic differentiation (autodiff) via dual numbers to compute exact Jacobians from the same code that evaluates residuals.

Dual Numbers

A dual number extends a real number with an infinitesimal part:

$\tilde{x} = x + ϵ \overset{x}{˙}, ϵ^{2} = 0$

Arithmetic preserves the chain rule automatically:

$f (\tilde{x}) = f (x) + ϵ f^{'} (x) \overset{x}{˙}$

By setting $\overset{x}{˙} = 1$ for the parameter of interest and $\overset{x}{˙} = 0$ for others, evaluating $f (\tilde{x})$ simultaneously computes both $f (x)$ and $\frac{\partial f}{\partial x}$ .

For multi-variable functions, hyper-dual numbers or multi-dual numbers generalize this to compute full Jacobians. The num-dual crate provides these types.

The `RealField` Pattern

All residual functions in calibration-rs are written generic over the scalar type:

#![allow(unused)]
fn main() {
fn reproj_residual_pinhole4_dist5_se3_generic<T: RealField>(
    intr: DVectorView<'_, T>,  // [fx, fy, cx, cy]
    dist: DVectorView<'_, T>,  // [k1, k2, k3, p1, p2]
    pose: DVectorView<'_, T>,  // [qx, qy, qz, qw, tx, ty, tz]
    pw: [f64; 3],              // 3D world point (constant)
    uv: [f64; 2],              // observed pixel (constant)
    w: f64,                    // weight (constant)
) -> SVector<T, 2>
}

When called with T = f64, this computes the residual value. When called with T = DualNumber, it simultaneously computes the residual and its derivatives with respect to all parameters.

Design Rules for Autodiff-Compatible Code

Use .clone() liberally: Dual numbers are small structs; cloning is cheap and avoids borrow checker issues with operator overloading.
Avoid in-place operations: Operations like x += y or v[i] = expr can cause issues with dual number tracking. Prefer functional style.
Convert constants: Use T::from_f64(1.0).unwrap() to convert floating-point literals. Do not use 1.0 directly where a T is expected.
Constant data as f64: World points, observed pixels, and weights are constant data — not optimized. Keep them as f64 and convert to T inside the function.

Walkthrough: Reprojection Residual

The core reprojection residual computes:

$r = w \cdot (π (K, d, T, P) - p_{obs})$

Step by step:

1. Unpack Parameters

#![allow(unused)]
fn main() {
let fx = intr[0].clone();
let fy = intr[1].clone();
let cx = intr[2].clone();
let cy = intr[3].clone();

let k1 = dist[0].clone();
let k2 = dist[1].clone();
// ...
}

2. SE(3) Transform: World to Camera

Apply the pose to the 3D world point using the exponential map or direct quaternion rotation:

$P_{c} = R (q) \cdot P_{w} + t$

The implementation unpacks the quaternion $[q_{x}, q_{y}, q_{z}, q_{w}]$ and applies the rotation formula.

3. Pinhole Projection

$n_{x} = P_{c x} / P_{cz}, n_{y} = P_{cy} / P_{cz}$

4. Apply Distortion

$r^{2} = n_{x}^{2} + n_{y}^{2}$ $α = 1 + k_{1} r^{2} + k_{2} r^{4} + k_{3} r^{6}$ $x_{d} = n_{x} \cdot α + 2 p_{1} n_{x} n_{y} + p_{2} (r^{2} + 2 n_{x}^{2})$ $y_{d} = n_{y} \cdot α + p_{1} (r^{2} + 2 n_{y}^{2}) + 2 p_{2} n_{x} n_{y}$

5. Apply Intrinsics

$u = f_{x} \cdot x_{d} + c_{x}$ $v = f_{y} \cdot y_{d} + c_{y}$

6. Compute Residual

$r_{x} = w \cdot (u - u_{obs})$ $r_{y} = w \cdot (v - v_{obs})$

Every arithmetic operation on T values propagates derivatives automatically through the dual number mechanism.

Backend Integration

The backend wraps each generic residual function in a solver-specific factor struct that implements the solver's cost function interface:

FactorKind::ReprojPoint { model: PINHOLE4_DIST5, chain: SinglePose, pw, uv, w }
    ↓ (compile step — kernel types selected from the descriptor)
TinySolverFactor {
    evaluate: |params| {
        reproj_residual_model_generic::<PinholeKernel, BrownConrady5Kernel, IdentitySensorKernel, T>(
            params["cam"], params["dist"], params["pose/k"],
            pw, uv, w
        )
    }
}

The tiny-solver backend then calls evaluate with dual numbers to compute both residuals and Jacobians in a single pass.

SE(3) Exponential Map

A critical autodiff-compatible function is se3_exp<T>(), which maps a 6D tangent vector to an SE(3) transform:

#![allow(unused)]
fn main() {
fn se3_exp<T: RealField>(xi: &[T; 6]) -> [[T; 4]; 4]
}

This implements Rodrigues' formula with a special case for small angles ( $∥ ω ∥ < ϵ$ ) using Taylor expansions to avoid numerical issues with the dual number's infinitesimal part.

Performance Considerations

Autodiff with dual numbers is typically 2-10x slower than hand-coded Jacobians, but:

Correctness: Eliminates the risk of Jacobian implementation bugs
Maintainability: Adding a new factor only requires writing the forward evaluation
Flexibility: The same code works for any parameter configuration

For calibration problems (hundreds to thousands of residuals, tens to hundreds of parameters), the autodiff overhead is negligible compared to the linear solve in each LM iteration.

Levenberg-Marquardt Backend

The TinySolverBackend is the current optimization backend in calibration-rs. It wraps the tiny-solver crate, providing Levenberg-Marquardt optimization with sparse linear solvers, manifold support, and robust loss functions.

Backend Trait

All backends implement the OptimBackend trait:

#![allow(unused)]
fn main() {
pub trait OptimBackend {
    fn solve(
        &self,
        ir: &ProblemIR,
        initial_params: &HashMap<String, DVector<f64>>,
        opts: &BackendSolveOptions,
    ) -> Result<BackendSolution>;
}
}

The backend receives the problem IR, initial parameter values, and solver options, and returns the optimized parameters with a solve report.

Compilation: IR to Solver

The compile() step translates the abstract IR into tiny-solver's concrete types:

Parameters

For each ParamBlock in the IR:

Create parameter with the correct dimension
Set manifold based on ManifoldKind:
- Euclidean → no manifold (standard addition)
- SE3 → SE3Manifold (7D ambient, 6D tangent)
- SO3 → QuaternionManifold (4D ambient, 3D tangent)
- S2 → UnitVector3Manifold (3D ambient, 2D tangent)
Fix parameters according to FixedMask:
- Euclidean: fix individual indices
- Manifolds: fix entire block (all-or-nothing)
Set bounds if present (box constraints on parameter values)

Residuals

For each ResidualBlock:

Compile the factor: Create a closure that calls the appropriate generic residual function
Apply robust loss: Wrap in Huber/Cauchy/Arctan if specified
Connect parameters: Reference the correct parameter blocks by their compiled IDs

Factor Compilation

compile_factor matches the factor's CameraModelDesc against the dispatch_camera_model! table once and monomorphizes a generic factor struct over the selected kernel types; the chain is evaluated as data inside the residual:

FactorKind::ReprojPoint { model: PINHOLE4_DIST5, chain, pw, uv, w }
    → TinyReprojFactor::<PinholeKernel, BrownConrady5Kernel, IdentitySensorKernel>
      (calls reproj_residual_model_generic::<P, D, S, T>(chain, params, pw, uv, w))

FactorKind::LaserLineDistance { model, chain, laser_pixel, w }
    → TinyLaserLineFactor::<BrownConrady5Kernel, Scheimpflug2Kernel>
      (calls laser_line_distance_model_generic::<D, S, T>(chain, params, laser_pixel, w))

FactorKind::Se3TangentPrior { sqrt_info }
    → TinySe3TangentPriorFactor (element-wise scaled tangent residual)

Solver Options

#![allow(unused)]
fn main() {
pub struct BackendSolveOptions {
    pub max_iters: usize,          // Maximum LM iterations (default: 100)
    pub verbosity: u32,            // 0=silent, 1=summary, 2=per-iteration
    pub linear_solver: LinearSolverType,  // SparseCholesky or SparseQR
    pub min_abs_decrease: f64,     // Absolute cost decrease threshold
    pub min_rel_decrease: f64,     // Relative cost decrease threshold
    pub min_error: f64,            // Minimum cost to stop early
    pub initial_lambda: Option<f64>, // Initial damping (None = auto)
}

pub enum LinearSolverType {
    SparseCholesky,  // Default: fast for well-conditioned problems
    SparseQR,        // More robust for ill-conditioned problems
}
}

Choosing the Linear Solver

SparseCholesky (default): Factors the normal equations $J^{T} J + λ D = - J^{T} r$ directly. Fast but can fail if $J^{T} J$ is poorly conditioned.
SparseQR: Factors $J$ directly (QR decomposition). More robust but slower. Use when Cholesky fails or when the problem has near-singular directions.

Solution

#![allow(unused)]
fn main() {
pub struct BackendSolution {
    pub params: HashMap<String, DVector<f64>>,  // Optimized values by name
    pub report: SolveReport,
}

pub struct SolveReport {
    pub initial_cost: f64,
    pub final_cost: f64,
    pub iterations: usize,
    pub termination: TerminationReason,
}
}

The cost is $F = \frac{1}{2} \sum r_{i}^{2}$ (half sum of squared residuals). Problem-specific code extracts domain types (cameras, poses, planes) from the raw parameter vectors.

Typical Convergence

For a well-initialized planar intrinsics problem:

Initial cost: $\sim 1 0^{2}$ - $1 0^{4}$ (from linear initialization)
Final cost: $\sim 1 0^{- 2}$ - $1 0^{0}$ (sub-pixel residuals)
Iterations: 10-50 (depends on problem size and initial quality)
Termination: Usually relative decrease below threshold

Error Handling

The backend propagates errors for:

Missing initial values for a parameter block
Manifold dimension mismatch
Linear solver failure (singular system)
NaN/Inf in residual evaluation

Factor Catalog Reference

This chapter is a complete reference for the factor kinds (residual computations) supported by the optimization IR. A factor is described along three independent axes carried as data, not baked into variant names:

Camera model (CameraModelDesc) — which projection / distortion / sensor kernels apply and which leading parameter blocks exist.
Pose chain (ReprojChain / LaserChain) — how a target-frame point reaches the camera frame and which trailing parameter blocks exist.
Residual family (the FactorKind variant) — what is measured.

pub enum FactorKind {
    ReprojPoint       { model: CameraModelDesc, chain: ReprojChain, pw: [f64; 3], uv: [f64; 2], w: f64 },
    LaserPointToPlane { model: CameraModelDesc, chain: LaserChain, laser_pixel: [f64; 2], w: f64 },
    LaserLineDistance { model: CameraModelDesc, chain: LaserChain, laser_pixel: [f64; 2], w: f64 },
    Se3TangentPrior   { sqrt_info: [f64; 6] },
}

Camera-Model Descriptor

pub struct CameraModelDesc {
    pub projection: ProjectionKind, // Pinhole               → intrinsics block (dim 4)
    pub distortion: DistortionKind, // None | BrownConrady5  → distortion block (dim 0 | 5)
    pub sensor: SensorKind,         // None | Scheimpflug2   → sensor block (dim 0 | 2)
}

A slot with dimension 0 contributes no parameter block. The expected layout of every factor is therefore derived, never hand-listed:

[intrinsics, distortion?, sensor?, <chain blocks…>]

FactorKind::param_layout() returns this layout as Vec<ParamSlotSpec { dim, manifold, role }>, and ProblemIR::validate checks each referenced block against it. Common descriptors are available as constants: CameraModelDesc::PINHOLE4, ::PINHOLE4_DIST5, ::PINHOLE4_DIST5_SCHEIMPFLUG2.

Each descriptor maps to zero-sized kernel types (one per slot) whose static methods are generic over the autodiff scalar T: RealField. The backend matches the descriptor once per factor and monomorphizes the residual over the kernels — there is no per-evaluation dispatch on the camera-model axis. Adding a camera model means one descriptor variant, one kernel type, and one row in the backend dispatch table (see Adding a New Solver Backend).

Reprojection Chains

ReprojPoint computes the weighted pixel residual

$r = w \cdot (p_{obs} - π (θ, P_{c})) \in R^{2}$

where $π$ is the camera model selected by the descriptor and $P_{c}$ comes from the chain:

`ReprojChain::SinglePose`

Chain blocks: [camera_se3_target(7)]

$P_{c} = T_{C, T} \cdot P_{w}$

Use case: single-camera calibration (planar / Scheimpflug intrinsics, laserline device bundles).

`ReprojChain::TwoSe3`

Chain blocks: [extrinsics(7), pose(7)]

$P_{c} = T_{extr}^{- 1} \cdot T_{pose} \cdot P_{w}$

where extrinsics maps camera → rig and pose maps target → rig.

Use case: multi-camera rig extrinsics.

`ReprojChain::HandEye { base_se3_gripper, mode }`

Chain blocks: [extrinsics(7), handeye(7), target(7)]

Chain data: the measured robot pose $T_{B, G}$ and the [HandEyeMode].

Eye-in-hand: $T_{C, T} = T_{extr}^{- 1} \cdot T_{handeye}^{- 1} \cdot T_{B, G}^{- 1} \cdot T_{target}$

Eye-to-hand: $T_{C, T} = T_{extr}^{- 1} \cdot T_{handeye} \cdot T_{B, G} \cdot T_{target}$

For single-camera hand-eye, $T_{extr} = I$ .

Use case: hand-eye calibration (camera on robot arm, or rig observing a robot-mounted target).

`ReprojChain::HandEyeRobotDelta { base_se3_gripper, mode }`

Chain blocks: [extrinsics(7), handeye(7), target(7), robot_delta(6)]

Same as HandEye, but with a per-view se(3) tangent correction applied to the robot pose:

$T_{B, G}^{'} = exp (ξ_{delta}) \cdot T_{B, G}$

The correction is regularized by an Se3TangentPrior factor.

Use case: hand-eye calibration with imprecise robot kinematics.

Laser Factors

Both laser families have residual dimension 1. Their chains mirror the reprojection chains but end with the laser-plane blocks [plane_normal(3, S²), plane_distance(1)]; a robot-delta block, when present, always comes last:

LaserChain::SinglePose — blocks [camera_se3_target, plane_normal, plane_distance].
LaserChain::RigHandEye { base_se3_gripper, mode } — blocks [cam_se3_rig, handeye, target_ref, plane_normal, plane_distance]; the per-view pose is composed exactly like the hand-eye reprojection chain.
LaserChain::RigHandEyeRobotDelta { … } — adds robot_delta(6) last.

LaserPointToPlane

Computation:

Undistort the laser pixel to normalized coordinates (inverting sensor and distortion kernels)
Back-project to a ray in camera frame
Intersect the ray with the target plane (Z = 0 in the target frame)
Compute the 3D point in camera frame
Measure the signed distance from the 3D point to the laser plane

Residual: $r = w \cdot (\hat{n}^{T} P_{c} + d)$ — distance in meters.

LaserLineDistance

Computation:

Compute the 3D intersection line of laser plane and target plane (camera frame)
Project this line onto the $Z = 1$ normalized camera plane
Undistort the laser pixel to normalized coordinates
Measure the perpendicular distance from pixel to projected line (2D geometry)
Scale by $f_{x} \cdot f_{y}$ for pixel-comparable units

Residual: $r = w \cdot d_{⊥} \cdot f_{x} f_{y}$ — distance in effective pixels.

Advantages over LaserPointToPlane: residuals in pixel units (directly comparable to reprojection error) and simpler 2D geometry.

Default: this is the recommended laser residual type (LaserlineResidualType::LineDistNormalized).

Prior Factors

Se3TangentPrior

Parameters: [robot_delta(6)] (Euclidean se(3) tangent block)

Computation: element-wise scaling of the tangent block by the diagonal square-root information:

$r = diag (s) \cdot ξ \in R^{6}$

Use case: zero-mean Gaussian prior on robot pose corrections (robot_delta) to penalize deviations from the nominal robot kinematics.

Summary Table

Factor	Blocks	Res. dim	Units
`ReprojPoint`	camera blocks + 1–4 chain blocks	2	pixels
`LaserPointToPlane`	camera blocks + 3–6 chain blocks	1	meters
`LaserLineDistance`	camera blocks + 3–6 chain blocks	1	pixels
`Se3TangentPrior`	1	6	normalized

Camera blocks = 1 (intrinsics) + 1 if the descriptor has distortion + 1 if it has a sensor. For example, ReprojPoint with PINHOLE4_DIST5_SCHEIMPFLUG2 and HandEyeRobotDelta connects 7 blocks: [intrinsics, distortion, sensor, extrinsics, handeye, target, robot_delta].

Planar Intrinsics Calibration

This is the most common calibration workflow: estimate camera intrinsics and lens distortion from multiple views of a planar calibration board. It combines Zhang's linear initialization with Levenberg-Marquardt bundle adjustment.

Problem Formulation

Parameters:

Intrinsics: $K = (f_{x}, f_{y}, c_{x}, c_{y})$ — 4 scalars
Distortion: $d = (k_{1}, k_{2}, k_{3}, p_{1}, p_{2})$ — 5 scalars
Per-view poses: ${T_{v}}_{v = 1}^{M}$ — $M$ SE(3) transforms (6 DOF each)

Total: $9 + 6 M$ parameters.

Observations: For each view $v$ and board point $j$ :

Known 3D position $P_{j}$ (on the board, at $Z = 0$ )
Observed pixel $p_{v j}$

Objective: Minimize total reprojection error:

$K, d, {T_{v}} min v = 1 \sum M j = 1 \sum N_{v} ∥ π (K, d, T_{v}, P_{j}) - p_{v j} ∥^{2}$

where $π$ is the full camera projection pipeline: SE(3) transform → pinhole → distortion → intrinsics.

With robust loss $ρ$ :

$K, d, {T_{v}} min v = 1 \sum M j = 1 \sum N_{v} ρ (∥ π (K, d, T_{v}, P_{j}) - p_{v j} ∥)$

Two-Step Pipeline

Step 1: Linear Initialization (`step_init`)

Homographies: Compute $H_{v}$ for each view via DLT (from board points at $Z = 0$ to observed pixels)
Intrinsics: Estimate $K$ from homographies using Zhang's method, iteratively refined with distortion estimation (see Iterative Intrinsics)
Distortion: Estimate $(k_{1}, k_{2}, p_{1}, p_{2})$ from homography residuals (see Distortion Fit)
Poses: Decompose each homography to recover $T_{v}$ (see Pose from Homography)

After initialization, intrinsics are typically within 10-40% of the true values.

Constructs the optimization problem as IR:

Parameter blocks: "cam" (4D, Euclidean), "dist" (5D, Euclidean), "pose/0"..."pose/M-1" (7D, SE3)
Residual blocks: One ReprojPoint { model: PINHOLE4_DIST5, chain: SinglePose, .. } per observation (2D residual)
Backend: Levenberg-Marquardt via TinySolverBackend

After optimization, expect <2% intrinsics error and <1 px mean reprojection error.

Configuration

#![allow(unused)]
fn main() {
pub struct PlanarIntrinsicsConfig {
    // Initialization
    pub init_iterations: usize,        // Iterative intrinsics iterations (default: 2)
    pub fix_k3_in_init: bool,          // Fix k3 during init (default: true)
    pub fix_tangential_in_init: bool,  // Fix p1, p2 during init (default: false)
    pub zero_skew: bool,               // Enforce zero skew (default: true)

    // Optimization
    pub max_iters: usize,              // LM iterations (default: 100)
    pub verbosity: u32,                // Solver output level
    pub robust_loss: RobustLoss,          // Robust loss function (default: None)
    pub fix_intrinsics: IntrinsicsFixMask,  // Fix specific intrinsics
    pub fix_distortion: DistortionFixMask,  // Fix specific distortion params
    pub fix_poses: Vec<usize>,         // Fix specific view poses
}
}

Fix Masks

Fine-grained control over which parameters are optimized:

#![allow(unused)]
fn main() {
// Fix cx, cy but optimize fx, fy
session.update_config(|c| {
    c.fix_intrinsics = IntrinsicsFixMask {
        fx: false, fy: false, cx: true, cy: true,
    };
})?;

// Fix k3 and tangential distortion
session.update_config(|c| {
    c.fix_distortion = DistortionFixMask {
        k1: false, k2: false, k3: true, p1: true, p2: true,
    };
})?;
}

Complete Example

#![allow(unused)]
fn main() {
use vision_calibration::prelude::*;
use vision_calibration::planar_intrinsics::{step_init, step_optimize};

let mut session = CalibrationSession::<PlanarIntrinsicsProblem>::new();
session.set_input(dataset)?;

// Optional: customize configuration
session.update_config(|c| {
    c.max_iters = 50;
    c.robust_loss = RobustLoss::Huber { scale: 2.0 };
})?;

// Run pipeline
step_init(&mut session, None)?;

// Inspect initialization
let init_k = session.state.initial_intrinsics.as_ref().unwrap();
println!("Init fx={:.1}, fy={:.1}", init_k.fx, init_k.fy);

step_optimize(&mut session, None)?;

// Export results
let export = session.export()?;
let k = export.params.intrinsics();
println!("Final fx={:.1}, fy={:.1}", k.fx, k.fy);
println!("Reprojection error: {:.4} px", export.mean_reproj_error);
}

Filtering (Optional Step)

After optimization, views or individual observations with high reprojection error can be filtered out:

#![allow(unused)]
fn main() {
use vision_calibration::planar_intrinsics::{step_filter, FilterOptions};

let filter_opts = FilterOptions {
    max_reproj_error: 2.0,      // Remove observations > 2 px
    min_points_per_view: 10,     // Minimum points to keep a view
    remove_sparse_views: true,   // Drop views below threshold
};
step_filter(&mut session, filter_opts)?;

// Re-optimize with cleaned data
step_optimize(&mut session, None)?;
}

OpenCV equivalence: cv::calibrateCamera performs both initialization and optimization internally. calibration-rs separates these steps for inspection and customization.

Accuracy Expectations

Stage	Intrinsics error	Reprojection error
After `step_init`	10-40%	Not computed
After `step_optimize`	<2%	<1 px mean
After filtering + re-optimize	<1%	<0.5 px mean

Input Requirements

Minimum 3 views (for Zhang's method with skew)
Minimum 4 points per view (for homography estimation)
View diversity: Views should include rotation around both axes and vary in distance
Board coverage: Points should span the full image area for good distortion estimation

Planar Intrinsics with Real Data

[COLLAB] This chapter requires user collaboration for dataset-specific details, data collection advice, and practical tips.

This chapter walks through the planar_real example, which calibrates a camera from real chessboard images.

Dataset

The example uses images from data/stereo/imgs/leftcamera/:

Pattern: 7×11 chessboard, 30 mm square size
Images: Multiple PNG files (Im_L_*.png)
Camera: Standard industrial camera (specific model TBD)

Workflow

1. Corner Detection

The example uses the chess-corners crate for chessboard detection:

#![allow(unused)]
fn main() {
use chess_corners::{ChessboardParams, ChessConfig, detect_chessboard};

let board_params = ChessboardParams {
    rows: 7,
    cols: 11,
    square_size: 0.03, // 30 mm
};
}

Each image is processed independently. Failed detections are skipped — the workflow continues with the views that succeed.

2. Dataset Construction

Detected corners are assembled into CorrespondenceView structures with matched 3D board points:

#![allow(unused)]
fn main() {
let views: Vec<View<NoMeta>> = images
    .iter()
    .filter_map(|img| {
        let corners = detect_chessboard(img, &config)?;
        let obs = CorrespondenceView::new(board_3d.clone(), corners);
        Some(View::without_meta(obs))
    })
    .collect();
let dataset = PlanarDataset::new(views)?;
}

3. Calibration

The pipeline is identical to the synthetic case:

#![allow(unused)]
fn main() {
let mut session = CalibrationSession::<PlanarIntrinsicsProblem>::new();
session.set_input(dataset)?;
step_init(&mut session, None)?;
step_optimize(&mut session, None)?;
let export = session.export()?;
}

Interpreting Results

Key outputs to examine:

Focal lengths ( $f_{x}$ , $f_{y}$ ): Should match the expected value based on sensor size and lens focal length. For a 1/2" sensor with 8mm lens: $f \approx 800$ pixels.
Principal point ( $c_{x}$ , $c_{y}$ ): Should be near the image center. Large offsets may indicate a decentered lens.
Distortion ( $k_{1}$ , $k_{2}$ ): Negative $k_{1}$ indicates barrel distortion (common). $∣ k_{1} ∣ > 0.3$ suggests a wide-angle lens.
Reprojection error: <1 px is good. >2 px suggests problems with corner detection or insufficient view diversity.

Comparison with Synthetic Data

Aspect	Synthetic	Real
Corner accuracy	Exact (no detection noise)	~0.1-0.5 px (detector dependent)
Distortion	Known ground truth	Unknown, estimated
View coverage	Controlled	May have gaps
Typical reprojection error	<0.01 px	0.1-1.0 px

Practical Tips

Use 15-30 images with diverse viewpoints
Cover the full image area — points only near the center poorly constrain distortion
Include tilted views — not just frontal. Rotation around both axes constrains all intrinsic parameters
Check corner ordering — mismatched 2D-3D correspondences cause calibration failure
Start with fix_k3: true — only estimate $k_{3}$ if reprojection error is high and you suspect higher-order radial distortion

Running the Example

cargo run -p vision-calibration --example planar_real

The example prints per-step results including initialization accuracy, optimization convergence, and final calibrated parameters.

Single-Camera Hand-Eye Calibration

Hand-eye calibration estimates the rigid transform between a camera and a robot's gripper (or base). This workflow combines intrinsics calibration with hand-eye estimation in a 4-step pipeline.

Problem Formulation

Transformation Chain (Eye-in-Hand)

For a camera mounted on the robot gripper:

$T_{C, T} = T_{C, G} \cdot T_{G, B} \cdot T_{B, T}$

where:

$T_{C, G} = T_{handeye}^{- 1}$ : camera-to-gripper (the calibrated hand-eye transform, inverted)
$T_{G, B} = T_{robot}^{- 1}$ : gripper-to-base (known from robot kinematics, inverted)
$T_{B, T}$ : base-to-target (the target's pose in the robot base frame, also calibrated)

Equivalently:

$T_{C, T} = (T_{robot} \cdot T_{handeye})^{- 1} \cdot T_{B, T}$

Eye-to-Hand Variant

For a fixed camera observing a target on the gripper:

$T_{C, T} = T_{C, B} \cdot T_{B, G} \cdot T_{G, T}$

where $T_{C, B}$ is the camera-to-base transform and $T_{G, T}$ is the target-to-gripper transform.

Optimization Objective

Minimize reprojection error across all views, jointly over all parameters:

$K, d, X, T_{B, T} min v = 1 \sum M j = 1 \sum N_{v} π (K, d, T_{C, T}^{(v)}, P_{j}) - p_{v j}^{2}$

where $T_{C, T}^{(v)}$ depends on the known robot pose for view $v$ , the hand-eye transform $X$ , and the target pose $T_{B, T}$ .

4-Step Pipeline

Step 1: Intrinsics Initialization (`step_intrinsics_init`)

Runs the Iterative Intrinsics solver to estimate $K$ and distortion from the calibration board observations, ignoring robot poses entirely.

Step 2: Intrinsics Optimization (`step_intrinsics_optimize`)

Runs planar intrinsics bundle adjustment to refine $K$ , distortion, and per-view camera-to-target poses. This gives accurate per-view poses $T_{C, T}^{(v)}$ needed for hand-eye initialization.

Step 3: Hand-Eye Initialization (`step_handeye_init`)

Uses the Tsai-Lenz linear method (see Hand-Eye Linear):

Compute relative motions from robot poses and camera poses
Filter pairs by rotation magnitude (reject small rotations)
Estimate rotation via quaternion SVD
Estimate translation via linear least squares
Estimate $T_{B, T}$ from the hand-eye and camera poses

Step 4: Hand-Eye Optimization (`step_handeye_optimize`)

Joint non-linear optimization of all parameters:

Parameter blocks:

"cam" (4D, Euclidean): intrinsics
"dist" (5D, Euclidean): distortion
"handeye" (7D, SE3): gripper-to-camera transform
"target" (7D, SE3): base-to-target transform

Residual blocks: one ReprojPoint { model: PINHOLE4_DIST5, chain: HandEye { .. }, .. } per observation.

The robot poses are not optimized — they are known constants from the robot kinematics (passed as per-residual data).

If robot pose accuracy is suspect, the pipeline supports per-view SE(3) corrections:

Additional parameter blocks: "robot_delta/v" (7D, SE3) per view

Additional residuals: Se3TangentPrior per view, with configurable rotation and translation sigmas:

$r_{prior} = [ω / σ_{r} v / σ_{t}]$

This penalizes deviations from the nominal robot poses, allowing small corrections while preventing the optimizer from absorbing all error into robot pose changes.

Configuration

#![allow(unused)]
fn main() {
pub struct SingleCamHandeyeConfig {
    // Intrinsics init
    pub intrinsics_init_iterations: usize,
    pub fix_k3: bool,              // Fix k3 (default: true)
    pub fix_tangential: bool,      // Fix p1, p2 (default: false)
    pub zero_skew: bool,           // Enforce zero skew (default: true)

    // Hand-eye
    pub handeye_mode: HandEyeMode, // EyeInHand or EyeToHand
    pub min_motion_angle_deg: f64, // Filter small rotations in Tsai-Lenz

    // Optimization
    pub max_iters: usize,          // LM iterations (default: 100)
    pub verbosity: usize,
    pub robust_loss: RobustLoss,

    // Robot pose refinement
    pub refine_robot_poses: bool,  // Enable per-view corrections
    pub robot_rot_sigma: f64,      // Rotation prior sigma (radians)
    pub robot_trans_sigma: f64,    // Translation prior sigma (meters)
}
}

Input Format

Each view is a View<HandeyeMeta> (aliased as SingleCamHandeyeView):

#![allow(unused)]
fn main() {
pub struct HandeyeMeta {
    pub base_se3_gripper: Iso3,  // Robot pose (from kinematics)
}

// Type alias
pub type SingleCamHandeyeView = View<HandeyeMeta>;
// Where View<Meta> has fields: obs: CorrespondenceView, meta: Meta
}

Complete Example

#![allow(unused)]
fn main() {
use vision_calibration::prelude::*;
use vision_calibration::single_cam_handeye::*;

let views: Vec<SingleCamHandeyeView> = /* load data */;
let input = SingleCamHandeyeInput::new(views)?;

let mut session = CalibrationSession::<SingleCamHandeyeProblem>::new();
session.set_input(input)?;

step_intrinsics_init(&mut session, None)?;
step_intrinsics_optimize(&mut session, None)?;
step_handeye_init(&mut session, None)?;
step_handeye_optimize(&mut session, None)?;

let export = session.export()?;
println!("Hand-eye: {:?}", export.gripper_se3_camera);
println!("Target in base: {:?}", export.base_se3_target);
println!("Reprojection error: {:.4} px", export.mean_reproj_error);
}

Practical Considerations

Rotation diversity is critical for Tsai-Lenz initialization. Include rotations around at least 2 axes.
5-10 views minimum, with 10-20 recommended for robustness.
Robot accuracy: If the robot's reported poses have errors > 1mm or > 0.1°, enable robot pose refinement.
Initialization failure: If Tsai-Lenz produces a poor estimate (large reprojection error after optimization), the most likely cause is insufficient rotation diversity.

OpenCV equivalence: cv::calibrateHandEye provides the linear initialization step. calibration-rs adds joint non-linear refinement and optional robot pose correction.

Hand-Eye with KUKA Robot

[COLLAB] This chapter requires user collaboration for KUKA dataset details, data collection protocol, and practical tips.

This chapter walks through the handeye_session example, which performs hand-eye calibration using a real KUKA robot dataset.

Dataset

Located at data/kuka_1/:

Images: 30 PNG files (01.png through 30.png)
Robot poses: RobotPosesVec.txt — 30 poses in $4 \times 4$ matrix format
Calibration board: 17×28 chessboard, 20 mm square size

Data Loading

The example loads images and robot poses, detects chessboard corners, and constructs the input:

#![allow(unused)]
fn main() {
// Load robot poses from 4×4 matrix format
let robot_poses: Vec<Iso3> = load_robot_poses("data/kuka_1/RobotPosesVec.txt")?;

// Detect corners per image
let views: Vec<SingleCamHandeyeView> = images
    .iter()
    .zip(&robot_poses)
    .filter_map(|(img, pose)| {
        let corners = detect_chessboard(img, &config)?;
        Some(SingleCamHandeyeView {
            obs: CorrespondenceView::new(board_3d.clone(), corners),
            meta: HandeyeMeta { base_se3_gripper: *pose },
        })
    })
    .collect();
}

Typically 20-25 of the 30 images yield successful corner detections.

Running the Example

cargo run -p vision-calibration --example handeye_session

The example reports:

Dataset summary (total images, used views, skipped views)
Per-step results (initialization, optimization)
Final calibrated parameters (intrinsics, distortion, hand-eye transform, target pose)
Robot pose refinement deltas (if enabled)

Interpreting Results

Key outputs:

Hand-eye transform ( $T_{G, C}$ ): Translation magnitude should match the physical camera-to-gripper distance. Rotation should reflect the mounting orientation.
Target-in-base ( $T_{B, T}$ ): The calibration board's position in the robot base frame. Verify against the known physical setup.
Reprojection error: <1 px indicates a good calibration. >3 px suggests problems.
Robot pose deltas: If enabled, large deltas (>1° rotation, >5mm translation) suggest robot kinematic inaccuracies.

Common Failure Modes

Insufficient rotation diversity: All robot poses rotate around the same axis (e.g., only wrist rotation). The Tsai-Lenz initialization will fail or produce a poor estimate.
Incorrect pose convention: Robot poses must be $T_{B, G}$ (base-to-gripper). If the convention is inverted, the calibration will diverge.
Mismatched corner ordering: If the chessboard detector assigns corners in a different order than expected, the 2D-3D correspondences are wrong.
Robot pose timestamps: If images and robot poses are not synchronized, the calibration will fail.

Data Collection Recommendations

Rotation diversity: Include poses with significant roll, pitch, and yaw. Avoid pure translations or rotations around a single axis.
Target visibility: Ensure the calibration board is fully visible in all images. Partial visibility causes corner detection failure.
Stable poses: Take images when the robot is stationary. Motion blur degrades corner detection.
Coverage: Vary the distance to the board and the position within the image.

Multi-Camera Rig Extrinsics

A multi-camera rig is a set of cameras rigidly attached to a common frame. Rig extrinsics calibration estimates the relative pose of each camera within the rig, in addition to per-camera intrinsics and distortion.

Problem Formulation

Parameters

Per-camera intrinsics: ${K_{k}}$ — one per camera
Per-camera distortion: ${d_{k}}$ — one per camera
Per-camera extrinsics: ${T_{R, C_{k}}}$ — camera-to-rig transforms (one per camera, reference camera = identity)
Per-view rig poses: ${T_{R, T}^{(v)}}$ — rig-to-target transforms (one per view)

Transform Chain

For camera $k$ in view $v$ :

$T_{C_{k}, T}^{(v)} = T_{C_{k}, R} \cdot T_{R, T}^{(v)}$

where $T_{C_{k}, R} = (T_{R, C_{k}})^{- 1}$ .

Objective

$min k = 1 \sum C v = 1 \sum M j = 1 \sum N_{k v} π (K_{k}, d_{k}, T_{C_{k}, R} \cdot T_{R, T}^{(v)}, P_{j}) - p_{k v j}^{2}$

This jointly optimizes all cameras' parameters and all rig geometry. Each observation depends on two SE(3) transforms composed together (a ReprojPoint factor with the TwoSe3 chain).

4-Step Pipeline

Step 1: Per-Camera Intrinsics Initialization (`step_intrinsics_init_all`)

For each camera independently:

Compute homographies from the views where that camera has observations
Run iterative intrinsics estimation (Zhang + distortion fit)
Recover per-camera, per-view poses

Step 2: Per-Camera Intrinsics Optimization (`step_intrinsics_optimize_all`)

For each camera independently:

Run planar intrinsics bundle adjustment
Refine $K_{k}$ , $d_{k}$ , and per-view poses

Step 3: Rig Extrinsics Initialization (`step_rig_init`)

Uses the linear initialization from Multi-Camera Rig Initialization:

Define the rig frame by the reference camera ( $T_{R, C_{0}} = I$ )
For each non-reference camera, compute $T_{R, C_{k}}$ by averaging across views
Compute rig-to-target poses from the reference camera's poses

Step 4: Rig Bundle Adjustment (`step_rig_optimize`)

Joint optimization of all parameters:

Parameter blocks:

"cam/k" (4D): per-camera intrinsics
"dist/k" (5D): per-camera distortion
"extrinsics/k" (7D, SE3): per-camera camera-to-rig transforms
"rig_pose/v" (7D, SE3): per-view rig-to-target poses

Residual blocks: one ReprojPoint { model: PINHOLE4_DIST5, chain: TwoSe3, .. } per observation.

Fixed parameters: The reference camera's extrinsics are fixed at identity (gauge freedom).

Configuration

#![allow(unused)]
fn main() {
pub struct RigExtrinsicsConfig {
    // Per-camera intrinsics options
    pub intrinsics_init_iterations: usize,
    pub fix_k3: bool,
    pub fix_tangential: bool,
    pub zero_skew: bool,

    // Rig settings
    pub reference_camera_idx: usize,

    // Optimization
    pub max_iters: usize,
    pub verbosity: usize,
    pub robust_loss: RobustLoss,

    // Rig BA options
    pub refine_intrinsics_in_rig_ba: bool,
    pub fix_first_rig_pose: bool,
}
}

Input Format

#![allow(unused)]
fn main() {
pub struct RigDataset<Meta> {
    pub num_cameras: usize,
    pub views: Vec<RigView<Meta>>,
}

pub struct RigView<Meta> {
    pub obs: RigViewObs,  // Per-camera observations (Option per camera)
    pub meta: Meta,
}
}

Each view has an Option<CorrespondenceView> per camera — cameras that don't see the target in a given view have None.

Per-Camera Reprojection Error

After optimization, per-camera reprojection errors are computed independently:

#![allow(unused)]
fn main() {
let export = session.export()?;
for (cam_idx, error) in export.per_cam_reproj_errors.iter().enumerate() {
    println!("Camera {}: {:.4} px", cam_idx, error);
}
println!("Overall: {:.4} px", export.mean_reproj_error);
}

Stereo Baseline

For a 2-camera rig, the baseline (distance between camera centers) is:

$baseline = ∥ T_{R, C_{1}} . translation ∥$

since $T_{R, C_{0}} = I$ (camera 0 is the reference).

Complete Example

#![allow(unused)]
fn main() {
use vision_calibration::prelude::*;
use vision_calibration::rig_extrinsics::*;

let mut session = CalibrationSession::<RigExtrinsicsProblem>::new();
session.set_input(rig_dataset)?;

step_intrinsics_init_all(&mut session, None)?;
step_intrinsics_optimize_all(&mut session, None)?;
step_rig_init(&mut session)?;
step_rig_optimize(&mut session, None)?;

let export = session.export()?;
let baseline = export.cam_se3_rig[1].translation.vector.norm();
println!("Stereo baseline: {:.1} mm", baseline * 1000.0);
}

OpenCV equivalence: cv::stereoCalibrate for 2-camera rigs. calibration-rs generalizes to $N$ cameras.

Stereo Rig with Real Data

[COLLAB] This chapter requires user collaboration for dataset-specific details and verification strategies.

This chapter walks through the stereo_session example, which calibrates a stereo camera rig from synchronized left/right image pairs.

Dataset

Located at data/stereo/imgs/:

Left camera: leftcamera/Im_L_*.png
Right camera: rightcamera/Im_R_*.png
Pattern: 7×11 chessboard, 30 mm square size
Synchronized: Left and right images are captured simultaneously

Workflow

1. Load Stereo Pairs

The example loads images from both camera directories and pairs them by filename:

#![allow(unused)]
fn main() {
let board = ChessboardParams { rows: 7, cols: 11, square_size: 0.03 };
let views: Vec<RigView<NoMeta>> = image_pairs
    .iter()
    .filter_map(|(left_img, right_img)| {
        let left_corners = detect_chessboard(left_img, &config)?;
        let right_corners = detect_chessboard(right_img, &config)?;
        // Both cameras must detect corners for a valid view
        Some(RigView::new(vec![
            Some(CorrespondenceView::new(board_3d.clone(), left_corners)),
            Some(CorrespondenceView::new(board_3d.clone(), right_corners)),
        ]))
    })
    .collect();
}

2. Run 4-Step Pipeline

#![allow(unused)]
fn main() {
let mut session = CalibrationSession::<RigExtrinsicsProblem>::new();
session.set_input(RigDataset::new(views, 2)?)?;

step_intrinsics_init_all(&mut session, None)?;
step_intrinsics_optimize_all(&mut session, None)?;
step_rig_init(&mut session)?;
step_rig_optimize(&mut session, None)?;
}

Interpreting Results

Per-Camera Intrinsics

Both cameras should have similar focal lengths (if they use the same lens). Differences in principal point are normal.

Stereo Baseline

The extrinsics give the relative pose between cameras. For a horizontal stereo pair, expect:

Rotation: primarily around the Y axis (converging or parallel configuration)
Translation: primarily along X (horizontal baseline)
Baseline magnitude: should match the physical distance between cameras

Verification

Epipolar constraint: For correctly calibrated stereo, corresponding points should lie on epipolar lines. Measure the average distance from points to their epipolar lines.
Rectification: The stereo pair should rectify cleanly — horizontal scanlines in the rectified images should correspond.
Triangulation: Triangulate known board corners and verify the 3D distances match the board geometry.

Running the Example

cargo run -p vision-calibration --example stereo_session
cargo run -p vision-calibration --example stereo_session -- --max-views 15

The --max-views flag limits the number of image pairs processed, useful for faster iteration during development.

Data Collection Tips

Both cameras must see the board in each view. Discard pairs where only one camera detects corners.
Vary board orientation and distance — same advice as single-camera calibration.
Ensure synchronization — if images are not captured simultaneously, the board may have moved between left and right captures.
Overlap region — the board should be in the overlapping field of view of both cameras.

Stereo Rig with Real ChArUco Data

This chapter walks through the stereo_charuco_session example, which calibrates a stereo camera rig from synchronized ChArUco board images.

Dataset

The example uses data/stereo_charuco/:

cam1/Cam1_*.png for camera 0
cam2/Cam2_*.png for camera 1
Pairing by shared filename suffix (deterministic sorted intersection)

The loader ignores non-PNG files (for example Thumbs.db).

ChArUco Target Parameters

Board squares: 22 x 22
Cell size: 0.45 mm (0.00045 m)
Marker size scale: 0.75
Dictionary: DICT_4X4_1000
Layout: OpenCV ChArUco (OpenCvCharuco)

Detector Tuning

For this 2048x1536 dataset, the example increases ChArUco graph spacing:

params.graph.max_spacing_pix = 120.0

Without this, the default graph spacing may be too strict for this board scale in image space.

Workflow

The example follows the standard rig extrinsics 4-step session pipeline:

step_intrinsics_init_all
step_intrinsics_optimize_all
step_rig_init
step_rig_optimize

It also runs run_calibration as a convenience-path comparison.

Running the Example

cargo run -p vision-calibration --example stereo_charuco_session
cargo run -p vision-calibration --example stereo_charuco_session -- --max-views 8
cargo run -p vision-calibration --example stereo_charuco_session -- --max-views=15

Default: uses all detected stereo pairs
Optional --max-views caps the number of pairs used (deterministic prefix of sorted pairs)

Output Interpretation

The example prints:

Dataset summary (total_pairs, used_views, skipped_views, usable_left, usable_right)
Per-camera intrinsics and per-camera reprojection errors
Rig BA mean reprojection error and per-camera BA errors
Baseline magnitude in meters and millimeters

As with any real-data calibration, verify that:

Both cameras have enough usable views (at least 3 each)
Reprojection errors are within your application tolerance
Estimated baseline is physically plausible for the hardware setup

Multi-Camera Rig Hand-Eye

This is the most complex calibration workflow: a multi-camera rig mounted on a robot arm. It combines per-camera intrinsics calibration, rig extrinsics estimation, and hand-eye calibration in a 6-step pipeline.

Problem Formulation

Transformation Chain

For camera $k$ in view $v$ with robot pose $T_{B, G}^{(v)}$ :

$T_{C_{k}, T}^{(v)} = T_{C_{k}, R} \cdot T_{R, G}^{- 1} \cdot (T_{B, G}^{(v)})^{- 1} \cdot T_{B, T}$

where:

$T_{C_{k}, R}$ : camera $k$ to rig (rig extrinsics)
$T_{R, G} = T_{handeye}$ : rig to gripper (hand-eye transform)
$T_{B, G}^{(v)}$ : base to gripper (known robot pose for view $v$ )
$T_{B, T}$ : base to target (calibrated)

Parameters

Per-camera intrinsics: ${K_{k}}$ ( $4 C$ scalar parameters)
Per-camera distortion: ${d_{k}}$ ( $5 C$ scalar parameters)
Per-camera extrinsics: ${T_{C_{k}, R}}$ ( $6 (C - 1)$ DOF, reference camera = identity)
Hand-eye: $T_{R, G}$ (6 DOF)
Target pose: $T_{B, T}$ (6 DOF)
Optionally: per-view robot corrections ${Δ T_{v}}$ ( $6 M$ DOF, regularized)

6-Step Pipeline

Steps 1-2: Per-Camera Intrinsics

Same as Rig Extrinsics: initialize and optimize each camera's intrinsics independently.

Steps 3-4: Rig Extrinsics

Same as Rig Extrinsics: initialize camera-to-rig transforms via SE(3) averaging, then jointly optimize the rig geometry.

Step 5: Hand-Eye Initialization (`step_handeye_init`)

Uses Tsai-Lenz with the rig's reference camera poses and robot poses:

Extract relative camera motions from rig-to-target poses
Extract relative robot motions from base-to-gripper poses
Solve $A X = XB$ for $X = T_{G, R}$ (gripper-to-rig hand-eye)
Estimate target-in-base $T_{B, T}$

Step 6: Hand-Eye Optimization (`step_handeye_optimize`)

Joint optimization of all parameters:

Parameter blocks:

"cam/k" (4D): per-camera intrinsics
"dist/k" (5D): per-camera distortion
"extrinsics/k" (7D, SE3): per-camera camera-to-rig
"handeye" (7D, SE3): rig-to-gripper
"target" (7D, SE3): base-to-target

Factor: one ReprojPoint with the HandEye chain per observation, which composes the full transform chain.

Complete Example

#![allow(unused)]
fn main() {
use vision_calibration::prelude::*;
use vision_calibration::rig_handeye::*;

let mut session = CalibrationSession::<RigHandeyeProblem>::new();
session.set_input(rig_dataset_with_robot_poses)?;

// Per-camera calibration
step_intrinsics_init_all(&mut session, None)?;
step_intrinsics_optimize_all(&mut session, None)?;

// Rig geometry
step_rig_init(&mut session)?;
step_rig_optimize(&mut session, None)?;

// Hand-eye
step_handeye_init(&mut session, None)?;
step_handeye_optimize(&mut session, None)?;

let export = session.export()?;
println!("Mode: {:?}", export.handeye_mode);
if let Some(gripper_se3_rig) = export.gripper_se3_rig {
    println!("gripper_se3_rig: {:?}", gripper_se3_rig);
}
println!("Baseline: {:.1} mm",
    export.cam_se3_rig[1].translation.vector.norm() * 1000.0);
println!("Per-camera errors: {:?}", export.per_cam_reproj_errors);
}

Gauge Freedom

The system has a gauge freedom: the rig frame origin and the hand-eye transform are coupled. Fixing the reference camera's extrinsics at identity resolves this by defining the rig frame to coincide with camera 0.

Practical Considerations

All the advice from Single-Camera Hand-Eye applies, plus:

All cameras must observe the target in at least some views for the rig extrinsics to be well-constrained
Views where only some cameras see the target are handled (missing observations are skipped)
The hand-eye transform describes gripper-to-rig, not gripper-to-individual-camera. The per-camera offset comes from the rig extrinsics.

Laserline Device Calibration

Laserline calibration jointly estimates camera parameters and a laser plane from observations of both a calibration board and laser line projections on the board. This is used in laser triangulation systems where a camera and laser are rigidly mounted together.

Problem Formulation

Parameters

Camera intrinsics: $K = (f_{x}, f_{y}, c_{x}, c_{y})$
Distortion: $d = (k_{1}, k_{2}, k_{3}, p_{1}, p_{2})$
Sensor tilt: $(τ_{x}, τ_{y})$ (optional, for Scheimpflug cameras)
Per-view poses: ${T_{v}}$ (camera-to-target SE(3))
Laser plane: normal $\hat{n} \in S^{2}$ , distance $d \in R$

Observations

Each view provides two types of observations:

Calibration points: 2D-3D correspondences from the chessboard (same as planar intrinsics)
Laser pixels: 2D pixel positions where the laser line appears on the target

Objective

The cost function has two terms:

$F = w_{c} calib \sum ρ_{c} (∥ π (K, d, S, T_{v}, P_{j}) - p_{v j} ∥) + w_{l} laser \sum ρ_{l} (r_{laser} (\cdot))$

where $w_{c}, w_{l}$ are weights, $ρ_{c}, ρ_{l}$ are (possibly different) robust loss functions, and $r_{laser}$ is the laser residual.

Laser Residual Types

Two approaches are implemented for the laser residual, selectable via configuration.

PointToPlane (3D Distance)

Algorithm:

Undistort laser pixel to normalized coordinates
Back-project to a ray in camera frame
Intersect the ray with the target plane (at $Z_{target} = 0$ , transformed by pose $T_{v}$ ) to get a 3D point $P_{c}$
Compute signed distance from $P_{c}$ to the laser plane:

$r = w \cdot (\hat{n}^{T} P_{c} - d)$

Residual dimension: 1 (meters)

LineDistNormalized (2D Line Distance) — Default

Algorithm:

Compute the 3D intersection line of the laser plane and the target plane (in camera frame)
Project this 3D line onto the $Z = 1$ normalized camera plane
Undistort the laser pixel to normalized coordinates (done once, not per-iteration)
Measure the perpendicular distance from the undistorted pixel to the projected line
Scale by $f_{x} \cdot f_{y}$ for pixel-comparable units:

$r = w \cdot d_{⊥} \cdot f_{x} f_{y}$

Residual dimension: 1 (effective pixels)

Comparison

Property	PointToPlane	LineDistNormalized
Residual units	meters	pixels
Undistortion	Per-iteration	Once per pixel
Geometry	Ray-plane intersection	2D line distance
Speed	Slower	Faster
Recommended	Alternative	Default

Both approaches yield similar accuracy in practice (<6% intrinsics error, <5° plane normal error in synthetic tests).

Derivation: Line-Distance Residual

Plane-Plane Intersection Line

The laser plane $\hat{n}^{T} P = d$ and the target plane (normal $\hat{m}$ , distance $e$ from camera, derived from pose $T_{v}$ ) intersect in a 3D line with:

$direction: \hat{l} = \hat{n} \times \hat{m}$

$origin: solved from [\hat{n}^{T} \hat{m}^{T}] P_{0} = [d e]$

Projection onto Normalized Plane

Project the 3D line to the $Z = 1$ plane:

$p_{0} = point on line at Z = 1$ $\hat{d}_{2D} = direction projected to Z = 1 plane$

Perpendicular Distance

For an undistorted pixel $q$ in normalized coordinates, the perpendicular distance to the 2D line is:

$d_{⊥} = \frac{∣ ( q - p _{0} ) \times d ^ _{2D} ∣}{∣ d ^ _{2D} ∣}$

Configuration

#![allow(unused)]
fn main() {
pub struct LaserlineDeviceConfig {
    // Initialization
    pub init_iterations: usize,        // Iterative intrinsics iterations (default: 2)
    pub fix_k3_in_init: bool,          // Fix k3 during init (default: true)
    pub fix_tangential_in_init: bool,  // Fix p1, p2 during init (default: false)
    pub zero_skew: bool,               // Enforce zero skew (default: true)
    pub sensor_init: ScheimpflugParams, // Initial sensor tilt (default: identity)

    // Optimization
    pub max_iters: usize,              // LM iterations (default: 50)
    pub verbosity: usize,
    pub calib_loss: RobustLoss,        // Default: Huber { scale: 1.0 }
    pub laser_loss: RobustLoss,        // Default: Huber { scale: 0.01 }
    pub calib_weight: f64,             // Weight for calibration residuals (default: 1.0)
    pub laser_weight: f64,             // Weight for laser residuals (default: 1.0)
    pub fix_intrinsics: bool,
    pub fix_distortion: bool,
    pub fix_k3: bool,                  // Default: true
    pub fix_sensor: bool,              // Default: true
    pub fix_poses: Vec<usize>,         // Default: vec![0]
    pub fix_plane: bool,
    pub laser_residual_type: LaserlineResidualType, // Default: LineDistNormalized
}
}

Weight Balancing

Since calibration residuals (in pixels) and laser residuals may have different scales, the weights allow balancing their relative influence. A common starting point is calib_weight = 1.0 and laser_weight = 1.0, adjusting if one term dominates.

Complete Example

#![allow(unused)]
fn main() {
use vision_calibration::prelude::*;
use vision_calibration::laserline_device::*;

let mut session = CalibrationSession::<LaserlineDeviceProblem>::new();
session.set_input(laserline_input)?;

run_calibration(&mut session, None)?;

let export = session.export()?;
println!("Plane normal: {:?}", export.estimate.params.plane.normal);
println!("Plane distance: {:.4}", export.estimate.params.plane.distance);
println!("Reprojection error: {:.4} px", export.mean_reproj_error);
println!("Laser error: {:.4}", export.stats.mean_laser_error);
}

Scheimpflug Support

For laser profilers with tilted sensors, the sensor model parameters $(τ_{x}, τ_{y})$ are jointly optimized. The reprojection factor's PINHOLE4_DIST5_SCHEIMPFLUG2 camera-model descriptor selects the extended projection chain.

Laserline with Industrial Data

[COLLAB] This chapter requires user collaboration for industrial application context, data collection protocols, and accuracy requirements.

This chapter discusses practical considerations for laser triangulation calibration in industrial settings.

Industrial Context

Laser triangulation is widely used in industrial inspection:

Weld seam inspection: Verifying weld bead geometry (width, height, penetration)
3D surface profiling: Measuring surface topology for quality control
Gap and flush measurement: Checking assembly tolerances in automotive manufacturing
Tire and rubber inspection: Measuring tread depth and sidewall profiles

In each case, a laser line is projected onto the workpiece and a camera observes the deformed line. The calibrated laser plane, combined with camera parameters, allows converting pixel positions to metric 3D coordinates.

Accuracy Requirements

Application	Typical accuracy needed
Weld inspection	0.1-0.5 mm
Surface profiling	0.01-0.1 mm
Gap/flush	0.05-0.2 mm
Coarse 3D scanning	0.5-2 mm

Data Collection for Laser Calibration

Calibration Board Setup

The calibration board serves dual purpose:

Provides chessboard corners for camera intrinsics calibration
Provides a known plane to constrain the laser line observations

Capturing Views

For each view:

Position the calibration board at a known angle to the laser plane
Capture an image with both the chessboard pattern and the laser line visible
Record chessboard corners and laser line pixel positions separately

View Diversity

Board angles: Vary the board tilt so the laser-board intersection line covers different positions in the image
Board distances: Vary the distance to cover the working range
Minimum views: 5-10 views with diverse orientations

The `laserline_device_session` Example

cargo run -p vision-calibration --example laserline_device_session

This example uses synthetic data to demonstrate the workflow. For real data, the laser pixel extraction must be performed by an external algorithm (typically a peak detector applied to each image column).

Interpreting Results

Key outputs:

Laser plane normal: The direction the laser projects in. Should align with the physical mounting.
Laser plane distance: Distance from camera center to the laser plane. Should match the physical geometry.
Mean laser error: The average residual of laser observations. In pixel units for LineDistNormalized, in meters for PointToPlane.

DS8 Dataset

The data/DS8/ directory contains a real industrial laser triangulation dataset:

ExperimentDetails.txt: Experiment metadata
calibration_object.txt: Calibration pattern specification
images/: Captured images
robot_cali.txt: Robot calibration data (if applicable)

Troubleshooting

Poor laser plane estimate: Usually caused by insufficient view diversity or laser pixels too close to collinear (all from similar board angles)
High laser residuals: May indicate laser pixel detection errors, or that the laser is not well-described by a single plane (diverging laser, curved projection)
Scheimpflug convergence: If sensor tilt parameters diverge, try initializing with fix_sensor: true for the first few iterations, then release

CalibrationSession

CalibrationSession<P: ProblemType> is the central state container for calibration workflows. It holds all data for a calibration run — configuration, input observations, intermediate state, final output, and an audit log — with full JSON serialization for checkpointing.

Structure

#![allow(unused)]
fn main() {
pub struct CalibrationSession<P: ProblemType> {
    pub metadata: SessionMetadata,
    pub config: P::Config,
    pub state: P::State,
    pub exports: Vec<ExportRecord<P::Export>>,
    pub log: Vec<LogEntry>,
    // input and output are private — access via methods
}
}

Field	Access	Purpose
`metadata`	`pub`	Problem type name, schema version, timestamps
`config`	`pub`	Algorithm parameters (iterations, fix masks, loss functions)
input	`input()`, `set_input()`	Observation data (set once per calibration run)
`state`	`pub`	Mutable intermediate results (modified by step functions)
output	`output()`, `set_output()`	Final calibration result (set by the last step)
`exports`	`pub`	Timestamped export records
`log`	`pub`	Audit trail of operations

Lifecycle

1. Create

#![allow(unused)]
fn main() {
let mut session = CalibrationSession::<PlanarIntrinsicsProblem>::new();
// Or with a description:
let mut session = CalibrationSession::<PlanarIntrinsicsProblem>::with_description(
    "Lab camera calibration 2024-01-15"
);
}

2. Set Input

#![allow(unused)]
fn main() {
session.set_input(dataset)?;
}

Input is validated via ProblemType::validate_input(). Setting input clears computed state (per the invalidation policy).

3. Configure

#![allow(unused)]
fn main() {
session.update_config(|c| {
    c.max_iters = 50;
    c.robust_loss = RobustLoss::Huber { scale: 2.0 };
})?;
}

Configuration is validated via ProblemType::validate_config().

4. Run Steps

#![allow(unused)]
fn main() {
step_init(&mut session, None)?;
step_optimize(&mut session, None)?;
}

Step functions are free functions operating on &mut CalibrationSession<P>. Each step reads input/state, performs computation, and updates state (or output).

5. Export

#![allow(unused)]
fn main() {
let export = session.export()?;
}

Creates an ExportRecord with the current timestamp and output. Multiple exports can be created (e.g., after re-optimization with different settings).

JSON Checkpointing

The entire session can be serialized and restored:

#![allow(unused)]
fn main() {
// Save
let json = session.to_json()?;
std::fs::write("calibration.json", &json)?;

// Restore
let json = std::fs::read_to_string("calibration.json")?;
let restored = CalibrationSession::<PlanarIntrinsicsProblem>::from_json(&json)?;

// Resume from where we left off
step_optimize(&mut restored, None)?;
}

This enables:

Interruption recovery: Save progress and resume later
Reproducibility: Share exact calibration state
Debugging: Inspect the session at any point

Invalidation Policies

When input or configuration changes, computed state may need to be cleared:

Event	Default policy
`set_input()`	Clear state and output (`CLEAR_COMPUTED`)
`update_config()`	Keep everything (`KEEP_ALL`)
`clear_input()`	Clear everything (`CLEAR_ALL`)

These defaults can be overridden per problem type.

Audit Log

Every step function appends to the session log:

#![allow(unused)]
fn main() {
pub struct LogEntry {
    pub timestamp: u64,
    pub operation: String,
    pub success: bool,
    pub notes: Option<String>,
}
}

The log records what was done and when, useful for tracking calibration history.

Accessing State

#![allow(unused)]
fn main() {
// Input (required before steps)
let input = session.require_input()?;  // Returns error if no input

// Intermediate state (available after init)
if let Some(k) = &session.state.initial_intrinsics {
    println!("Init fx={:.1}", k.fx);
}

// Output (available after optimize)
let output = session.require_output()?;
}

ProblemType Trait

The ProblemType trait defines the interface for a calibration problem. It specifies the types for configuration, input, state, output, and export, along with validation and lifecycle hooks. The trait is intentionally minimal — behavior is implemented in external step functions, not trait methods.

Definition

#![allow(unused)]
fn main() {
pub trait ProblemType: Sized + 'static {
    type Config: Clone + Default + Serialize + DeserializeOwned + Debug;
    type Input: Clone + Serialize + DeserializeOwned + Debug;
    type State: Clone + Default + Serialize + DeserializeOwned + Debug;
    type Output: Clone + Serialize + DeserializeOwned + Debug;
    type Export: Clone + Serialize + DeserializeOwned + Debug;

    fn name() -> &'static str;
    fn schema_version() -> u32 { 1 }

    fn validate_input(input: &Self::Input) -> Result<()> { Ok(()) }
    fn validate_config(config: &Self::Config) -> Result<()> { Ok(()) }
    fn validate_input_config(
        input: &Self::Input, config: &Self::Config
    ) -> Result<()> { Ok(()) }

    fn on_input_change() -> InvalidationPolicy {
        InvalidationPolicy::CLEAR_COMPUTED
    }
    fn on_config_change() -> InvalidationPolicy {
        InvalidationPolicy::KEEP_ALL
    }

    fn export(
        output: &Self::Output, config: &Self::Config
    ) -> Result<Self::Export>;
}
}

Associated Types

Type	Purpose	Requirements
`Config`	Algorithm parameters	`Default` (sensible defaults)
`Input`	Observation data	Validated on set
`State`	Intermediate results	`Default` (empty initial state)
`Output`	Final calibration result	Set by last step
`Export`	User-facing result	Created from Output + Config

The Serialize + DeserializeOwned bounds enable JSON checkpointing. Clone enables snapshot operations.

Required Method: `name()`

Returns a stable string identifier for the problem type:

#![allow(unused)]
fn main() {
fn name() -> &'static str { "planar_intrinsics_v2" }
}

This is stored in session metadata and used for deserialization. Changing the name breaks compatibility with existing checkpoints.

Required Method: `export()`

Converts the internal output to a user-facing export format:

#![allow(unused)]
fn main() {
fn export(output: &Self::Output, config: &Self::Config) -> Result<Self::Export>;
}

The export may transform, filter, or enrich the output. For example, PlanarIntrinsicsProblem::export() computes reprojection error metrics from the raw optimization output.

Validation Hooks

Three optional validation hooks run at specific points:

Hook	When it runs
`validate_input(input)`	On `session.set_input(input)`
`validate_config(config)`	On `session.set_config(config)` or `update_config()`
`validate_input_config(input, config)`	When both input and config are present

Example: PlanarIntrinsicsProblem validates that input has ≥ 3 views with ≥ 4 points each.

Invalidation Policies

Control what happens when input or config changes:

#![allow(unused)]
fn main() {
pub struct InvalidationPolicy {
    pub clear_state: bool,
    pub clear_output: bool,
    pub clear_exports: bool,
}

impl InvalidationPolicy {
    pub const KEEP_ALL: Self = ...;       // Nothing cleared
    pub const CLEAR_COMPUTED: Self = ...; // State + output cleared
    pub const CLEAR_ALL: Self = ...;      // Everything cleared
}
}

Defaults:

Input change → CLEAR_COMPUTED (re-running steps is needed)
Config change → KEEP_ALL (output is still valid, but may not reflect new config)

Existing Problem Types

Type	Name	Steps
`PlanarIntrinsicsProblem`	`"planar_intrinsics_v2"`	init → optimize
`SingleCamHandeyeProblem`	`"single_cam_handeye"`	intrinsics_init → intrinsics_optimize → handeye_init → handeye_optimize
`RigExtrinsicsProblem`	`"rig_extrinsics"`	intrinsics_init_all → intrinsics_optimize_all → rig_init → rig_optimize
`RigHandeyeProblem`	`"rig_handeye"`	6 steps (intrinsics + rig + handeye)
`LaserlineDeviceProblem`	`"laserline_device"`	init → optimize

Design Philosophy

The trait is minimal by design:

No step methods: Steps are free functions, not trait methods. This allows flexible composition and avoids trait object limitations.
No algorithm logic: The trait defines data types and validation, not computation.
Serialization-first: All types are serializable, enabling checkpointing from day one.

Step Functions vs Pipeline Functions

calibration-rs offers two ways to run calibration workflows: step functions for granular control and pipeline functions for convenience.

Step Functions

Step functions are free functions that operate on a mutable session reference:

#![allow(unused)]
fn main() {
pub fn step_init(
    session: &mut CalibrationSession<PlanarIntrinsicsProblem>,
    opts: Option<IntrinsicsInitOptions>,
) -> Result<()>
}

Each step:

Reads input and/or state from the session
Performs one phase of the calibration (e.g., initialization or optimization)
Updates the session state (and possibly output)
Logs the operation

Advantages

Intermediate inspection: Examine the state between steps.

#![allow(unused)]
fn main() {
step_init(&mut session, None)?;

// Inspect initialization quality before committing to optimization
let init_k = session.state.initial_intrinsics.as_ref().unwrap();
if (init_k.fx - expected_fx).abs() / expected_fx > 0.5 {
    eprintln!("Warning: init fx={:.0} is far from expected {:.0}",
              init_k.fx, expected_fx);
}

step_optimize(&mut session, None)?;
}

Per-step configuration: Override options for individual steps.

#![allow(unused)]
fn main() {
// Use more iterations for optimization
let opts = IntrinsicsOptimizeOptions {
    max_iters: Some(200),
    ..Default::default()
};
step_optimize(&mut session, Some(opts))?;
}

Selective execution: Skip steps or re-run specific steps.

#![allow(unused)]
fn main() {
// Re-run optimization with different settings (without re-initializing)
session.update_config(|c| c.robust_loss = RobustLoss::Cauchy { scale: 3.0 })?;
step_optimize(&mut session, None)?;
}

Checkpointing: Save and restore between steps.

#![allow(unused)]
fn main() {
step_init(&mut session, None)?;
let checkpoint = session.to_json()?;
std::fs::write("after_init.json", &checkpoint)?;

step_optimize(&mut session, None)?;
}

Pipeline Functions

Pipeline functions chain all steps into a single call:

#![allow(unused)]
fn main() {
pub fn run_calibration(
    session: &mut CalibrationSession<PlanarIntrinsicsProblem>,
) -> Result<()> {
    step_init(session, None)?;
    step_optimize(session, None)?;
    Ok(())
}
}

When to Use

Quick prototyping: Get results with minimal code
Default settings: When the defaults work and you don't need inspection
Scripts and automation: When human inspection is not needed

#![allow(unused)]
fn main() {
let mut session = CalibrationSession::<PlanarIntrinsicsProblem>::new();
session.set_input(dataset)?;
run_calibration(&mut session)?;
let export = session.export()?;
}

Available Pipeline Functions

Problem	Pipeline function	Steps
`PlanarIntrinsicsProblem`	`run_calibration()`	init → optimize
`SingleCamHandeyeProblem`	`single_cam_handeye::run_calibration()`	4 steps
`RigExtrinsicsProblem`	`rig_extrinsics::run_calibration()`	4 steps
`RigHandeyeProblem`	`rig_handeye::run_calibration()`	6 steps
`LaserlineDeviceProblem`	`run_calibration(session, config)`	init → optimize
`ScheimpflugIntrinsicsProblem`	`scheimpflug_intrinsics::run_calibration(session, config)`	init → optimize

Recommendation

Use step functions for production calibration where you need to:

Verify initialization quality
Adjust parameters between steps
Handle failures gracefully
Log and audit the process

Use pipeline functions for:

Examples and tutorials
Batch processing with known-good settings
Quick experiments

Adding a New Optimization Problem

This chapter walks through adding a new optimization problem to vision-calibration-optim, following the pattern established by the planar intrinsics implementation.

Overview

Most new problems compose the existing factor families (ReprojPoint, LaserPointToPlane, LaserLineDistance, Se3TangentPrior) with a CameraModelDesc and a chain — in that case skip straight to Step 3 and emit the existing factors from your builder. The steps below cover the rarer case of a genuinely new residual family:

Define the generic residual function in factors/
Add a FactorKind variant in ir/types.rs
Create a problem builder in problems/
Integrate with the backend in backend/tiny_solver_backend.rs
Write tests with synthetic ground truth

(Adding a new camera model is a different, smaller recipe: one descriptor enum variant + one kernel type + one dispatch-table row; see ADR 0020 and the Factor Catalog.)

Step 1: Generic Residual Function

Create a new file or add to an existing file in crates/vision-calibration-optim/src/factors/:

#![allow(unused)]
fn main() {
pub fn my_residual_generic<T: RealField>(
    param_a: DVectorView<'_, T>,  // optimizable parameters
    param_b: DVectorView<'_, T>,
    constant_data: [f64; 3],      // per-residual constants (f64, not T)
    w: f64,                        // weight
) -> SVector<T, 2> {              // residual dimension
    // Convert constants to T
    let cx = T::from_f64(constant_data[0]).unwrap();

    // Compute residual using param_a, param_b, cx
    // Use .clone() liberally, avoid in-place ops
    let r_x = /* ... */;
    let r_y = /* ... */;

    let wt = T::from_f64(w).unwrap();
    SVector::<T, 2>::new(r_x * wt.clone(), r_y * wt)
}
}

Key rules:

Generic over T: RealField for autodiff
Optimizable parameters as DVectorView<'_, T>
Constants as f64 (converted inside)
Use .clone(), no in-place mutation
Return SVector<T, N> for fixed residual dimension

Step 2: FactorKind Variant

Add a new variant to FactorKind in crates/vision-calibration-optim/src/ir/types.rs:

#![allow(unused)]
fn main() {
pub enum FactorKind {
    // ... existing variants ...
    MyNewFactor {
        constant_data: [f64; 3],
        w: f64,
    },
}
}

Describe its parameter layout in param_layout() — validation is derived from the layout, so there is no separate validation arm to write:

#![allow(unused)]
fn main() {
FactorKind::MyNewFactor { .. } => vec![
    ParamSlotSpec { dim: 4, manifold: ManifoldKind::Euclidean, role: "param_a" },
    ParamSlotSpec { dim: 7, manifold: ManifoldKind::SE3, role: "param_b" },
],
}

Update the residual_dim() and name() methods:

#![allow(unused)]
fn main() {
FactorKind::MyNewFactor { .. } => 2,
}

Step 3: Problem Builder

Create crates/vision-calibration-optim/src/problems/my_problem.rs:

#![allow(unused)]
fn main() {
pub struct MyProblemParams {
    pub param_a: DVector<f64>,
    pub param_b: Iso3,
}

pub struct MySolveOptions {
    pub fix_param_a: FixedMask,
    pub robust_loss: RobustLoss,
}

pub fn build_my_problem_ir(
    data: &MyDataset,
    initial: &MyProblemParams,
    opts: &MySolveOptions,
) -> Result<(ProblemIR, HashMap<String, DVector<f64>>)> {
    let mut ir = ProblemIR::new();
    let mut initial_values = HashMap::new();

    // Add parameter blocks via the builder API
    let param_a_id = ir.add_param_block(
        "param_a", 4, ManifoldKind::Euclidean,
        opts.fix_param_a.clone(), None,
    );
    initial_values.insert("param_a".to_string(), initial.param_a.clone());

    let param_b_id = ir.add_param_block(
        "param_b", 7, ManifoldKind::SE3,
        FixedMask::all_free(), None,
    );
    initial_values.insert(
        "param_b".to_string(), iso3_to_se3_dvec(&initial.param_b),
    );

    // Add residual blocks
    for obs in &data.observations {
        ir.add_residual_block(ResidualBlock {
            params: vec![param_a_id, param_b_id],
            loss: opts.robust_loss,
            factor: FactorKind::MyNewFactor {
                constant_data: obs.constant_data,
                w: obs.weight,
            },
            residual_dim: 2,
        });
    }

    Ok((ir, initial_values))
}
}

Step 4: Backend Integration

In crates/vision-calibration-optim/src/backend/tiny_solver_backend.rs, add a factor struct implementing tiny-solver's Factor<T> plus a match arm in compile_factor():

#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
struct TinyMyNewFactor {
    constant_data: [f64; 3],
    w: f64,
}

impl<T: nalgebra::RealField> Factor<T> for TinyMyNewFactor {
    fn residual_func(&self, params: &[DVector<T>]) -> DVector<T> {
        let r = my_residual_generic(
            params[0].as_view(),
            params[1].as_view(),
            self.constant_data,
            self.w,
        );
        DVector::from_row_slice(r.as_slice())
    }
}

// in compile_factor():
FactorKind::MyNewFactor { constant_data, w } => Ok((
    Box::new(TinyMyNewFactor { constant_data: *constant_data, w: *w }),
    loss,
)),
}

If the new family is camera-model-aware, make the struct generic over the kernel types and construct it through the dispatch_camera_model! table the way TinyReprojFactor<P, D, S> is.

Step 5: Tests

Write a test with synthetic ground truth in crates/vision-calibration-optim/tests/:

#![allow(unused)]
fn main() {
#[test]
fn my_problem_converges() {
    // 1. Define ground truth parameters
    let gt_param_a = /* ... */;
    let gt_param_b = /* ... */;

    // 2. Generate synthetic observations
    let data = generate_synthetic_data(&gt_param_a, &gt_param_b);

    // 3. Create perturbed initial values
    let initial = perturb(&gt_param_a, &gt_param_b);

    // 4. Build and solve
    let (ir, init_vals) = build_my_problem_ir(&data, &initial, &opts)?;
    let solution = solve_with_backend(
        BackendKind::TinySolver, &ir, &init_vals, &backend_opts,
    )?;

    // 5. Verify convergence
    let solved_a = &solution.params["param_a"];
    assert!((solved_a[0] - gt_param_a[0]).abs() < tolerance);
}
}

Checklist

Generic residual function with T: RealField
FactorKind variant with validation
Problem builder producing ProblemIR + initial values
Backend compilation for the new factor
Synthetic ground truth test verifying convergence
(Optional) Pipeline integration with session framework

Adding a New Pipeline Problem Type

This chapter describes how to create a new session-based calibration workflow in vision-calibration-pipeline, using the laserline device module as a template.

Module Structure

Create a new folder under crates/vision-calibration-pipeline/src/:

my_problem/
├── mod.rs         # Module re-exports
├── problem.rs     # ProblemType implementation + Config
├── state.rs       # Intermediate state type
└── steps.rs       # Step functions + pipeline function

Step 1: Define the Problem Type (`problem.rs`)

#![allow(unused)]
fn main() {
use crate::session::{ProblemType, InvalidationPolicy};

pub struct MyProblem;

#[derive(Clone, Default, Debug, Serialize, Deserialize)]
pub struct MyConfig {
    pub max_iters: usize,
    pub fix_k3: bool,
    // ... other configuration
}

#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct MyInput {
    pub views: Vec<MyView>,
}

#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct MyOutput {
    pub calibrated_params: CameraParams,
    pub mean_reproj_error: f64,
}

#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct MyExport {
    pub params: CameraParams,
    pub mean_reproj_error: f64,
}

impl ProblemType for MyProblem {
    type Config = MyConfig;
    type Input = MyInput;
    type State = MyState;  // defined in state.rs
    type Output = MyOutput;
    type Export = MyExport;

    fn name() -> &'static str { "my_problem_v1" }

    fn validate_input(input: &MyInput) -> Result<()> {
        ensure!(input.views.len() >= 3, "Need at least 3 views");
        Ok(())
    }

    fn on_input_change() -> InvalidationPolicy { InvalidationPolicy::CLEAR_COMPUTED }
    fn on_config_change() -> InvalidationPolicy { InvalidationPolicy::KEEP_ALL }

    fn export(output: &MyOutput, _config: &MyConfig) -> Result<MyExport> {
        Ok(MyExport {
            params: output.calibrated_params.clone(),
            mean_reproj_error: output.mean_reproj_error,
        })
    }
}
}

Step 2: Define the State (`state.rs`)

#![allow(unused)]
fn main() {
#[derive(Clone, Default, Debug, Serialize, Deserialize)]
pub struct MyState {
    // Initialization results
    pub initial_intrinsics: Option<FxFyCxCySkew<f64>>,
    pub initial_distortion: Option<BrownConrady5<f64>>,
    pub initial_poses: Option<Vec<Iso3>>,

    // Optimization results
    pub final_cost: Option<f64>,
    pub mean_reproj_error: Option<f64>,
}
}

The state must implement Default (empty state) and Clone + Serialize + Deserialize (for checkpointing).

Step 3: Implement Step Functions (`steps.rs`)

#![allow(unused)]
fn main() {
use crate::session::CalibrationSession;
use super::problem::MyProblem;

pub fn step_init(
    session: &mut CalibrationSession<MyProblem>,
    _opts: Option<&()>,
) -> Result<()> {
    let input = session.require_input()?;
    let config = &session.config;

    // Run linear initialization
    let intrinsics = /* ... */;
    let poses = /* ... */;

    // Update state
    session.state.initial_intrinsics = Some(intrinsics);
    session.state.initial_poses = Some(poses);

    // Log
    session.log_success("step_init", Some("Initialization complete"));
    Ok(())
}

pub fn step_optimize(
    session: &mut CalibrationSession<MyProblem>,
    _opts: Option<&()>,
) -> Result<()> {
    let input = session.require_input()?;
    let config = &session.config;

    // Require initialization
    let init_k = session.state.initial_intrinsics.as_ref()
        .context("Run step_init first")?;

    // Build and solve optimization problem
    let result = /* ... */;

    // Update state and output
    session.state.final_cost = Some(result.cost);
    session.state.mean_reproj_error = Some(result.reproj_error);
    session.set_output(MyOutput {
        calibrated_params: result.params,
        mean_reproj_error: result.reproj_error,
    })?;

    session.log_success("step_optimize", Some("Optimization complete"));
    Ok(())
}

/// Convenience pipeline function
pub fn run_calibration(
    session: &mut CalibrationSession<MyProblem>,
) -> Result<()> {
    step_init(session, None)?;
    step_optimize(session, None)?;
    Ok(())
}
}

Step 4: Module Re-exports (`mod.rs`)

#![allow(unused)]
fn main() {
mod problem;
mod state;
mod steps;

pub use problem::{MyProblem, MyConfig, MyInput, MyOutput, MyExport};
pub use state::MyState;
pub use steps::{step_init, step_optimize, run_calibration};
}

Step 5: Register in the Pipeline Crate

In crates/vision-calibration-pipeline/src/lib.rs:

#![allow(unused)]
fn main() {
pub mod my_problem;
}

Step 6: Wire into the Facade Crate

In crates/vision-calibration/src/lib.rs:

#![allow(unused)]
fn main() {
pub mod my_problem {
    pub use vision_calibration_pipeline::my_problem::*;
}
}

Do not add new workflows to the prelude by default; keep the prelude minimal for planar hello-world usage.

Testing

Write an integration test in crates/vision-calibration-pipeline/tests/:

#![allow(unused)]
fn main() {
#[test]
fn my_problem_session_workflow() -> Result<()> {
    let input = generate_synthetic_input();

    let mut session = CalibrationSession::<MyProblem>::new();
    session.set_input(input)?;

    step_init(&mut session, None)?;
    assert!(session.state.initial_intrinsics.is_some());

    step_optimize(&mut session, None)?;
    assert!(session.output().is_some());

    // Test JSON round-trip
    let json = session.to_json()?;
    let restored = CalibrationSession::<MyProblem>::from_json(&json)?;
    assert!(restored.output.is_some());

    Ok(())
}
}

Adding a New Solver Backend

The backend-agnostic IR design allows adding new optimization backends without modifying problem definitions. This chapter describes what a backend must implement and how to integrate it.

The `OptimBackend` Trait

#![allow(unused)]
fn main() {
pub trait OptimBackend {
    fn solve(
        &self,
        ir: &ProblemIR,
        initial_params: &HashMap<String, DVector<f64>>,
        opts: &BackendSolveOptions,
    ) -> Result<BackendSolution>;
}
}

A backend receives:

ir: The problem structure (parameter blocks, residual blocks, factor kinds)
initial_params: Initial values for all parameter blocks (keyed by name)
opts: Solver options (max iterations, tolerances, verbosity)

And returns:

BackendSolution: Optimized parameter values (keyed by parameter name) and a solve report

What a Backend Must Handle

1. Parameter Blocks

For each ParamBlock, the backend must:

Allocate storage for a parameter vector of the given dimension
Initialize from the provided initial values
Apply the manifold (if not Euclidean)
Respect the fixed mask (hold specified indices constant)
Apply box constraints (if bounds are specified)

2. Manifold Constraints

The backend must implement the plus ( $\oplus$ ) and minus ( $⊖$ ) operations for each ManifoldKind:

Manifold	Ambient dim	Tangent dim	Plus operation
`Euclidean`	$n$	$n$	$x + δ$
`SE3`	7	6	$exp (ξ) \cdot T$
`SO3`	4	3	$exp ([ω]_{\times}) \cdot R$
`S2`	3	2	Retract via tangent plane basis

3. Residual Evaluation

For each ResidualBlock, the backend must:

Call the appropriate residual function based on FactorKind
Pass the correct parameter block values (referenced by ParamId)
Include per-residual constant data (3D points, observed pixels, weights)
Compute Jacobians (via autodiff or finite differences)

4. Robust Loss Functions

The backend must apply the RobustLoss to each residual:

None → standard squared loss
Huber { scale } → Huber loss with the given scale
Cauchy { scale } → Cauchy loss
Arctan { scale } → Arctan loss

5. Solution Extraction

Return optimized values as a HashMap<String, DVector<f64>> keyed by parameter block name (not ID).

Implementation Pattern

A typical backend has two phases:

Compile Phase

Translate the IR into solver-specific data structures:

#![allow(unused)]
fn main() {
fn compile(&self, ir: &ProblemIR) -> SolverProblem {
    for param in &ir.params {
        // Create solver parameter with manifold and fixing
    }
    for residual in &ir.residuals {
        // Create solver cost function from FactorKind
    }
}
}

Solve Phase

Run the optimizer and extract results:

#![allow(unused)]
fn main() {
fn solve(&self, problem: SolverProblem, opts: &BackendSolveOptions)
    -> BackendSolution
{
    // Set convergence criteria from opts
    // Run optimization loop
    // Extract final parameter values
    // Build SolveReport
}
}

Potential Backends

Backend	Description	Advantages
tiny-solver	Current. Rust-native LM.	Pure Rust, no external deps
Ceres-RS	Rust bindings to Google Ceres	Battle-tested, many features
Custom GN	Hand-written Gauss-Newton	Full control, educational
L-BFGS	Quasi-Newton for large problems	Memory-efficient

Testing

A new backend should pass the same convergence tests as the existing backend:

#![allow(unused)]
fn main() {
#[test]
fn new_backend_planar_converges() {
    // Same synthetic data and initial values as tiny-solver tests
    let (ir, init) = build_planar_test_problem();

    let solution = MyNewBackend.solve(&ir, &init, &opts)?;

    // Verify same convergence quality
    assert!(solution.report.final_cost < 1e-4);
}
}

Run the full test suite with both backends to ensure equivalent results.

Synthetic Data Generation

[COLLAB] This appendix benefits from user collaboration on recommended noise levels and realistic test scenarios.

calibration-rs provides utilities for generating synthetic calibration data, used in examples and tests. Synthetic data allows testing with known ground truth, verifying convergence, and debugging algorithm issues.

Board Point Generation

#![allow(unused)]
fn main() {
use vision_calibration::synthetic::planar;

// 8×6 grid of 3D points at Z=0, spaced 40mm apart
let board_points = planar::grid_points(8, 6, 0.04);
// Returns Vec<Pt3>: [(0,0,0), (0.04,0,0), ..., (0.28,0.20,0)]
}

Points are generated in the $Z = 0$ plane with the specified column count, row count, and spacing (in meters).

Pose Generation

#![allow(unused)]
fn main() {
// 6 camera poses with varying yaw and distance
let poses = planar::poses_yaw_y_z(
    6,      // number of views
    -0.2,   // start yaw (radians)
    0.08,   // yaw step
    0.5,    // start Z distance (meters)
    0.05,   // Z step
);
// Returns Vec<Iso3>: camera-to-board transforms
}

Poses are generated looking at the board center, with rotation around the Y axis (yaw) and varying distance along Z.

Projection

#![allow(unused)]
fn main() {
// Project board points through all poses
let views = planar::project_views_all(&camera, &board_points, &poses)?;
// Returns Vec<CorrespondenceView>: 2D-3D point pairs per view
}

This applies the full camera model (projection + distortion + intrinsics) to generate pixel observations.

Adding Noise

For realistic testing, add Gaussian noise to the projected pixels:

#![allow(unused)]
fn main() {
use rand::Rng;
let mut rng = rand::thread_rng();
let noise_sigma = 0.5; // pixels

for view in &mut views {
    for pixel in view.points_2d.iter_mut() {
        pixel.x += rng.gen::<f64>() * noise_sigma;
        pixel.y += rng.gen::<f64>() * noise_sigma;
    }
}
}

Typical noise levels:

Scenario	Noise sigma (px)
Ideal (testing convergence)	0.0
High-quality detector	0.1 - 0.3
Standard detector	0.3 - 1.0
Noisy conditions	1.0 - 3.0

Deterministic Seeds

All examples and tests use fixed random seeds for reproducibility:

#![allow(unused)]
fn main() {
use rand::SeedableRng;
let rng = rand::rngs::StdRng::seed_from_u64(42);
}

Data Types Quick Reference

Core Math Types

All based on nalgebra with f64 precision:

Type	Definition	Description
`Real`	`f64`	Scalar type
`Pt2`	`Point2<f64>`	2D point
`Pt3`	`Point3<f64>`	3D point
`Vec2`	`Vector2<f64>`	2D vector
`Vec3`	`Vector3<f64>`	3D vector
`Mat3`	`Matrix3<f64>`	3×3 matrix
`Mat4`	`Matrix4<f64>`	4×4 matrix
`Iso3`	`Isometry3<f64>`	SE(3) rigid transform

Camera Model Types

Type	Parameters	Description
`FxFyCxCySkew<S>`	`fx, fy, cx, cy, skew`	Intrinsics matrix
`BrownConrady5<S>`	`k1, k2, k3, p1, p2, iters`	Distortion model
`Pinhole`	(none)	Projection model
`IdentitySensor`	(none)	No sensor tilt
`ScheimpflugParams`	`tilt_x, tilt_y`	Scheimpflug tilt
`Camera<S,P,D,Sm,K>`	`proj, dist, sensor, k`	Composable camera

Observation Types

Type	Fields	Description
`CorrespondenceView`	`points_3d, points_2d, weights`	2D-3D correspondences
`View<Meta>`	`obs, meta`	Observation + metadata
`PlanarDataset`	`views: Vec<View<NoMeta>>`	Planar calibration input
`RigView<Meta>`	`obs: RigViewObs, meta`	Multi-camera view
`RigDataset<Meta>`	`num_cameras, views`	Multi-camera input
`ReprojectionStats`	`mean, rms, max, count`	Error statistics

Fix Masks

Type	Fields	Description
`IntrinsicsFixMask`	`fx, fy, cx, cy` (bool each)	Fix individual intrinsics
`DistortionFixMask`	`k1, k2, k3, p1, p2` (bool each)	Fix individual distortion params
`CameraFixMask`	`intrinsics, distortion`	Combined camera fix mask
`FixedMask`	opaque (`all_free()`, `all_fixed(dim)`, `fix_indices(&[usize])`)	IR-level parameter fixing

Optimization IR Types

Type	Description
`ProblemIR`	Complete optimization problem
`ParamBlock`	Parameter group (id, name, dim, manifold, fixed)
`ResidualBlock`	Residual definition (params, loss, factor)
`ParamId`	Unique parameter identifier
`FactorKind`	Residual family (reprojection / laser / prior) with camera model + chain as data
`CameraModelDesc`	Projection × distortion × sensor descriptor carried by factors
`ReprojChain` / `LaserChain`	Pose-chain descriptors carried by factors
`ManifoldKind`	Parameter geometry (Euclidean, SE3, SO3, S2)
`RobustLoss`	Loss function (None, Huber, Cauchy, Arctan)

Session Types

Type	Description
`CalibrationSession<P>`	Generic session container
`SessionMetadata`	Problem name, version, timestamps
`LogEntry`	Audit log entry (timestamp, operation, success)
`ExportRecord<E>`	Timestamped export
`InvalidationPolicy`	What to clear on input/config change

Backend Types

Type	Description
`BackendSolveOptions`	Solver settings (max_iters, tolerances, etc.)
`BackendSolution`	Optimized params + solve report
`SolveReport`	Initial/final cost, iterations, termination
`LinearSolverType`	SparseCholesky or SparseQR

Hand-Eye Types

Type	Description
`HandEyeMode`	EyeInHand or EyeToHand
`HandeyeMeta`	`base_se3_gripper: Iso3`
`RobotPoseMeta`	`base_se3_gripper: Iso3`

References and Further Reading

[COLLAB] Please help curate and prioritize this reference list, and add any additional references.

Foundational Textbooks

Hartley, R.I. & Zisserman, A. (2004). Multiple View Geometry in Computer Vision. 2nd edition. Cambridge University Press. — The definitive reference for projective geometry, fundamental/essential matrices, homography, triangulation, and bundle adjustment.
Ma, Y., Soatto, S., Kosecka, J., & Sastry, S.S. (2004). An Invitation to 3-D Vision. Springer. — Rigorous treatment of Lie groups and geometric vision with emphasis on SE(3) and optimization on manifolds.

Camera Calibration

Zhang, Z. (2000). "A Flexible New Technique for Camera Calibration." IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11), 1330-1334. — Zhang's planar calibration method: intrinsics from homographies via the image of the absolute conic.
Brown, D.C. (1966). "Decentering Distortion of Lenses." Photometric Engineering, 32(3), 444-462. — The original Brown-Conrady distortion model with radial and tangential components.

Minimal Solvers

Nister, D. (2004). "An Efficient Solution to the Five-Point Relative Pose Problem." IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6), 756-770. — The 5-point essential matrix solver used in stereo and visual odometry.
Kneip, L., Scaramuzza, D., & Siegwart, R. (2011). "A Novel Parametrization of the Perspective-Three-Point Problem for a Direct Computation of Absolute Camera Position and Orientation." IEEE Conference on Computer Vision and Pattern Recognition (CVPR). — The P3P solver used for camera pose estimation.

Hand-Eye Calibration

Tsai, R.Y. & Lenz, R.K. (1989). "A New Technique for Fully Autonomous and Efficient 3D Robotics Hand/Eye Calibration." IEEE Transactions on Robotics and Automation, 5(3), 345-358. — The Tsai-Lenz method for solving AX=XB.

Robust Estimation

Fischler, M.A. & Bolles, R.C. (1981). "Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography." Communications of the ACM, 24(6), 381-395. — The original RANSAC paper.
Hartley, R.I. (1997). "In Defense of the Eight-Point Algorithm." IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(6), 580-593. — Demonstrates that data normalization makes the 8-point algorithm competitive with iterative methods.

Non-Linear Optimization

Levenberg, K. (1944). "A Method for the Solution of Certain Non-Linear Problems in Least Squares." Quarterly of Applied Mathematics, 2(2), 164-168.
Marquardt, D.W. (1963). "An Algorithm for Least-Squares Estimation of Nonlinear Parameters." Journal of the Society for Industrial and Applied Mathematics, 11(2), 431-441.
Triggs, B., McLauchlan, P.F., Hartley, R.I., & Fitzgibbon, A.W. (2000). "Bundle Adjustment — A Modern Synthesis." International Workshop on Vision Algorithms. Springer. — Comprehensive survey of bundle adjustment techniques.

Lie Groups in Vision

Sola, J., Deray, J., & Atchuthan, D. (2018). "A Micro Lie Theory for State Estimation in Robotics." arXiv:1812.01537. — Accessible introduction to Lie groups for robotics and vision, covering SO(3), SE(3), and their tangent spaces.

vision-calibration Book