Introduction

calibration-rs is a Rust library for camera calibration — the process of estimating the internal parameters (intrinsics, lens distortion) and external parameters (pose, rig geometry, hand-eye transforms) of camera systems from observed correspondences between known 3D points and their 2D projections.

API reference

Who This Book Is For

This book targets engineers and researchers working in machine vision, robotics, and 3D reconstruction who want to:

  • Calibrate single cameras, stereo rigs, or multi-camera systems
  • Perform hand-eye calibration for cameras mounted on robot arms
  • Calibrate laser triangulation devices (camera + laser plane)
  • Understand the mathematical foundations behind calibration algorithms

We assume familiarity with linear algebra (SVD, eigendecomposition, least squares) and basic projective geometry (homogeneous coordinates, projection matrices). Lie group theory (SO(3), SE(3)) is introduced when needed.

What calibration-rs Provides

The library covers the full calibration pipeline:

  1. Linear initialization — closed-form solvers (Zhang's method, DLT, PnP, Tsai-Lenz hand-eye, epipolar geometry) that produce approximate parameter estimates
  2. Non-linear refinement — Levenberg-Marquardt bundle adjustment that minimizes reprojection error to sub-pixel accuracy
  3. Session framework — a high-level API with step functions, configuration, and JSON checkpointing

Workspace Structure

calibration-rs is organized as a 5-crate Rust workspace with a layered architecture:

vision-calibration (facade)
    → vision-calibration-pipeline (sessions, workflows)
        → vision-calibration-optim (non-linear refinement)
        → vision-calibration-linear (initialization)
            → vision-calibration-core (primitives, camera models, RANSAC)

The key dependency rule: the linear and optimization crates are peers — they both depend on vision-calibration-core but not on each other. This separation keeps initialization algorithms independent of the optimization backend.

Relation to OpenCV

Readers familiar with OpenCV's calibration module will find analogous functionality throughout:

OpenCVcalibration-rs
cv::calibrateCameraPlanar intrinsics pipeline (Zhang init + bundle adjustment)
cv::solvePnPPnpSolver::p3p(), PnpSolver::dlt()
cv::findHomographyHomographySolver::dlt(), dlt_homography_ransac()
cv::findFundamentalMatEpipolarSolver::fundamental_8point()
cv::findEssentialMatEpipolarSolver::essential_5point()
cv::stereoCalibrateRig extrinsics pipeline

calibration-rs differs from OpenCV in several ways: it is written in pure Rust, uses a composable camera model with generic type parameters, provides a backend-agnostic optimization IR, and offers a session framework with JSON checkpointing for production workflows.

Book Organization

The book is structured in seven parts:

  • Part I: Camera Model — the composable projection pipeline (pinhole, distortion, sensor tilt, intrinsics)
  • Part II: Geometric Primitives — rigid transforms, RANSAC, robust loss functions
  • Part III: Linear Initialization — all closed-form solvers with full mathematical derivations
  • Part IV: Non-Linear Optimization — Levenberg-Marquardt, manifold constraints, autodiff, the IR architecture
  • Part V: Calibration Workflows — end-to-end pipelines for each problem type, with both synthetic and real data examples
  • Part VI: Session Framework — the high-level CalibrationSession API
  • Part VII: Extending the Library — adding new problems, backends, and pipeline types

Each algorithm chapter includes a formal problem statement, objective function, assumptions, and a full derivation leading to the implementation.

Quickstart

This chapter walks through a minimal camera calibration using synthetic data. By the end you will have estimated camera intrinsics and lens distortion from simulated observations of a planar calibration board.

Add the Dependency

[dependencies]
vision-calibration = "0.2"

Minimal Example

The following program generates synthetic calibration data, runs the two-step calibration pipeline (linear initialization followed by non-linear refinement), and prints the results:

use anyhow::Result;
use vision_calibration::planar_intrinsics::{step_init, step_optimize};
use vision_calibration::prelude::*;
use vision_calibration::synthetic::planar;

fn main() -> Result<()> {
    // 1. Define ground truth camera
    let k_gt = FxFyCxCySkew {
        fx: 800.0, fy: 780.0, cx: 640.0, cy: 360.0, skew: 0.0,
    };
    let dist_gt = BrownConrady5 {
        k1: 0.05, k2: -0.02, k3: 0.0,
        p1: 0.001, p2: -0.001, iters: 8,
    };
    let camera = make_pinhole_camera(k_gt, dist_gt);

    // 2. Generate synthetic observations
    let board = planar::grid_points(8, 6, 0.04); // 8×6 grid, 40 mm squares
    let poses = planar::poses_yaw_y_z(6, -0.2, 0.08, 0.5, 0.05);
    let views = planar::project_views_all(&camera, &board, &poses)?;

    // 3. Create session and set input
    let dataset = PlanarDataset::new(
        views.into_iter().map(View::without_meta).collect()
    )?;
    let mut session = CalibrationSession::<PlanarIntrinsicsProblem>::new();
    session.set_input(dataset)?;

    // 4. Run calibration
    step_init(&mut session, None)?;      // Linear initialization
    step_optimize(&mut session, None)?;   // Non-linear refinement

    // 5. Inspect results
    let export = session.export()?;
    let k = export.params.intrinsics();
    println!("fx={:.1}, fy={:.1}, cx={:.1}, cy={:.1}",
             k.fx, k.fy, k.cx, k.cy);
    println!("Mean reprojection error: {:.4} px",
             export.mean_reproj_error);

    Ok(())
}

What Happens in Each Step

Step 1: step_init — Linear Initialization

  1. Computes a homography from each view's 2D-3D correspondences using DLT (direct linear transform)
  2. Estimates intrinsics from the homographies using Zhang's method
  3. Estimates lens distortion coefficients from homography residuals
  4. Iterates steps 2-3 to refine the joint + distortion estimate
  5. Recovers each view's camera pose from its homography and

After this step, intrinsics are typically within 10-40% of the true values — good enough to initialize non-linear optimization.

Step 2: step_optimize — Bundle Adjustment

Runs Levenberg-Marquardt optimization minimizing the total reprojection error:

where is the camera projection function, is the pose of view , are distortion coefficients, are known 3D board points, and are observed pixel coordinates.

After optimization, expect <2% error on intrinsics and <1 px mean reprojection error.

Running the Built-in Example

The library ships with this example (and several others):

cargo run -p vision-calibration --example planar_synthetic

Alternative: One-Line Pipeline

If you don't need to inspect intermediate results:

#![allow(unused)]
fn main() {
use vision_calibration::planar_intrinsics::run_calibration;

run_calibration(&mut session)?;
}

Next Steps

Architecture Overview

calibration-rs is organized as a layered workspace of Rust crates plus Python bindings. This chapter explains the dependency structure, data flow, and key design patterns.

Crate Dependency Graph

vision-calibration          (facade: re-exports everything)
    │
    └── vision-calibration-pipeline    (sessions, workflows, step functions)
            │
            ├── vision-calibration-optim   (non-linear optimization)
            │       │
            │       └── vision-calibration-core
            │
            └── vision-calibration-linear  (closed-form initialization)
                    │
                    └── vision-calibration-core

Key rule: vision-calibration-linear and vision-calibration-optim are peers. They both depend on vision-calibration-core but never on each other. This keeps initialization algorithms free of optimization dependencies and vice versa.

Crate Responsibilities

vision-calibration-core

The foundation layer providing:

  • Math types — nalgebra-based aliases: Pt2, Pt3, Vec3, Mat3, Iso3, Real (= f64)
  • Camera models — composable trait-based pipeline: ProjectionModel, DistortionModel, SensorModel, IntrinsicsModel
  • Observation typesCorrespondenceView (2D-3D point pairs), View<Meta>, PlanarDataset, RigDataset
  • RANSAC engine — generic Estimator trait with configurable options
  • Synthetic data utilities — grid generation, pose sampling, projection
  • Reprojection error computation — single-camera and multi-camera rig

vision-calibration-linear

Closed-form initialization solvers. Each produces an approximate estimate suitable for seeding non-linear optimization:

SolverInputOutput
Zhang's methodHomographiesIntrinsics
Distortion fit + homographiesBrown-Conrady coefficients
Iterative intrinsicsObservationsJoint + distortion
Homography DLT2D-2D correspondences homography
Planar pose + homographySE(3) pose
P3P / DLT PnP3D-2D + SE(3) pose
5-point essentialNormalized correspondencesEssential matrix
8-/7-point fundamentalPixel correspondencesFundamental matrix
Tsai-Lenz hand-eyeRobot + camera motionsHand-eye SE(3)
Rig extrinsicsPer-camera posesCamera-to-rig SE(3)
Laser planeLaser pixels + target posesPlane (normal + distance)

vision-calibration-optim

Non-linear refinement with a backend-agnostic architecture:

  1. IR (Intermediate Representation)ProblemIR with ParamBlock and ResidualBlock types that describe optimization problems independently of any solver
  2. Factors — generic residual functions parameterized over RealField for automatic differentiation
  3. Backends — currently TinySolverBackend (Levenberg-Marquardt with sparse linear solvers)
  4. Problem builders — domain-specific functions that construct IR from calibration data

vision-calibration-pipeline

The session framework providing production-ready workflows:

  • CalibrationSession<P: ProblemType> — generic state container with config, input, state, output, exports
  • Step functions — free functions operating on &mut CalibrationSession<P> (e.g., step_init, step_optimize)
  • Pipeline functions — convenience wrappers chaining all steps
  • JSON checkpointing — full serialization for session persistence
  • Six problem types: PlanarIntrinsicsProblem, ScheimpflugIntrinsicsProblem, SingleCamHandeyeProblem, RigExtrinsicsProblem, RigHandeyeProblem, LaserlineDeviceProblem

vision-calibration

Unified facade crate that re-exports everything through a clean module hierarchy:

#![allow(unused)]
fn main() {
use vision_calibration::prelude::*;            // Minimal planar hello-world imports
use vision_calibration::planar_intrinsics::*;  // Planar workflow
use vision_calibration::core::*;               // Math types
use vision_calibration::linear::*;             // Linear solvers
use vision_calibration::optim::*;              // Optimization
}

vision-calibration-py

Python bindings crate (maturin/PyO3) exposing high-level session workflows.

Data Flow

Every calibration workflow follows the same pattern:

Observations (2D-3D correspondences)
    │
    ▼
Linear Initialization (vision-calibration-linear)
    │  Closed-form solvers: ~5-40% accuracy
    ▼
Non-Linear Refinement (vision-calibration-optim)
    │  Levenberg-Marquardt: <2% accuracy, <1 px reprojection
    ▼
Calibrated Parameters (K, distortion, poses, ...)

The session framework wraps this flow with configuration, state tracking, and checkpointing:

CalibrationSession::new()
    │
    ▼
session.set_input(dataset)
    │
    ▼
step_init(&mut session)      ← linear initialization
    │
    ▼
step_optimize(&mut session)  ← non-linear refinement
    │
    ▼
session.export()             ← calibrated parameters

Design Patterns

Composable Camera Model

The camera projection pipeline is built from four independent traits composed via generics:

pixel = K(sensor(distortion(projection(direction))))

Each stage can be mixed and matched. For example, a standard camera uses Pinhole + BrownConrady5 + IdentitySensor + FxFyCxCySkew, while a laser profiler might use Pinhole + BrownConrady5 + ScheimpflugParams + FxFyCxCySkew.

Backend-Agnostic Optimization

Problems are defined as an intermediate representation (IR) that is independent of any specific solver. The IR is then compiled to a solver-specific form:

Problem Builder  →  ProblemIR  →  Backend.compile()  →  Backend.solve()
                  (generic)       (solver-specific)

This allows swapping the optimization backend without changing problem definitions.

Step Functions

Calibration workflows are decomposed into discrete steps implemented as free functions. This allows:

  • Inspection of intermediate state between steps
  • Resumption from any point (via JSON checkpointing)
  • Customization of per-step options
  • Composition of steps from different problem types

Composable Camera Pipeline

A camera model maps 3D points in the camera frame to 2D pixel coordinates. In calibration-rs, this mapping is decomposed into four composable stages, each implemented as a trait:

where:

  • projection: maps a 3D direction to normalized coordinates on the image plane
  • distortion: warps normalized coordinates to model lens imperfections
  • sensor: applies a homography for tilted sensor planes (identity for standard cameras)
  • intrinsics: scales and translates to pixel coordinates

The Camera Struct

The camera is a generic struct parameterized over the four model traits:

#![allow(unused)]
fn main() {
pub struct Camera<S, P, D, Sm, K>
where
    S: RealField,
    P: ProjectionModel<S>,
    D: DistortionModel<S>,
    Sm: SensorModel<S>,
    K: IntrinsicsModel<S>,
{
    pub proj: P,
    pub dist: D,
    pub sensor: Sm,
    pub k: K,
}
}

The scalar type S is generic over RealField, enabling the same camera to work with f64 for evaluation and with dual numbers for automatic differentiation.

Forward Projection

Given a 3D point in the camera frame, projection to pixel coordinates proceeds through the four stages:

#![allow(unused)]
fn main() {
pub fn project_point_c(&self, p_c: &Vector3<S>) -> Option<Point2<S>>
}
  1. Projection: Compute normalized coordinates . For pinhole: . Returns None if the point is behind the camera ().

  2. Distortion: Apply lens distortion . Warps normalized coordinates according to the distortion model (e.g., Brown-Conrady radial + tangential).

  3. Sensor transform: Apply sensor model . For standard cameras this is identity; for tilted sensors it applies a homography.

  4. Intrinsics: Map to pixels . Applies focal lengths, principal point, and optional skew.

Inverse (Back-Projection)

The inverse maps a pixel to a ray in the camera frame:

#![allow(unused)]
fn main() {
pub fn backproject_pixel(&self, px: &Point2<S>) -> Ray<S>
}
  1. — pixel to sensor coordinates
  2. — sensor to distorted normalized
  3. — undistort (iterative)
  4. — normalized to 3D direction

The result is a point on the plane in the camera frame, defining a ray from the camera center.

Common Type Aliases

For the most common configuration (pinhole + Brown-Conrady + no tilt):

#![allow(unused)]
fn main() {
type PinholeCamera = Camera<f64, Pinhole, BrownConrady5<f64>,
                            IdentitySensor, FxFyCxCySkew<f64>>;
}

The make_pinhole_camera(k, dist) convenience function constructs this type.

Projecting World Points

To project a world point through a posed camera, first transform from world to camera frame using the extrinsic pose :

Then apply the camera projection:

Trait Composition

Each stage is an independent trait. This design allows mixing and matching:

ProjectionDistortionSensorIntrinsics
PinholeNoDistortionIdentitySensorFxFyCxCySkew
PinholeBrownConrady5IdentitySensorFxFyCxCySkew
PinholeBrownConrady5HomographySensorFxFyCxCySkew

Note: ScheimpflugParams stores the tilt angles and has a compile() method that produces a HomographySensor suitable for the Camera struct. See the Sensor Models chapter.

The subsequent sections detail each stage.

Pinhole Projection

The pinhole model is the simplest and most widely used camera projection. It models a camera as an ideal point through which all light rays pass — no lens, no aperture, just a geometric center of projection.

Mathematical Definition

Given a 3D point in the camera frame, the pinhole projection maps it to normalized image coordinates:

This is perspective division: the point is projected onto the plane. The projection is defined only for points in front of the camera ().

The inverse (back-projection) maps normalized coordinates to a 3D direction:

This defines a ray from the camera center through the image point.

Camera Frame Convention

calibration-rs uses a right-handed camera frame:

  • : right
  • : down
  • : forward (optical axis)

This matches the convention used by OpenCV and most computer vision libraries. A point with positive is in front of the camera; negative is behind it.

The ProjectionModel Trait

#![allow(unused)]
fn main() {
pub trait ProjectionModel<S: RealField> {
    fn project_dir(&self, dir_c: &Vector3<S>) -> Option<Point2<S>>;
    fn unproject_dir(&self, n: &Point2<S>) -> Vector3<S>;
}
}

The Pinhole struct implements this trait:

#![allow(unused)]
fn main() {
pub struct Pinhole;

impl<S: RealField> ProjectionModel<S> for Pinhole {
    fn project_dir(&self, dir_c: &Vector3<S>) -> Option<Point2<S>> {
        if dir_c.z > S::zero() {
            Some(Point2::new(
                dir_c.x.clone() / dir_c.z.clone(),
                dir_c.y.clone() / dir_c.z.clone(),
            ))
        } else {
            None
        }
    }

    fn unproject_dir(&self, n: &Point2<S>) -> Vector3<S> {
        Vector3::new(n.x.clone(), n.y.clone(), S::one())
    }
}
}

Note the use of .clone() — this is required for compatibility with dual numbers used in automatic differentiation (see Autodiff and Generic Residual Functions).

Limitations

The pinhole model assumes:

  • No lens distortion — addressed by the distortion stage
  • No sensor tilt — addressed by the sensor stage
  • Central projection — all rays pass through a single point

For cameras with significant distortion (wide-angle, fisheye), the distortion model corrects for lens effects after pinhole projection to normalized coordinates.

Brown-Conrady Distortion

Real lenses introduce geometric distortion: straight lines in the world project as curves in the image. The Brown-Conrady model captures the two dominant distortion effects — radial and tangential — as polynomial functions of the distance from the optical axis.

Distortion Model

Given undistorted normalized coordinates , the distorted coordinates are:

Radial distortion (barrel/pincushion):

Tangential distortion (decentering):

Combined:

The model has five parameters: radial coefficients and tangential coefficients .

OpenCV equivalence: This is identical to OpenCV's distortPoints with the 5-parameter model (k1, k2, p1, p2, k3). Note OpenCV's parameter ordering differs from the mathematical ordering.

Physical Interpretation

  • : barrel distortion (lines bow outward from center)
  • : pincushion distortion (lines bow inward)
  • : higher-order radial corrections
  • : tangential distortion from imperfect lens-sensor alignment (lens elements not perfectly centered)

Typical values for industrial cameras: , , .

When to Fix

The sixth-order radial term is only significant for wide-angle lenses where reaches large values. For typical industrial cameras with moderate field of view:

  • is poorly constrained by the data and often absorbs noise
  • Estimating can cause overfitting, leading to worse extrapolation outside the calibration region
  • The library defaults to fix_k3: true in most problem configurations

Recommendation: Only estimate with high-quality data covering the full image area, or for lenses with field of view > 90°.

Undistortion (Inverse)

Given distorted coordinates , recovering undistorted coordinates requires inverting the distortion model. Since the forward model is a polynomial without a closed-form inverse, calibration-rs uses iterative fixed-point refinement:

This rearranges the distortion equation to isolate and iterates to convergence. The default is 8 iterations, which provides accuracy well below sensor noise for typical distortion magnitudes.

Convergence

The fixed-point iteration converges when the Jacobian of the distortion residual has spectral radius less than 1, which holds for physically realistic distortion values (small , relative to ). For extreme distortion, more iterations may be needed via the iters parameter.

OpenCV equivalence: cv::undistortPoints performs the same iterative inversion.

The DistortionModel Trait

#![allow(unused)]
fn main() {
pub trait DistortionModel<S: RealField> {
    fn distort(&self, n_undist: &Point2<S>) -> Point2<S>;
    fn undistort(&self, n_dist: &Point2<S>) -> Point2<S>;
}
}

The BrownConrady5<S> struct:

#![allow(unused)]
fn main() {
pub struct BrownConrady5<S: RealField> {
    pub k1: S, pub k2: S, pub k3: S,
    pub p1: S, pub p2: S,
    pub iters: u32,   // undistortion iterations (default: 8)
}
}

NoDistortion is the identity implementation for distortion-free cameras.

Distortion in the Projection Pipeline

Distortion operates in normalized image coordinates (after pinhole projection, before intrinsics):

This is the standard convention: distortion is defined on the plane, not in pixel space. The advantage is that distortion parameters are independent of image resolution and focal length.

Intrinsics Matrix

The intrinsics matrix maps from the sensor coordinate system (normalized or sensor-transformed coordinates) to pixel coordinates. It encodes the camera's internal geometric properties: focal length, principal point, and optional skew.

Definition

where:

  • — focal lengths in pixels (horizontal and vertical). These combine the physical focal length with the pixel pitch: where is the focal length in mm and is the pixel width in mm.
  • — principal point in pixels. The projection of the optical axis onto the image plane. Typically near the image center but not exactly at it.
  • — skew coefficient. Non-zero only if pixel axes are not perpendicular. Effectively zero for all modern sensors; calibration-rs defaults to zero_skew: true in most configurations.

Forward and Inverse Transform

Sensor to pixel (forward):

where are sensor coordinates (output of the distortion + sensor stages).

Pixel to sensor (inverse):

The IntrinsicsModel Trait

#![allow(unused)]
fn main() {
pub trait IntrinsicsModel<S: RealField> {
    fn sensor_to_pixel(&self, sensor: &Point2<S>) -> Point2<S>;
    fn pixel_to_sensor(&self, pixel: &Point2<S>) -> Point2<S>;
}
}

The FxFyCxCySkew<S> struct implements this trait:

#![allow(unused)]
fn main() {
pub struct FxFyCxCySkew<S: RealField> {
    pub fx: S, pub fy: S,
    pub cx: S, pub cy: S,
    pub skew: S,
}
}

Coordinate Utilities

The coordinate_utils module provides convenience functions that combine intrinsics with distortion:

  • pixel_to_normalized(pixel, K) — apply to get normalized coordinates on the plane
  • normalized_to_pixel(normalized, K) — apply to get pixel coordinates
  • undistort_pixel(pixel, K, distortion) — pixel → normalized → undistort → return normalized
  • distort_to_pixel(normalized, K, distortion) — distort → apply → return pixel

These functions are used extensively in the linear initialization algorithms, which need to convert between pixel and normalized coordinate spaces.

Aspect Ratio

For most cameras, because pixels are not perfectly square. The ratio reflects the pixel aspect ratio. Typical industrial cameras have aspect ratios very close to 1.0 (within 1-3%).

OpenCV equivalence: cv::initCameraMatrix2D provides initial intrinsics estimates. The matrix format is identical to OpenCV's cameraMatrix.

Sensor Models and Scheimpflug Tilt

In standard cameras, the sensor plane is perpendicular to the optical axis. Some industrial cameras — particularly laser triangulation profilers — deliberately tilt the sensor to achieve a larger depth of field along the laser plane (the Scheimpflug condition). The sensor model stage accounts for this tilt.

The SensorModel Trait

#![allow(unused)]
fn main() {
pub trait SensorModel<S: RealField> {
    fn normalized_to_sensor(&self, n: &Point2<S>) -> Point2<S>;
    fn sensor_to_normalized(&self, s: &Point2<S>) -> Point2<S>;
}
}

The sensor model sits between distortion and intrinsics in the pipeline:

IdentitySensor

For standard cameras with untilted sensors:

This is the default and most common case.

Scheimpflug Tilt Model

The Scheimpflug condition states that when the lens plane, image plane, and object plane intersect along a common line, the entire object plane is in sharp focus. Laser profilers exploit this by tilting the sensor to keep the laser line in focus.

Tilt Projection Matrix

The tilt is parameterized by two rotation angles:

  • — tilt around the horizontal (X) axis
  • — tilt around the vertical (Y) axis

The tilt homography is constructed as:

The projection onto the tilted sensor plane normalizes by the third row of applied to the point in homogeneous coordinates:

where normalizes by the -component (perspective division through the tilted plane). In practice, the first two rows of divided by the third row give a rational transform in the image plane.

OpenCV equivalence: This matches OpenCV's computeTiltProjectionMatrix used in the 14-parameter rational model (CALIB_TILTED_MODEL).

ScheimpflugParams

#![allow(unused)]
fn main() {
pub struct ScheimpflugParams {
    pub tilt_x: f64,   // tau_x in radians
    pub tilt_y: f64,   // tau_y in radians
}
}

The compile() method generates a HomographySensor containing the homography and its precomputed inverse. ScheimpflugParams::default() returns zero tilt (identity sensor).

HomographySensor

The general homography sensor applies an arbitrary projective transform:

This is used as the runtime representation compiled from ScheimpflugParams.

When to Use Scheimpflug

  • Laser triangulation profilers: The laser plane is at an angle to the camera axis. Tilting the sensor brings the entire laser line into focus.
  • Machine vision with tilted object planes: When the depth of field is insufficient to keep the entire object in focus.

For standard cameras (no tilt), use IdentitySensor (the default).

Optimization of Tilt Parameters

In the non-linear optimization stage, Scheimpflug parameters are treated as additional optimization variables. The tilt_projection_matrix_generic<T>() function in the factor module computes the tilt homography using dual numbers for automatic differentiation, enabling joint optimization of tilt with intrinsics, distortion, and poses.

Serialization and Runtime-Dynamic Types

The generic Camera<S, P, D, Sm, K> type provides compile-time composition of camera stages. However, for JSON serialization (session checkpointing, data exchange) and runtime-dynamic camera construction, calibration-rs provides an enum-based parameter system.

CameraParams

The CameraParams struct holds camera parameters as serializable enums:

#![allow(unused)]
fn main() {
pub struct CameraParams {
    pub projection: ProjectionParams,
    pub distortion: DistortionParams,
    pub sensor: SensorParams,
    pub intrinsics: IntrinsicsParams,
}
}

Each component is an enum of supported variants:

#![allow(unused)]
fn main() {
pub enum ProjectionParams { Pinhole }

pub enum DistortionParams {
    None,
    BrownConrady5 { k1, k2, k3, p1, p2 },
}

pub enum SensorParams {
    Identity,
    Scheimpflug { tilt_x, tilt_y },
}

pub enum IntrinsicsParams {
    FxFyCxCySkew { fx, fy, cx, cy, skew },
}
}

Building a Camera from Parameters

The build() method constructs a concrete Camera from serialized parameters:

#![allow(unused)]
fn main() {
let params = CameraParams { /* ... */ };
let camera = params.build();
}

This is used internally by the session framework to reconstruct cameras from JSON-checkpointed state.

Convenience Accessors

CameraParams provides typed accessors:

#![allow(unused)]
fn main() {
params.intrinsics()   // → &FxFyCxCySkew<f64>
params.distortion()   // → &BrownConrady5<f64> or zero distortion
}

Use in Session Framework

CameraParams appears throughout the pipeline:

  • Export records store calibrated parameters as CameraParams
  • Session JSON serializes intermediate and final camera parameters
  • Config types reference CameraParams for initial guesses

This provides a stable, version-safe serialization format decoupled from the generic type system.

Rigid Transformations and SE(3)

Camera calibration is fundamentally about estimating rigid transformations — rotations and translations that relate different coordinate frames: camera, world, robot gripper, calibration target, rig, etc. This chapter covers the rotation representations and transformation algebra used throughout calibration-rs.

Rotation Representations

Rotation Matrix

A orthogonal matrix with determinant :

The set of all such matrices forms the special orthogonal group . It has 3 degrees of freedom (9 entries minus 6 constraints).

Unit Quaternion

A 4-element vector with unit norm . The rotation of a vector is:

where is quaternion multiplication and is the conjugate.

Quaternions are preferred for optimization because they avoid gimbal lock and have simpler interpolation properties than Euler angles. calibration-rs (via nalgebra) stores quaternions as [i, j, k, w] internally, which maps to .

Axis-Angle

A rotation of angle around unit axis is represented as:

with and .

The Rodrigues formula converts axis-angle to rotation matrix:

where is the skew-symmetric matrix of :

Rigid Transformations: SE(3)

A rigid body transformation combines a rotation and translation :

The special Euclidean group has 6 degrees of freedom (3 rotation + 3 translation).

Applying a Transform

A point transformed by :

or in homogeneous coordinates:

Composition

Transforms compose by matrix multiplication:

Inverse

Naming Convention

calibration-rs uses the notation to mean "the transform from frame to frame ":

Common transforms:

SymbolMeaning
World to camera (camera extrinsics)
Calibration board to camera
Camera to rig (rig extrinsics)
Camera to gripper (hand-eye)
Target to base (hand-eye target pose)

nalgebra Type: Iso3

calibration-rs uses nalgebra's Isometry3<f64> (aliased as Iso3) for SE(3) transforms:

#![allow(unused)]
fn main() {
pub type Iso3 = nalgebra::Isometry3<f64>;
}

This stores a UnitQuaternion (rotation) and a Translation3 (translation) separately, avoiding the redundancy of a full matrix.

SE(3) Storage for Optimization

In the optimization IR, SE(3) parameters are stored as a 7-element vector:

The quaternion is unit-constrained. The optimizer uses the SE(3) manifold (see Manifold Optimization) to maintain this constraint during parameter updates.

The Exponential Map

The Lie algebra provides a 6-dimensional tangent space at the identity. A tangent vector maps to an SE(3) element via the exponential map:

where is the left Jacobian of SO(3):

The exponential map is central to manifold optimization: parameter updates are computed in the tangent space and then retracted to the manifold (see Manifold Optimization).

RANSAC

RANSAC (Random Sample Consensus) is a robust estimation method that fits a model to data containing outliers. Unlike least-squares methods that use all data points and can be corrupted by even a single outlier, RANSAC repeatedly samples minimal subsets, fits a model to each, and selects the model with the most support (inliers).

Algorithm

Given data points and a model requiring points to fit:

  1. Repeat for up to iterations:

    1. Sample random data points
    2. Fit a model to the -point sample (skip if degenerate)
    3. Score: count inliers — points with residual
    4. If this model has more inliers than the current best, update the best
    5. Optionally refit the model on all inliers for a tighter fit
    6. Update the iteration bound dynamically based on the current inlier ratio
  2. Return the best model and its inlier set

Dynamic Iteration Bound

After finding a model with inlier ratio , the number of iterations needed to find an all-inlier sample with probability is:

where is the minimum sample size. As more inliers are found, increases and decreases, allowing early termination. The iteration count is clamped to — it can only decrease.

Example: For (homography), 50% inliers, and 99% confidence: iterations.

The Estimator Trait

calibration-rs provides a generic RANSAC engine that works with any model implementing the Estimator trait:

#![allow(unused)]
fn main() {
pub trait Estimator {
    type Datum;
    type Model;

    const MIN_SAMPLES: usize;

    fn fit(data: &[Self::Datum], sample_indices: &[usize])
        -> Option<Self::Model>;

    fn residual(model: &Self::Model, datum: &Self::Datum) -> f64;

    fn is_degenerate(data: &[Self::Datum], sample_indices: &[usize])
        -> bool { false }

    fn refit(data: &[Self::Datum], inliers: &[usize])
        -> Option<Self::Model> {
        Self::fit(data, inliers)
    }
}
}
MethodPurpose
MIN_SAMPLESMinimum points for a model (e.g., 4 for homography)
fitFit model from selected points
residualDistance from a point to the model
is_degenerateReject degenerate samples (e.g., collinear points for homography)
refitOptional: refit model on full inlier set (may differ from fit)

Configuration

#![allow(unused)]
fn main() {
pub struct RansacOptions {
    pub max_iters: usize,      // Maximum iterations
    pub thresh: f64,           // Inlier distance threshold
    pub min_inliers: usize,    // Minimum consensus set size
    pub confidence: f64,       // Desired probability (0-1) for dynamic bound
    pub seed: u64,             // RNG seed for reproducibility
    pub refit_on_inliers: bool, // Refit model on full inlier set
}
}

Choosing the threshold : The threshold should reflect the expected noise level. For pixel-space residuals, pixels is typical. For normalized-coordinate residuals, scale accordingly by the focal length.

Result

#![allow(unused)]
fn main() {
pub struct RansacResult<M> {
    pub success: bool,
    pub model: Option<M>,
    pub inliers: Vec<usize>,
    pub inlier_rms: f64,
    pub iters: usize,
}
}

Instantiated Models

RANSAC is used with several models in calibration-rs:

ModelMIN_SAMPLESResidualUsage
Homography4Reprojection error (pixels)dlt_homography_ransac
Fundamental matrix8Epipolar distancefundamental_8point_ransac
PnP (DLT)6Reprojection error (pixels)dlt_ransac

Deterministic Seeding

All RANSAC calls in calibration-rs use a deterministic seed for the random number generator. This ensures:

  • Reproducible results across runs
  • Deterministic tests that do not flake
  • Debuggable behavior — the same input always produces the same output

The seed can be configured via RansacOptions::seed.

Best-Model Selection

When multiple models achieve the same inlier count, RANSAC selects the model with the lowest inlier RMS (root mean square of inlier residuals). This breaks ties in favor of more accurate models.

Robust Loss Functions

Standard least squares minimizes the sum of squared residuals . This objective is highly sensitive to outliers: a single point with a large residual can dominate the entire cost function and corrupt the solution. Robust loss functions (M-estimators) reduce the influence of large residuals, making optimization tolerant to outliers in the data.

Problem Setup

In non-linear least squares, we minimize:

where is the loss function applied to each residual . The standard (non-robust) case uses .

Available Loss Functions

calibration-rs provides three robust loss functions, each parameterized by a scale that controls the transition from quadratic (inlier) to robust (outlier) behavior.

Huber Loss

Properties:

  • Quadratic for small residuals, linear for large residuals
  • Continuous first derivative
  • Influence function: bounded at — outliers contribute constant gradient, not growing

When to use: The default robust loss. Good general-purpose choice when you expect a moderate number of outliers.

Cauchy Loss

Properties:

  • Grows logarithmically for large residuals (slower than linear)
  • Smooth everywhere
  • Influence function: — decreases to zero for large , effectively down-weighting far outliers

When to use: When outliers are far from the bulk of the data and should have near-zero influence.

Arctan Loss

Properties:

  • Bounded: as
  • Influence function approaches zero for large residuals (redescending)

When to use: When very strong outlier rejection is needed. More aggressive than Cauchy but can make convergence harder.

Comparison

LossLarge- growthOutlier influenceConvergence
Quadratic ()QuadraticUnboundedBest
HuberLinearBounded (constant)Good
CauchyLogarithmicDecreasingModerate
ArctanBoundedApproaching zeroCan be tricky

Choosing the Scale Parameter

The scale sets the boundary between "inlier" and "outlier" behavior:

  • Too small: Treats good data as outliers, reducing effective sample size
  • Too large: Outliers still dominate (approaches standard least squares)
  • Rule of thumb: Set to the expected residual magnitude for good data points. For reprojection residuals, pixels is typical.

Usage in calibration-rs

Robust losses are specified per-residual block in the optimization IR:

#![allow(unused)]
fn main() {
pub enum RobustLoss {
    None,
    Huber { scale: f64 },
    Cauchy { scale: f64 },
    Arctan { scale: f64 },
}
}

Each problem type exposes the loss function as a configuration option:

#![allow(unused)]
fn main() {
session.update_config(|c| {
    c.robust_loss = RobustLoss::Huber { scale: 2.0 };
})?;
}

The backend applies the loss function during residual evaluation, modifying both the cost and the Jacobian.

Iteratively Reweighted Least Squares (IRLS)

Under the hood, robust loss functions are typically implemented via IRLS: each residual is weighted by , and the weighted least-squares problem is solved iteratively. The Levenberg-Marquardt backend handles this automatically.

Interaction with RANSAC

RANSAC and robust losses address outliers at different stages:

  • RANSAC (linear initialization): Binary inlier/outlier classification. Used during model fitting to reject gross outliers before any optimization.
  • Robust losses (non-linear refinement): Soft down-weighting. Used during optimization to reduce the influence of moderate outliers that passed RANSAC.

The two approaches are complementary: RANSAC handles gross outliers during initialization, while robust losses handle smaller outliers during refinement.

Why Linear Initialization Matters

Non-linear optimization is the workhorse of camera calibration: it achieves sub-pixel reprojection error by jointly optimizing all parameters. But non-linear optimizers — Levenberg-Marquardt, Gauss-Newton, and their variants — are local methods. They converge to the nearest local minimum, which is only the global minimum if the starting point is sufficiently close.

The role of linear initialization is to provide that starting point.

The Init-Then-Refine Paradigm

Every calibration workflow in calibration-rs follows a two-phase pattern:

  1. Linear initialization: Closed-form solvers estimate approximate parameters. These are fast, deterministic, and require no initial guess — but they achieve only ~5-40% accuracy on camera parameters.

  2. Non-linear refinement: Levenberg-Marquardt bundle adjustment minimizes reprojection error starting from the linear estimate. This converges to <2% parameter accuracy and <1 px reprojection error.

The linear estimate does not need to be highly accurate. It needs to be in the basin of convergence of the non-linear optimizer — close enough that gradient-based iteration converges to the correct solution rather than a local minimum.

Why Not Just Optimize Directly?

Without initialization, you would need to guess intrinsics (), distortion (), and all camera poses ( per view). For a typical 20-view calibration:

  • 4 intrinsics + 5 distortion + 20 × 6 pose = 129 parameters
  • The cost landscape has many local minima
  • A random starting point will almost certainly diverge or converge to a wrong solution

Linear initialization eliminates this problem by computing a reasonable estimate from the data structure alone.

Linear Solvers in calibration-rs

SolverEstimatesMethodChapter
Homography DLT homographySVD nullspaceHomography
Zhang's methodIntrinsics IAC constraintsZhang
Distortion fitLinear LS on residualsDistortion Fit
Iterative intrinsics + distortionAlternating refinementIterative
Planar pose from Decomposition + SVDPlanar Pose
P3P (Kneip) from 3 pointsQuartic polynomialPnP
DLT PnP from pointsSVD + SO(3) projectionPnP
8-point fundamental matrixSVD + rank constraintEpipolar
7-point fundamental matrixCubic polynomialEpipolar
5-point essential matrixAction matrix eigenvaluesEpipolar
Camera matrix DLTSVD + RQ decompositionCamera Matrix
Triangulation3D pointSVD nullspaceTriangulation
Tsai-LenzHand-eye Quaternion + linear LSHand-Eye
Rig extrinsics per cameraSE(3) averagingRig Init
Laser planePlane normal + distanceSVD on covarianceLaser Init

Common Mathematical Tools

Most linear solvers share a common toolkit:

  • SVD (Singular Value Decomposition): Solves homogeneous systems via the right singular vector corresponding to the smallest singular value. This is the core of DLT methods.
  • Hartley normalization: Conditions the DLT system for numerical stability by centering and scaling the input points.
  • Polynomial root finding: Minimal solvers (P3P, 7-point F, 5-point E) reduce to polynomial equations.

The following chapters derive each algorithm in detail.

Hartley Normalization

DLT algorithms build large linear systems from point coordinates and solve them via SVD. When the coordinates span vastly different scales (e.g., pixel values in the hundreds vs. homogeneous scale factor of 1), the resulting system is ill-conditioned: small perturbations in the data cause large changes in the solution. Hartley normalization is a simple preprocessing step that dramatically improves numerical stability.

The Problem

Consider the homography DLT: we build a matrix from pixel coordinates and solve . If , the entries of range from to . The condition number of becomes large, and the SVD solution is sensitive to floating-point errors.

The Solution

Normalize the input points so they are centered at the origin with a controlled scale, solve the system, then denormalize the result.

2D Normalization

Given points :

  1. Centroid: ,

  2. Mean distance from centroid:

  3. Scale factor:

  4. Normalization matrix:

  1. Normalized points:

After normalization, the points have centroid at the origin and mean distance from the origin.

3D Normalization

For 3D points , the analogous normalization uses:

  • Mean distance target: (instead of )
  • A normalization matrix

Denormalization

After solving the normalized system, the result must be denormalized. For a homography estimated from normalized points:

For a fundamental matrix:

The denormalization formula depends on the specific problem. Each chapter states the appropriate denormalization.

When It Is Used

Hartley normalization is applied in every DLT-based solver in calibration-rs:

SolverNormalized spaces
Homography DLTWorld 2D + image 2D
Fundamental 8-pointImage 1 2D + image 2 2D
Essential 5-pointImage 1 2D + image 2 2D
Camera matrix DLTWorld 3D + image 2D
PnP DLTWorld 3D (image points use normalized coordinates via )

Reference

Hartley, R.I. (1997). "In Defense of the Eight-Point Algorithm." IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(6), 580-593.

The normalization technique is described in Algorithm 4.2 of Hartley & Zisserman (2004).

Homography Estimation (DLT)

A homography is a projective transformation between two planes. In calibration, the most common case is the mapping from a planar calibration board to the image plane. Homography estimation is the first step in Zhang's calibration method.

Problem Statement

Given: point correspondences where are points on the world plane and are the corresponding image points.

Find: such that (equality up to scale) for all .

Assumptions:

  • The world points are coplanar (the calibration board is flat)
  • Correspondences are correct (or RANSAC is used to handle outliers)
  • At least 4 non-collinear point correspondences

Derivation

From Projective Relation to Linear System

The relation in homogeneous coordinates means:

Expanding and eliminating the scale factor using cross-product (the constraint ), we get two independent equations per correspondence:

The DLT System

Stacking all correspondences into a matrix :

where .

Solving via SVD

We seek minimizing subject to (to avoid the trivial solution ). This is the standard homogeneous least squares problem, solved by the SVD of :

The solution is the last column of (corresponding to the smallest singular value). Reshaping this 9-vector into a matrix gives .

Normalization and Denormalization

The complete algorithm with Hartley normalization:

  1. Normalize world and image points: and
  2. Build the matrix from normalized points
  3. Solve via SVD to get
  4. Denormalize:
  5. Scale:

Degrees of Freedom

is a matrix with 9 entries, but it is defined up to scale, giving 8 degrees of freedom. Each correspondence provides 2 equations, so the minimum is correspondences (giving equations for 8 unknowns).

With , the system is overdetermined and the SVD solution minimizes the algebraic error in a least-squares sense.

RANSAC Wrapper

For real data with potential mismatched correspondences, dlt_homography_ransac wraps the DLT solver in RANSAC:

#![allow(unused)]
fn main() {
let (H, inliers) = dlt_homography_ransac(&world_pts, &image_pts, &opts)?;
}
  • MIN_SAMPLES = 4
  • Residual: Euclidean reprojection error in pixels:
  • Degeneracy check: Tests if the first 3 world points are collinear (a degenerate configuration for homography estimation)

OpenCV equivalence: cv::findHomography with RANSAC method.

API

#![allow(unused)]
fn main() {
// Direct DLT (all points assumed inliers)
let H = dlt_homography(&world_2d, &image_2d)?;

// With RANSAC
let opts = RansacOptions {
    max_iters: 1000,
    thresh: 3.0,   // pixels
    confidence: 0.99,
    ..Default::default()
};
let (H, inliers) = dlt_homography_ransac(&world_2d, &image_2d, &opts)?;
}

Geometric Interpretation

For a planar calibration board at in world coordinates, the camera projection matrix reduces to a homography:

where are the first two columns of . This relationship is the foundation of Zhang's calibration method (next chapter).

Zhang's Intrinsics from Homographies

Zhang's method is the most widely used technique for estimating camera intrinsics from a planar calibration board. Given homographies from multiple views of the board, it recovers the intrinsics matrix using constraints from the image of the absolute conic (IAC).

Problem Statement

Given: homographies from a planar calibration board to the image, computed from different viewpoints.

Find: Camera intrinsics .

Assumptions:

  • The calibration board is planar
  • At least 3 homographies from distinct viewpoints
  • The camera intrinsics are constant across all views
  • Distortion is negligible (or has been accounted for — see Iterative Intrinsics)

Derivation

Homography and Intrinsics

From the previous chapter, the homography from a board plane is:

where is an arbitrary scale and are columns of the rotation matrix. Let . Then:

Orthogonality Constraints

Since and are columns of a rotation matrix, they satisfy:

Substituting :

The Image of the Absolute Conic (IAC)

Define the symmetric matrix:

Since is symmetric, it has 6 independent entries. The constraints become:

Each homography gives 2 linear equations in the 6 unknowns of .

The Vectors

To write the constraints as a linear system, define the 6-vector:

where is the -th element of (1-indexed, column , row ). Then:

where .

The Linear System

For each homography , the two constraints become:

Stacking homographies gives a system:

With (giving equations), this is solved via SVD: is the right singular vector of corresponding to the smallest singular value.

Extracting Intrinsics from

Given , the intrinsics are recovered by a Cholesky-like factorization of .

The closed-form extraction:

Validity Checks

The extraction can fail when:

  • (degenerate: insufficient view diversity)
  • and have different signs (negative focal length squared)

These cases indicate that the input homographies do not sufficiently constrain the intrinsics, typically because the views are too similar (e.g., all near-frontal).

Minimum Number of Views

  • homographies: 6 equations for 6 unknowns — the minimum for a unique solution (with skew)
  • : Only 4 equations; requires the additional assumption that (zero skew) to reduce unknowns to 5
  • : Overdetermined system; SVD gives the least-squares solution

Practical Considerations

View diversity: The views should include significant rotation around both axes. Pure translations or rotations around a single axis lead to degenerate configurations where some intrinsic parameters are undetermined.

Distortion: Zhang's method assumes distortion-free observations. When applied to distorted pixels, the estimated is biased. The Iterative Intrinsics chapter addresses this with an alternating refinement scheme.

OpenCV equivalence: Zhang's method is the internal initialization step of cv::calibrateCamera. OpenCV does not expose it as a separate API.

API

#![allow(unused)]
fn main() {
let K = estimate_intrinsics_from_homographies(&homographies)?;
}

Returns FxFyCxCySkew with the estimated intrinsics. Typically followed by distortion estimation and non-linear refinement.

Reference

Zhang, Z. (2000). "A Flexible New Technique for Camera Calibration." IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11), 1330-1334.

Distortion Estimation from Homography Residuals

After estimating intrinsics via Zhang's method (which assumes distortion-free observations), the residuals between observed and predicted pixel positions encode the lens distortion. This chapter derives a linear method to estimate Brown-Conrady distortion coefficients from these residuals.

Problem Statement

Given:

  • Camera intrinsics (from Zhang's method)
  • homographies from the calibration board to the image
  • Point correspondences: board points and observed pixels

Find: Distortion coefficients .

Assumptions:

  • is a reasonable estimate (possibly biased by distortion)
  • Distortion is moderate (the linear approximation holds)
  • The homographies were computed from distorted pixel observations

Derivation

Ideal vs. Observed Coordinates

For each correspondence in view , the ideal (undistorted) pixel position predicted by the homography is:

Convert both observed and ideal pixels to normalized coordinates:

The residual in normalized coordinates encodes the distortion effect:

Linearized Distortion Model

The Brown-Conrady distortion applied to point with produces a displacement:

This is linear in the distortion coefficients .

Building the Linear System

For each correspondence, write the and components:

where are the ideal normalized coordinates and are the observed residuals.

Stacking all correspondences from all views gives an overdetermined system:

where is (or smaller if some coefficients are fixed), and .

Solving

The least-squares solution is:

In practice, this is solved via SVD of for numerical stability.

Options

#![allow(unused)]
fn main() {
pub struct DistortionFitOptions {
    pub fix_tangential: bool,  // Fix p1 = p2 = 0
    pub fix_k3: bool,          // Fix k3 = 0 (default: true)
    pub iters: u32,            // Undistortion iterations
}
}
  • fix_k3 (default true): Removes from the system, reducing to a 4-parameter or 2-parameter model. Recommended unless the lens has strong higher-order radial distortion.
  • fix_tangential (default false): Removes , estimating only radial distortion. Useful when tangential distortion is known to be negligible.

When parameters are fixed, the corresponding columns are removed from .

Accuracy

This linear estimate is typically within 10-50% of the true distortion values. The accuracy depends on:

  • Quality of : If intrinsics are biased (from distorted observations), the estimated distortion absorbs some of the bias
  • Number of points: More correspondences improve the overdetermined system
  • Point distribution: Points across the full image area constrain distortion better than points clustered near the center (where distortion is small)

This accuracy is sufficient for initializing non-linear refinement.

API

#![allow(unused)]
fn main() {
let dist = estimate_distortion_from_homographies(
    &k_matrix, &views, opts
)?;
}

Returns BrownConrady5 with the estimated coefficients.

OpenCV equivalence: OpenCV estimates distortion jointly with intrinsics inside cv::calibrateCamera. The separate distortion estimation step is specific to calibration-rs's initialization approach.

Iterative Intrinsics + Distortion

Zhang's method assumes distortion-free observations. When applied to images from a camera with significant distortion, the estimated intrinsics are biased because the distorted pixels violate the linear homography model. The iterative intrinsics solver addresses this by alternating between intrinsics estimation and distortion correction.

Problem Statement

Given: views of a planar calibration board, with observed (distorted) pixel coordinates.

Find: Camera intrinsics and distortion coefficients jointly.

Assumptions:

  • The observations are distorted (raw detector output)
  • Distortion is moderate enough that Zhang's method on distorted pixels gives a usable initial
  • 1-3 iterations of alternating refinement suffice

Why Alternation Works

The joint estimation of and distortion is a non-convex problem. However, it decomposes naturally:

  • Given : Distortion estimation is a linear problem (see Distortion Fit)
  • Given distortion: Undistorting the pixels and re-estimating is a linear problem (Zhang's method)

Each step solves a convex subproblem. The alternation converges because:

  1. The initial Zhang estimate (ignoring distortion) is typically within the basin where the alternation contracts
  2. Each step reduces the residual between the model and observations
  3. The coupling between and distortion is relatively weak for moderate distortion

Algorithm

Input: Views with distorted observations

Output: Intrinsics , distortion

  1. Compute homographies from distorted pixels via DLT

  2. Estimate initial from via Zhang's method

  3. For :

    a. Estimate distortion from homography residuals using

    b. Undistort all observed pixels:

    c. Recompute homographies from undistorted pixels

    d. Re-estimate from via Zhang's method

    e. (Optional) Enforce zero skew:

  4. Return

Convergence

Typically 1-2 iterations suffice:

  • Iteration 0 (Zhang on distorted pixels): 10-40% intrinsics error
  • Iteration 1: Distortion estimate corrects the dominant radial effect; intrinsics error drops to 5-20%
  • Iteration 2: Further refinement; diminishing returns

More iterations are safe but rarely improve the estimate significantly. The default is 2 iterations.

Configuration

#![allow(unused)]
fn main() {
pub struct IterativeIntrinsicsOptions {
    pub iterations: usize,                    // 1-3 typical (default: 2)
    pub distortion_opts: DistortionFitOptions, // Controls fix_k3, fix_tangential, iters
    pub zero_skew: bool,                      // Force skew = 0 (default: true)
}
}

The Undistortion Step

Step 3b converts distorted pixels back to "ideal" pixels by inverting the distortion model:

  1. Convert distorted pixel to normalized coordinates:
  2. Undistort using the current distortion estimate: (iterative fixed-point, see Brown-Conrady Distortion)
  3. Convert back to pixel:

Note that this uses both for normalization and for converting back — the undistorted pixels are in the same coordinate system as the original distorted pixels, just without the distortion effect.

Accuracy Expectations

StageTypical intrinsics error
Zhang on distorted pixels10-40%
After 1 iteration5-20%
After 2 iterations5-15%
After non-linear refinement<2%

The iterative linear estimate is not meant to be highly accurate. Its purpose is to provide a starting point good enough for the non-linear optimizer to converge.

API

#![allow(unused)]
fn main() {
use vision_calibration::linear::iterative_intrinsics::*;
use vision_calibration::linear::DistortionFitOptions;

let opts = IterativeIntrinsicsOptions {
    iterations: 2,
    distortion_opts: DistortionFitOptions {
        fix_k3: true,
        fix_tangential: false,
        iters: 8,
    },
    zero_skew: true,
};

let camera = estimate_intrinsics_iterative(&dataset, opts)?;
// camera is PinholeCamera = Camera<f64, Pinhole, BrownConrady5, IdentitySensor, FxFyCxCySkew>
// camera.k: FxFyCxCySkew (intrinsics)
// camera.dist: BrownConrady5 (distortion)
}

Pose from Homography

Given camera intrinsics and a homography from a planar calibration board to the image, we can decompose to recover the camera pose (rotation and translation ) relative to the board.

Problem Statement

Given: Intrinsics and homography such that for board points at .

Find: Rigid transform (board to camera).

Assumptions:

  • The board lies at in world coordinates
  • is known (or has been estimated)
  • The homography was computed from correct correspondences

Derivation

Extracting Rotation and Translation

Recall from the Homography chapter:

where are the first two columns of the rotation matrix and is the translation.

Removing the intrinsics:

Let . The scale factor is recovered from the constraint that and have unit norm:

Then:

The third rotation column is:

Projecting onto SO(3)

Due to noise, the matrix is not exactly orthonormal. We project it onto using SVD:

If , flip the sign of the third column of and recompute.

Ensuring Forward-Facing

If , the board is behind the camera. In this case, flip the sign of both and :

This resolves the sign ambiguity inherent in the scale factor .

Accuracy

The pose from homography is an approximate estimate because:

  1. The homography itself is subject to noise (DLT algebraic error minimization, not geometric)
  2. The SVD projection onto SO(3) corrects non-orthogonality but introduces additional error
  3. Distortion (if not corrected) biases the homography

Typical rotation error: 1-5°. Typical translation direction error: 5-15%. These estimates are refined in non-linear optimization.

OpenCV equivalence: cv::decomposeHomographyMat provides a similar decomposition, returning up to 4 pose candidates. calibration-rs returns a single pose by resolving ambiguities via the forward-facing constraint.

API

#![allow(unused)]
fn main() {
let pose = estimate_planar_pose_from_h(&K, &H)?;
// pose: Iso3 (T_C_B: board to camera transform)
}

Usage in Calibration Pipeline

In the planar intrinsics calibration pipeline, pose estimation is applied to every view after has been estimated:

  1. Compute homography per view
  2. Estimate from all homographies (Zhang's method)
  3. Decompose each to get pose
  4. Use poses as initial values for non-linear optimization

Perspective-n-Point Solvers

The Perspective-n-Point (PnP) problem estimates the camera pose from known 3D-2D point correspondences. Unlike homography-based pose estimation, PnP does not require coplanar points.

Problem Statement

Given: 3D world points and their corresponding 2D image points , plus camera intrinsics .

Find: Camera pose such that .

Assumptions:

  • Camera intrinsics are known
  • Correspondences are correct (or RANSAC is used)
  • Points are not degenerate (e.g., not all collinear)

P3P: Kneip's Minimal Solver

The P3P solver uses exactly 3 correspondences — the minimum for a finite number of solutions. It returns up to 4 candidate poses.

Algorithm

Input: 3 world points , 3 pixel points , intrinsics .

  1. Bearing vectors: Convert pixels to unit bearing vectors in the camera frame:

  1. Inter-point distances in world frame:

  1. Bearing vector cosines:

  1. Quartic polynomial: Using the ratios and , Kneip derives a quartic polynomial in a distance ratio . The coefficients are functions of , , , , .

  2. Solve quartic for up to 4 real roots .

  3. For each root:

    • Compute the second distance ratio
    • Compute the three camera-frame distances (depths of the three points)
    • Back-project to 3D points in camera frame:
    • Recover pose from the 3D-3D correspondence using SVD-based rigid alignment

Disambiguation

P3P returns up to 4 poses. To select the correct one, use a fourth point (or more points with RANSAC) and pick the pose with the smallest reprojection error.

DLT PnP: Linear Solver for

The DLT (Direct Linear Transform) PnP uses an overdetermined system for points.

Derivation

The projection equation in normalized coordinates is:

where are normalized coordinates (after applying to pixels) and are rows of .

Cross-multiplying gives two equations per point:

The 12 unknowns are the entries of the matrix .

The Linear System

For each point :

Stacking gives matrix . Solve via SVD.

Post-Processing

  1. Reshape the 12-vector into a matrix
  2. Normalize the scale using the row norms of the block
  3. Project the rotation block onto SO(3) via SVD (same as in Pose from Homography)
  4. Extract translation from the fourth column

Hartley Normalization

The 3D world points are normalized before building the system (center at origin, scale mean distance to ). The image points are normalized via . The result is denormalized after solving.

RANSAC Wrappers

Both solvers have RANSAC variants for handling outliers:

#![allow(unused)]
fn main() {
// DLT PnP + RANSAC
let (pose, inliers) = dlt_ransac(
    &world_pts, &image_pts, &K,
    &RansacOptions { thresh: 5.0, ..Default::default() }
)?;
}

The DLT PnP solver uses MIN_SAMPLES = 6 with RANSAC. P3P (p3p()) returns up to 4 candidates and is intended for manual disambiguation (e.g., using a 4th point), not as a RANSAC estimator.

Comparison

SolverMin. pointsSolutionsStrengths
P3P3Up to 4Best for RANSAC (minimal sample)
DLT PnP61Simple, no polynomial solving

OpenCV equivalence: cv::solvePnP with SOLVEPNP_P3P or SOLVEPNP_DLS; cv::solvePnPRansac for robust estimation.

References

  • Kneip, L., Scaramuzza, D., & Siegwart, R. (2011). "A Novel Parametrization of the Perspective-Three-Point Problem for a Direct Computation of Absolute Camera Position and Orientation." CVPR.

Epipolar Geometry

Epipolar geometry describes the geometric relationship between two views of the same scene. It is encoded in the fundamental matrix (for pixel coordinates) or the essential matrix (for normalized coordinates). These matrices are central to stereo vision, visual odometry, and structure from motion.

Fundamental Matrix

Problem Statement

Given: point correspondences in pixel coordinates from two views.

Find: Fundamental matrix satisfying for all correspondences.

Properties of :

  • Rank 2 (by the epipolar constraint)
  • 7 degrees of freedom (9 entries, scale ambiguity, rank constraint)
  • Maps a point in one image to its epipolar line in the other:

8-Point Algorithm

The simplest method, requiring correspondences.

Derivation: The constraint expands to:

Each correspondence gives one linear equation in the 9 entries of .

Algorithm:

  1. Normalize both point sets using Hartley normalization
  2. Build design matrix where each row is:
  3. Solve via SVD; is the last column of
  4. Enforce rank 2: Decompose and set :
  5. Denormalize:

The rank-2 enforcement is critical: without it, the epipolar lines do not intersect at a common epipole.

7-Point Algorithm

A minimal solver using exactly 7 correspondences, producing 1 or 3 candidate matrices.

Derivation: With 7 equations for 9 unknowns (modulo scale), the null space of is 2-dimensional. Let the null space basis be and . The fundamental matrix is:

for some scalar . The rank-2 constraint gives a cubic equation in :

This cubic has 1 or 3 real roots, each giving a candidate .

Essential Matrix

Problem Statement

Given: point correspondences in normalized coordinates where .

Find: Essential matrix satisfying .

Properties of :

  • where is the relative rotation and is the translation direction
  • Has exactly two equal non-zero singular values: ,
  • 5 degrees of freedom (3 rotation + 2 translation direction)

5-Point Algorithm (Nister)

The minimal solver, producing up to 10 candidate matrices.

Algorithm:

  1. Normalize both point sets using Hartley normalization
  2. Build matrix (same form as 8-point)
  3. Null space: SVD of gives a 4-dimensional null space. The essential matrix is: for unknowns , where are the null space basis vectors
  4. Polynomial constraints: The essential matrix constraint (9 equations) and (1 equation) generate polynomial equations in
  5. Action matrix: These constraints are assembled into a polynomial system solved via the eigenvalues of a action matrix
  6. Extract real solutions: Each real eigenvalue determines and thus

Up to 10 candidate essential matrices may be returned.

Decomposing into

Given , there are 4 possible decompositions:

where and is the third column of .

Chirality check: Of the 4 candidates , only one places all triangulated points in front of both cameras. This is verified by triangulating a test point and checking that in both views.

Coordinate Conventions

MatrixInput coordinatesUsage
Fundamental Pixel coordinatesWhen intrinsics are unknown
Essential Normalized coordinates ()When intrinsics are known

The relationship is: .

RANSAC

Both solvers have RANSAC wrappers for outlier rejection:

#![allow(unused)]
fn main() {
let (F, inliers) = fundamental_8point_ransac(&pts1, &pts2, &opts)?;
}

Currently, only the 8-point fundamental matrix solver has a RANSAC wrapper. The 5-point essential matrix solver (essential_5point) returns multiple candidates and is used directly.

OpenCV equivalence: cv::findFundamentalMat (8-point and 7-point), cv::findEssentialMat (5-point with RANSAC).

References

  • Nister, D. (2004). "An Efficient Solution to the Five-Point Relative Pose Problem." IEEE TPAMI, 26(6), 756-770.
  • Hartley, R.I. (1997). "In Defense of the Eight-Point Algorithm." IEEE TPAMI, 19(6), 580-593.

Camera Matrix and RQ Decomposition

The camera matrix (or projection matrix) is a matrix that directly maps 3D world points to 2D pixel coordinates. Estimating via DLT and decomposing it into intrinsics and extrinsics via RQ decomposition provides an alternative initialization path.

Camera Matrix DLT

Problem Statement

Given: correspondences between 3D world points and 2D pixel points .

Find: projection matrix such that .

Derivation

The projection (with in homogeneous coordinates) gives, after cross-multiplication:

where is the -th row of .

This gives a system , solved via SVD with Hartley normalization of the 3D and 2D points.

Post-Processing

After denormalization, is but not guaranteed to decompose cleanly into due to noise. The RQ decomposition extracts the components.

RQ Decomposition

Problem Statement

Given: The left submatrix of (where ).

Find: Upper-triangular and orthogonal such that .

Algorithm

RQ decomposition is computed by transposing QR decomposition:

  1. Compute QR decomposition of :
  2. Then , where is lower-triangular and is orthogonal
  3. Apply a permutation matrix to flip the matrix to upper-triangular form:
    • (upper-triangular)
    • (orthogonal)

Sign Conventions

After decomposition, ensure:

  • has positive diagonal entries: if , negate column of and row of
  • : if , negate a column of (and the corresponding column of )

Translation Extraction

Full Decomposition

The CameraMatrixDecomposition struct:

#![allow(unused)]
fn main() {
pub struct CameraMatrixDecomposition {
    pub k: Mat3,   // Upper-triangular intrinsics
    pub r: Mat3,   // Rotation matrix (orthonormal, det = +1)
    pub t: Vec3,   // Translation vector
}
}

API

#![allow(unused)]
fn main() {
// Estimate the full 3×4 camera matrix
let P = dlt_camera_matrix(&world_pts, &image_pts)?;

// Decompose into K, R, t
let decomp = decompose_camera_matrix(&P)?;
println!("Intrinsics: {:?}", decomp.k);
println!("Rotation: {:?}", decomp.r);
println!("Translation: {:?}", decomp.t);

// Or just RQ decompose any 3×3 matrix
let (K, R) = rq_decompose(&M);
}

When to Use

Camera matrix DLT is useful when:

  • You have non-coplanar 3D-2D correspondences and want to estimate both intrinsics and pose simultaneously
  • You need a quick estimate of from a single view (without multiple homographies)

For calibration with a planar board, Zhang's method is preferred because it uses the planar constraint to get more constraints per view.

Linear Triangulation

Triangulation reconstructs a 3D point from its projections in two or more calibrated views. It is the inverse of the projection operation: given pixel observations and camera poses, recover the world point.

Problem Statement

Given: camera projection matrices (each ) and corresponding pixel observations .

Find: 3D point such that .

Assumptions:

  • Camera poses and intrinsics are known
  • Correspondences are correct
  • The point is visible from all views (not occluded)

Derivation (DLT)

For each view , the projection constraint gives (by cross-multiplication):

where is the -th row of and .

Stacking all views gives a system:

Solve via SVD: is the right singular vector corresponding to the smallest singular value. Dehomogenize: .

Limitations

  • No uncertainty modeling: The DLT minimizes algebraic error, not geometric (reprojection) error
  • Baseline sensitivity: For small baselines (nearly parallel views), the triangulation is poorly conditioned — small pixel errors lead to large depth errors
  • Outlier sensitivity: A single incorrect correspondence corrupts the result; no robustness mechanism

For high-accuracy 3D reconstruction, triangulation should be followed by bundle adjustment that jointly optimizes points and cameras to minimize reprojection error.

API

#![allow(unused)]
fn main() {
let X = triangulate_point_linear(&projection_matrices, &pixel_points)?;
}

Where each projection matrix is precomputed from the camera intrinsics and pose.

Hand-Eye Calibration (Tsai-Lenz)

Hand-eye calibration estimates the rigid transform between a camera and a robot gripper (or between a camera and a robot base). It is essential for any application where a camera is mounted on a robot arm and the robot needs to localize objects in its own coordinate frame.

The AX = XB Problem

Setup

Consider a camera rigidly mounted on a robot gripper (eye-in-hand configuration). The system involves four coordinate frames:

  • Base (B): The robot's fixed base frame
  • Gripper (G): The robot's end-effector frame (known from robot kinematics)
  • Camera (C): The camera frame (observations come from here)
  • Target (T): The calibration board frame (fixed in the world)

The unknown is (gripper-to-camera transform).

The Equation

Given two observations , we can compute:

  • Robot motion: (relative gripper motion, known from robot kinematics)
  • Camera motion: (relative camera motion, from calibration board observations)

Since the camera is rigidly attached to the gripper:

This is the classic equation. We need to find that satisfies this for all motion pairs.

Eye-to-Hand Variant

When the camera is fixed and the target is on the gripper (eye-to-hand), the equation becomes:

where now is the relative gripper motion and is the relative target-in-camera motion, and (camera-to-base transform).

The Tsai-Lenz Method

Overview

Tsai and Lenz (1989) decomposed into separate rotation and translation subproblems:

  1. Rotation: Solve for
  2. Translation: Given , solve for

Step 1: All-Pairs Motion Computation

From observations, construct all motion pairs. For each pair :

Step 2: Filtering

Discard pairs with insufficient rotation (degenerate for the rotation subproblem):

  • Reject pairs where (default: 10°)
  • Optionally reject pairs where rotation axes of and are near-parallel (ill-conditioned)

Rotation diversity is critical: If all robot motions are rotations around the same axis, the hand-eye rotation around that axis is undetermined. Use poses with diverse rotation axes (roll, pitch, and yaw).

Step 3: Rotation Estimation

The rotation constraint is solved using quaternion algebra.

Convert and to quaternions and . The constraint becomes:

Using the left and right quaternion multiplication matrices and :

Stacking all motion pairs gives a system:

Solve via SVD: is the right singular vector of corresponding to the smallest singular value. Normalize to unit quaternion.

Step 4: Translation Estimation

Given , the translation constraint for each motion pair is:

This is a overdetermined linear system , solved via least squares:

with optional ridge regularization for numerical stability.

Target-in-Base Estimation

After finding , the target pose in the base frame can be estimated. For each observation :

The estimates from different views are averaged (quaternion averaging for rotation, arithmetic mean for translation).

Alternatively, is included in the non-linear optimization.

Practical Requirements

  • Minimum 3 views with diverse rotations (in practice, 5-10 views recommended)
  • Rotation diversity: Motions should span multiple rotation axes. Pure translations provide no rotation constraint. Pure Z-axis rotations leave the Z-component of the hand-eye rotation undetermined.
  • Accuracy: The linear Tsai-Lenz method typically gives 5-20% translation accuracy and 2-10° rotation accuracy. This initializes the non-linear joint optimization.

API

#![allow(unused)]
fn main() {
let X = estimate_handeye_dlt(
    &base_se3_gripper,      // &[Iso3]: robot poses (base to gripper)
    &target_se3_camera,     // &[Iso3]: camera-to-target poses (inverted)
    min_angle_deg,          // f64: minimum rotation angle filter
)?;
}

Returns as an Iso3 transform.

OpenCV equivalence: cv::calibrateHandEye with method CALIB_HAND_EYE_TSAI.

References

  • Tsai, R.Y. & Lenz, R.K. (1989). "A New Technique for Fully Autonomous and Efficient 3D Robotics Hand/Eye Calibration." IEEE Transactions on Robotics and Automation, 5(3), 345-358.

Multi-Camera Rig Initialization

A multi-camera rig is a set of cameras rigidly mounted on a common frame. The extrinsics of the rig describe the relative pose of each camera with respect to a reference camera (or a rig frame). This chapter covers the linear initialization of rig extrinsics from per-camera pose estimates.

Problem Statement

Given: Per-camera, per-view pose estimates (camera , view , target to camera).

Find: Camera-to-rig transforms and rig-to-target poses .

Assumptions:

  • All cameras observe the same calibration target simultaneously
  • Per-camera poses have been estimated (e.g., via PnP or homography decomposition)
  • The rig is rigid (camera-to-rig transforms are constant across views)

Algorithm

Reference Camera

One camera is designated as the reference camera (default: camera 0). Its camera-to-rig transform is identity:

This defines the rig frame to coincide with the reference camera frame.

Rig-to-Target from Reference

For each view , the rig-to-target pose is the reference camera's pose:

Camera-to-Rig via Averaging

For each non-reference camera , collect the camera-to-rig estimate from each view where both cameras have observations:

Then average across views:

  • Rotation: Quaternion averaging with hemisphere correction. Before averaging, flip quaternions to the same hemisphere (since and represent the same rotation). Then compute the mean quaternion and normalize.
  • Translation: Arithmetic mean of translation vectors.

Quaternion Hemisphere Correction

Quaternions have a double-cover of SO(3): both and represent the same rotation. Before averaging, all quaternions are flipped to the same hemisphere as the first:

This prevents averaging artifacts where opposite quaternions cancel out.

Accuracy

The initialization accuracy depends on:

  • Per-camera pose accuracy: Noisy single-camera poses (from homography decomposition or PnP) propagate to the rig extrinsics
  • Number of views: More views improve the averaging
  • View diversity: Diverse viewpoints reduce systematic errors

Typical accuracy: 1-5° rotation, 5-15% translation. This is refined in the rig bundle adjustment (see Multi-Camera Rig Extrinsics).

API

#![allow(unused)]
fn main() {
let extrinsic_poses = estimate_extrinsics_from_cam_target_poses(
    &cam_se3_target,    // Vec<Vec<Option<Iso3>>>: per-view, per-camera poses
    ref_cam_idx,        // usize: reference camera index
)?;

// extrinsic_poses.cam_to_rig: Vec<Iso3>  (T_{R,C} for each camera)
// extrinsic_poses.rig_from_target: Vec<Iso3>  (T_{R,T} for each view)
}

Usage in Calibration Pipeline

In the rig extrinsics calibration pipeline:

  1. Per-camera intrinsics are estimated and optimized independently
  2. Per-camera poses are recovered from the calibration board
  3. Rig extrinsics are initialized via averaging (this chapter)
  4. Rig bundle adjustment jointly optimizes all parameters

Laser Plane Initialization

In laser triangulation, a laser line is projected onto the scene and a camera observes the line. The laser plane — defined by a normal vector and distance from the camera origin — must be calibrated. This chapter describes the linear initialization of the laser plane from observations of the laser line on a calibration target.

Problem Statement

Given:

  • Camera intrinsics and distortion
  • Per-view camera poses (target to camera)
  • Per-view laser pixel observations (pixels where the laser line intersects the target)

Find: Laser plane in the camera frame, where is the unit normal and is the signed distance from the camera origin:

for all 3D points on the laser plane (in camera coordinates).

Algorithm

Step 1: Back-Project Laser Pixels to 3D

For each laser pixel in each view:

  1. Undistort the pixel to get normalized coordinates:
  2. Form a ray in the camera frame:
  3. Intersect with the target plane: The target at in target coordinates, transformed to camera frame by , defines a plane. Solve for that places the ray on this plane.
  4. Compute 3D point: in camera coordinates

This gives a set of 3D points on the laser plane.

Step 2: Fit a Plane via SVD

Given 3D points on the laser plane:

  1. Centroid:
  2. Center the points:
  3. Covariance matrix:
  4. SVD:
  5. Normal: is the eigenvector corresponding to the smallest singular value (last column of )
  6. Distance:

Degeneracy Detection

If all 3D points are nearly collinear (e.g., because all views have the same target pose), the plane is underdetermined. The algorithm checks the ratio of the two smallest singular values:

If the points are too close to collinear, the plane fit fails with an error.

Plane Representation for Optimization

In the non-linear optimization, the plane is parameterized as:

  • Normal: A unit vector (2 degrees of freedom on the unit sphere)
  • Distance: A scalar (1 degree of freedom)

The manifold is used for the normal to maintain the unit-norm constraint during optimization (see Manifold Optimization).

Accuracy

The linear plane initialization accuracy depends on:

  • View diversity: Different target orientations provide points at different locations on the laser plane, improving the fit
  • Camera calibration accuracy: Errors in intrinsics, distortion, or poses propagate to the 3D point estimates
  • Number of laser pixels: More pixels (from more views or more detected pixels per view) improve the SVD fit

Typical accuracy: 1-5° normal direction error, 5-20% distance error. Refined in the Laserline Device Calibration optimization.

API

#![allow(unused)]
fn main() {
use vision_calibration::linear::laserline::{LaserlinePlaneSolver, LaserlineView};

// Each view carries laser pixels and the camera-to-target pose
let views: Vec<LaserlineView> = poses.iter().zip(laser_pixels_per_view.iter())
    .map(|(pose, pixels)| LaserlineView {
        camera_se3_target: *pose,
        laser_pixels: pixels.clone(),
    })
    .collect();

let plane = LaserlinePlaneSolver::from_views(&views, &camera)?;
// plane.normal: UnitVector3<f64> (unit normal in camera frame)
// plane.distance: f64 (signed distance from camera origin)
// plane.rmse: f64 (fit residual)
}

Polynomial Solvers

Several minimal geometry solvers in calibration-rs reduce to polynomial equations. This chapter documents the polynomial root-finding utilities used by P3P, 7-point fundamental, and 5-point essential solvers.

Quadratic

Solved via the standard discriminant formula: . Returns 0, 1, or 2 real roots depending on the discriminant sign.

Cubic

Solved using Cardano's formula:

  1. Convert to depressed cubic via substitution
  2. Compute discriminant
  3. For (three real roots): use trigonometric form
  4. For (one real root): use Cardano's formula with cube roots

Returns 1 or 3 real roots.

Quartic

Solved via the companion matrix approach:

  1. Normalize to monic form: divide by
  2. Construct the companion matrix
  3. Compute eigenvalues using Schur decomposition (via nalgebra)
  4. Extract real eigenvalues: those with

Returns 0 to 4 real roots.

Usage in Minimal Solvers

SolverPolynomialDegreeMax roots
P3P (Kneip)Distance ratioQuartic4
7-point fundamentalCubic3
5-point essentialAction matrix eigenvaluesDegree 1010

The 5-point solver uses eigendecomposition of a matrix rather than a polynomial solver directly, but the underlying mathematics involves degree-10 polynomial constraints.

Numerical Considerations

  • All roots are deduplicated (roots closer than are merged)
  • Real root extraction uses a threshold on the imaginary part ()
  • The companion matrix approach is more numerically stable than analytical quartic formulas for extreme coefficient ratios

Non-Linear Least Squares Overview

Non-linear least squares (NLLS) is the mathematical framework underlying all refinement in calibration-rs. After linear initialization provides approximate parameter estimates, NLLS minimizes the reprojection error to achieve sub-pixel accuracy.

Problem Formulation

Objective: Minimize the sum of squared residuals:

where is the parameter vector and is the stacked residual vector.

In camera calibration, a typical residual is the reprojection error: the difference between an observed pixel and the predicted projection:

where is the camera projection function, and the parameters include intrinsics , distortion , and poses .

Gauss-Newton Method

The Gauss-Newton method exploits the least-squares structure. At the current estimate , linearize the residuals:

where is the Jacobian.

Substituting into the objective and minimizing with respect to :

gives the normal equations:

The matrix is the Gauss-Newton approximation to the Hessian ( when residuals are small).

Update:

Levenberg-Marquardt Method

Gauss-Newton can diverge when the linearization is poor (far from the minimum) or when is singular. Levenberg-Marquardt (LM) adds a damping term:

The damping parameter interpolates between:

  • : Pure Gauss-Newton (fast convergence near minimum)
  • : Gradient descent with small step (safe far from minimum)

Trust Region Interpretation

LM is a trust region method: controls the size of the region where the linear approximation is trusted.

  • If the update reduces the cost: accept the step, decrease (expand trust region)
  • If the update increases the cost: reject the step, increase (shrink trust region)

Convergence Criteria

LM terminates when any of:

  • Cost threshold:
  • Absolute decrease:
  • Relative decrease:
  • Maximum iterations:
  • Parameter change:

Sparsity

In bundle adjustment problems, the Jacobian is sparse: each residual depends on only a few parameter blocks (one camera intrinsics, one distortion, one pose). The matrix has a block-arrow structure that can be exploited by sparse linear solvers:

  • Sparse Cholesky: Efficient for well-structured problems
  • Sparse QR: More robust when the normal equations are ill-conditioned

calibration-rs uses sparse linear solvers through the tiny-solver backend.

Cost Function vs. Reprojection Error

The optimizer minimizes the cost . The commonly reported mean reprojection error is:

These are related but not identical: the cost weights large errors quadratically, while the mean error weighs them linearly. calibration-rs reports both: the cost from the solver and the mean reprojection error computed post-optimization.

With Robust Losses

When a robust loss is used, the objective becomes:

The normal equations are modified to incorporate the loss function's weight:

where is the iteratively reweighted diagonal matrix. See Robust Loss Functions for details.

Manifold Optimization

Camera poses are elements of the Lie group , which is a 6-dimensional smooth manifold — not a Euclidean space. Laser plane normals live on the unit sphere . Optimizing these parameters with standard Euclidean updates (adding to the parameter vector) would violate the manifold constraints: rotations would become non-orthogonal, quaternions would lose unit norm, and normals would lose unit length.

Manifold optimization solves this by computing updates in the tangent space and then retracting back to the manifold.

The Retract-Update Pattern

For a parameter on a manifold :

  1. Compute update in the tangent space (a Euclidean space of dimension equal to the manifold's degrees of freedom)
  2. Retract: where maps from tangent space back to the manifold

This is also called local parameterization: the optimizer sees a Euclidean space of the correct dimension, and the manifold structure handles the constraint.

SE(3): Rigid Transforms

Ambient dimension: 7 (quaternion + translation )

Tangent dimension: 6 ()

Plus operation (): Given current pose and tangent vector :

where is the SE(3) exponential map (see Rigid Transformations):

Minus operation (): The inverse — compute the tangent vector between two poses — uses the SE(3) logarithm.

Implementation in calibration-rs

The se3_exp function in the factor module computes the exponential map generically over RealField for autodiff compatibility:

#![allow(unused)]
fn main() {
fn se3_exp<T: RealField>(xi: &[T; 6]) -> [[T; 4]; 4]
}

This uses Rodrigues' formula with special handling for the small-angle case () to avoid division by zero.

SO(3): Rotations

Ambient dimension: 4 (unit quaternion)

Tangent dimension: 3 (axis-angle )

Plus operation:

SO(3) is used for rotation-only parameters. In calibration-rs, most poses use the full SE(3), but SO(3) is available for specialized problems.

: Unit Sphere

Ambient dimension: 3 (unit vector , )

Tangent dimension: 2 ()

The manifold is used for laser plane normals — 3D unit vectors with 2 degrees of freedom.

Plus operation: Given current normal and tangent vector :

  1. Compute orthonormal basis of the tangent plane at
  2. Tangent vector in 3D:
  3. Retract:

For small , this is equivalent to the exponential map on .

Basis Construction

The tangent plane basis is constructed deterministically:

  1. Choose a reference vector not parallel to (e.g., or )
  2. , normalized

Euclidean Parameters

For intrinsics (), distortion (), and other unconstrained parameters, the manifold is simply with:

No special treatment needed.

ManifoldKind in the IR

The optimization IR specifies the manifold for each parameter block:

#![allow(unused)]
fn main() {
pub enum ManifoldKind {
    Euclidean,  // R^n, standard addition
    SE3,        // 7D ambient, 6D tangent
    SO3,        // 4D ambient, 3D tangent
    S2,         // 3D ambient, 2D tangent
}
}

The backend maps these to solver-specific manifold implementations.

Fixed Parameters on Manifolds

For Euclidean parameters, individual indices can be fixed (e.g., fix but optimize ). For manifold parameters (SE3, SO3, S2), fixing is all-or-nothing: either the entire parameter block is optimized or the entire block is fixed. This is because the tangent space decomposition does not naturally support partial fixing on non-Euclidean manifolds.

Backend-Agnostic IR Architecture

calibration-rs separates the definition of optimization problems from their execution by a specific solver. Problems are expressed as an Intermediate Representation (IR) that is compiled to a solver-specific form. This design allows swapping backends without changing problem definitions and makes it straightforward to add new problem types.

Three-Stage Pipeline

Problem Builder  →  ProblemIR  →  Backend.compile()  →  Backend.solve()
 (domain code)      (generic)     (solver-specific)     (optimization)
  1. Problem Builder: Domain code (e.g., build_planar_intrinsics_ir()) constructs a ProblemIR from calibration data and initial parameter estimates
  2. Backend Compilation: The backend (e.g., TinySolverBackend) translates the IR into solver-specific data structures
  3. Solving: The backend runs the optimizer and returns a BackendSolution

ProblemIR

The central type that describes a complete optimization problem:

#![allow(unused)]
fn main() {
pub struct ProblemIR {
    pub params: Vec<ParamBlock>,
    pub residuals: Vec<ResidualBlock>,
}
}

It is validated on construction to ensure all parameter references are valid, dimensions match, and manifold constraints are respected.

ParamBlock

Each parameter block represents a group of optimization variables:

#![allow(unused)]
fn main() {
pub struct ParamBlock {
    pub id: ParamId,
    pub name: String,
    pub dim: usize,
    pub manifold: ManifoldKind,
    pub fixed: FixedMask,
    pub bounds: Option<Vec<Bound>>,  // per-index box constraints
}
}
FieldDescription
idUnique identifier within the problem
nameHuman-readable name (e.g., "cam", "pose/0", "plane")
dimAmbient dimension (e.g., 4 for intrinsics, 7 for SE3)
manifoldGeometry of the parameter space
fixedWhich indices (or the whole block) are held constant
boundsOptional box constraints on parameter values

Naming Conventions

Problem builders use consistent naming:

  • "intrinsics" or "cam" — intrinsics (), dimension 4
  • "distortion" or "dist" — distortion (), dimension 5
  • "sensor" — Scheimpflug tilt (), dimension 2
  • "pose/0", "pose/1", ... — per-view SE(3) poses (planar, rig), dimension 7
  • "pose_0", "pose_1", ... — per-view SE(3) poses (laserline), dimension 7
  • "plane_normal" — laser plane unit normal, dimension 3, manifold
  • "plane_distance" — laser plane distance, dimension 1
  • "handeye" — hand-eye SE(3), dimension 7
  • "extrinsics/0", ... — per-camera SE(3) extrinsics, dimension 7

FixedMask

Controls which parameters are held constant during optimization:

#![allow(unused)]
fn main() {
// Construct via factory methods:
FixedMask::all_free()              // nothing fixed
FixedMask::all_fixed(dim)          // everything fixed
FixedMask::fix_indices(&[2])       // fix specific indices
}
  • Euclidean parameters: Individual indices can be fixed. For example, FixedMask::fix_indices(&[2]) on intrinsics fixes while allowing to vary.
  • Manifold parameters: All-or-nothing. If any index is fixed, the entire block is fixed. (The tangent space does not support partial fixing.)

ResidualBlock

Each residual block connects parameter blocks through a factor:

#![allow(unused)]
fn main() {
pub struct ResidualBlock {
    pub params: Vec<ParamId>,
    pub loss: RobustLoss,
    pub factor: FactorKind,
    pub residual_dim: usize,
}
}

The params vector references ParamBlocks by their ParamId. The ordering must match what the factor expects.

FactorKind

An enum of all supported residual computations:

#![allow(unused)]
fn main() {
pub enum FactorKind {
    // Reprojection factors — all carry per-observation data
    ReprojPointPinhole4 { pw: [f64; 3], uv: [f64; 2], w: f64 },
    ReprojPointPinhole4Dist5 { pw: [f64; 3], uv: [f64; 2], w: f64 },
    ReprojPointPinhole4Dist5Scheimpflug2 { pw: [f64; 3], uv: [f64; 2], w: f64 },
    ReprojPointPinhole4Dist5TwoSE3 { pw: [f64; 3], uv: [f64; 2], w: f64 },
    ReprojPointPinhole4Dist5HandEye {
        pw: [f64; 3], uv: [f64; 2], w: f64,
        base_to_gripper_se3: [f64; 7], mode: HandEyeMode,
    },
    ReprojPointPinhole4Dist5HandEyeRobotDelta {
        pw: [f64; 3], uv: [f64; 2], w: f64,
        base_to_gripper_se3: [f64; 7], mode: HandEyeMode,
    },

    // Laser factors
    LaserPlanePixel { laser_pixel: [f64; 2], w: f64 },
    LaserLineDist2D { laser_pixel: [f64; 2], w: f64 },

    // Prior factors
    Se3TangentPrior { sqrt_info: [f64; 6] },
}
}

Each variant carries the per-residual data (3D point coordinates, observed pixel, weight, etc.) that is not part of the optimizable parameters. The factor also specifies which parameter blocks it expects and validates their dimensions.

Reprojection Factors

All reprojection factors compute the pixel residual:

The variants differ in which parameters are involved:

FactorParametersResidual dim
ReprojPointPinhole4[intrinsics, pose]2
ReprojPointPinhole4Dist5[intrinsics, distortion, pose]2
ReprojPointPinhole4Dist5Scheimpflug2[intrinsics, distortion, sensor, pose]2
ReprojPointPinhole4Dist5TwoSE3[intrinsics, distortion, extrinsics, rig_pose]2
ReprojPointPinhole4Dist5HandEye[intrinsics, distortion, extrinsics, handeye, target_pose]2
ReprojPointPinhole4Dist5HandEyeRobotDelta[intrinsics, distortion, extrinsics, handeye, target_pose, robot_delta]2

Laser Factors

FactorParametersResidual dimDescription
LaserPlanePixel[intrinsics, distortion, pose, plane]1Point-to-plane 3D distance
LaserLineDist2D[intrinsics, distortion, pose, plane]1Line distance in normalized plane

Prior Factors

FactorParametersResidual dimDescription
Se3TangentPrior[se3_param]6Zero-mean Gaussian prior on SE(3) tangent

Validation

ProblemIR::new() validates:

  • All ParamId references in residual blocks resolve to existing parameter blocks
  • Parameter dimensions match what each factor expects
  • Manifold kinds are compatible with the factor requirements
  • No duplicate parameter IDs

Initial Values

Parameter blocks carry only their structural definition (dimension, manifold, fixing). The initial values are passed separately as a HashMap<String, DVector<f64>> (keyed by parameter block name) to the backend's solve() method.

Example: Planar Intrinsics IR

For a 10-view planar intrinsics problem with 48 points per view:

Parameters:
  - "cam": dim=4, Euclidean (fx, fy, cx, cy)
  - "dist": dim=5, Euclidean (k1, k2, k3, p1, p2)
  - "pose/0"..."pose/9": dim=7, SE3

Residuals: 480 blocks
  - Each: ReprojPointPinhole4Dist5 { pw, uv, w=1.0 }
  - Params: [cam_id, dist_id, pose_k_id]
  - Residual dim: 2

Total residuals: 960 scalar values
Total parameters: 4 + 5 + 10*7 = 79

Autodiff and Generic Residual Functions

The Levenberg-Marquardt algorithm requires the Jacobian of the residual vector. Computing this analytically for complex camera models (distortion, Scheimpflug tilt, SE(3) transforms) is error-prone and tedious. calibration-rs uses automatic differentiation (autodiff) via dual numbers to compute exact Jacobians from the same code that evaluates residuals.

Dual Numbers

A dual number extends a real number with an infinitesimal part:

Arithmetic preserves the chain rule automatically:

By setting for the parameter of interest and for others, evaluating simultaneously computes both and .

For multi-variable functions, hyper-dual numbers or multi-dual numbers generalize this to compute full Jacobians. The num-dual crate provides these types.

The RealField Pattern

All residual functions in calibration-rs are written generic over the scalar type:

#![allow(unused)]
fn main() {
fn reproj_residual_pinhole4_dist5_se3_generic<T: RealField>(
    intr: DVectorView<'_, T>,  // [fx, fy, cx, cy]
    dist: DVectorView<'_, T>,  // [k1, k2, k3, p1, p2]
    pose: DVectorView<'_, T>,  // [qx, qy, qz, qw, tx, ty, tz]
    pw: [f64; 3],              // 3D world point (constant)
    uv: [f64; 2],              // observed pixel (constant)
    w: f64,                    // weight (constant)
) -> SVector<T, 2>
}

When called with T = f64, this computes the residual value. When called with T = DualNumber, it simultaneously computes the residual and its derivatives with respect to all parameters.

Design Rules for Autodiff-Compatible Code

  1. Use .clone() liberally: Dual numbers are small structs; cloning is cheap and avoids borrow checker issues with operator overloading.

  2. Avoid in-place operations: Operations like x += y or v[i] = expr can cause issues with dual number tracking. Prefer functional style.

  3. Convert constants: Use T::from_f64(1.0).unwrap() to convert floating-point literals. Do not use 1.0 directly where a T is expected.

  4. Constant data as f64: World points, observed pixels, and weights are constant data — not optimized. Keep them as f64 and convert to T inside the function.

Walkthrough: Reprojection Residual

The core reprojection residual computes:

Step by step:

1. Unpack Parameters

#![allow(unused)]
fn main() {
let fx = intr[0].clone();
let fy = intr[1].clone();
let cx = intr[2].clone();
let cy = intr[3].clone();

let k1 = dist[0].clone();
let k2 = dist[1].clone();
// ...
}

2. SE(3) Transform: World to Camera

Apply the pose to the 3D world point using the exponential map or direct quaternion rotation:

The implementation unpacks the quaternion and applies the rotation formula.

3. Pinhole Projection

4. Apply Distortion

5. Apply Intrinsics

6. Compute Residual

Every arithmetic operation on T values propagates derivatives automatically through the dual number mechanism.

Backend Integration

The backend wraps each generic residual function in a solver-specific factor struct that implements the solver's cost function interface:

FactorKind::ReprojPointPinhole4Dist5 { pw, uv, w }
    ↓ (compile step)
TinySolverFactor {
    evaluate: |params| {
        reproj_residual_pinhole4_dist5_se3_generic(
            params["cam"], params["dist"], params["pose/k"],
            pw, uv, w
        )
    }
}

The tiny-solver backend then calls evaluate with dual numbers to compute both residuals and Jacobians in a single pass.

SE(3) Exponential Map

A critical autodiff-compatible function is se3_exp<T>(), which maps a 6D tangent vector to an SE(3) transform:

#![allow(unused)]
fn main() {
fn se3_exp<T: RealField>(xi: &[T; 6]) -> [[T; 4]; 4]
}

This implements Rodrigues' formula with a special case for small angles () using Taylor expansions to avoid numerical issues with the dual number's infinitesimal part.

Performance Considerations

Autodiff with dual numbers is typically 2-10x slower than hand-coded Jacobians, but:

  • Correctness: Eliminates the risk of Jacobian implementation bugs
  • Maintainability: Adding a new factor only requires writing the forward evaluation
  • Flexibility: The same code works for any parameter configuration

For calibration problems (hundreds to thousands of residuals, tens to hundreds of parameters), the autodiff overhead is negligible compared to the linear solve in each LM iteration.

Levenberg-Marquardt Backend

The TinySolverBackend is the current optimization backend in calibration-rs. It wraps the tiny-solver crate, providing Levenberg-Marquardt optimization with sparse linear solvers, manifold support, and robust loss functions.

Backend Trait

All backends implement the OptimBackend trait:

#![allow(unused)]
fn main() {
pub trait OptimBackend {
    fn solve(
        &self,
        ir: &ProblemIR,
        initial_params: &HashMap<String, DVector<f64>>,
        opts: &BackendSolveOptions,
    ) -> Result<BackendSolution>;
}
}

The backend receives the problem IR, initial parameter values, and solver options, and returns the optimized parameters with a solve report.

Compilation: IR to Solver

The compile() step translates the abstract IR into tiny-solver's concrete types:

Parameters

For each ParamBlock in the IR:

  1. Create parameter with the correct dimension
  2. Set manifold based on ManifoldKind:
    • Euclidean → no manifold (standard addition)
    • SE3SE3Manifold (7D ambient, 6D tangent)
    • SO3QuaternionManifold (4D ambient, 3D tangent)
    • S2UnitVector3Manifold (3D ambient, 2D tangent)
  3. Fix parameters according to FixedMask:
    • Euclidean: fix individual indices
    • Manifolds: fix entire block (all-or-nothing)
  4. Set bounds if present (box constraints on parameter values)

Residuals

For each ResidualBlock:

  1. Compile the factor: Create a closure that calls the appropriate generic residual function
  2. Apply robust loss: Wrap in Huber/Cauchy/Arctan if specified
  3. Connect parameters: Reference the correct parameter blocks by their compiled IDs

Factor Compilation

Each FactorKind maps to a specific generic function call:

FactorKind::ReprojPointPinhole4Dist5 { pw, uv, w }
    → reproj_residual_pinhole4_dist5_se3_generic(intr, dist, pose, pw, uv, w)

FactorKind::LaserLineDist2D { laser_pixel, w }
    → laser_line_dist_normalized_generic(intr, dist, pose, plane_normal, plane_distance, laser_pixel, w)

FactorKind::Se3TangentPrior { sqrt_info }
    → se3_tangent_prior_generic(pose, sqrt_info)

Solver Options

#![allow(unused)]
fn main() {
pub struct BackendSolveOptions {
    pub max_iters: usize,          // Maximum LM iterations (default: 100)
    pub verbosity: u32,            // 0=silent, 1=summary, 2=per-iteration
    pub linear_solver: LinearSolverType,  // SparseCholesky or SparseQR
    pub min_abs_decrease: f64,     // Absolute cost decrease threshold
    pub min_rel_decrease: f64,     // Relative cost decrease threshold
    pub min_error: f64,            // Minimum cost to stop early
    pub initial_lambda: Option<f64>, // Initial damping (None = auto)
}

pub enum LinearSolverType {
    SparseCholesky,  // Default: fast for well-conditioned problems
    SparseQR,        // More robust for ill-conditioned problems
}
}

Choosing the Linear Solver

  • SparseCholesky (default): Factors the normal equations directly. Fast but can fail if is poorly conditioned.
  • SparseQR: Factors directly (QR decomposition). More robust but slower. Use when Cholesky fails or when the problem has near-singular directions.

Solution

#![allow(unused)]
fn main() {
pub struct BackendSolution {
    pub params: HashMap<String, DVector<f64>>,  // Optimized values by name
    pub report: SolveReport,
}

pub struct SolveReport {
    pub initial_cost: f64,
    pub final_cost: f64,
    pub iterations: usize,
    pub termination: TerminationReason,
}
}

The cost is (half sum of squared residuals). Problem-specific code extracts domain types (cameras, poses, planes) from the raw parameter vectors.

Typical Convergence

For a well-initialized planar intrinsics problem:

  • Initial cost: - (from linear initialization)
  • Final cost: - (sub-pixel residuals)
  • Iterations: 10-50 (depends on problem size and initial quality)
  • Termination: Usually relative decrease below threshold

Error Handling

The backend propagates errors for:

  • Missing initial values for a parameter block
  • Manifold dimension mismatch
  • Linear solver failure (singular system)
  • NaN/Inf in residual evaluation

Factor Catalog Reference

This chapter is a complete reference for all factor types (residual computations) supported by the optimization IR. Each factor defines a residual function connecting specific parameter blocks.

Reprojection Factors

All reprojection factors compute the weighted pixel residual:

They differ in the transform chain from world point to pixel.

ReprojPointPinhole4

Parameters: [intrinsics(4), pose(7)]

Transform chain: , then pinhole projection with intrinsics only (no distortion).

Use case: Distortion-free cameras or when distortion is handled externally.


ReprojPointPinhole4Dist5

Parameters: [intrinsics(4), distortion(5), pose(7)]

Transform chain: , then pinhole + Brown-Conrady distortion + intrinsics.

Use case: Standard single-camera calibration (planar intrinsics).


ReprojPointPinhole4Dist5Scheimpflug2

Parameters: [intrinsics(4), distortion(5), sensor(2), pose(7)]

Transform chain: , then pinhole + distortion + Scheimpflug tilt + intrinsics.

Use case: Laser profilers with tilted sensors.


ReprojPointPinhole4Dist5TwoSE3

Parameters: [intrinsics(4), distortion(5), extrinsics(7), rig_pose(7)]

Transform chain:

The camera pose is composed from two SE(3) transforms: the camera-to-rig extrinsics and the rig-to-target pose.

Use case: Multi-camera rig calibration.


ReprojPointPinhole4Dist5HandEye

Parameters: [intrinsics(4), distortion(5), extrinsics(7), handeye(7), target_pose(7)]

Per-residual data: robot_pose (, known from robot kinematics)

Transform chain (eye-in-hand):

Then project .

For single-camera hand-eye, .

Use case: Hand-eye calibration (camera on robot arm).


ReprojPointPinhole4Dist5HandEyeRobotDelta

Parameters: [intrinsics(4), distortion(5), extrinsics(7), handeye(7), target_pose(7), robot_delta(7)]

Per-residual data: robot_pose

Transform chain: Same as HandEye, but with a per-view SE(3) correction applied to the robot pose:

The correction is regularized by an Se3TangentPrior factor.

Use case: Hand-eye calibration with imprecise robot kinematics (accounts for robot pose uncertainty).


Laser Factors

Both laser factors have residual dimension 1 and connect [intrinsics(4), distortion(5), pose(7), plane(3+1)].

LaserPlanePixel

Computation:

  1. Undistort laser pixel to normalized coordinates
  2. Back-project to a ray in camera frame
  3. Intersect ray with target plane (known from pose)
  4. Compute 3D point in camera frame
  5. Measure signed distance from 3D point to laser plane

Residual: — distance in meters.


LaserLineDist2D

Computation:

  1. Compute 3D intersection line of laser plane and target plane (in camera frame)
  2. Project this line onto the normalized camera plane
  3. Undistort laser pixel to normalized coordinates
  4. Measure perpendicular distance from pixel to projected line (2D geometry)
  5. Scale by for pixel-comparable units

Residual: — distance in effective pixels.

Advantages over LaserPlanePixel: Undistortion done once per pixel (not per-iteration), residuals in pixel units (directly comparable to reprojection error), simpler 2D geometry.

Default: This is the recommended laser residual type.


Prior Factors

Se3TangentPrior

Parameters: [se3_param(7)]

Computation: Maps the SE(3) parameter to its tangent vector (via logarithm map) and divides by the prior standard deviations:

where is the 6D tangent vector.

Use case: Zero-mean Gaussian prior on SE(3) parameters. Applied to robot pose corrections (robot_delta) to penalize deviations from the nominal robot kinematics.

Effect: Adds a regularization term to the cost function.


Summary Table

FactorParamsRes. dimUnitsDomain
ReprojPointPinhole42 blocks2pixelsSimple pinhole
ReprojPointPinhole4Dist53 blocks2pixelsStandard calibration
ReprojPointPinhole4Dist5Scheimpflug24 blocks2pixelsTilted sensor
ReprojPointPinhole4Dist5TwoSE34 blocks2pixelsMulti-camera rig
ReprojPointPinhole4Dist5HandEye5 blocks2pixelsHand-eye
ReprojPointPinhole4Dist5HandEyeRobotDelta6 blocks2pixelsHand-eye + robot refinement
LaserPlanePixel4 blocks1metersLaser triangulation
LaserLineDist2D4 blocks1pixelsLaser triangulation
Se3TangentPrior1 block6normalizedRegularization

Planar Intrinsics Calibration

This is the most common calibration workflow: estimate camera intrinsics and lens distortion from multiple views of a planar calibration board. It combines Zhang's linear initialization with Levenberg-Marquardt bundle adjustment.

Problem Formulation

Parameters:

  • Intrinsics: — 4 scalars
  • Distortion: — 5 scalars
  • Per-view poses: SE(3) transforms (6 DOF each)

Total: parameters.

Observations: For each view and board point :

  • Known 3D position (on the board, at )
  • Observed pixel

Objective: Minimize total reprojection error:

where is the full camera projection pipeline: SE(3) transform → pinhole → distortion → intrinsics.

With robust loss :

Two-Step Pipeline

Step 1: Linear Initialization (step_init)

  1. Homographies: Compute for each view via DLT (from board points at to observed pixels)
  2. Intrinsics: Estimate from homographies using Zhang's method, iteratively refined with distortion estimation (see Iterative Intrinsics)
  3. Distortion: Estimate from homography residuals (see Distortion Fit)
  4. Poses: Decompose each homography to recover (see Pose from Homography)

After initialization, intrinsics are typically within 10-40% of the true values.

Step 2: Non-Linear Refinement (step_optimize)

Constructs the optimization problem as IR:

  • Parameter blocks: "cam" (4D, Euclidean), "dist" (5D, Euclidean), "pose/0"..."pose/M-1" (7D, SE3)
  • Residual blocks: One ReprojPointPinhole4Dist5 per observation (2D residual)
  • Backend: Levenberg-Marquardt via TinySolverBackend

After optimization, expect <2% intrinsics error and <1 px mean reprojection error.

Configuration

#![allow(unused)]
fn main() {
pub struct PlanarIntrinsicsConfig {
    // Initialization
    pub init_iterations: usize,        // Iterative intrinsics iterations (default: 2)
    pub fix_k3_in_init: bool,          // Fix k3 during init (default: true)
    pub fix_tangential_in_init: bool,  // Fix p1, p2 during init (default: false)
    pub zero_skew: bool,               // Enforce zero skew (default: true)

    // Optimization
    pub max_iters: usize,              // LM iterations (default: 100)
    pub verbosity: u32,                // Solver output level
    pub robust_loss: RobustLoss,          // Robust loss function (default: None)
    pub fix_intrinsics: IntrinsicsFixMask,  // Fix specific intrinsics
    pub fix_distortion: DistortionFixMask,  // Fix specific distortion params
    pub fix_poses: Vec<usize>,         // Fix specific view poses
}
}

Fix Masks

Fine-grained control over which parameters are optimized:

#![allow(unused)]
fn main() {
// Fix cx, cy but optimize fx, fy
session.update_config(|c| {
    c.fix_intrinsics = IntrinsicsFixMask {
        fx: false, fy: false, cx: true, cy: true,
    };
})?;

// Fix k3 and tangential distortion
session.update_config(|c| {
    c.fix_distortion = DistortionFixMask {
        k1: false, k2: false, k3: true, p1: true, p2: true,
    };
})?;
}

Complete Example

#![allow(unused)]
fn main() {
use vision_calibration::prelude::*;
use vision_calibration::planar_intrinsics::{step_init, step_optimize};

let mut session = CalibrationSession::<PlanarIntrinsicsProblem>::new();
session.set_input(dataset)?;

// Optional: customize configuration
session.update_config(|c| {
    c.max_iters = 50;
    c.robust_loss = RobustLoss::Huber { scale: 2.0 };
})?;

// Run pipeline
step_init(&mut session, None)?;

// Inspect initialization
let init_k = session.state.initial_intrinsics.as_ref().unwrap();
println!("Init fx={:.1}, fy={:.1}", init_k.fx, init_k.fy);

step_optimize(&mut session, None)?;

// Export results
let export = session.export()?;
let k = export.params.intrinsics();
println!("Final fx={:.1}, fy={:.1}", k.fx, k.fy);
println!("Reprojection error: {:.4} px", export.mean_reproj_error);
}

Filtering (Optional Step)

After optimization, views or individual observations with high reprojection error can be filtered out:

#![allow(unused)]
fn main() {
use vision_calibration::planar_intrinsics::{step_filter, FilterOptions};

let filter_opts = FilterOptions {
    max_reproj_error: 2.0,      // Remove observations > 2 px
    min_points_per_view: 10,     // Minimum points to keep a view
    remove_sparse_views: true,   // Drop views below threshold
};
step_filter(&mut session, filter_opts)?;

// Re-optimize with cleaned data
step_optimize(&mut session, None)?;
}

OpenCV equivalence: cv::calibrateCamera performs both initialization and optimization internally. calibration-rs separates these steps for inspection and customization.

Accuracy Expectations

StageIntrinsics errorReprojection error
After step_init10-40%Not computed
After step_optimize<2%<1 px mean
After filtering + re-optimize<1%<0.5 px mean

Input Requirements

  • Minimum 3 views (for Zhang's method with skew)
  • Minimum 4 points per view (for homography estimation)
  • View diversity: Views should include rotation around both axes and vary in distance
  • Board coverage: Points should span the full image area for good distortion estimation

Planar Intrinsics with Real Data

[COLLAB] This chapter requires user collaboration for dataset-specific details, data collection advice, and practical tips.

This chapter walks through the planar_real example, which calibrates a camera from real chessboard images.

Dataset

The example uses images from data/stereo/imgs/leftcamera/:

  • Pattern: 7×11 chessboard, 30 mm square size
  • Images: Multiple PNG files (Im_L_*.png)
  • Camera: Standard industrial camera (specific model TBD)

Workflow

1. Corner Detection

The example uses the chess-corners crate for chessboard detection:

#![allow(unused)]
fn main() {
use chess_corners::{ChessboardParams, ChessConfig, detect_chessboard};

let board_params = ChessboardParams {
    rows: 7,
    cols: 11,
    square_size: 0.03, // 30 mm
};
}

Each image is processed independently. Failed detections are skipped — the workflow continues with the views that succeed.

2. Dataset Construction

Detected corners are assembled into CorrespondenceView structures with matched 3D board points:

#![allow(unused)]
fn main() {
let views: Vec<View<NoMeta>> = images
    .iter()
    .filter_map(|img| {
        let corners = detect_chessboard(img, &config)?;
        let obs = CorrespondenceView::new(board_3d.clone(), corners);
        Some(View::without_meta(obs))
    })
    .collect();
let dataset = PlanarDataset::new(views)?;
}

3. Calibration

The pipeline is identical to the synthetic case:

#![allow(unused)]
fn main() {
let mut session = CalibrationSession::<PlanarIntrinsicsProblem>::new();
session.set_input(dataset)?;
step_init(&mut session, None)?;
step_optimize(&mut session, None)?;
let export = session.export()?;
}

Interpreting Results

Key outputs to examine:

  • Focal lengths (, ): Should match the expected value based on sensor size and lens focal length. For a 1/2" sensor with 8mm lens: pixels.
  • Principal point (, ): Should be near the image center. Large offsets may indicate a decentered lens.
  • Distortion (, ): Negative indicates barrel distortion (common). suggests a wide-angle lens.
  • Reprojection error: <1 px is good. >2 px suggests problems with corner detection or insufficient view diversity.

Comparison with Synthetic Data

AspectSyntheticReal
Corner accuracyExact (no detection noise)~0.1-0.5 px (detector dependent)
DistortionKnown ground truthUnknown, estimated
View coverageControlledMay have gaps
Typical reprojection error<0.01 px0.1-1.0 px

Practical Tips

  • Use 15-30 images with diverse viewpoints
  • Cover the full image area — points only near the center poorly constrain distortion
  • Include tilted views — not just frontal. Rotation around both axes constrains all intrinsic parameters
  • Check corner ordering — mismatched 2D-3D correspondences cause calibration failure
  • Start with fix_k3: true — only estimate if reprojection error is high and you suspect higher-order radial distortion

Running the Example

cargo run -p vision-calibration --example planar_real

The example prints per-step results including initialization accuracy, optimization convergence, and final calibrated parameters.

Single-Camera Hand-Eye Calibration

Hand-eye calibration estimates the rigid transform between a camera and a robot's gripper (or base). This workflow combines intrinsics calibration with hand-eye estimation in a 4-step pipeline.

Problem Formulation

Transformation Chain (Eye-in-Hand)

For a camera mounted on the robot gripper:

where:

  • : camera-to-gripper (the calibrated hand-eye transform, inverted)
  • : gripper-to-base (known from robot kinematics, inverted)
  • : base-to-target (the target's pose in the robot base frame, also calibrated)

Equivalently:

Eye-to-Hand Variant

For a fixed camera observing a target on the gripper:

where is the camera-to-base transform and is the target-to-gripper transform.

Optimization Objective

Minimize reprojection error across all views, jointly over all parameters:

where depends on the known robot pose for view , the hand-eye transform , and the target pose .

4-Step Pipeline

Step 1: Intrinsics Initialization (step_intrinsics_init)

Runs the Iterative Intrinsics solver to estimate and distortion from the calibration board observations, ignoring robot poses entirely.

Step 2: Intrinsics Optimization (step_intrinsics_optimize)

Runs planar intrinsics bundle adjustment to refine , distortion, and per-view camera-to-target poses. This gives accurate per-view poses needed for hand-eye initialization.

Step 3: Hand-Eye Initialization (step_handeye_init)

Uses the Tsai-Lenz linear method (see Hand-Eye Linear):

  1. Compute relative motions from robot poses and camera poses
  2. Filter pairs by rotation magnitude (reject small rotations)
  3. Estimate rotation via quaternion SVD
  4. Estimate translation via linear least squares
  5. Estimate from the hand-eye and camera poses

Step 4: Hand-Eye Optimization (step_handeye_optimize)

Joint non-linear optimization of all parameters:

Parameter blocks:

  • "cam" (4D, Euclidean): intrinsics
  • "dist" (5D, Euclidean): distortion
  • "handeye" (7D, SE3): gripper-to-camera transform
  • "target" (7D, SE3): base-to-target transform

Residual blocks: ReprojPointPinhole4Dist5HandEye per observation.

The robot poses are not optimized — they are known constants from the robot kinematics (passed as per-residual data).

Optional: Robot Pose Refinement

If robot pose accuracy is suspect, the pipeline supports per-view SE(3) corrections:

Additional parameter blocks: "robot_delta/v" (7D, SE3) per view

Additional residuals: Se3TangentPrior per view, with configurable rotation and translation sigmas:

This penalizes deviations from the nominal robot poses, allowing small corrections while preventing the optimizer from absorbing all error into robot pose changes.

Configuration

#![allow(unused)]
fn main() {
pub struct SingleCamHandeyeConfig {
    // Intrinsics init
    pub intrinsics_init_iterations: usize,
    pub fix_k3: bool,              // Fix k3 (default: true)
    pub fix_tangential: bool,      // Fix p1, p2 (default: false)
    pub zero_skew: bool,           // Enforce zero skew (default: true)

    // Hand-eye
    pub handeye_mode: HandEyeMode, // EyeInHand or EyeToHand
    pub min_motion_angle_deg: f64, // Filter small rotations in Tsai-Lenz

    // Optimization
    pub max_iters: usize,          // LM iterations (default: 100)
    pub verbosity: usize,
    pub robust_loss: RobustLoss,

    // Robot pose refinement
    pub refine_robot_poses: bool,  // Enable per-view corrections
    pub robot_rot_sigma: f64,      // Rotation prior sigma (radians)
    pub robot_trans_sigma: f64,    // Translation prior sigma (meters)
}
}

Input Format

Each view is a View<HandeyeMeta> (aliased as SingleCamHandeyeView):

#![allow(unused)]
fn main() {
pub struct HandeyeMeta {
    pub base_se3_gripper: Iso3,  // Robot pose (from kinematics)
}

// Type alias
pub type SingleCamHandeyeView = View<HandeyeMeta>;
// Where View<Meta> has fields: obs: CorrespondenceView, meta: Meta
}

Complete Example

#![allow(unused)]
fn main() {
use vision_calibration::prelude::*;
use vision_calibration::single_cam_handeye::*;

let views: Vec<SingleCamHandeyeView> = /* load data */;
let input = SingleCamHandeyeInput::new(views)?;

let mut session = CalibrationSession::<SingleCamHandeyeProblem>::new();
session.set_input(input)?;

step_intrinsics_init(&mut session, None)?;
step_intrinsics_optimize(&mut session, None)?;
step_handeye_init(&mut session, None)?;
step_handeye_optimize(&mut session, None)?;

let export = session.export()?;
println!("Hand-eye: {:?}", export.gripper_se3_camera);
println!("Target in base: {:?}", export.base_se3_target);
println!("Reprojection error: {:.4} px", export.mean_reproj_error);
}

Practical Considerations

  • Rotation diversity is critical for Tsai-Lenz initialization. Include rotations around at least 2 axes.
  • 5-10 views minimum, with 10-20 recommended for robustness.
  • Robot accuracy: If the robot's reported poses have errors > 1mm or > 0.1°, enable robot pose refinement.
  • Initialization failure: If Tsai-Lenz produces a poor estimate (large reprojection error after optimization), the most likely cause is insufficient rotation diversity.

OpenCV equivalence: cv::calibrateHandEye provides the linear initialization step. calibration-rs adds joint non-linear refinement and optional robot pose correction.

Hand-Eye with KUKA Robot

[COLLAB] This chapter requires user collaboration for KUKA dataset details, data collection protocol, and practical tips.

This chapter walks through the handeye_session example, which performs hand-eye calibration using a real KUKA robot dataset.

Dataset

Located at data/kuka_1/:

  • Images: 30 PNG files (01.png through 30.png)
  • Robot poses: RobotPosesVec.txt — 30 poses in matrix format
  • Calibration board: 17×28 chessboard, 20 mm square size

Data Loading

The example loads images and robot poses, detects chessboard corners, and constructs the input:

#![allow(unused)]
fn main() {
// Load robot poses from 4×4 matrix format
let robot_poses: Vec<Iso3> = load_robot_poses("data/kuka_1/RobotPosesVec.txt")?;

// Detect corners per image
let views: Vec<SingleCamHandeyeView> = images
    .iter()
    .zip(&robot_poses)
    .filter_map(|(img, pose)| {
        let corners = detect_chessboard(img, &config)?;
        Some(SingleCamHandeyeView {
            obs: CorrespondenceView::new(board_3d.clone(), corners),
            meta: HandeyeMeta { base_se3_gripper: *pose },
        })
    })
    .collect();
}

Typically 20-25 of the 30 images yield successful corner detections.

Running the Example

cargo run -p vision-calibration --example handeye_session

The example reports:

  1. Dataset summary (total images, used views, skipped views)
  2. Per-step results (initialization, optimization)
  3. Final calibrated parameters (intrinsics, distortion, hand-eye transform, target pose)
  4. Robot pose refinement deltas (if enabled)

Interpreting Results

Key outputs:

  • Hand-eye transform (): Translation magnitude should match the physical camera-to-gripper distance. Rotation should reflect the mounting orientation.
  • Target-in-base (): The calibration board's position in the robot base frame. Verify against the known physical setup.
  • Reprojection error: <1 px indicates a good calibration. >3 px suggests problems.
  • Robot pose deltas: If enabled, large deltas (>1° rotation, >5mm translation) suggest robot kinematic inaccuracies.

Common Failure Modes

  1. Insufficient rotation diversity: All robot poses rotate around the same axis (e.g., only wrist rotation). The Tsai-Lenz initialization will fail or produce a poor estimate.

  2. Incorrect pose convention: Robot poses must be (base-to-gripper). If the convention is inverted, the calibration will diverge.

  3. Mismatched corner ordering: If the chessboard detector assigns corners in a different order than expected, the 2D-3D correspondences are wrong.

  4. Robot pose timestamps: If images and robot poses are not synchronized, the calibration will fail.

Data Collection Recommendations

  • Rotation diversity: Include poses with significant roll, pitch, and yaw. Avoid pure translations or rotations around a single axis.
  • Target visibility: Ensure the calibration board is fully visible in all images. Partial visibility causes corner detection failure.
  • Stable poses: Take images when the robot is stationary. Motion blur degrades corner detection.
  • Coverage: Vary the distance to the board and the position within the image.

Multi-Camera Rig Extrinsics

A multi-camera rig is a set of cameras rigidly attached to a common frame. Rig extrinsics calibration estimates the relative pose of each camera within the rig, in addition to per-camera intrinsics and distortion.

Problem Formulation

Parameters

  • Per-camera intrinsics: — one per camera
  • Per-camera distortion: — one per camera
  • Per-camera extrinsics: — camera-to-rig transforms (one per camera, reference camera = identity)
  • Per-view rig poses: — rig-to-target transforms (one per view)

Transform Chain

For camera in view :

where .

Objective

This jointly optimizes all cameras' parameters and all rig geometry. Each observation depends on two SE(3) transforms composed together (the ReprojPointPinhole4Dist5TwoSE3 factor).

4-Step Pipeline

Step 1: Per-Camera Intrinsics Initialization (step_intrinsics_init_all)

For each camera independently:

  1. Compute homographies from the views where that camera has observations
  2. Run iterative intrinsics estimation (Zhang + distortion fit)
  3. Recover per-camera, per-view poses

Step 2: Per-Camera Intrinsics Optimization (step_intrinsics_optimize_all)

For each camera independently:

  • Run planar intrinsics bundle adjustment
  • Refine , , and per-view poses

Step 3: Rig Extrinsics Initialization (step_rig_init)

Uses the linear initialization from Multi-Camera Rig Initialization:

  1. Define the rig frame by the reference camera ()
  2. For each non-reference camera, compute by averaging across views
  3. Compute rig-to-target poses from the reference camera's poses

Step 4: Rig Bundle Adjustment (step_rig_optimize)

Joint optimization of all parameters:

Parameter blocks:

  • "cam/k" (4D): per-camera intrinsics
  • "dist/k" (5D): per-camera distortion
  • "extrinsics/k" (7D, SE3): per-camera camera-to-rig transforms
  • "rig_pose/v" (7D, SE3): per-view rig-to-target poses

Residual blocks: ReprojPointPinhole4Dist5TwoSE3 per observation.

Fixed parameters: The reference camera's extrinsics are fixed at identity (gauge freedom).

Configuration

#![allow(unused)]
fn main() {
pub struct RigExtrinsicsConfig {
    // Per-camera intrinsics options
    pub intrinsics_init_iterations: usize,
    pub fix_k3: bool,
    pub fix_tangential: bool,
    pub zero_skew: bool,

    // Rig settings
    pub reference_camera_idx: usize,

    // Optimization
    pub max_iters: usize,
    pub verbosity: usize,
    pub robust_loss: RobustLoss,

    // Rig BA options
    pub refine_intrinsics_in_rig_ba: bool,
    pub fix_first_rig_pose: bool,
}
}

Input Format

#![allow(unused)]
fn main() {
pub struct RigDataset<Meta> {
    pub num_cameras: usize,
    pub views: Vec<RigView<Meta>>,
}

pub struct RigView<Meta> {
    pub obs: RigViewObs,  // Per-camera observations (Option per camera)
    pub meta: Meta,
}
}

Each view has an Option<CorrespondenceView> per camera — cameras that don't see the target in a given view have None.

Per-Camera Reprojection Error

After optimization, per-camera reprojection errors are computed independently:

#![allow(unused)]
fn main() {
let export = session.export()?;
for (cam_idx, error) in export.per_cam_reproj_errors.iter().enumerate() {
    println!("Camera {}: {:.4} px", cam_idx, error);
}
println!("Overall: {:.4} px", export.mean_reproj_error);
}

Stereo Baseline

For a 2-camera rig, the baseline (distance between camera centers) is:

since (camera 0 is the reference).

Complete Example

#![allow(unused)]
fn main() {
use vision_calibration::prelude::*;
use vision_calibration::rig_extrinsics::*;

let mut session = CalibrationSession::<RigExtrinsicsProblem>::new();
session.set_input(rig_dataset)?;

step_intrinsics_init_all(&mut session, None)?;
step_intrinsics_optimize_all(&mut session, None)?;
step_rig_init(&mut session)?;
step_rig_optimize(&mut session, None)?;

let export = session.export()?;
let baseline = export.cam_se3_rig[1].translation.vector.norm();
println!("Stereo baseline: {:.1} mm", baseline * 1000.0);
}

OpenCV equivalence: cv::stereoCalibrate for 2-camera rigs. calibration-rs generalizes to cameras.

Stereo Rig with Real Data

[COLLAB] This chapter requires user collaboration for dataset-specific details and verification strategies.

This chapter walks through the stereo_session example, which calibrates a stereo camera rig from synchronized left/right image pairs.

Dataset

Located at data/stereo/imgs/:

  • Left camera: leftcamera/Im_L_*.png
  • Right camera: rightcamera/Im_R_*.png
  • Pattern: 7×11 chessboard, 30 mm square size
  • Synchronized: Left and right images are captured simultaneously

Workflow

1. Load Stereo Pairs

The example loads images from both camera directories and pairs them by filename:

#![allow(unused)]
fn main() {
let board = ChessboardParams { rows: 7, cols: 11, square_size: 0.03 };
let views: Vec<RigView<NoMeta>> = image_pairs
    .iter()
    .filter_map(|(left_img, right_img)| {
        let left_corners = detect_chessboard(left_img, &config)?;
        let right_corners = detect_chessboard(right_img, &config)?;
        // Both cameras must detect corners for a valid view
        Some(RigView::new(vec![
            Some(CorrespondenceView::new(board_3d.clone(), left_corners)),
            Some(CorrespondenceView::new(board_3d.clone(), right_corners)),
        ]))
    })
    .collect();
}

2. Run 4-Step Pipeline

#![allow(unused)]
fn main() {
let mut session = CalibrationSession::<RigExtrinsicsProblem>::new();
session.set_input(RigDataset::new(views, 2)?)?;

step_intrinsics_init_all(&mut session, None)?;
step_intrinsics_optimize_all(&mut session, None)?;
step_rig_init(&mut session)?;
step_rig_optimize(&mut session, None)?;
}

Interpreting Results

Per-Camera Intrinsics

Both cameras should have similar focal lengths (if they use the same lens). Differences in principal point are normal.

Stereo Baseline

The extrinsics give the relative pose between cameras. For a horizontal stereo pair, expect:

  • Rotation: primarily around the Y axis (converging or parallel configuration)
  • Translation: primarily along X (horizontal baseline)
  • Baseline magnitude: should match the physical distance between cameras

Verification

  • Epipolar constraint: For correctly calibrated stereo, corresponding points should lie on epipolar lines. Measure the average distance from points to their epipolar lines.
  • Rectification: The stereo pair should rectify cleanly — horizontal scanlines in the rectified images should correspond.
  • Triangulation: Triangulate known board corners and verify the 3D distances match the board geometry.

Running the Example

cargo run -p vision-calibration --example stereo_session
cargo run -p vision-calibration --example stereo_session -- --max-views 15

The --max-views flag limits the number of image pairs processed, useful for faster iteration during development.

Data Collection Tips

  • Both cameras must see the board in each view. Discard pairs where only one camera detects corners.
  • Vary board orientation and distance — same advice as single-camera calibration.
  • Ensure synchronization — if images are not captured simultaneously, the board may have moved between left and right captures.
  • Overlap region — the board should be in the overlapping field of view of both cameras.

Stereo Rig with Real ChArUco Data

This chapter walks through the stereo_charuco_session example, which calibrates a stereo camera rig from synchronized ChArUco board images.

Dataset

The example uses data/stereo_charuco/:

  • cam1/Cam1_*.png for camera 0
  • cam2/Cam2_*.png for camera 1
  • Pairing by shared filename suffix (deterministic sorted intersection)

The loader ignores non-PNG files (for example Thumbs.db).

ChArUco Target Parameters

  • Board squares: 22 x 22
  • Cell size: 0.45 mm (0.00045 m)
  • Marker size scale: 0.75
  • Dictionary: DICT_4X4_1000
  • Layout: OpenCV ChArUco (OpenCvCharuco)

Detector Tuning

For this 2048x1536 dataset, the example increases ChArUco graph spacing:

  • params.graph.max_spacing_pix = 120.0

Without this, the default graph spacing may be too strict for this board scale in image space.

Workflow

The example follows the standard rig extrinsics 4-step session pipeline:

  1. step_intrinsics_init_all
  2. step_intrinsics_optimize_all
  3. step_rig_init
  4. step_rig_optimize

It also runs run_calibration as a convenience-path comparison.

Running the Example

cargo run -p vision-calibration --example stereo_charuco_session
cargo run -p vision-calibration --example stereo_charuco_session -- --max-views 8
cargo run -p vision-calibration --example stereo_charuco_session -- --max-views=15
  • Default: uses all detected stereo pairs
  • Optional --max-views caps the number of pairs used (deterministic prefix of sorted pairs)

Output Interpretation

The example prints:

  • Dataset summary (total_pairs, used_views, skipped_views, usable_left, usable_right)
  • Per-camera intrinsics and per-camera reprojection errors
  • Rig BA mean reprojection error and per-camera BA errors
  • Baseline magnitude in meters and millimeters

As with any real-data calibration, verify that:

  • Both cameras have enough usable views (at least 3 each)
  • Reprojection errors are within your application tolerance
  • Estimated baseline is physically plausible for the hardware setup

Multi-Camera Rig Hand-Eye

This is the most complex calibration workflow: a multi-camera rig mounted on a robot arm. It combines per-camera intrinsics calibration, rig extrinsics estimation, and hand-eye calibration in a 6-step pipeline.

Problem Formulation

Transformation Chain

For camera in view with robot pose :

where:

  • : camera to rig (rig extrinsics)
  • : rig to gripper (hand-eye transform)
  • : base to gripper (known robot pose for view )
  • : base to target (calibrated)

Parameters

  • Per-camera intrinsics: ( scalar parameters)
  • Per-camera distortion: ( scalar parameters)
  • Per-camera extrinsics: ( DOF, reference camera = identity)
  • Hand-eye: (6 DOF)
  • Target pose: (6 DOF)
  • Optionally: per-view robot corrections ( DOF, regularized)

6-Step Pipeline

Steps 1-2: Per-Camera Intrinsics

Same as Rig Extrinsics: initialize and optimize each camera's intrinsics independently.

Steps 3-4: Rig Extrinsics

Same as Rig Extrinsics: initialize camera-to-rig transforms via SE(3) averaging, then jointly optimize the rig geometry.

Step 5: Hand-Eye Initialization (step_handeye_init)

Uses Tsai-Lenz with the rig's reference camera poses and robot poses:

  1. Extract relative camera motions from rig-to-target poses
  2. Extract relative robot motions from base-to-gripper poses
  3. Solve for (gripper-to-rig hand-eye)
  4. Estimate target-in-base

Step 6: Hand-Eye Optimization (step_handeye_optimize)

Joint optimization of all parameters:

Parameter blocks:

  • "cam/k" (4D): per-camera intrinsics
  • "dist/k" (5D): per-camera distortion
  • "extrinsics/k" (7D, SE3): per-camera camera-to-rig
  • "handeye" (7D, SE3): rig-to-gripper
  • "target" (7D, SE3): base-to-target

Factor: ReprojPointPinhole4Dist5HandEye per observation, which composes the full transform chain.

Complete Example

#![allow(unused)]
fn main() {
use vision_calibration::prelude::*;
use vision_calibration::rig_handeye::*;

let mut session = CalibrationSession::<RigHandeyeProblem>::new();
session.set_input(rig_dataset_with_robot_poses)?;

// Per-camera calibration
step_intrinsics_init_all(&mut session, None)?;
step_intrinsics_optimize_all(&mut session, None)?;

// Rig geometry
step_rig_init(&mut session)?;
step_rig_optimize(&mut session, None)?;

// Hand-eye
step_handeye_init(&mut session, None)?;
step_handeye_optimize(&mut session, None)?;

let export = session.export()?;
println!("Mode: {:?}", export.handeye_mode);
if let Some(gripper_se3_rig) = export.gripper_se3_rig {
    println!("gripper_se3_rig: {:?}", gripper_se3_rig);
}
println!("Baseline: {:.1} mm",
    export.cam_se3_rig[1].translation.vector.norm() * 1000.0);
println!("Per-camera errors: {:?}", export.per_cam_reproj_errors);
}

Gauge Freedom

The system has a gauge freedom: the rig frame origin and the hand-eye transform are coupled. Fixing the reference camera's extrinsics at identity resolves this by defining the rig frame to coincide with camera 0.

Practical Considerations

All the advice from Single-Camera Hand-Eye applies, plus:

  • All cameras must observe the target in at least some views for the rig extrinsics to be well-constrained
  • Views where only some cameras see the target are handled (missing observations are skipped)
  • The hand-eye transform describes gripper-to-rig, not gripper-to-individual-camera. The per-camera offset comes from the rig extrinsics.

Laserline Device Calibration

Laserline calibration jointly estimates camera parameters and a laser plane from observations of both a calibration board and laser line projections on the board. This is used in laser triangulation systems where a camera and laser are rigidly mounted together.

Problem Formulation

Parameters

  • Camera intrinsics:
  • Distortion:
  • Sensor tilt: (optional, for Scheimpflug cameras)
  • Per-view poses: (camera-to-target SE(3))
  • Laser plane: normal , distance

Observations

Each view provides two types of observations:

  1. Calibration points: 2D-3D correspondences from the chessboard (same as planar intrinsics)
  2. Laser pixels: 2D pixel positions where the laser line appears on the target

Objective

The cost function has two terms:

where are weights, are (possibly different) robust loss functions, and is the laser residual.

Laser Residual Types

Two approaches are implemented for the laser residual, selectable via configuration.

PointToPlane (3D Distance)

Algorithm:

  1. Undistort laser pixel to normalized coordinates
  2. Back-project to a ray in camera frame
  3. Intersect the ray with the target plane (at , transformed by pose ) to get a 3D point
  4. Compute signed distance from to the laser plane:

Residual dimension: 1 (meters)

LineDistNormalized (2D Line Distance) — Default

Algorithm:

  1. Compute the 3D intersection line of the laser plane and the target plane (in camera frame)
  2. Project this 3D line onto the normalized camera plane
  3. Undistort the laser pixel to normalized coordinates (done once, not per-iteration)
  4. Measure the perpendicular distance from the undistorted pixel to the projected line
  5. Scale by for pixel-comparable units:

Residual dimension: 1 (effective pixels)

Comparison

PropertyPointToPlaneLineDistNormalized
Residual unitsmeterspixels
UndistortionPer-iterationOnce per pixel
GeometryRay-plane intersection2D line distance
SpeedSlowerFaster
RecommendedAlternativeDefault

Both approaches yield similar accuracy in practice (<6% intrinsics error, <5° plane normal error in synthetic tests).

Derivation: Line-Distance Residual

Plane-Plane Intersection Line

The laser plane and the target plane (normal , distance from camera, derived from pose ) intersect in a 3D line with:

Projection onto Normalized Plane

Project the 3D line to the plane:

Perpendicular Distance

For an undistorted pixel in normalized coordinates, the perpendicular distance to the 2D line is:

Configuration

#![allow(unused)]
fn main() {
pub struct LaserlineDeviceConfig {
    // Initialization
    pub init_iterations: usize,        // Iterative intrinsics iterations (default: 2)
    pub fix_k3_in_init: bool,          // Fix k3 during init (default: true)
    pub fix_tangential_in_init: bool,  // Fix p1, p2 during init (default: false)
    pub zero_skew: bool,               // Enforce zero skew (default: true)
    pub sensor_init: ScheimpflugParams, // Initial sensor tilt (default: identity)

    // Optimization
    pub max_iters: usize,              // LM iterations (default: 50)
    pub verbosity: usize,
    pub calib_loss: RobustLoss,        // Default: Huber { scale: 1.0 }
    pub laser_loss: RobustLoss,        // Default: Huber { scale: 0.01 }
    pub calib_weight: f64,             // Weight for calibration residuals (default: 1.0)
    pub laser_weight: f64,             // Weight for laser residuals (default: 1.0)
    pub fix_intrinsics: bool,
    pub fix_distortion: bool,
    pub fix_k3: bool,                  // Default: true
    pub fix_sensor: bool,              // Default: true
    pub fix_poses: Vec<usize>,         // Default: vec![0]
    pub fix_plane: bool,
    pub laser_residual_type: LaserlineResidualType, // Default: LineDistNormalized
}
}

Weight Balancing

Since calibration residuals (in pixels) and laser residuals may have different scales, the weights allow balancing their relative influence. A common starting point is calib_weight = 1.0 and laser_weight = 1.0, adjusting if one term dominates.

Complete Example

#![allow(unused)]
fn main() {
use vision_calibration::prelude::*;
use vision_calibration::laserline_device::*;

let mut session = CalibrationSession::<LaserlineDeviceProblem>::new();
session.set_input(laserline_input)?;

run_calibration(&mut session, None)?;

let export = session.export()?;
println!("Plane normal: {:?}", export.estimate.params.plane.normal);
println!("Plane distance: {:.4}", export.estimate.params.plane.distance);
println!("Reprojection error: {:.4} px", export.mean_reproj_error);
println!("Laser error: {:.4}", export.stats.mean_laser_error);
}

Scheimpflug Support

For laser profilers with tilted sensors, the sensor model parameters are jointly optimized. The ReprojPointPinhole4Dist5Scheimpflug2 factor handles the extended camera model.

Laserline with Industrial Data

[COLLAB] This chapter requires user collaboration for industrial application context, data collection protocols, and accuracy requirements.

This chapter discusses practical considerations for laser triangulation calibration in industrial settings.

Industrial Context

Laser triangulation is widely used in industrial inspection:

  • Weld seam inspection: Verifying weld bead geometry (width, height, penetration)
  • 3D surface profiling: Measuring surface topology for quality control
  • Gap and flush measurement: Checking assembly tolerances in automotive manufacturing
  • Tire and rubber inspection: Measuring tread depth and sidewall profiles

In each case, a laser line is projected onto the workpiece and a camera observes the deformed line. The calibrated laser plane, combined with camera parameters, allows converting pixel positions to metric 3D coordinates.

Accuracy Requirements

ApplicationTypical accuracy needed
Weld inspection0.1-0.5 mm
Surface profiling0.01-0.1 mm
Gap/flush0.05-0.2 mm
Coarse 3D scanning0.5-2 mm

Data Collection for Laser Calibration

Calibration Board Setup

The calibration board serves dual purpose:

  1. Provides chessboard corners for camera intrinsics calibration
  2. Provides a known plane to constrain the laser line observations

Capturing Views

For each view:

  1. Position the calibration board at a known angle to the laser plane
  2. Capture an image with both the chessboard pattern and the laser line visible
  3. Record chessboard corners and laser line pixel positions separately

View Diversity

  • Board angles: Vary the board tilt so the laser-board intersection line covers different positions in the image
  • Board distances: Vary the distance to cover the working range
  • Minimum views: 5-10 views with diverse orientations

The laserline_device_session Example

cargo run -p vision-calibration --example laserline_device_session

This example uses synthetic data to demonstrate the workflow. For real data, the laser pixel extraction must be performed by an external algorithm (typically a peak detector applied to each image column).

Interpreting Results

Key outputs:

  • Laser plane normal: The direction the laser projects in. Should align with the physical mounting.
  • Laser plane distance: Distance from camera center to the laser plane. Should match the physical geometry.
  • Mean laser error: The average residual of laser observations. In pixel units for LineDistNormalized, in meters for PointToPlane.

DS8 Dataset

The data/DS8/ directory contains a real industrial laser triangulation dataset:

  • ExperimentDetails.txt: Experiment metadata
  • calibration_object.txt: Calibration pattern specification
  • images/: Captured images
  • robot_cali.txt: Robot calibration data (if applicable)

Troubleshooting

  • Poor laser plane estimate: Usually caused by insufficient view diversity or laser pixels too close to collinear (all from similar board angles)
  • High laser residuals: May indicate laser pixel detection errors, or that the laser is not well-described by a single plane (diverging laser, curved projection)
  • Scheimpflug convergence: If sensor tilt parameters diverge, try initializing with fix_sensor: true for the first few iterations, then release

CalibrationSession

CalibrationSession<P: ProblemType> is the central state container for calibration workflows. It holds all data for a calibration run — configuration, input observations, intermediate state, final output, and an audit log — with full JSON serialization for checkpointing.

Structure

#![allow(unused)]
fn main() {
pub struct CalibrationSession<P: ProblemType> {
    pub metadata: SessionMetadata,
    pub config: P::Config,
    pub state: P::State,
    pub exports: Vec<ExportRecord<P::Export>>,
    pub log: Vec<LogEntry>,
    // input and output are private — access via methods
}
}
FieldAccessPurpose
metadatapubProblem type name, schema version, timestamps
configpubAlgorithm parameters (iterations, fix masks, loss functions)
inputinput(), set_input()Observation data (set once per calibration run)
statepubMutable intermediate results (modified by step functions)
outputoutput(), set_output()Final calibration result (set by the last step)
exportspubTimestamped export records
logpubAudit trail of operations

Lifecycle

1. Create

#![allow(unused)]
fn main() {
let mut session = CalibrationSession::<PlanarIntrinsicsProblem>::new();
// Or with a description:
let mut session = CalibrationSession::<PlanarIntrinsicsProblem>::with_description(
    "Lab camera calibration 2024-01-15"
);
}

2. Set Input

#![allow(unused)]
fn main() {
session.set_input(dataset)?;
}

Input is validated via ProblemType::validate_input(). Setting input clears computed state (per the invalidation policy).

3. Configure

#![allow(unused)]
fn main() {
session.update_config(|c| {
    c.max_iters = 50;
    c.robust_loss = RobustLoss::Huber { scale: 2.0 };
})?;
}

Configuration is validated via ProblemType::validate_config().

4. Run Steps

#![allow(unused)]
fn main() {
step_init(&mut session, None)?;
step_optimize(&mut session, None)?;
}

Step functions are free functions operating on &mut CalibrationSession<P>. Each step reads input/state, performs computation, and updates state (or output).

5. Export

#![allow(unused)]
fn main() {
let export = session.export()?;
}

Creates an ExportRecord with the current timestamp and output. Multiple exports can be created (e.g., after re-optimization with different settings).

JSON Checkpointing

The entire session can be serialized and restored:

#![allow(unused)]
fn main() {
// Save
let json = session.to_json()?;
std::fs::write("calibration.json", &json)?;

// Restore
let json = std::fs::read_to_string("calibration.json")?;
let restored = CalibrationSession::<PlanarIntrinsicsProblem>::from_json(&json)?;

// Resume from where we left off
step_optimize(&mut restored, None)?;
}

This enables:

  • Interruption recovery: Save progress and resume later
  • Reproducibility: Share exact calibration state
  • Debugging: Inspect the session at any point

Invalidation Policies

When input or configuration changes, computed state may need to be cleared:

EventDefault policy
set_input()Clear state and output (CLEAR_COMPUTED)
update_config()Keep everything (KEEP_ALL)
clear_input()Clear everything (CLEAR_ALL)

These defaults can be overridden per problem type.

Audit Log

Every step function appends to the session log:

#![allow(unused)]
fn main() {
pub struct LogEntry {
    pub timestamp: u64,
    pub operation: String,
    pub success: bool,
    pub notes: Option<String>,
}
}

The log records what was done and when, useful for tracking calibration history.

Accessing State

#![allow(unused)]
fn main() {
// Input (required before steps)
let input = session.require_input()?;  // Returns error if no input

// Intermediate state (available after init)
if let Some(k) = &session.state.initial_intrinsics {
    println!("Init fx={:.1}", k.fx);
}

// Output (available after optimize)
let output = session.require_output()?;
}

ProblemType Trait

The ProblemType trait defines the interface for a calibration problem. It specifies the types for configuration, input, state, output, and export, along with validation and lifecycle hooks. The trait is intentionally minimal — behavior is implemented in external step functions, not trait methods.

Definition

#![allow(unused)]
fn main() {
pub trait ProblemType: Sized + 'static {
    type Config: Clone + Default + Serialize + DeserializeOwned + Debug;
    type Input: Clone + Serialize + DeserializeOwned + Debug;
    type State: Clone + Default + Serialize + DeserializeOwned + Debug;
    type Output: Clone + Serialize + DeserializeOwned + Debug;
    type Export: Clone + Serialize + DeserializeOwned + Debug;

    fn name() -> &'static str;
    fn schema_version() -> u32 { 1 }

    fn validate_input(input: &Self::Input) -> Result<()> { Ok(()) }
    fn validate_config(config: &Self::Config) -> Result<()> { Ok(()) }
    fn validate_input_config(
        input: &Self::Input, config: &Self::Config
    ) -> Result<()> { Ok(()) }

    fn on_input_change() -> InvalidationPolicy {
        InvalidationPolicy::CLEAR_COMPUTED
    }
    fn on_config_change() -> InvalidationPolicy {
        InvalidationPolicy::KEEP_ALL
    }

    fn export(
        output: &Self::Output, config: &Self::Config
    ) -> Result<Self::Export>;
}
}

Associated Types

TypePurposeRequirements
ConfigAlgorithm parametersDefault (sensible defaults)
InputObservation dataValidated on set
StateIntermediate resultsDefault (empty initial state)
OutputFinal calibration resultSet by last step
ExportUser-facing resultCreated from Output + Config

The Serialize + DeserializeOwned bounds enable JSON checkpointing. Clone enables snapshot operations.

Required Method: name()

Returns a stable string identifier for the problem type:

#![allow(unused)]
fn main() {
fn name() -> &'static str { "planar_intrinsics_v2" }
}

This is stored in session metadata and used for deserialization. Changing the name breaks compatibility with existing checkpoints.

Required Method: export()

Converts the internal output to a user-facing export format:

#![allow(unused)]
fn main() {
fn export(output: &Self::Output, config: &Self::Config) -> Result<Self::Export>;
}

The export may transform, filter, or enrich the output. For example, PlanarIntrinsicsProblem::export() computes reprojection error metrics from the raw optimization output.

Validation Hooks

Three optional validation hooks run at specific points:

HookWhen it runs
validate_input(input)On session.set_input(input)
validate_config(config)On session.set_config(config) or update_config()
validate_input_config(input, config)When both input and config are present

Example: PlanarIntrinsicsProblem validates that input has ≥ 3 views with ≥ 4 points each.

Invalidation Policies

Control what happens when input or config changes:

#![allow(unused)]
fn main() {
pub struct InvalidationPolicy {
    pub clear_state: bool,
    pub clear_output: bool,
    pub clear_exports: bool,
}

impl InvalidationPolicy {
    pub const KEEP_ALL: Self = ...;       // Nothing cleared
    pub const CLEAR_COMPUTED: Self = ...; // State + output cleared
    pub const CLEAR_ALL: Self = ...;      // Everything cleared
}
}

Defaults:

  • Input change → CLEAR_COMPUTED (re-running steps is needed)
  • Config change → KEEP_ALL (output is still valid, but may not reflect new config)

Existing Problem Types

TypeNameSteps
PlanarIntrinsicsProblem"planar_intrinsics_v2"init → optimize
SingleCamHandeyeProblem"single_cam_handeye"intrinsics_init → intrinsics_optimize → handeye_init → handeye_optimize
RigExtrinsicsProblem"rig_extrinsics"intrinsics_init_all → intrinsics_optimize_all → rig_init → rig_optimize
RigHandeyeProblem"rig_handeye"6 steps (intrinsics + rig + handeye)
LaserlineDeviceProblem"laserline_device"init → optimize

Design Philosophy

The trait is minimal by design:

  • No step methods: Steps are free functions, not trait methods. This allows flexible composition and avoids trait object limitations.
  • No algorithm logic: The trait defines data types and validation, not computation.
  • Serialization-first: All types are serializable, enabling checkpointing from day one.

Step Functions vs Pipeline Functions

calibration-rs offers two ways to run calibration workflows: step functions for granular control and pipeline functions for convenience.

Step Functions

Step functions are free functions that operate on a mutable session reference:

#![allow(unused)]
fn main() {
pub fn step_init(
    session: &mut CalibrationSession<PlanarIntrinsicsProblem>,
    opts: Option<IntrinsicsInitOptions>,
) -> Result<()>
}

Each step:

  1. Reads input and/or state from the session
  2. Performs one phase of the calibration (e.g., initialization or optimization)
  3. Updates the session state (and possibly output)
  4. Logs the operation

Advantages

Intermediate inspection: Examine the state between steps.

#![allow(unused)]
fn main() {
step_init(&mut session, None)?;

// Inspect initialization quality before committing to optimization
let init_k = session.state.initial_intrinsics.as_ref().unwrap();
if (init_k.fx - expected_fx).abs() / expected_fx > 0.5 {
    eprintln!("Warning: init fx={:.0} is far from expected {:.0}",
              init_k.fx, expected_fx);
}

step_optimize(&mut session, None)?;
}

Per-step configuration: Override options for individual steps.

#![allow(unused)]
fn main() {
// Use more iterations for optimization
let opts = IntrinsicsOptimizeOptions {
    max_iters: Some(200),
    ..Default::default()
};
step_optimize(&mut session, Some(opts))?;
}

Selective execution: Skip steps or re-run specific steps.

#![allow(unused)]
fn main() {
// Re-run optimization with different settings (without re-initializing)
session.update_config(|c| c.robust_loss = RobustLoss::Cauchy { scale: 3.0 })?;
step_optimize(&mut session, None)?;
}

Checkpointing: Save and restore between steps.

#![allow(unused)]
fn main() {
step_init(&mut session, None)?;
let checkpoint = session.to_json()?;
std::fs::write("after_init.json", &checkpoint)?;

step_optimize(&mut session, None)?;
}

Pipeline Functions

Pipeline functions chain all steps into a single call:

#![allow(unused)]
fn main() {
pub fn run_calibration(
    session: &mut CalibrationSession<PlanarIntrinsicsProblem>,
) -> Result<()> {
    step_init(session, None)?;
    step_optimize(session, None)?;
    Ok(())
}
}

When to Use

  • Quick prototyping: Get results with minimal code
  • Default settings: When the defaults work and you don't need inspection
  • Scripts and automation: When human inspection is not needed
#![allow(unused)]
fn main() {
let mut session = CalibrationSession::<PlanarIntrinsicsProblem>::new();
session.set_input(dataset)?;
run_calibration(&mut session)?;
let export = session.export()?;
}

Available Pipeline Functions

ProblemPipeline functionSteps
PlanarIntrinsicsProblemrun_calibration()init → optimize
SingleCamHandeyeProblemsingle_cam_handeye::run_calibration()4 steps
RigExtrinsicsProblemrig_extrinsics::run_calibration()4 steps
RigHandeyeProblemrig_handeye::run_calibration()6 steps
LaserlineDeviceProblemrun_calibration(session, config)init → optimize
ScheimpflugIntrinsicsProblemscheimpflug_intrinsics::run_calibration(session, config)init → optimize

Recommendation

Use step functions for production calibration where you need to:

  • Verify initialization quality
  • Adjust parameters between steps
  • Handle failures gracefully
  • Log and audit the process

Use pipeline functions for:

  • Examples and tutorials
  • Batch processing with known-good settings
  • Quick experiments

Adding a New Optimization Problem

This chapter walks through adding a new optimization problem to vision-calibration-optim, following the pattern established by the planar intrinsics implementation.

Overview

Adding a new problem requires these steps:

  1. Define the generic residual function in factors/
  2. Add a FactorKind variant in ir/types.rs
  3. Create a problem builder in problems/
  4. Integrate with the backend in backend/tiny_solver_backend.rs
  5. Write tests with synthetic ground truth

Step 1: Generic Residual Function

Create a new file or add to an existing file in crates/vision-calibration-optim/src/factors/:

#![allow(unused)]
fn main() {
pub fn my_residual_generic<T: RealField>(
    param_a: DVectorView<'_, T>,  // optimizable parameters
    param_b: DVectorView<'_, T>,
    constant_data: [f64; 3],      // per-residual constants (f64, not T)
    w: f64,                        // weight
) -> SVector<T, 2> {              // residual dimension
    // Convert constants to T
    let cx = T::from_f64(constant_data[0]).unwrap();

    // Compute residual using param_a, param_b, cx
    // Use .clone() liberally, avoid in-place ops
    let r_x = /* ... */;
    let r_y = /* ... */;

    let wt = T::from_f64(w).unwrap();
    SVector::<T, 2>::new(r_x * wt.clone(), r_y * wt)
}
}

Key rules:

  • Generic over T: RealField for autodiff
  • Optimizable parameters as DVectorView<'_, T>
  • Constants as f64 (converted inside)
  • Use .clone(), no in-place mutation
  • Return SVector<T, N> for fixed residual dimension

Step 2: FactorKind Variant

Add a new variant to FactorKind in crates/vision-calibration-optim/src/ir/types.rs:

#![allow(unused)]
fn main() {
pub enum FactorKind {
    // ... existing variants ...
    MyNewFactor {
        constant_data: [f64; 3],
        w: f64,
    },
}
}

Add validation in the validate method:

#![allow(unused)]
fn main() {
FactorKind::MyNewFactor { .. } => {
    ensure!(params.len() == 2, "MyNewFactor requires 2 param blocks");
    ensure!(params[0].dim == 4, "param_a must be dim 4");
    ensure!(params[1].dim == 7, "param_b must be dim 7 (SE3)");
}
}

Update the residual_dim() method:

#![allow(unused)]
fn main() {
FactorKind::MyNewFactor { .. } => 2,
}

Step 3: Problem Builder

Create crates/vision-calibration-optim/src/problems/my_problem.rs:

#![allow(unused)]
fn main() {
pub struct MyProblemParams {
    pub param_a: DVector<f64>,
    pub param_b: Iso3,
}

pub struct MySolveOptions {
    pub fix_param_a: FixedMask,
    pub robust_loss: RobustLoss,
}

pub fn build_my_problem_ir(
    data: &MyDataset,
    initial: &MyProblemParams,
    opts: &MySolveOptions,
) -> Result<(ProblemIR, HashMap<String, DVector<f64>>)> {
    let mut ir = ProblemIR::new();
    let mut initial_values = HashMap::new();

    // Add parameter blocks via the builder API
    let param_a_id = ir.add_param_block(
        "param_a", 4, ManifoldKind::Euclidean,
        opts.fix_param_a.clone(), None,
    );
    initial_values.insert("param_a".to_string(), initial.param_a.clone());

    let param_b_id = ir.add_param_block(
        "param_b", 7, ManifoldKind::SE3,
        FixedMask::all_free(), None,
    );
    initial_values.insert(
        "param_b".to_string(), iso3_to_se3_dvec(&initial.param_b),
    );

    // Add residual blocks
    for obs in &data.observations {
        ir.add_residual_block(ResidualBlock {
            params: vec![param_a_id, param_b_id],
            loss: opts.robust_loss,
            factor: FactorKind::MyNewFactor {
                constant_data: obs.constant_data,
                w: obs.weight,
            },
            residual_dim: 2,
        });
    }

    Ok((ir, initial_values))
}
}

Step 4: Backend Integration

In crates/vision-calibration-optim/src/backend/tiny_solver_backend.rs, add a match arm in compile_factor():

#![allow(unused)]
fn main() {
FactorKind::MyNewFactor { constant_data, w } => {
    let constant_data = *constant_data;
    let w = *w;
    Box::new(move |params: &[DVectorView<'_, T>]| {
        my_residual_generic(
            params[0], params[1],
            constant_data, w,
        ).as_slice().to_vec()
    })
}
}

Step 5: Tests

Write a test with synthetic ground truth in crates/vision-calibration-optim/tests/:

#![allow(unused)]
fn main() {
#[test]
fn my_problem_converges() {
    // 1. Define ground truth parameters
    let gt_param_a = /* ... */;
    let gt_param_b = /* ... */;

    // 2. Generate synthetic observations
    let data = generate_synthetic_data(&gt_param_a, &gt_param_b);

    // 3. Create perturbed initial values
    let initial = perturb(&gt_param_a, &gt_param_b);

    // 4. Build and solve
    let (ir, init_vals) = build_my_problem_ir(&data, &initial, &opts)?;
    let solution = solve_with_backend(
        BackendKind::TinySolver, &ir, &init_vals, &backend_opts,
    )?;

    // 5. Verify convergence
    let solved_a = &solution.params["param_a"];
    assert!((solved_a[0] - gt_param_a[0]).abs() < tolerance);
}
}

Checklist

  • Generic residual function with T: RealField
  • FactorKind variant with validation
  • Problem builder producing ProblemIR + initial values
  • Backend compilation for the new factor
  • Synthetic ground truth test verifying convergence
  • (Optional) Pipeline integration with session framework

Adding a New Pipeline Problem Type

This chapter describes how to create a new session-based calibration workflow in vision-calibration-pipeline, using the laserline device module as a template.

Module Structure

Create a new folder under crates/vision-calibration-pipeline/src/:

my_problem/
├── mod.rs         # Module re-exports
├── problem.rs     # ProblemType implementation + Config
├── state.rs       # Intermediate state type
└── steps.rs       # Step functions + pipeline function

Step 1: Define the Problem Type (problem.rs)

#![allow(unused)]
fn main() {
use crate::session::{ProblemType, InvalidationPolicy};

pub struct MyProblem;

#[derive(Clone, Default, Debug, Serialize, Deserialize)]
pub struct MyConfig {
    pub max_iters: usize,
    pub fix_k3: bool,
    // ... other configuration
}

#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct MyInput {
    pub views: Vec<MyView>,
}

#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct MyOutput {
    pub calibrated_params: CameraParams,
    pub mean_reproj_error: f64,
}

#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct MyExport {
    pub params: CameraParams,
    pub mean_reproj_error: f64,
}

impl ProblemType for MyProblem {
    type Config = MyConfig;
    type Input = MyInput;
    type State = MyState;  // defined in state.rs
    type Output = MyOutput;
    type Export = MyExport;

    fn name() -> &'static str { "my_problem_v1" }

    fn validate_input(input: &MyInput) -> Result<()> {
        ensure!(input.views.len() >= 3, "Need at least 3 views");
        Ok(())
    }

    fn on_input_change() -> InvalidationPolicy { InvalidationPolicy::CLEAR_COMPUTED }
    fn on_config_change() -> InvalidationPolicy { InvalidationPolicy::KEEP_ALL }

    fn export(output: &MyOutput, _config: &MyConfig) -> Result<MyExport> {
        Ok(MyExport {
            params: output.calibrated_params.clone(),
            mean_reproj_error: output.mean_reproj_error,
        })
    }
}
}

Step 2: Define the State (state.rs)

#![allow(unused)]
fn main() {
#[derive(Clone, Default, Debug, Serialize, Deserialize)]
pub struct MyState {
    // Initialization results
    pub initial_intrinsics: Option<FxFyCxCySkew<f64>>,
    pub initial_distortion: Option<BrownConrady5<f64>>,
    pub initial_poses: Option<Vec<Iso3>>,

    // Optimization results
    pub final_cost: Option<f64>,
    pub mean_reproj_error: Option<f64>,
}
}

The state must implement Default (empty state) and Clone + Serialize + Deserialize (for checkpointing).

Step 3: Implement Step Functions (steps.rs)

#![allow(unused)]
fn main() {
use crate::session::CalibrationSession;
use super::problem::MyProblem;

pub fn step_init(
    session: &mut CalibrationSession<MyProblem>,
    _opts: Option<&()>,
) -> Result<()> {
    let input = session.require_input()?;
    let config = &session.config;

    // Run linear initialization
    let intrinsics = /* ... */;
    let poses = /* ... */;

    // Update state
    session.state.initial_intrinsics = Some(intrinsics);
    session.state.initial_poses = Some(poses);

    // Log
    session.log_success("step_init", Some("Initialization complete"));
    Ok(())
}

pub fn step_optimize(
    session: &mut CalibrationSession<MyProblem>,
    _opts: Option<&()>,
) -> Result<()> {
    let input = session.require_input()?;
    let config = &session.config;

    // Require initialization
    let init_k = session.state.initial_intrinsics.as_ref()
        .context("Run step_init first")?;

    // Build and solve optimization problem
    let result = /* ... */;

    // Update state and output
    session.state.final_cost = Some(result.cost);
    session.state.mean_reproj_error = Some(result.reproj_error);
    session.set_output(MyOutput {
        calibrated_params: result.params,
        mean_reproj_error: result.reproj_error,
    })?;

    session.log_success("step_optimize", Some("Optimization complete"));
    Ok(())
}

/// Convenience pipeline function
pub fn run_calibration(
    session: &mut CalibrationSession<MyProblem>,
) -> Result<()> {
    step_init(session, None)?;
    step_optimize(session, None)?;
    Ok(())
}
}

Step 4: Module Re-exports (mod.rs)

#![allow(unused)]
fn main() {
mod problem;
mod state;
mod steps;

pub use problem::{MyProblem, MyConfig, MyInput, MyOutput, MyExport};
pub use state::MyState;
pub use steps::{step_init, step_optimize, run_calibration};
}

Step 5: Register in the Pipeline Crate

In crates/vision-calibration-pipeline/src/lib.rs:

#![allow(unused)]
fn main() {
pub mod my_problem;
}

Step 6: Wire into the Facade Crate

In crates/vision-calibration/src/lib.rs:

#![allow(unused)]
fn main() {
pub mod my_problem {
    pub use vision_calibration_pipeline::my_problem::*;
}
}

Do not add new workflows to the prelude by default; keep the prelude minimal for planar hello-world usage.

Testing

Write an integration test in crates/vision-calibration-pipeline/tests/:

#![allow(unused)]
fn main() {
#[test]
fn my_problem_session_workflow() -> Result<()> {
    let input = generate_synthetic_input();

    let mut session = CalibrationSession::<MyProblem>::new();
    session.set_input(input)?;

    step_init(&mut session, None)?;
    assert!(session.state.initial_intrinsics.is_some());

    step_optimize(&mut session, None)?;
    assert!(session.output().is_some());

    // Test JSON round-trip
    let json = session.to_json()?;
    let restored = CalibrationSession::<MyProblem>::from_json(&json)?;
    assert!(restored.output.is_some());

    Ok(())
}
}

Adding a New Solver Backend

The backend-agnostic IR design allows adding new optimization backends without modifying problem definitions. This chapter describes what a backend must implement and how to integrate it.

The OptimBackend Trait

#![allow(unused)]
fn main() {
pub trait OptimBackend {
    fn solve(
        &self,
        ir: &ProblemIR,
        initial_params: &HashMap<String, DVector<f64>>,
        opts: &BackendSolveOptions,
    ) -> Result<BackendSolution>;
}
}

A backend receives:

  • ir: The problem structure (parameter blocks, residual blocks, factor kinds)
  • initial_params: Initial values for all parameter blocks (keyed by name)
  • opts: Solver options (max iterations, tolerances, verbosity)

And returns:

  • BackendSolution: Optimized parameter values (keyed by parameter name) and a solve report

What a Backend Must Handle

1. Parameter Blocks

For each ParamBlock, the backend must:

  • Allocate storage for a parameter vector of the given dimension
  • Initialize from the provided initial values
  • Apply the manifold (if not Euclidean)
  • Respect the fixed mask (hold specified indices constant)
  • Apply box constraints (if bounds are specified)

2. Manifold Constraints

The backend must implement the plus () and minus () operations for each ManifoldKind:

ManifoldAmbient dimTangent dimPlus operation
Euclidean
SE376
SO343
S232Retract via tangent plane basis

3. Residual Evaluation

For each ResidualBlock, the backend must:

  • Call the appropriate residual function based on FactorKind
  • Pass the correct parameter block values (referenced by ParamId)
  • Include per-residual constant data (3D points, observed pixels, weights)
  • Compute Jacobians (via autodiff or finite differences)

4. Robust Loss Functions

The backend must apply the RobustLoss to each residual:

  • None → standard squared loss
  • Huber { scale } → Huber loss with the given scale
  • Cauchy { scale } → Cauchy loss
  • Arctan { scale } → Arctan loss

5. Solution Extraction

Return optimized values as a HashMap<String, DVector<f64>> keyed by parameter block name (not ID).

Implementation Pattern

A typical backend has two phases:

Compile Phase

Translate the IR into solver-specific data structures:

#![allow(unused)]
fn main() {
fn compile(&self, ir: &ProblemIR) -> SolverProblem {
    for param in &ir.params {
        // Create solver parameter with manifold and fixing
    }
    for residual in &ir.residuals {
        // Create solver cost function from FactorKind
    }
}
}

Solve Phase

Run the optimizer and extract results:

#![allow(unused)]
fn main() {
fn solve(&self, problem: SolverProblem, opts: &BackendSolveOptions)
    -> BackendSolution
{
    // Set convergence criteria from opts
    // Run optimization loop
    // Extract final parameter values
    // Build SolveReport
}
}

Potential Backends

BackendDescriptionAdvantages
tiny-solverCurrent. Rust-native LM.Pure Rust, no external deps
Ceres-RSRust bindings to Google CeresBattle-tested, many features
Custom GNHand-written Gauss-NewtonFull control, educational
L-BFGSQuasi-Newton for large problemsMemory-efficient

Testing

A new backend should pass the same convergence tests as the existing backend:

#![allow(unused)]
fn main() {
#[test]
fn new_backend_planar_converges() {
    // Same synthetic data and initial values as tiny-solver tests
    let (ir, init) = build_planar_test_problem();

    let solution = MyNewBackend.solve(&ir, &init, &opts)?;

    // Verify same convergence quality
    assert!(solution.report.final_cost < 1e-4);
}
}

Run the full test suite with both backends to ensure equivalent results.

Synthetic Data Generation

[COLLAB] This appendix benefits from user collaboration on recommended noise levels and realistic test scenarios.

calibration-rs provides utilities for generating synthetic calibration data, used in examples and tests. Synthetic data allows testing with known ground truth, verifying convergence, and debugging algorithm issues.

Board Point Generation

#![allow(unused)]
fn main() {
use vision_calibration::synthetic::planar;

// 8×6 grid of 3D points at Z=0, spaced 40mm apart
let board_points = planar::grid_points(8, 6, 0.04);
// Returns Vec<Pt3>: [(0,0,0), (0.04,0,0), ..., (0.28,0.20,0)]
}

Points are generated in the plane with the specified column count, row count, and spacing (in meters).

Pose Generation

#![allow(unused)]
fn main() {
// 6 camera poses with varying yaw and distance
let poses = planar::poses_yaw_y_z(
    6,      // number of views
    -0.2,   // start yaw (radians)
    0.08,   // yaw step
    0.5,    // start Z distance (meters)
    0.05,   // Z step
);
// Returns Vec<Iso3>: camera-to-board transforms
}

Poses are generated looking at the board center, with rotation around the Y axis (yaw) and varying distance along Z.

Projection

#![allow(unused)]
fn main() {
// Project board points through all poses
let views = planar::project_views_all(&camera, &board_points, &poses)?;
// Returns Vec<CorrespondenceView>: 2D-3D point pairs per view
}

This applies the full camera model (projection + distortion + intrinsics) to generate pixel observations.

Adding Noise

For realistic testing, add Gaussian noise to the projected pixels:

#![allow(unused)]
fn main() {
use rand::Rng;
let mut rng = rand::thread_rng();
let noise_sigma = 0.5; // pixels

for view in &mut views {
    for pixel in view.points_2d.iter_mut() {
        pixel.x += rng.gen::<f64>() * noise_sigma;
        pixel.y += rng.gen::<f64>() * noise_sigma;
    }
}
}

Typical noise levels:

ScenarioNoise sigma (px)
Ideal (testing convergence)0.0
High-quality detector0.1 - 0.3
Standard detector0.3 - 1.0
Noisy conditions1.0 - 3.0

Deterministic Seeds

All examples and tests use fixed random seeds for reproducibility:

#![allow(unused)]
fn main() {
use rand::SeedableRng;
let rng = rand::rngs::StdRng::seed_from_u64(42);
}

Common Test Scenarios

Minimal (3 views, no noise)

For verifying algorithm correctness. Should converge to machine precision.

Moderate (6-10 views, 0.5 px noise)

For testing realistic convergence. Should achieve <1% intrinsics error after optimization.

Challenging (20 views, 1.0 px noise, outliers)

For testing robustness. Should achieve <2% intrinsics error with robust loss functions.

Distortion stress test

Use large distortion (, ) to verify the iterative intrinsics solver handles strong distortion.

Data Types Quick Reference

Core Math Types

All based on nalgebra with f64 precision:

TypeDefinitionDescription
Realf64Scalar type
Pt2Point2<f64>2D point
Pt3Point3<f64>3D point
Vec2Vector2<f64>2D vector
Vec3Vector3<f64>3D vector
Mat3Matrix3<f64>3×3 matrix
Mat4Matrix4<f64>4×4 matrix
Iso3Isometry3<f64>SE(3) rigid transform

Camera Model Types

TypeParametersDescription
FxFyCxCySkew<S>fx, fy, cx, cy, skewIntrinsics matrix
BrownConrady5<S>k1, k2, k3, p1, p2, itersDistortion model
Pinhole(none)Projection model
IdentitySensor(none)No sensor tilt
ScheimpflugParamstilt_x, tilt_yScheimpflug tilt
Camera<S,P,D,Sm,K>proj, dist, sensor, kComposable camera

Observation Types

TypeFieldsDescription
CorrespondenceViewpoints_3d, points_2d, weights2D-3D correspondences
View<Meta>obs, metaObservation + metadata
PlanarDatasetviews: Vec<View<NoMeta>>Planar calibration input
RigView<Meta>obs: RigViewObs, metaMulti-camera view
RigDataset<Meta>num_cameras, viewsMulti-camera input
ReprojectionStatsmean, rms, max, countError statistics

Fix Masks

TypeFieldsDescription
IntrinsicsFixMaskfx, fy, cx, cy (bool each)Fix individual intrinsics
DistortionFixMaskk1, k2, k3, p1, p2 (bool each)Fix individual distortion params
CameraFixMaskintrinsics, distortionCombined camera fix mask
FixedMaskopaque (all_free(), all_fixed(dim), fix_indices(&[usize]))IR-level parameter fixing

Optimization IR Types

TypeDescription
ProblemIRComplete optimization problem
ParamBlockParameter group (id, name, dim, manifold, fixed)
ResidualBlockResidual definition (params, loss, factor)
ParamIdUnique parameter identifier
FactorKindResidual computation type (enum)
ManifoldKindParameter geometry (Euclidean, SE3, SO3, S2)
RobustLossLoss function (None, Huber, Cauchy, Arctan)

Session Types

TypeDescription
CalibrationSession<P>Generic session container
SessionMetadataProblem name, version, timestamps
LogEntryAudit log entry (timestamp, operation, success)
ExportRecord<E>Timestamped export
InvalidationPolicyWhat to clear on input/config change

Backend Types

TypeDescription
BackendSolveOptionsSolver settings (max_iters, tolerances, etc.)
BackendSolutionOptimized params + solve report
SolveReportInitial/final cost, iterations, termination
LinearSolverTypeSparseCholesky or SparseQR

Hand-Eye Types

TypeDescription
HandEyeModeEyeInHand or EyeToHand
HandeyeMetabase_se3_gripper: Iso3
RobotPoseMetabase_se3_gripper: Iso3

References and Further Reading

[COLLAB] Please help curate and prioritize this reference list, and add any additional references.

Foundational Textbooks

  • Hartley, R.I. & Zisserman, A. (2004). Multiple View Geometry in Computer Vision. 2nd edition. Cambridge University Press. — The definitive reference for projective geometry, fundamental/essential matrices, homography, triangulation, and bundle adjustment.

  • Ma, Y., Soatto, S., Kosecka, J., & Sastry, S.S. (2004). An Invitation to 3-D Vision. Springer. — Rigorous treatment of Lie groups and geometric vision with emphasis on SE(3) and optimization on manifolds.

Camera Calibration

  • Zhang, Z. (2000). "A Flexible New Technique for Camera Calibration." IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11), 1330-1334. — Zhang's planar calibration method: intrinsics from homographies via the image of the absolute conic.

  • Brown, D.C. (1966). "Decentering Distortion of Lenses." Photometric Engineering, 32(3), 444-462. — The original Brown-Conrady distortion model with radial and tangential components.

Minimal Solvers

  • Nister, D. (2004). "An Efficient Solution to the Five-Point Relative Pose Problem." IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6), 756-770. — The 5-point essential matrix solver used in stereo and visual odometry.

  • Kneip, L., Scaramuzza, D., & Siegwart, R. (2011). "A Novel Parametrization of the Perspective-Three-Point Problem for a Direct Computation of Absolute Camera Position and Orientation." IEEE Conference on Computer Vision and Pattern Recognition (CVPR). — The P3P solver used for camera pose estimation.

Hand-Eye Calibration

  • Tsai, R.Y. & Lenz, R.K. (1989). "A New Technique for Fully Autonomous and Efficient 3D Robotics Hand/Eye Calibration." IEEE Transactions on Robotics and Automation, 5(3), 345-358. — The Tsai-Lenz method for solving AX=XB.

Robust Estimation

  • Fischler, M.A. & Bolles, R.C. (1981). "Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography." Communications of the ACM, 24(6), 381-395. — The original RANSAC paper.

  • Hartley, R.I. (1997). "In Defense of the Eight-Point Algorithm." IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(6), 580-593. — Demonstrates that data normalization makes the 8-point algorithm competitive with iterative methods.

Non-Linear Optimization

  • Levenberg, K. (1944). "A Method for the Solution of Certain Non-Linear Problems in Least Squares." Quarterly of Applied Mathematics, 2(2), 164-168.

  • Marquardt, D.W. (1963). "An Algorithm for Least-Squares Estimation of Nonlinear Parameters." Journal of the Society for Industrial and Applied Mathematics, 11(2), 431-441.

  • Triggs, B., McLauchlan, P.F., Hartley, R.I., & Fitzgibbon, A.W. (2000). "Bundle Adjustment — A Modern Synthesis." International Workshop on Vision Algorithms. Springer. — Comprehensive survey of bundle adjustment techniques.

Lie Groups in Vision

  • Sola, J., Deray, J., & Atchuthan, D. (2018). "A Micro Lie Theory for State Estimation in Robotics." arXiv:1812.01537. — Accessible introduction to Lie groups for robotics and vision, covering SO(3), SE(3), and their tangent spaces.