Introduction
calibration-rs is a Rust library for camera calibration — the process of estimating the internal parameters (intrinsics, lens distortion) and external parameters (pose, rig geometry, hand-eye transforms) of camera systems from observed correspondences between known 3D points and their 2D projections.
Who This Book Is For
This book targets engineers and researchers working in machine vision, robotics, and 3D reconstruction who want to:
- Calibrate single cameras, stereo rigs, or multi-camera systems
- Perform hand-eye calibration for cameras mounted on robot arms
- Calibrate laser triangulation devices (camera + laser plane)
- Understand the mathematical foundations behind calibration algorithms
We assume familiarity with linear algebra (SVD, eigendecomposition, least squares) and basic projective geometry (homogeneous coordinates, projection matrices). Lie group theory (SO(3), SE(3)) is introduced when needed.
What calibration-rs Provides
The library covers the full calibration pipeline:
- Linear initialization — closed-form solvers (Zhang's method, DLT, PnP, Tsai-Lenz hand-eye, epipolar geometry) that produce approximate parameter estimates
- Non-linear refinement — Levenberg-Marquardt bundle adjustment that minimizes reprojection error to sub-pixel accuracy
- Session framework — a high-level API with step functions, configuration, and JSON checkpointing
Workspace Structure
calibration-rs is organized as a 5-crate Rust workspace with a layered architecture:
vision-calibration (facade)
→ vision-calibration-pipeline (sessions, workflows)
→ vision-calibration-optim (non-linear refinement)
→ vision-calibration-linear (initialization)
→ vision-calibration-core (primitives, camera models, RANSAC)
The key dependency rule: the linear and optimization crates are peers — they both depend on vision-calibration-core but not on each other. This separation keeps initialization algorithms independent of the optimization backend.
Relation to OpenCV
Readers familiar with OpenCV's calibration module will find analogous functionality throughout:
| OpenCV | calibration-rs |
|---|---|
cv::calibrateCamera | Planar intrinsics pipeline (Zhang init + bundle adjustment) |
cv::solvePnP | PnpSolver::p3p(), PnpSolver::dlt() |
cv::findHomography | HomographySolver::dlt(), dlt_homography_ransac() |
cv::findFundamentalMat | EpipolarSolver::fundamental_8point() |
cv::findEssentialMat | EpipolarSolver::essential_5point() |
cv::stereoCalibrate | Rig extrinsics pipeline |
calibration-rs differs from OpenCV in several ways: it is written in pure Rust, uses a composable camera model with generic type parameters, provides a backend-agnostic optimization IR, and offers a session framework with JSON checkpointing for production workflows.
Book Organization
The book is structured in seven parts:
- Part I: Camera Model — the composable projection pipeline (pinhole, distortion, sensor tilt, intrinsics)
- Part II: Geometric Primitives — rigid transforms, RANSAC, robust loss functions
- Part III: Linear Initialization — all closed-form solvers with full mathematical derivations
- Part IV: Non-Linear Optimization — Levenberg-Marquardt, manifold constraints, autodiff, the IR architecture
- Part V: Calibration Workflows — end-to-end pipelines for each problem type, with both synthetic and real data examples
- Part VI: Session Framework — the high-level
CalibrationSessionAPI - Part VII: Extending the Library — adding new problems, backends, and pipeline types
Each algorithm chapter includes a formal problem statement, objective function, assumptions, and a full derivation leading to the implementation.
Quickstart
This chapter walks through a minimal camera calibration using synthetic data. By the end you will have estimated camera intrinsics and lens distortion from simulated observations of a planar calibration board.
Add the Dependency
[dependencies]
vision-calibration = "0.2"
Minimal Example
The following program generates synthetic calibration data, runs the two-step calibration pipeline (linear initialization followed by non-linear refinement), and prints the results:
use anyhow::Result; use vision_calibration::planar_intrinsics::{step_init, step_optimize}; use vision_calibration::prelude::*; use vision_calibration::synthetic::planar; fn main() -> Result<()> { // 1. Define ground truth camera let k_gt = FxFyCxCySkew { fx: 800.0, fy: 780.0, cx: 640.0, cy: 360.0, skew: 0.0, }; let dist_gt = BrownConrady5 { k1: 0.05, k2: -0.02, k3: 0.0, p1: 0.001, p2: -0.001, iters: 8, }; let camera = make_pinhole_camera(k_gt, dist_gt); // 2. Generate synthetic observations let board = planar::grid_points(8, 6, 0.04); // 8×6 grid, 40 mm squares let poses = planar::poses_yaw_y_z(6, -0.2, 0.08, 0.5, 0.05); let views = planar::project_views_all(&camera, &board, &poses)?; // 3. Create session and set input let dataset = PlanarDataset::new( views.into_iter().map(View::without_meta).collect() )?; let mut session = CalibrationSession::<PlanarIntrinsicsProblem>::new(); session.set_input(dataset)?; // 4. Run calibration step_init(&mut session, None)?; // Linear initialization step_optimize(&mut session, None)?; // Non-linear refinement // 5. Inspect results let export = session.export()?; let k = export.params.intrinsics(); println!("fx={:.1}, fy={:.1}, cx={:.1}, cy={:.1}", k.fx, k.fy, k.cx, k.cy); println!("Mean reprojection error: {:.4} px", export.mean_reproj_error); Ok(()) }
What Happens in Each Step
Step 1: step_init — Linear Initialization
- Computes a homography from each view's 2D-3D correspondences using DLT (direct linear transform)
- Estimates intrinsics from the homographies using Zhang's method
- Estimates lens distortion coefficients from homography residuals
- Iterates steps 2-3 to refine the joint + distortion estimate
- Recovers each view's camera pose from its homography and
After this step, intrinsics are typically within 10-40% of the true values — good enough to initialize non-linear optimization.
Step 2: step_optimize — Bundle Adjustment
Runs Levenberg-Marquardt optimization minimizing the total reprojection error:
where is the camera projection function, is the pose of view , are distortion coefficients, are known 3D board points, and are observed pixel coordinates.
After optimization, expect <2% error on intrinsics and <1 px mean reprojection error.
Running the Built-in Example
The library ships with this example (and several others):
cargo run -p vision-calibration --example planar_synthetic
Alternative: One-Line Pipeline
If you don't need to inspect intermediate results:
#![allow(unused)] fn main() { use vision_calibration::planar_intrinsics::run_calibration; run_calibration(&mut session)?; }
Next Steps
- Architecture Overview — understand the crate structure
- Composable Camera Pipeline — the projection model in detail
- Planar Intrinsics Calibration — the full mathematical treatment
Architecture Overview
calibration-rs is organized as a layered workspace of Rust crates plus Python bindings. This chapter explains the dependency structure, data flow, and key design patterns.
Crate Dependency Graph
vision-calibration (facade: re-exports everything)
│
└── vision-calibration-pipeline (sessions, workflows, step functions)
│
├── vision-calibration-optim (non-linear optimization)
│ │
│ └── vision-calibration-core
│
└── vision-calibration-linear (closed-form initialization)
│
└── vision-calibration-core
Key rule: vision-calibration-linear and vision-calibration-optim are peers. They both depend on vision-calibration-core but never on each other. This keeps initialization algorithms free of optimization dependencies and vice versa.
Crate Responsibilities
vision-calibration-core
The foundation layer providing:
- Math types — nalgebra-based aliases:
Pt2,Pt3,Vec3,Mat3,Iso3,Real(=f64) - Camera models — composable trait-based pipeline:
ProjectionModel,DistortionModel,SensorModel,IntrinsicsModel - Observation types —
CorrespondenceView(2D-3D point pairs),View<Meta>,PlanarDataset,RigDataset - RANSAC engine — generic
Estimatortrait with configurable options - Synthetic data utilities — grid generation, pose sampling, projection
- Reprojection error computation — single-camera and multi-camera rig
vision-calibration-linear
Closed-form initialization solvers. Each produces an approximate estimate suitable for seeding non-linear optimization:
| Solver | Input | Output |
|---|---|---|
| Zhang's method | Homographies | Intrinsics |
| Distortion fit | + homographies | Brown-Conrady coefficients |
| Iterative intrinsics | Observations | Joint + distortion |
| Homography DLT | 2D-2D correspondences | homography |
| Planar pose | + homography | SE(3) pose |
| P3P / DLT PnP | 3D-2D + | SE(3) pose |
| 5-point essential | Normalized correspondences | Essential matrix |
| 8-/7-point fundamental | Pixel correspondences | Fundamental matrix |
| Tsai-Lenz hand-eye | Robot + camera motions | Hand-eye SE(3) |
| Rig extrinsics | Per-camera poses | Camera-to-rig SE(3) |
| Laser plane | Laser pixels + target poses | Plane (normal + distance) |
vision-calibration-optim
Non-linear refinement with a backend-agnostic architecture:
- IR (Intermediate Representation) —
ProblemIRwithParamBlockandResidualBlocktypes that describe optimization problems independently of any solver - Factors — generic residual functions parameterized over
RealFieldfor automatic differentiation - Backends — currently
TinySolverBackend(Levenberg-Marquardt with sparse linear solvers) - Problem builders — domain-specific functions that construct IR from calibration data
vision-calibration-pipeline
The session framework providing production-ready workflows:
CalibrationSession<P: ProblemType>— generic state container with config, input, state, output, exports- Step functions — free functions operating on
&mut CalibrationSession<P>(e.g.,step_init,step_optimize) - Pipeline functions — convenience wrappers chaining all steps
- JSON checkpointing — full serialization for session persistence
- Six problem types:
PlanarIntrinsicsProblem,ScheimpflugIntrinsicsProblem,SingleCamHandeyeProblem,RigExtrinsicsProblem,RigHandeyeProblem,LaserlineDeviceProblem
vision-calibration
Unified facade crate that re-exports everything through a clean module hierarchy:
#![allow(unused)] fn main() { use vision_calibration::prelude::*; // Minimal planar hello-world imports use vision_calibration::planar_intrinsics::*; // Planar workflow use vision_calibration::core::*; // Math types use vision_calibration::linear::*; // Linear solvers use vision_calibration::optim::*; // Optimization }
vision-calibration-py
Python bindings crate (maturin/PyO3) exposing high-level session workflows.
Data Flow
Every calibration workflow follows the same pattern:
Observations (2D-3D correspondences)
│
▼
Linear Initialization (vision-calibration-linear)
│ Closed-form solvers: ~5-40% accuracy
▼
Non-Linear Refinement (vision-calibration-optim)
│ Levenberg-Marquardt: <2% accuracy, <1 px reprojection
▼
Calibrated Parameters (K, distortion, poses, ...)
The session framework wraps this flow with configuration, state tracking, and checkpointing:
CalibrationSession::new()
│
▼
session.set_input(dataset)
│
▼
step_init(&mut session) ← linear initialization
│
▼
step_optimize(&mut session) ← non-linear refinement
│
▼
session.export() ← calibrated parameters
Design Patterns
Composable Camera Model
The camera projection pipeline is built from four independent traits composed via generics:
pixel = K(sensor(distortion(projection(direction))))
Each stage can be mixed and matched. For example, a standard camera uses Pinhole + BrownConrady5 + IdentitySensor + FxFyCxCySkew, while a laser profiler might use Pinhole + BrownConrady5 + ScheimpflugParams + FxFyCxCySkew.
Backend-Agnostic Optimization
Problems are defined as an intermediate representation (IR) that is independent of any specific solver. The IR is then compiled to a solver-specific form:
Problem Builder → ProblemIR → Backend.compile() → Backend.solve()
(generic) (solver-specific)
This allows swapping the optimization backend without changing problem definitions.
Step Functions
Calibration workflows are decomposed into discrete steps implemented as free functions. This allows:
- Inspection of intermediate state between steps
- Resumption from any point (via JSON checkpointing)
- Customization of per-step options
- Composition of steps from different problem types
Composable Camera Pipeline
A camera model maps 3D points in the camera frame to 2D pixel coordinates. In calibration-rs, this mapping is decomposed into four composable stages, each implemented as a trait:
where:
- — projection: maps a 3D direction to normalized coordinates on the image plane
- — distortion: warps normalized coordinates to model lens imperfections
- — sensor: applies a homography for tilted sensor planes (identity for standard cameras)
- — intrinsics: scales and translates to pixel coordinates
The Camera Struct
The camera is a generic struct parameterized over the four model traits:
#![allow(unused)] fn main() { pub struct Camera<S, P, D, Sm, K> where S: RealField, P: ProjectionModel<S>, D: DistortionModel<S>, Sm: SensorModel<S>, K: IntrinsicsModel<S>, { pub proj: P, pub dist: D, pub sensor: Sm, pub k: K, } }
The scalar type S is generic over RealField, enabling the same camera to work with f64 for evaluation and with dual numbers for automatic differentiation.
Forward Projection
Given a 3D point in the camera frame, projection to pixel coordinates proceeds through the four stages:
#![allow(unused)] fn main() { pub fn project_point_c(&self, p_c: &Vector3<S>) -> Option<Point2<S>> }
-
Projection: Compute normalized coordinates . For pinhole: . Returns
Noneif the point is behind the camera (). -
Distortion: Apply lens distortion . Warps normalized coordinates according to the distortion model (e.g., Brown-Conrady radial + tangential).
-
Sensor transform: Apply sensor model . For standard cameras this is identity; for tilted sensors it applies a homography.
-
Intrinsics: Map to pixels . Applies focal lengths, principal point, and optional skew.
Inverse (Back-Projection)
The inverse maps a pixel to a ray in the camera frame:
#![allow(unused)] fn main() { pub fn backproject_pixel(&self, px: &Point2<S>) -> Ray<S> }
- — pixel to sensor coordinates
- — sensor to distorted normalized
- — undistort (iterative)
- — normalized to 3D direction
The result is a point on the plane in the camera frame, defining a ray from the camera center.
Common Type Aliases
For the most common configuration (pinhole + Brown-Conrady + no tilt):
#![allow(unused)] fn main() { type PinholeCamera = Camera<f64, Pinhole, BrownConrady5<f64>, IdentitySensor, FxFyCxCySkew<f64>>; }
The make_pinhole_camera(k, dist) convenience function constructs this type.
Projecting World Points
To project a world point through a posed camera, first transform from world to camera frame using the extrinsic pose :
Then apply the camera projection:
Trait Composition
Each stage is an independent trait. This design allows mixing and matching:
| Projection | Distortion | Sensor | Intrinsics |
|---|---|---|---|
Pinhole | NoDistortion | IdentitySensor | FxFyCxCySkew |
Pinhole | BrownConrady5 | IdentitySensor | FxFyCxCySkew |
Pinhole | BrownConrady5 | HomographySensor | FxFyCxCySkew |
Note: ScheimpflugParams stores the tilt angles and has a compile() method that produces a HomographySensor suitable for the Camera struct. See the Sensor Models chapter.
The subsequent sections detail each stage.
Pinhole Projection
The pinhole model is the simplest and most widely used camera projection. It models a camera as an ideal point through which all light rays pass — no lens, no aperture, just a geometric center of projection.
Mathematical Definition
Given a 3D point in the camera frame, the pinhole projection maps it to normalized image coordinates:
This is perspective division: the point is projected onto the plane. The projection is defined only for points in front of the camera ().
The inverse (back-projection) maps normalized coordinates to a 3D direction:
This defines a ray from the camera center through the image point.
Camera Frame Convention
calibration-rs uses a right-handed camera frame:
- : right
- : down
- : forward (optical axis)
This matches the convention used by OpenCV and most computer vision libraries. A point with positive is in front of the camera; negative is behind it.
The ProjectionModel Trait
#![allow(unused)] fn main() { pub trait ProjectionModel<S: RealField> { fn project_dir(&self, dir_c: &Vector3<S>) -> Option<Point2<S>>; fn unproject_dir(&self, n: &Point2<S>) -> Vector3<S>; } }
The Pinhole struct implements this trait:
#![allow(unused)] fn main() { pub struct Pinhole; impl<S: RealField> ProjectionModel<S> for Pinhole { fn project_dir(&self, dir_c: &Vector3<S>) -> Option<Point2<S>> { if dir_c.z > S::zero() { Some(Point2::new( dir_c.x.clone() / dir_c.z.clone(), dir_c.y.clone() / dir_c.z.clone(), )) } else { None } } fn unproject_dir(&self, n: &Point2<S>) -> Vector3<S> { Vector3::new(n.x.clone(), n.y.clone(), S::one()) } } }
Note the use of .clone() — this is required for compatibility with dual numbers used in automatic differentiation (see Autodiff and Generic Residual Functions).
Limitations
The pinhole model assumes:
- No lens distortion — addressed by the distortion stage
- No sensor tilt — addressed by the sensor stage
- Central projection — all rays pass through a single point
For cameras with significant distortion (wide-angle, fisheye), the distortion model corrects for lens effects after pinhole projection to normalized coordinates.
Brown-Conrady Distortion
Real lenses introduce geometric distortion: straight lines in the world project as curves in the image. The Brown-Conrady model captures the two dominant distortion effects — radial and tangential — as polynomial functions of the distance from the optical axis.
Distortion Model
Given undistorted normalized coordinates , the distorted coordinates are:
Radial distortion (barrel/pincushion):
Tangential distortion (decentering):
Combined:
The model has five parameters: radial coefficients and tangential coefficients .
OpenCV equivalence: This is identical to OpenCV's
distortPointswith the 5-parameter model(k1, k2, p1, p2, k3). Note OpenCV's parameter ordering differs from the mathematical ordering.
Physical Interpretation
- : barrel distortion (lines bow outward from center)
- : pincushion distortion (lines bow inward)
- : higher-order radial corrections
- : tangential distortion from imperfect lens-sensor alignment (lens elements not perfectly centered)
Typical values for industrial cameras: , , .
When to Fix
The sixth-order radial term is only significant for wide-angle lenses where reaches large values. For typical industrial cameras with moderate field of view:
- is poorly constrained by the data and often absorbs noise
- Estimating can cause overfitting, leading to worse extrapolation outside the calibration region
- The library defaults to
fix_k3: truein most problem configurations
Recommendation: Only estimate with high-quality data covering the full image area, or for lenses with field of view > 90°.
Undistortion (Inverse)
Given distorted coordinates , recovering undistorted coordinates requires inverting the distortion model. Since the forward model is a polynomial without a closed-form inverse, calibration-rs uses iterative fixed-point refinement:
This rearranges the distortion equation to isolate and iterates to convergence. The default is 8 iterations, which provides accuracy well below sensor noise for typical distortion magnitudes.
Convergence
The fixed-point iteration converges when the Jacobian of the distortion residual has spectral radius less than 1, which holds for physically realistic distortion values (small , relative to ). For extreme distortion, more iterations may be needed via the iters parameter.
OpenCV equivalence:
cv::undistortPointsperforms the same iterative inversion.
The DistortionModel Trait
#![allow(unused)] fn main() { pub trait DistortionModel<S: RealField> { fn distort(&self, n_undist: &Point2<S>) -> Point2<S>; fn undistort(&self, n_dist: &Point2<S>) -> Point2<S>; } }
The BrownConrady5<S> struct:
#![allow(unused)] fn main() { pub struct BrownConrady5<S: RealField> { pub k1: S, pub k2: S, pub k3: S, pub p1: S, pub p2: S, pub iters: u32, // undistortion iterations (default: 8) } }
NoDistortion is the identity implementation for distortion-free cameras.
Distortion in the Projection Pipeline
Distortion operates in normalized image coordinates (after pinhole projection, before intrinsics):
This is the standard convention: distortion is defined on the plane, not in pixel space. The advantage is that distortion parameters are independent of image resolution and focal length.
Intrinsics Matrix
The intrinsics matrix maps from the sensor coordinate system (normalized or sensor-transformed coordinates) to pixel coordinates. It encodes the camera's internal geometric properties: focal length, principal point, and optional skew.
Definition
where:
- — focal lengths in pixels (horizontal and vertical). These combine the physical focal length with the pixel pitch: where is the focal length in mm and is the pixel width in mm.
- — principal point in pixels. The projection of the optical axis onto the image plane. Typically near the image center but not exactly at it.
- — skew coefficient. Non-zero only if pixel axes are not perpendicular. Effectively zero for all modern sensors; calibration-rs defaults to
zero_skew: truein most configurations.
Forward and Inverse Transform
Sensor to pixel (forward):
where are sensor coordinates (output of the distortion + sensor stages).
Pixel to sensor (inverse):
The IntrinsicsModel Trait
#![allow(unused)] fn main() { pub trait IntrinsicsModel<S: RealField> { fn sensor_to_pixel(&self, sensor: &Point2<S>) -> Point2<S>; fn pixel_to_sensor(&self, pixel: &Point2<S>) -> Point2<S>; } }
The FxFyCxCySkew<S> struct implements this trait:
#![allow(unused)] fn main() { pub struct FxFyCxCySkew<S: RealField> { pub fx: S, pub fy: S, pub cx: S, pub cy: S, pub skew: S, } }
Coordinate Utilities
The coordinate_utils module provides convenience functions that combine intrinsics with distortion:
pixel_to_normalized(pixel, K)— apply to get normalized coordinates on the planenormalized_to_pixel(normalized, K)— apply to get pixel coordinatesundistort_pixel(pixel, K, distortion)— pixel → normalized → undistort → return normalizeddistort_to_pixel(normalized, K, distortion)— distort → apply → return pixel
These functions are used extensively in the linear initialization algorithms, which need to convert between pixel and normalized coordinate spaces.
Aspect Ratio
For most cameras, because pixels are not perfectly square. The ratio reflects the pixel aspect ratio. Typical industrial cameras have aspect ratios very close to 1.0 (within 1-3%).
OpenCV equivalence:
cv::initCameraMatrix2Dprovides initial intrinsics estimates. The matrix format is identical to OpenCV'scameraMatrix.
Sensor Models and Scheimpflug Tilt
In standard cameras, the sensor plane is perpendicular to the optical axis. Some industrial cameras — particularly laser triangulation profilers — deliberately tilt the sensor to achieve a larger depth of field along the laser plane (the Scheimpflug condition). The sensor model stage accounts for this tilt.
The SensorModel Trait
#![allow(unused)] fn main() { pub trait SensorModel<S: RealField> { fn normalized_to_sensor(&self, n: &Point2<S>) -> Point2<S>; fn sensor_to_normalized(&self, s: &Point2<S>) -> Point2<S>; } }
The sensor model sits between distortion and intrinsics in the pipeline:
IdentitySensor
For standard cameras with untilted sensors:
This is the default and most common case.
Scheimpflug Tilt Model
The Scheimpflug condition states that when the lens plane, image plane, and object plane intersect along a common line, the entire object plane is in sharp focus. Laser profilers exploit this by tilting the sensor to keep the laser line in focus.
Tilt Projection Matrix
The tilt is parameterized by two rotation angles:
- — tilt around the horizontal (X) axis
- — tilt around the vertical (Y) axis
The tilt homography is constructed as:
The projection onto the tilted sensor plane normalizes by the third row of applied to the point in homogeneous coordinates:
where normalizes by the -component (perspective division through the tilted plane). In practice, the first two rows of divided by the third row give a rational transform in the image plane.
OpenCV equivalence: This matches OpenCV's
computeTiltProjectionMatrixused in the 14-parameter rational model (CALIB_TILTED_MODEL).
ScheimpflugParams
#![allow(unused)] fn main() { pub struct ScheimpflugParams { pub tilt_x: f64, // tau_x in radians pub tilt_y: f64, // tau_y in radians } }
The compile() method generates a HomographySensor containing the homography and its precomputed inverse. ScheimpflugParams::default() returns zero tilt (identity sensor).
HomographySensor
The general homography sensor applies an arbitrary projective transform:
This is used as the runtime representation compiled from ScheimpflugParams.
When to Use Scheimpflug
- Laser triangulation profilers: The laser plane is at an angle to the camera axis. Tilting the sensor brings the entire laser line into focus.
- Machine vision with tilted object planes: When the depth of field is insufficient to keep the entire object in focus.
For standard cameras (no tilt), use IdentitySensor (the default).
Optimization of Tilt Parameters
In the non-linear optimization stage, Scheimpflug parameters are treated as additional optimization variables. The tilt_projection_matrix_generic<T>() function in the factor module computes the tilt homography using dual numbers for automatic differentiation, enabling joint optimization of tilt with intrinsics, distortion, and poses.
Serialization and Runtime-Dynamic Types
The generic Camera<S, P, D, Sm, K> type provides compile-time composition of camera stages. However, for JSON serialization (session checkpointing, data exchange) and runtime-dynamic camera construction, calibration-rs provides an enum-based parameter system.
CameraParams
The CameraParams struct holds camera parameters as serializable enums:
#![allow(unused)] fn main() { pub struct CameraParams { pub projection: ProjectionParams, pub distortion: DistortionParams, pub sensor: SensorParams, pub intrinsics: IntrinsicsParams, } }
Each component is an enum of supported variants:
#![allow(unused)] fn main() { pub enum ProjectionParams { Pinhole } pub enum DistortionParams { None, BrownConrady5 { k1, k2, k3, p1, p2 }, } pub enum SensorParams { Identity, Scheimpflug { tilt_x, tilt_y }, } pub enum IntrinsicsParams { FxFyCxCySkew { fx, fy, cx, cy, skew }, } }
Building a Camera from Parameters
The build() method constructs a concrete Camera from serialized parameters:
#![allow(unused)] fn main() { let params = CameraParams { /* ... */ }; let camera = params.build(); }
This is used internally by the session framework to reconstruct cameras from JSON-checkpointed state.
Convenience Accessors
CameraParams provides typed accessors:
#![allow(unused)] fn main() { params.intrinsics() // → &FxFyCxCySkew<f64> params.distortion() // → &BrownConrady5<f64> or zero distortion }
Use in Session Framework
CameraParams appears throughout the pipeline:
- Export records store calibrated parameters as
CameraParams - Session JSON serializes intermediate and final camera parameters
- Config types reference
CameraParamsfor initial guesses
This provides a stable, version-safe serialization format decoupled from the generic type system.
Rigid Transformations and SE(3)
Camera calibration is fundamentally about estimating rigid transformations — rotations and translations that relate different coordinate frames: camera, world, robot gripper, calibration target, rig, etc. This chapter covers the rotation representations and transformation algebra used throughout calibration-rs.
Rotation Representations
Rotation Matrix
A orthogonal matrix with determinant :
The set of all such matrices forms the special orthogonal group . It has 3 degrees of freedom (9 entries minus 6 constraints).
Unit Quaternion
A 4-element vector with unit norm . The rotation of a vector is:
where is quaternion multiplication and is the conjugate.
Quaternions are preferred for optimization because they avoid gimbal lock and have simpler interpolation properties than Euler angles. calibration-rs (via nalgebra) stores quaternions as [i, j, k, w] internally, which maps to .
Axis-Angle
A rotation of angle around unit axis is represented as:
with and .
The Rodrigues formula converts axis-angle to rotation matrix:
where is the skew-symmetric matrix of :
Rigid Transformations: SE(3)
A rigid body transformation combines a rotation and translation :
The special Euclidean group has 6 degrees of freedom (3 rotation + 3 translation).
Applying a Transform
A point transformed by :
or in homogeneous coordinates:
Composition
Transforms compose by matrix multiplication:
Inverse
Naming Convention
calibration-rs uses the notation to mean "the transform from frame to frame ":
Common transforms:
| Symbol | Meaning |
|---|---|
| World to camera (camera extrinsics) | |
| Calibration board to camera | |
| Camera to rig (rig extrinsics) | |
| Camera to gripper (hand-eye) | |
| Target to base (hand-eye target pose) |
nalgebra Type: Iso3
calibration-rs uses nalgebra's Isometry3<f64> (aliased as Iso3) for SE(3) transforms:
#![allow(unused)] fn main() { pub type Iso3 = nalgebra::Isometry3<f64>; }
This stores a UnitQuaternion (rotation) and a Translation3 (translation) separately, avoiding the redundancy of a full matrix.
SE(3) Storage for Optimization
In the optimization IR, SE(3) parameters are stored as a 7-element vector:
The quaternion is unit-constrained. The optimizer uses the SE(3) manifold (see Manifold Optimization) to maintain this constraint during parameter updates.
The Exponential Map
The Lie algebra provides a 6-dimensional tangent space at the identity. A tangent vector maps to an SE(3) element via the exponential map:
where is the left Jacobian of SO(3):
The exponential map is central to manifold optimization: parameter updates are computed in the tangent space and then retracted to the manifold (see Manifold Optimization).
RANSAC
RANSAC (Random Sample Consensus) is a robust estimation method that fits a model to data containing outliers. Unlike least-squares methods that use all data points and can be corrupted by even a single outlier, RANSAC repeatedly samples minimal subsets, fits a model to each, and selects the model with the most support (inliers).
Algorithm
Given data points and a model requiring points to fit:
-
Repeat for up to iterations:
- Sample random data points
- Fit a model to the -point sample (skip if degenerate)
- Score: count inliers — points with residual
- If this model has more inliers than the current best, update the best
- Optionally refit the model on all inliers for a tighter fit
- Update the iteration bound dynamically based on the current inlier ratio
-
Return the best model and its inlier set
Dynamic Iteration Bound
After finding a model with inlier ratio , the number of iterations needed to find an all-inlier sample with probability is:
where is the minimum sample size. As more inliers are found, increases and decreases, allowing early termination. The iteration count is clamped to — it can only decrease.
Example: For (homography), 50% inliers, and 99% confidence: iterations.
The Estimator Trait
calibration-rs provides a generic RANSAC engine that works with any model implementing the Estimator trait:
#![allow(unused)] fn main() { pub trait Estimator { type Datum; type Model; const MIN_SAMPLES: usize; fn fit(data: &[Self::Datum], sample_indices: &[usize]) -> Option<Self::Model>; fn residual(model: &Self::Model, datum: &Self::Datum) -> f64; fn is_degenerate(data: &[Self::Datum], sample_indices: &[usize]) -> bool { false } fn refit(data: &[Self::Datum], inliers: &[usize]) -> Option<Self::Model> { Self::fit(data, inliers) } } }
| Method | Purpose |
|---|---|
MIN_SAMPLES | Minimum points for a model (e.g., 4 for homography) |
fit | Fit model from selected points |
residual | Distance from a point to the model |
is_degenerate | Reject degenerate samples (e.g., collinear points for homography) |
refit | Optional: refit model on full inlier set (may differ from fit) |
Configuration
#![allow(unused)] fn main() { pub struct RansacOptions { pub max_iters: usize, // Maximum iterations pub thresh: f64, // Inlier distance threshold pub min_inliers: usize, // Minimum consensus set size pub confidence: f64, // Desired probability (0-1) for dynamic bound pub seed: u64, // RNG seed for reproducibility pub refit_on_inliers: bool, // Refit model on full inlier set } }
Choosing the threshold : The threshold should reflect the expected noise level. For pixel-space residuals, pixels is typical. For normalized-coordinate residuals, scale accordingly by the focal length.
Result
#![allow(unused)] fn main() { pub struct RansacResult<M> { pub success: bool, pub model: Option<M>, pub inliers: Vec<usize>, pub inlier_rms: f64, pub iters: usize, } }
Instantiated Models
RANSAC is used with several models in calibration-rs:
| Model | MIN_SAMPLES | Residual | Usage |
|---|---|---|---|
| Homography | 4 | Reprojection error (pixels) | dlt_homography_ransac |
| Fundamental matrix | 8 | Epipolar distance | fundamental_8point_ransac |
| PnP (DLT) | 6 | Reprojection error (pixels) | dlt_ransac |
Deterministic Seeding
All RANSAC calls in calibration-rs use a deterministic seed for the random number generator. This ensures:
- Reproducible results across runs
- Deterministic tests that do not flake
- Debuggable behavior — the same input always produces the same output
The seed can be configured via RansacOptions::seed.
Best-Model Selection
When multiple models achieve the same inlier count, RANSAC selects the model with the lowest inlier RMS (root mean square of inlier residuals). This breaks ties in favor of more accurate models.
Robust Loss Functions
Standard least squares minimizes the sum of squared residuals . This objective is highly sensitive to outliers: a single point with a large residual can dominate the entire cost function and corrupt the solution. Robust loss functions (M-estimators) reduce the influence of large residuals, making optimization tolerant to outliers in the data.
Problem Setup
In non-linear least squares, we minimize:
where is the loss function applied to each residual . The standard (non-robust) case uses .
Available Loss Functions
calibration-rs provides three robust loss functions, each parameterized by a scale that controls the transition from quadratic (inlier) to robust (outlier) behavior.
Huber Loss
Properties:
- Quadratic for small residuals, linear for large residuals
- Continuous first derivative
- Influence function: bounded at — outliers contribute constant gradient, not growing
When to use: The default robust loss. Good general-purpose choice when you expect a moderate number of outliers.
Cauchy Loss
Properties:
- Grows logarithmically for large residuals (slower than linear)
- Smooth everywhere
- Influence function: — decreases to zero for large , effectively down-weighting far outliers
When to use: When outliers are far from the bulk of the data and should have near-zero influence.
Arctan Loss
Properties:
- Bounded: as
- Influence function approaches zero for large residuals (redescending)
When to use: When very strong outlier rejection is needed. More aggressive than Cauchy but can make convergence harder.
Comparison
| Loss | Large- growth | Outlier influence | Convergence |
|---|---|---|---|
| Quadratic () | Quadratic | Unbounded | Best |
| Huber | Linear | Bounded (constant) | Good |
| Cauchy | Logarithmic | Decreasing | Moderate |
| Arctan | Bounded | Approaching zero | Can be tricky |
Choosing the Scale Parameter
The scale sets the boundary between "inlier" and "outlier" behavior:
- Too small: Treats good data as outliers, reducing effective sample size
- Too large: Outliers still dominate (approaches standard least squares)
- Rule of thumb: Set to the expected residual magnitude for good data points. For reprojection residuals, pixels is typical.
Usage in calibration-rs
Robust losses are specified per-residual block in the optimization IR:
#![allow(unused)] fn main() { pub enum RobustLoss { None, Huber { scale: f64 }, Cauchy { scale: f64 }, Arctan { scale: f64 }, } }
Each problem type exposes the loss function as a configuration option:
#![allow(unused)] fn main() { session.update_config(|c| { c.robust_loss = RobustLoss::Huber { scale: 2.0 }; })?; }
The backend applies the loss function during residual evaluation, modifying both the cost and the Jacobian.
Iteratively Reweighted Least Squares (IRLS)
Under the hood, robust loss functions are typically implemented via IRLS: each residual is weighted by , and the weighted least-squares problem is solved iteratively. The Levenberg-Marquardt backend handles this automatically.
Interaction with RANSAC
RANSAC and robust losses address outliers at different stages:
- RANSAC (linear initialization): Binary inlier/outlier classification. Used during model fitting to reject gross outliers before any optimization.
- Robust losses (non-linear refinement): Soft down-weighting. Used during optimization to reduce the influence of moderate outliers that passed RANSAC.
The two approaches are complementary: RANSAC handles gross outliers during initialization, while robust losses handle smaller outliers during refinement.
Why Linear Initialization Matters
Non-linear optimization is the workhorse of camera calibration: it achieves sub-pixel reprojection error by jointly optimizing all parameters. But non-linear optimizers — Levenberg-Marquardt, Gauss-Newton, and their variants — are local methods. They converge to the nearest local minimum, which is only the global minimum if the starting point is sufficiently close.
The role of linear initialization is to provide that starting point.
The Init-Then-Refine Paradigm
Every calibration workflow in calibration-rs follows a two-phase pattern:
-
Linear initialization: Closed-form solvers estimate approximate parameters. These are fast, deterministic, and require no initial guess — but they achieve only ~5-40% accuracy on camera parameters.
-
Non-linear refinement: Levenberg-Marquardt bundle adjustment minimizes reprojection error starting from the linear estimate. This converges to <2% parameter accuracy and <1 px reprojection error.
The linear estimate does not need to be highly accurate. It needs to be in the basin of convergence of the non-linear optimizer — close enough that gradient-based iteration converges to the correct solution rather than a local minimum.
Why Not Just Optimize Directly?
Without initialization, you would need to guess intrinsics (), distortion (), and all camera poses ( per view). For a typical 20-view calibration:
- 4 intrinsics + 5 distortion + 20 × 6 pose = 129 parameters
- The cost landscape has many local minima
- A random starting point will almost certainly diverge or converge to a wrong solution
Linear initialization eliminates this problem by computing a reasonable estimate from the data structure alone.
Linear Solvers in calibration-rs
| Solver | Estimates | Method | Chapter |
|---|---|---|---|
| Homography DLT | homography | SVD nullspace | Homography |
| Zhang's method | Intrinsics | IAC constraints | Zhang |
| Distortion fit | Linear LS on residuals | Distortion Fit | |
| Iterative intrinsics | + distortion | Alternating refinement | Iterative |
| Planar pose | from | Decomposition + SVD | Planar Pose |
| P3P (Kneip) | from 3 points | Quartic polynomial | PnP |
| DLT PnP | from points | SVD + SO(3) projection | PnP |
| 8-point fundamental | matrix | SVD + rank constraint | Epipolar |
| 7-point fundamental | matrix | Cubic polynomial | Epipolar |
| 5-point essential | matrix | Action matrix eigenvalues | Epipolar |
| Camera matrix DLT | SVD + RQ decomposition | Camera Matrix | |
| Triangulation | 3D point | SVD nullspace | Triangulation |
| Tsai-Lenz | Hand-eye | Quaternion + linear LS | Hand-Eye |
| Rig extrinsics | per camera | SE(3) averaging | Rig Init |
| Laser plane | Plane normal + distance | SVD on covariance | Laser Init |
Common Mathematical Tools
Most linear solvers share a common toolkit:
- SVD (Singular Value Decomposition): Solves homogeneous systems via the right singular vector corresponding to the smallest singular value. This is the core of DLT methods.
- Hartley normalization: Conditions the DLT system for numerical stability by centering and scaling the input points.
- Polynomial root finding: Minimal solvers (P3P, 7-point F, 5-point E) reduce to polynomial equations.
The following chapters derive each algorithm in detail.
Hartley Normalization
DLT algorithms build large linear systems from point coordinates and solve them via SVD. When the coordinates span vastly different scales (e.g., pixel values in the hundreds vs. homogeneous scale factor of 1), the resulting system is ill-conditioned: small perturbations in the data cause large changes in the solution. Hartley normalization is a simple preprocessing step that dramatically improves numerical stability.
The Problem
Consider the homography DLT: we build a matrix from pixel coordinates and solve . If , the entries of range from to . The condition number of becomes large, and the SVD solution is sensitive to floating-point errors.
The Solution
Normalize the input points so they are centered at the origin with a controlled scale, solve the system, then denormalize the result.
2D Normalization
Given points :
-
Centroid: ,
-
Mean distance from centroid:
-
Scale factor:
-
Normalization matrix:
- Normalized points:
After normalization, the points have centroid at the origin and mean distance from the origin.
3D Normalization
For 3D points , the analogous normalization uses:
- Mean distance target: (instead of )
- A normalization matrix
Denormalization
After solving the normalized system, the result must be denormalized. For a homography estimated from normalized points:
For a fundamental matrix:
The denormalization formula depends on the specific problem. Each chapter states the appropriate denormalization.
When It Is Used
Hartley normalization is applied in every DLT-based solver in calibration-rs:
| Solver | Normalized spaces |
|---|---|
| Homography DLT | World 2D + image 2D |
| Fundamental 8-point | Image 1 2D + image 2 2D |
| Essential 5-point | Image 1 2D + image 2 2D |
| Camera matrix DLT | World 3D + image 2D |
| PnP DLT | World 3D (image points use normalized coordinates via ) |
Reference
Hartley, R.I. (1997). "In Defense of the Eight-Point Algorithm." IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(6), 580-593.
The normalization technique is described in Algorithm 4.2 of Hartley & Zisserman (2004).
Homography Estimation (DLT)
A homography is a projective transformation between two planes. In calibration, the most common case is the mapping from a planar calibration board to the image plane. Homography estimation is the first step in Zhang's calibration method.
Problem Statement
Given: point correspondences where are points on the world plane and are the corresponding image points.
Find: such that (equality up to scale) for all .
Assumptions:
- The world points are coplanar (the calibration board is flat)
- Correspondences are correct (or RANSAC is used to handle outliers)
- At least 4 non-collinear point correspondences
Derivation
From Projective Relation to Linear System
The relation in homogeneous coordinates means:
Expanding and eliminating the scale factor using cross-product (the constraint ), we get two independent equations per correspondence:
The DLT System
Stacking all correspondences into a matrix :
where .
Solving via SVD
We seek minimizing subject to (to avoid the trivial solution ). This is the standard homogeneous least squares problem, solved by the SVD of :
The solution is the last column of (corresponding to the smallest singular value). Reshaping this 9-vector into a matrix gives .
Normalization and Denormalization
The complete algorithm with Hartley normalization:
- Normalize world and image points: and
- Build the matrix from normalized points
- Solve via SVD to get
- Denormalize:
- Scale:
Degrees of Freedom
is a matrix with 9 entries, but it is defined up to scale, giving 8 degrees of freedom. Each correspondence provides 2 equations, so the minimum is correspondences (giving equations for 8 unknowns).
With , the system is overdetermined and the SVD solution minimizes the algebraic error in a least-squares sense.
RANSAC Wrapper
For real data with potential mismatched correspondences, dlt_homography_ransac wraps the DLT solver in RANSAC:
#![allow(unused)] fn main() { let (H, inliers) = dlt_homography_ransac(&world_pts, &image_pts, &opts)?; }
MIN_SAMPLES = 4- Residual: Euclidean reprojection error in pixels:
- Degeneracy check: Tests if the first 3 world points are collinear (a degenerate configuration for homography estimation)
OpenCV equivalence:
cv::findHomographywithRANSACmethod.
API
#![allow(unused)] fn main() { // Direct DLT (all points assumed inliers) let H = dlt_homography(&world_2d, &image_2d)?; // With RANSAC let opts = RansacOptions { max_iters: 1000, thresh: 3.0, // pixels confidence: 0.99, ..Default::default() }; let (H, inliers) = dlt_homography_ransac(&world_2d, &image_2d, &opts)?; }
Geometric Interpretation
For a planar calibration board at in world coordinates, the camera projection matrix reduces to a homography:
where are the first two columns of . This relationship is the foundation of Zhang's calibration method (next chapter).
Zhang's Intrinsics from Homographies
Zhang's method is the most widely used technique for estimating camera intrinsics from a planar calibration board. Given homographies from multiple views of the board, it recovers the intrinsics matrix using constraints from the image of the absolute conic (IAC).
Problem Statement
Given: homographies from a planar calibration board to the image, computed from different viewpoints.
Find: Camera intrinsics .
Assumptions:
- The calibration board is planar
- At least 3 homographies from distinct viewpoints
- The camera intrinsics are constant across all views
- Distortion is negligible (or has been accounted for — see Iterative Intrinsics)
Derivation
Homography and Intrinsics
From the previous chapter, the homography from a board plane is:
where is an arbitrary scale and are columns of the rotation matrix. Let . Then:
Orthogonality Constraints
Since and are columns of a rotation matrix, they satisfy:
Substituting :
The Image of the Absolute Conic (IAC)
Define the symmetric matrix:
Since is symmetric, it has 6 independent entries. The constraints become:
Each homography gives 2 linear equations in the 6 unknowns of .
The Vectors
To write the constraints as a linear system, define the 6-vector:
where is the -th element of (1-indexed, column , row ). Then:
where .
The Linear System
For each homography , the two constraints become:
Stacking homographies gives a system:
With (giving equations), this is solved via SVD: is the right singular vector of corresponding to the smallest singular value.
Extracting Intrinsics from
Given , the intrinsics are recovered by a Cholesky-like factorization of .
The closed-form extraction:
Validity Checks
The extraction can fail when:
- (degenerate: insufficient view diversity)
- and have different signs (negative focal length squared)
These cases indicate that the input homographies do not sufficiently constrain the intrinsics, typically because the views are too similar (e.g., all near-frontal).
Minimum Number of Views
- homographies: 6 equations for 6 unknowns — the minimum for a unique solution (with skew)
- : Only 4 equations; requires the additional assumption that (zero skew) to reduce unknowns to 5
- : Overdetermined system; SVD gives the least-squares solution
Practical Considerations
View diversity: The views should include significant rotation around both axes. Pure translations or rotations around a single axis lead to degenerate configurations where some intrinsic parameters are undetermined.
Distortion: Zhang's method assumes distortion-free observations. When applied to distorted pixels, the estimated is biased. The Iterative Intrinsics chapter addresses this with an alternating refinement scheme.
OpenCV equivalence: Zhang's method is the internal initialization step of
cv::calibrateCamera. OpenCV does not expose it as a separate API.
API
#![allow(unused)] fn main() { let K = estimate_intrinsics_from_homographies(&homographies)?; }
Returns FxFyCxCySkew with the estimated intrinsics. Typically followed by distortion estimation and non-linear refinement.
Reference
Zhang, Z. (2000). "A Flexible New Technique for Camera Calibration." IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11), 1330-1334.
Distortion Estimation from Homography Residuals
After estimating intrinsics via Zhang's method (which assumes distortion-free observations), the residuals between observed and predicted pixel positions encode the lens distortion. This chapter derives a linear method to estimate Brown-Conrady distortion coefficients from these residuals.
Problem Statement
Given:
- Camera intrinsics (from Zhang's method)
- homographies from the calibration board to the image
- Point correspondences: board points and observed pixels
Find: Distortion coefficients .
Assumptions:
- is a reasonable estimate (possibly biased by distortion)
- Distortion is moderate (the linear approximation holds)
- The homographies were computed from distorted pixel observations
Derivation
Ideal vs. Observed Coordinates
For each correspondence in view , the ideal (undistorted) pixel position predicted by the homography is:
Convert both observed and ideal pixels to normalized coordinates:
The residual in normalized coordinates encodes the distortion effect:
Linearized Distortion Model
The Brown-Conrady distortion applied to point with produces a displacement:
This is linear in the distortion coefficients .
Building the Linear System
For each correspondence, write the and components:
where are the ideal normalized coordinates and are the observed residuals.
Stacking all correspondences from all views gives an overdetermined system:
where is (or smaller if some coefficients are fixed), and .
Solving
The least-squares solution is:
In practice, this is solved via SVD of for numerical stability.
Options
#![allow(unused)] fn main() { pub struct DistortionFitOptions { pub fix_tangential: bool, // Fix p1 = p2 = 0 pub fix_k3: bool, // Fix k3 = 0 (default: true) pub iters: u32, // Undistortion iterations } }
fix_k3(defaulttrue): Removes from the system, reducing to a 4-parameter or 2-parameter model. Recommended unless the lens has strong higher-order radial distortion.fix_tangential(defaultfalse): Removes , estimating only radial distortion. Useful when tangential distortion is known to be negligible.
When parameters are fixed, the corresponding columns are removed from .
Accuracy
This linear estimate is typically within 10-50% of the true distortion values. The accuracy depends on:
- Quality of : If intrinsics are biased (from distorted observations), the estimated distortion absorbs some of the bias
- Number of points: More correspondences improve the overdetermined system
- Point distribution: Points across the full image area constrain distortion better than points clustered near the center (where distortion is small)
This accuracy is sufficient for initializing non-linear refinement.
API
#![allow(unused)] fn main() { let dist = estimate_distortion_from_homographies( &k_matrix, &views, opts )?; }
Returns BrownConrady5 with the estimated coefficients.
OpenCV equivalence: OpenCV estimates distortion jointly with intrinsics inside
cv::calibrateCamera. The separate distortion estimation step is specific to calibration-rs's initialization approach.
Iterative Intrinsics + Distortion
Zhang's method assumes distortion-free observations. When applied to images from a camera with significant distortion, the estimated intrinsics are biased because the distorted pixels violate the linear homography model. The iterative intrinsics solver addresses this by alternating between intrinsics estimation and distortion correction.
Problem Statement
Given: views of a planar calibration board, with observed (distorted) pixel coordinates.
Find: Camera intrinsics and distortion coefficients jointly.
Assumptions:
- The observations are distorted (raw detector output)
- Distortion is moderate enough that Zhang's method on distorted pixels gives a usable initial
- 1-3 iterations of alternating refinement suffice
Why Alternation Works
The joint estimation of and distortion is a non-convex problem. However, it decomposes naturally:
- Given : Distortion estimation is a linear problem (see Distortion Fit)
- Given distortion: Undistorting the pixels and re-estimating is a linear problem (Zhang's method)
Each step solves a convex subproblem. The alternation converges because:
- The initial Zhang estimate (ignoring distortion) is typically within the basin where the alternation contracts
- Each step reduces the residual between the model and observations
- The coupling between and distortion is relatively weak for moderate distortion
Algorithm
Input: Views with distorted observations
Output: Intrinsics , distortion
-
Compute homographies from distorted pixels via DLT
-
Estimate initial from via Zhang's method
-
For :
a. Estimate distortion from homography residuals using
b. Undistort all observed pixels:
c. Recompute homographies from undistorted pixels
d. Re-estimate from via Zhang's method
e. (Optional) Enforce zero skew:
-
Return
Convergence
Typically 1-2 iterations suffice:
- Iteration 0 (Zhang on distorted pixels): 10-40% intrinsics error
- Iteration 1: Distortion estimate corrects the dominant radial effect; intrinsics error drops to 5-20%
- Iteration 2: Further refinement; diminishing returns
More iterations are safe but rarely improve the estimate significantly. The default is 2 iterations.
Configuration
#![allow(unused)] fn main() { pub struct IterativeIntrinsicsOptions { pub iterations: usize, // 1-3 typical (default: 2) pub distortion_opts: DistortionFitOptions, // Controls fix_k3, fix_tangential, iters pub zero_skew: bool, // Force skew = 0 (default: true) } }
The Undistortion Step
Step 3b converts distorted pixels back to "ideal" pixels by inverting the distortion model:
- Convert distorted pixel to normalized coordinates:
- Undistort using the current distortion estimate: (iterative fixed-point, see Brown-Conrady Distortion)
- Convert back to pixel:
Note that this uses both for normalization and for converting back — the undistorted pixels are in the same coordinate system as the original distorted pixels, just without the distortion effect.
Accuracy Expectations
| Stage | Typical intrinsics error |
|---|---|
| Zhang on distorted pixels | 10-40% |
| After 1 iteration | 5-20% |
| After 2 iterations | 5-15% |
| After non-linear refinement | <2% |
The iterative linear estimate is not meant to be highly accurate. Its purpose is to provide a starting point good enough for the non-linear optimizer to converge.
API
#![allow(unused)] fn main() { use vision_calibration::linear::iterative_intrinsics::*; use vision_calibration::linear::DistortionFitOptions; let opts = IterativeIntrinsicsOptions { iterations: 2, distortion_opts: DistortionFitOptions { fix_k3: true, fix_tangential: false, iters: 8, }, zero_skew: true, }; let camera = estimate_intrinsics_iterative(&dataset, opts)?; // camera is PinholeCamera = Camera<f64, Pinhole, BrownConrady5, IdentitySensor, FxFyCxCySkew> // camera.k: FxFyCxCySkew (intrinsics) // camera.dist: BrownConrady5 (distortion) }
Pose from Homography
Given camera intrinsics and a homography from a planar calibration board to the image, we can decompose to recover the camera pose (rotation and translation ) relative to the board.
Problem Statement
Given: Intrinsics and homography such that for board points at .
Find: Rigid transform (board to camera).
Assumptions:
- The board lies at in world coordinates
- is known (or has been estimated)
- The homography was computed from correct correspondences
Derivation
Extracting Rotation and Translation
Recall from the Homography chapter:
where are the first two columns of the rotation matrix and is the translation.
Removing the intrinsics:
Let . The scale factor is recovered from the constraint that and have unit norm:
Then:
The third rotation column is:
Projecting onto SO(3)
Due to noise, the matrix is not exactly orthonormal. We project it onto using SVD:
If , flip the sign of the third column of and recompute.
Ensuring Forward-Facing
If , the board is behind the camera. In this case, flip the sign of both and :
This resolves the sign ambiguity inherent in the scale factor .
Accuracy
The pose from homography is an approximate estimate because:
- The homography itself is subject to noise (DLT algebraic error minimization, not geometric)
- The SVD projection onto SO(3) corrects non-orthogonality but introduces additional error
- Distortion (if not corrected) biases the homography
Typical rotation error: 1-5°. Typical translation direction error: 5-15%. These estimates are refined in non-linear optimization.
OpenCV equivalence:
cv::decomposeHomographyMatprovides a similar decomposition, returning up to 4 pose candidates. calibration-rs returns a single pose by resolving ambiguities via the forward-facing constraint.
API
#![allow(unused)] fn main() { let pose = estimate_planar_pose_from_h(&K, &H)?; // pose: Iso3 (T_C_B: board to camera transform) }
Usage in Calibration Pipeline
In the planar intrinsics calibration pipeline, pose estimation is applied to every view after has been estimated:
- Compute homography per view
- Estimate from all homographies (Zhang's method)
- Decompose each to get pose
- Use poses as initial values for non-linear optimization
Perspective-n-Point Solvers
The Perspective-n-Point (PnP) problem estimates the camera pose from known 3D-2D point correspondences. Unlike homography-based pose estimation, PnP does not require coplanar points.
Problem Statement
Given: 3D world points and their corresponding 2D image points , plus camera intrinsics .
Find: Camera pose such that .
Assumptions:
- Camera intrinsics are known
- Correspondences are correct (or RANSAC is used)
- Points are not degenerate (e.g., not all collinear)
P3P: Kneip's Minimal Solver
The P3P solver uses exactly 3 correspondences — the minimum for a finite number of solutions. It returns up to 4 candidate poses.
Algorithm
Input: 3 world points , 3 pixel points , intrinsics .
- Bearing vectors: Convert pixels to unit bearing vectors in the camera frame:
- Inter-point distances in world frame:
- Bearing vector cosines:
-
Quartic polynomial: Using the ratios and , Kneip derives a quartic polynomial in a distance ratio . The coefficients are functions of , , , , .
-
Solve quartic for up to 4 real roots .
-
For each root:
- Compute the second distance ratio
- Compute the three camera-frame distances (depths of the three points)
- Back-project to 3D points in camera frame:
- Recover pose from the 3D-3D correspondence using SVD-based rigid alignment
Disambiguation
P3P returns up to 4 poses. To select the correct one, use a fourth point (or more points with RANSAC) and pick the pose with the smallest reprojection error.
DLT PnP: Linear Solver for
The DLT (Direct Linear Transform) PnP uses an overdetermined system for points.
Derivation
The projection equation in normalized coordinates is:
where are normalized coordinates (after applying to pixels) and are rows of .
Cross-multiplying gives two equations per point:
The 12 unknowns are the entries of the matrix .
The Linear System
For each point :
Stacking gives matrix . Solve via SVD.
Post-Processing
- Reshape the 12-vector into a matrix
- Normalize the scale using the row norms of the block
- Project the rotation block onto SO(3) via SVD (same as in Pose from Homography)
- Extract translation from the fourth column
Hartley Normalization
The 3D world points are normalized before building the system (center at origin, scale mean distance to ). The image points are normalized via . The result is denormalized after solving.
RANSAC Wrappers
Both solvers have RANSAC variants for handling outliers:
#![allow(unused)] fn main() { // DLT PnP + RANSAC let (pose, inliers) = dlt_ransac( &world_pts, &image_pts, &K, &RansacOptions { thresh: 5.0, ..Default::default() } )?; }
The DLT PnP solver uses MIN_SAMPLES = 6 with RANSAC. P3P (p3p()) returns up to 4 candidates and is intended for manual disambiguation (e.g., using a 4th point), not as a RANSAC estimator.
Comparison
| Solver | Min. points | Solutions | Strengths |
|---|---|---|---|
| P3P | 3 | Up to 4 | Best for RANSAC (minimal sample) |
| DLT PnP | 6 | 1 | Simple, no polynomial solving |
OpenCV equivalence:
cv::solvePnPwithSOLVEPNP_P3PorSOLVEPNP_DLS;cv::solvePnPRansacfor robust estimation.
References
- Kneip, L., Scaramuzza, D., & Siegwart, R. (2011). "A Novel Parametrization of the Perspective-Three-Point Problem for a Direct Computation of Absolute Camera Position and Orientation." CVPR.
Epipolar Geometry
Epipolar geometry describes the geometric relationship between two views of the same scene. It is encoded in the fundamental matrix (for pixel coordinates) or the essential matrix (for normalized coordinates). These matrices are central to stereo vision, visual odometry, and structure from motion.
Fundamental Matrix
Problem Statement
Given: point correspondences in pixel coordinates from two views.
Find: Fundamental matrix satisfying for all correspondences.
Properties of :
- Rank 2 (by the epipolar constraint)
- 7 degrees of freedom (9 entries, scale ambiguity, rank constraint)
- Maps a point in one image to its epipolar line in the other:
8-Point Algorithm
The simplest method, requiring correspondences.
Derivation: The constraint expands to:
Each correspondence gives one linear equation in the 9 entries of .
Algorithm:
- Normalize both point sets using Hartley normalization
- Build design matrix where each row is:
- Solve via SVD; is the last column of
- Enforce rank 2: Decompose and set :
- Denormalize:
The rank-2 enforcement is critical: without it, the epipolar lines do not intersect at a common epipole.
7-Point Algorithm
A minimal solver using exactly 7 correspondences, producing 1 or 3 candidate matrices.
Derivation: With 7 equations for 9 unknowns (modulo scale), the null space of is 2-dimensional. Let the null space basis be and . The fundamental matrix is:
for some scalar . The rank-2 constraint gives a cubic equation in :
This cubic has 1 or 3 real roots, each giving a candidate .
Essential Matrix
Problem Statement
Given: point correspondences in normalized coordinates where .
Find: Essential matrix satisfying .
Properties of :
- where is the relative rotation and is the translation direction
- Has exactly two equal non-zero singular values: ,
- 5 degrees of freedom (3 rotation + 2 translation direction)
5-Point Algorithm (Nister)
The minimal solver, producing up to 10 candidate matrices.
Algorithm:
- Normalize both point sets using Hartley normalization
- Build matrix (same form as 8-point)
- Null space: SVD of gives a 4-dimensional null space. The essential matrix is: for unknowns , where are the null space basis vectors
- Polynomial constraints: The essential matrix constraint (9 equations) and (1 equation) generate polynomial equations in
- Action matrix: These constraints are assembled into a polynomial system solved via the eigenvalues of a action matrix
- Extract real solutions: Each real eigenvalue determines and thus
Up to 10 candidate essential matrices may be returned.
Decomposing into
Given , there are 4 possible decompositions:
where and is the third column of .
Chirality check: Of the 4 candidates , only one places all triangulated points in front of both cameras. This is verified by triangulating a test point and checking that in both views.
Coordinate Conventions
| Matrix | Input coordinates | Usage |
|---|---|---|
| Fundamental | Pixel coordinates | When intrinsics are unknown |
| Essential | Normalized coordinates () | When intrinsics are known |
The relationship is: .
RANSAC
Both solvers have RANSAC wrappers for outlier rejection:
#![allow(unused)] fn main() { let (F, inliers) = fundamental_8point_ransac(&pts1, &pts2, &opts)?; }
Currently, only the 8-point fundamental matrix solver has a RANSAC wrapper. The 5-point essential matrix solver (essential_5point) returns multiple candidates and is used directly.
OpenCV equivalence:
cv::findFundamentalMat(8-point and 7-point),cv::findEssentialMat(5-point with RANSAC).
References
- Nister, D. (2004). "An Efficient Solution to the Five-Point Relative Pose Problem." IEEE TPAMI, 26(6), 756-770.
- Hartley, R.I. (1997). "In Defense of the Eight-Point Algorithm." IEEE TPAMI, 19(6), 580-593.
Camera Matrix and RQ Decomposition
The camera matrix (or projection matrix) is a matrix that directly maps 3D world points to 2D pixel coordinates. Estimating via DLT and decomposing it into intrinsics and extrinsics via RQ decomposition provides an alternative initialization path.
Camera Matrix DLT
Problem Statement
Given: correspondences between 3D world points and 2D pixel points .
Find: projection matrix such that .
Derivation
The projection (with in homogeneous coordinates) gives, after cross-multiplication:
where is the -th row of .
This gives a system , solved via SVD with Hartley normalization of the 3D and 2D points.
Post-Processing
After denormalization, is but not guaranteed to decompose cleanly into due to noise. The RQ decomposition extracts the components.
RQ Decomposition
Problem Statement
Given: The left submatrix of (where ).
Find: Upper-triangular and orthogonal such that .
Algorithm
RQ decomposition is computed by transposing QR decomposition:
- Compute QR decomposition of :
- Then , where is lower-triangular and is orthogonal
- Apply a permutation matrix to flip the matrix to upper-triangular form:
- (upper-triangular)
- (orthogonal)
Sign Conventions
After decomposition, ensure:
- has positive diagonal entries: if , negate column of and row of
- : if , negate a column of (and the corresponding column of )
Translation Extraction
Full Decomposition
The CameraMatrixDecomposition struct:
#![allow(unused)] fn main() { pub struct CameraMatrixDecomposition { pub k: Mat3, // Upper-triangular intrinsics pub r: Mat3, // Rotation matrix (orthonormal, det = +1) pub t: Vec3, // Translation vector } }
API
#![allow(unused)] fn main() { // Estimate the full 3×4 camera matrix let P = dlt_camera_matrix(&world_pts, &image_pts)?; // Decompose into K, R, t let decomp = decompose_camera_matrix(&P)?; println!("Intrinsics: {:?}", decomp.k); println!("Rotation: {:?}", decomp.r); println!("Translation: {:?}", decomp.t); // Or just RQ decompose any 3×3 matrix let (K, R) = rq_decompose(&M); }
When to Use
Camera matrix DLT is useful when:
- You have non-coplanar 3D-2D correspondences and want to estimate both intrinsics and pose simultaneously
- You need a quick estimate of from a single view (without multiple homographies)
For calibration with a planar board, Zhang's method is preferred because it uses the planar constraint to get more constraints per view.
Linear Triangulation
Triangulation reconstructs a 3D point from its projections in two or more calibrated views. It is the inverse of the projection operation: given pixel observations and camera poses, recover the world point.
Problem Statement
Given: camera projection matrices (each ) and corresponding pixel observations .
Find: 3D point such that .
Assumptions:
- Camera poses and intrinsics are known
- Correspondences are correct
- The point is visible from all views (not occluded)
Derivation (DLT)
For each view , the projection constraint gives (by cross-multiplication):
where is the -th row of and .
Stacking all views gives a system:
Solve via SVD: is the right singular vector corresponding to the smallest singular value. Dehomogenize: .
Limitations
- No uncertainty modeling: The DLT minimizes algebraic error, not geometric (reprojection) error
- Baseline sensitivity: For small baselines (nearly parallel views), the triangulation is poorly conditioned — small pixel errors lead to large depth errors
- Outlier sensitivity: A single incorrect correspondence corrupts the result; no robustness mechanism
For high-accuracy 3D reconstruction, triangulation should be followed by bundle adjustment that jointly optimizes points and cameras to minimize reprojection error.
API
#![allow(unused)] fn main() { let X = triangulate_point_linear(&projection_matrices, &pixel_points)?; }
Where each projection matrix is precomputed from the camera intrinsics and pose.
Hand-Eye Calibration (Tsai-Lenz)
Hand-eye calibration estimates the rigid transform between a camera and a robot gripper (or between a camera and a robot base). It is essential for any application where a camera is mounted on a robot arm and the robot needs to localize objects in its own coordinate frame.
The AX = XB Problem
Setup
Consider a camera rigidly mounted on a robot gripper (eye-in-hand configuration). The system involves four coordinate frames:
- Base (B): The robot's fixed base frame
- Gripper (G): The robot's end-effector frame (known from robot kinematics)
- Camera (C): The camera frame (observations come from here)
- Target (T): The calibration board frame (fixed in the world)
The unknown is (gripper-to-camera transform).
The Equation
Given two observations , we can compute:
- Robot motion: (relative gripper motion, known from robot kinematics)
- Camera motion: (relative camera motion, from calibration board observations)
Since the camera is rigidly attached to the gripper:
This is the classic equation. We need to find that satisfies this for all motion pairs.
Eye-to-Hand Variant
When the camera is fixed and the target is on the gripper (eye-to-hand), the equation becomes:
where now is the relative gripper motion and is the relative target-in-camera motion, and (camera-to-base transform).
The Tsai-Lenz Method
Overview
Tsai and Lenz (1989) decomposed into separate rotation and translation subproblems:
- Rotation: Solve for
- Translation: Given , solve for
Step 1: All-Pairs Motion Computation
From observations, construct all motion pairs. For each pair :
Step 2: Filtering
Discard pairs with insufficient rotation (degenerate for the rotation subproblem):
- Reject pairs where (default: 10°)
- Optionally reject pairs where rotation axes of and are near-parallel (ill-conditioned)
Rotation diversity is critical: If all robot motions are rotations around the same axis, the hand-eye rotation around that axis is undetermined. Use poses with diverse rotation axes (roll, pitch, and yaw).
Step 3: Rotation Estimation
The rotation constraint is solved using quaternion algebra.
Convert and to quaternions and . The constraint becomes:
Using the left and right quaternion multiplication matrices and :
Stacking all motion pairs gives a system:
Solve via SVD: is the right singular vector of corresponding to the smallest singular value. Normalize to unit quaternion.
Step 4: Translation Estimation
Given , the translation constraint for each motion pair is:
This is a overdetermined linear system , solved via least squares:
with optional ridge regularization for numerical stability.
Target-in-Base Estimation
After finding , the target pose in the base frame can be estimated. For each observation :
The estimates from different views are averaged (quaternion averaging for rotation, arithmetic mean for translation).
Alternatively, is included in the non-linear optimization.
Practical Requirements
- Minimum 3 views with diverse rotations (in practice, 5-10 views recommended)
- Rotation diversity: Motions should span multiple rotation axes. Pure translations provide no rotation constraint. Pure Z-axis rotations leave the Z-component of the hand-eye rotation undetermined.
- Accuracy: The linear Tsai-Lenz method typically gives 5-20% translation accuracy and 2-10° rotation accuracy. This initializes the non-linear joint optimization.
API
#![allow(unused)] fn main() { let X = estimate_handeye_dlt( &base_se3_gripper, // &[Iso3]: robot poses (base to gripper) &target_se3_camera, // &[Iso3]: camera-to-target poses (inverted) min_angle_deg, // f64: minimum rotation angle filter )?; }
Returns as an Iso3 transform.
OpenCV equivalence:
cv::calibrateHandEyewith methodCALIB_HAND_EYE_TSAI.
References
- Tsai, R.Y. & Lenz, R.K. (1989). "A New Technique for Fully Autonomous and Efficient 3D Robotics Hand/Eye Calibration." IEEE Transactions on Robotics and Automation, 5(3), 345-358.
Multi-Camera Rig Initialization
A multi-camera rig is a set of cameras rigidly mounted on a common frame. The extrinsics of the rig describe the relative pose of each camera with respect to a reference camera (or a rig frame). This chapter covers the linear initialization of rig extrinsics from per-camera pose estimates.
Problem Statement
Given: Per-camera, per-view pose estimates (camera , view , target to camera).
Find: Camera-to-rig transforms and rig-to-target poses .
Assumptions:
- All cameras observe the same calibration target simultaneously
- Per-camera poses have been estimated (e.g., via PnP or homography decomposition)
- The rig is rigid (camera-to-rig transforms are constant across views)
Algorithm
Reference Camera
One camera is designated as the reference camera (default: camera 0). Its camera-to-rig transform is identity:
This defines the rig frame to coincide with the reference camera frame.
Rig-to-Target from Reference
For each view , the rig-to-target pose is the reference camera's pose:
Camera-to-Rig via Averaging
For each non-reference camera , collect the camera-to-rig estimate from each view where both cameras have observations:
Then average across views:
- Rotation: Quaternion averaging with hemisphere correction. Before averaging, flip quaternions to the same hemisphere (since and represent the same rotation). Then compute the mean quaternion and normalize.
- Translation: Arithmetic mean of translation vectors.
Quaternion Hemisphere Correction
Quaternions have a double-cover of SO(3): both and represent the same rotation. Before averaging, all quaternions are flipped to the same hemisphere as the first:
This prevents averaging artifacts where opposite quaternions cancel out.
Accuracy
The initialization accuracy depends on:
- Per-camera pose accuracy: Noisy single-camera poses (from homography decomposition or PnP) propagate to the rig extrinsics
- Number of views: More views improve the averaging
- View diversity: Diverse viewpoints reduce systematic errors
Typical accuracy: 1-5° rotation, 5-15% translation. This is refined in the rig bundle adjustment (see Multi-Camera Rig Extrinsics).
API
#![allow(unused)] fn main() { let extrinsic_poses = estimate_extrinsics_from_cam_target_poses( &cam_se3_target, // Vec<Vec<Option<Iso3>>>: per-view, per-camera poses ref_cam_idx, // usize: reference camera index )?; // extrinsic_poses.cam_to_rig: Vec<Iso3> (T_{R,C} for each camera) // extrinsic_poses.rig_from_target: Vec<Iso3> (T_{R,T} for each view) }
Usage in Calibration Pipeline
In the rig extrinsics calibration pipeline:
- Per-camera intrinsics are estimated and optimized independently
- Per-camera poses are recovered from the calibration board
- Rig extrinsics are initialized via averaging (this chapter)
- Rig bundle adjustment jointly optimizes all parameters
Laser Plane Initialization
In laser triangulation, a laser line is projected onto the scene and a camera observes the line. The laser plane — defined by a normal vector and distance from the camera origin — must be calibrated. This chapter describes the linear initialization of the laser plane from observations of the laser line on a calibration target.
Problem Statement
Given:
- Camera intrinsics and distortion
- Per-view camera poses (target to camera)
- Per-view laser pixel observations (pixels where the laser line intersects the target)
Find: Laser plane in the camera frame, where is the unit normal and is the signed distance from the camera origin:
for all 3D points on the laser plane (in camera coordinates).
Algorithm
Step 1: Back-Project Laser Pixels to 3D
For each laser pixel in each view:
- Undistort the pixel to get normalized coordinates:
- Form a ray in the camera frame:
- Intersect with the target plane: The target at in target coordinates, transformed to camera frame by , defines a plane. Solve for that places the ray on this plane.
- Compute 3D point: in camera coordinates
This gives a set of 3D points on the laser plane.
Step 2: Fit a Plane via SVD
Given 3D points on the laser plane:
- Centroid:
- Center the points:
- Covariance matrix:
- SVD:
- Normal: is the eigenvector corresponding to the smallest singular value (last column of )
- Distance:
Degeneracy Detection
If all 3D points are nearly collinear (e.g., because all views have the same target pose), the plane is underdetermined. The algorithm checks the ratio of the two smallest singular values:
If the points are too close to collinear, the plane fit fails with an error.
Plane Representation for Optimization
In the non-linear optimization, the plane is parameterized as:
- Normal: A unit vector (2 degrees of freedom on the unit sphere)
- Distance: A scalar (1 degree of freedom)
The manifold is used for the normal to maintain the unit-norm constraint during optimization (see Manifold Optimization).
Accuracy
The linear plane initialization accuracy depends on:
- View diversity: Different target orientations provide points at different locations on the laser plane, improving the fit
- Camera calibration accuracy: Errors in intrinsics, distortion, or poses propagate to the 3D point estimates
- Number of laser pixels: More pixels (from more views or more detected pixels per view) improve the SVD fit
Typical accuracy: 1-5° normal direction error, 5-20% distance error. Refined in the Laserline Device Calibration optimization.
API
#![allow(unused)] fn main() { use vision_calibration::linear::laserline::{LaserlinePlaneSolver, LaserlineView}; // Each view carries laser pixels and the camera-to-target pose let views: Vec<LaserlineView> = poses.iter().zip(laser_pixels_per_view.iter()) .map(|(pose, pixels)| LaserlineView { camera_se3_target: *pose, laser_pixels: pixels.clone(), }) .collect(); let plane = LaserlinePlaneSolver::from_views(&views, &camera)?; // plane.normal: UnitVector3<f64> (unit normal in camera frame) // plane.distance: f64 (signed distance from camera origin) // plane.rmse: f64 (fit residual) }
Polynomial Solvers
Several minimal geometry solvers in calibration-rs reduce to polynomial equations. This chapter documents the polynomial root-finding utilities used by P3P, 7-point fundamental, and 5-point essential solvers.
Quadratic
Solved via the standard discriminant formula: . Returns 0, 1, or 2 real roots depending on the discriminant sign.
Cubic
Solved using Cardano's formula:
- Convert to depressed cubic via substitution
- Compute discriminant
- For (three real roots): use trigonometric form
- For (one real root): use Cardano's formula with cube roots
Returns 1 or 3 real roots.
Quartic
Solved via the companion matrix approach:
- Normalize to monic form: divide by
- Construct the companion matrix
- Compute eigenvalues using Schur decomposition (via nalgebra)
- Extract real eigenvalues: those with
Returns 0 to 4 real roots.
Usage in Minimal Solvers
| Solver | Polynomial | Degree | Max roots |
|---|---|---|---|
| P3P (Kneip) | Distance ratio | Quartic | 4 |
| 7-point fundamental | Cubic | 3 | |
| 5-point essential | Action matrix eigenvalues | Degree 10 | 10 |
The 5-point solver uses eigendecomposition of a matrix rather than a polynomial solver directly, but the underlying mathematics involves degree-10 polynomial constraints.
Numerical Considerations
- All roots are deduplicated (roots closer than are merged)
- Real root extraction uses a threshold on the imaginary part ()
- The companion matrix approach is more numerically stable than analytical quartic formulas for extreme coefficient ratios
Non-Linear Least Squares Overview
Non-linear least squares (NLLS) is the mathematical framework underlying all refinement in calibration-rs. After linear initialization provides approximate parameter estimates, NLLS minimizes the reprojection error to achieve sub-pixel accuracy.
Problem Formulation
Objective: Minimize the sum of squared residuals:
where is the parameter vector and is the stacked residual vector.
In camera calibration, a typical residual is the reprojection error: the difference between an observed pixel and the predicted projection:
where is the camera projection function, and the parameters include intrinsics , distortion , and poses .
Gauss-Newton Method
The Gauss-Newton method exploits the least-squares structure. At the current estimate , linearize the residuals:
where is the Jacobian.
Substituting into the objective and minimizing with respect to :
gives the normal equations:
The matrix is the Gauss-Newton approximation to the Hessian ( when residuals are small).
Update:
Levenberg-Marquardt Method
Gauss-Newton can diverge when the linearization is poor (far from the minimum) or when is singular. Levenberg-Marquardt (LM) adds a damping term:
The damping parameter interpolates between:
- : Pure Gauss-Newton (fast convergence near minimum)
- : Gradient descent with small step (safe far from minimum)
Trust Region Interpretation
LM is a trust region method: controls the size of the region where the linear approximation is trusted.
- If the update reduces the cost: accept the step, decrease (expand trust region)
- If the update increases the cost: reject the step, increase (shrink trust region)
Convergence Criteria
LM terminates when any of:
- Cost threshold:
- Absolute decrease:
- Relative decrease:
- Maximum iterations:
- Parameter change:
Sparsity
In bundle adjustment problems, the Jacobian is sparse: each residual depends on only a few parameter blocks (one camera intrinsics, one distortion, one pose). The matrix has a block-arrow structure that can be exploited by sparse linear solvers:
- Sparse Cholesky: Efficient for well-structured problems
- Sparse QR: More robust when the normal equations are ill-conditioned
calibration-rs uses sparse linear solvers through the tiny-solver backend.
Cost Function vs. Reprojection Error
The optimizer minimizes the cost . The commonly reported mean reprojection error is:
These are related but not identical: the cost weights large errors quadratically, while the mean error weighs them linearly. calibration-rs reports both: the cost from the solver and the mean reprojection error computed post-optimization.
With Robust Losses
When a robust loss is used, the objective becomes:
The normal equations are modified to incorporate the loss function's weight:
where is the iteratively reweighted diagonal matrix. See Robust Loss Functions for details.
Manifold Optimization
Camera poses are elements of the Lie group , which is a 6-dimensional smooth manifold — not a Euclidean space. Laser plane normals live on the unit sphere . Optimizing these parameters with standard Euclidean updates (adding to the parameter vector) would violate the manifold constraints: rotations would become non-orthogonal, quaternions would lose unit norm, and normals would lose unit length.
Manifold optimization solves this by computing updates in the tangent space and then retracting back to the manifold.
The Retract-Update Pattern
For a parameter on a manifold :
- Compute update in the tangent space (a Euclidean space of dimension equal to the manifold's degrees of freedom)
- Retract: where maps from tangent space back to the manifold
This is also called local parameterization: the optimizer sees a Euclidean space of the correct dimension, and the manifold structure handles the constraint.
SE(3): Rigid Transforms
Ambient dimension: 7 (quaternion + translation )
Tangent dimension: 6 ()
Plus operation (): Given current pose and tangent vector :
where is the SE(3) exponential map (see Rigid Transformations):
Minus operation (): The inverse — compute the tangent vector between two poses — uses the SE(3) logarithm.
Implementation in calibration-rs
The se3_exp function in the factor module computes the exponential map generically over RealField for autodiff compatibility:
#![allow(unused)] fn main() { fn se3_exp<T: RealField>(xi: &[T; 6]) -> [[T; 4]; 4] }
This uses Rodrigues' formula with special handling for the small-angle case () to avoid division by zero.
SO(3): Rotations
Ambient dimension: 4 (unit quaternion)
Tangent dimension: 3 (axis-angle )
Plus operation:
SO(3) is used for rotation-only parameters. In calibration-rs, most poses use the full SE(3), but SO(3) is available for specialized problems.
: Unit Sphere
Ambient dimension: 3 (unit vector , )
Tangent dimension: 2 ()
The manifold is used for laser plane normals — 3D unit vectors with 2 degrees of freedom.
Plus operation: Given current normal and tangent vector :
- Compute orthonormal basis of the tangent plane at
- Tangent vector in 3D:
- Retract:
For small , this is equivalent to the exponential map on .
Basis Construction
The tangent plane basis is constructed deterministically:
- Choose a reference vector not parallel to (e.g., or )
- , normalized
Euclidean Parameters
For intrinsics (), distortion (), and other unconstrained parameters, the manifold is simply with:
No special treatment needed.
ManifoldKind in the IR
The optimization IR specifies the manifold for each parameter block:
#![allow(unused)] fn main() { pub enum ManifoldKind { Euclidean, // R^n, standard addition SE3, // 7D ambient, 6D tangent SO3, // 4D ambient, 3D tangent S2, // 3D ambient, 2D tangent } }
The backend maps these to solver-specific manifold implementations.
Fixed Parameters on Manifolds
For Euclidean parameters, individual indices can be fixed (e.g., fix but optimize ). For manifold parameters (SE3, SO3, S2), fixing is all-or-nothing: either the entire parameter block is optimized or the entire block is fixed. This is because the tangent space decomposition does not naturally support partial fixing on non-Euclidean manifolds.
Backend-Agnostic IR Architecture
calibration-rs separates the definition of optimization problems from their execution by a specific solver. Problems are expressed as an Intermediate Representation (IR) that is compiled to a solver-specific form. This design allows swapping backends without changing problem definitions and makes it straightforward to add new problem types.
Three-Stage Pipeline
Problem Builder → ProblemIR → Backend.compile() → Backend.solve()
(domain code) (generic) (solver-specific) (optimization)
- Problem Builder: Domain code (e.g.,
build_planar_intrinsics_ir()) constructs aProblemIRfrom calibration data and initial parameter estimates - Backend Compilation: The backend (e.g.,
TinySolverBackend) translates the IR into solver-specific data structures - Solving: The backend runs the optimizer and returns a
BackendSolution
ProblemIR
The central type that describes a complete optimization problem:
#![allow(unused)] fn main() { pub struct ProblemIR { pub params: Vec<ParamBlock>, pub residuals: Vec<ResidualBlock>, } }
It is validated on construction to ensure all parameter references are valid, dimensions match, and manifold constraints are respected.
ParamBlock
Each parameter block represents a group of optimization variables:
#![allow(unused)] fn main() { pub struct ParamBlock { pub id: ParamId, pub name: String, pub dim: usize, pub manifold: ManifoldKind, pub fixed: FixedMask, pub bounds: Option<Vec<Bound>>, // per-index box constraints } }
| Field | Description |
|---|---|
id | Unique identifier within the problem |
name | Human-readable name (e.g., "cam", "pose/0", "plane") |
dim | Ambient dimension (e.g., 4 for intrinsics, 7 for SE3) |
manifold | Geometry of the parameter space |
fixed | Which indices (or the whole block) are held constant |
bounds | Optional box constraints on parameter values |
Naming Conventions
Problem builders use consistent naming:
"intrinsics"or"cam"— intrinsics (), dimension 4"distortion"or"dist"— distortion (), dimension 5"sensor"— Scheimpflug tilt (), dimension 2"pose/0","pose/1", ... — per-view SE(3) poses (planar, rig), dimension 7"pose_0","pose_1", ... — per-view SE(3) poses (laserline), dimension 7"plane_normal"— laser plane unit normal, dimension 3, manifold"plane_distance"— laser plane distance, dimension 1"handeye"— hand-eye SE(3), dimension 7"extrinsics/0", ... — per-camera SE(3) extrinsics, dimension 7
FixedMask
Controls which parameters are held constant during optimization:
#![allow(unused)] fn main() { // Construct via factory methods: FixedMask::all_free() // nothing fixed FixedMask::all_fixed(dim) // everything fixed FixedMask::fix_indices(&[2]) // fix specific indices }
- Euclidean parameters: Individual indices can be fixed. For example,
FixedMask::fix_indices(&[2])on intrinsics fixes while allowing to vary. - Manifold parameters: All-or-nothing. If any index is fixed, the entire block is fixed. (The tangent space does not support partial fixing.)
ResidualBlock
Each residual block connects parameter blocks through a factor:
#![allow(unused)] fn main() { pub struct ResidualBlock { pub params: Vec<ParamId>, pub loss: RobustLoss, pub factor: FactorKind, pub residual_dim: usize, } }
The params vector references ParamBlocks by their ParamId. The ordering must match what the factor expects.
FactorKind
An enum of all supported residual computations:
#![allow(unused)] fn main() { pub enum FactorKind { // Reprojection factors — all carry per-observation data ReprojPointPinhole4 { pw: [f64; 3], uv: [f64; 2], w: f64 }, ReprojPointPinhole4Dist5 { pw: [f64; 3], uv: [f64; 2], w: f64 }, ReprojPointPinhole4Dist5Scheimpflug2 { pw: [f64; 3], uv: [f64; 2], w: f64 }, ReprojPointPinhole4Dist5TwoSE3 { pw: [f64; 3], uv: [f64; 2], w: f64 }, ReprojPointPinhole4Dist5HandEye { pw: [f64; 3], uv: [f64; 2], w: f64, base_to_gripper_se3: [f64; 7], mode: HandEyeMode, }, ReprojPointPinhole4Dist5HandEyeRobotDelta { pw: [f64; 3], uv: [f64; 2], w: f64, base_to_gripper_se3: [f64; 7], mode: HandEyeMode, }, // Laser factors LaserPlanePixel { laser_pixel: [f64; 2], w: f64 }, LaserLineDist2D { laser_pixel: [f64; 2], w: f64 }, // Prior factors Se3TangentPrior { sqrt_info: [f64; 6] }, } }
Each variant carries the per-residual data (3D point coordinates, observed pixel, weight, etc.) that is not part of the optimizable parameters. The factor also specifies which parameter blocks it expects and validates their dimensions.
Reprojection Factors
All reprojection factors compute the pixel residual:
The variants differ in which parameters are involved:
| Factor | Parameters | Residual dim |
|---|---|---|
ReprojPointPinhole4 | [intrinsics, pose] | 2 |
ReprojPointPinhole4Dist5 | [intrinsics, distortion, pose] | 2 |
ReprojPointPinhole4Dist5Scheimpflug2 | [intrinsics, distortion, sensor, pose] | 2 |
ReprojPointPinhole4Dist5TwoSE3 | [intrinsics, distortion, extrinsics, rig_pose] | 2 |
ReprojPointPinhole4Dist5HandEye | [intrinsics, distortion, extrinsics, handeye, target_pose] | 2 |
ReprojPointPinhole4Dist5HandEyeRobotDelta | [intrinsics, distortion, extrinsics, handeye, target_pose, robot_delta] | 2 |
Laser Factors
| Factor | Parameters | Residual dim | Description |
|---|---|---|---|
LaserPlanePixel | [intrinsics, distortion, pose, plane] | 1 | Point-to-plane 3D distance |
LaserLineDist2D | [intrinsics, distortion, pose, plane] | 1 | Line distance in normalized plane |
Prior Factors
| Factor | Parameters | Residual dim | Description |
|---|---|---|---|
Se3TangentPrior | [se3_param] | 6 | Zero-mean Gaussian prior on SE(3) tangent |
Validation
ProblemIR::new() validates:
- All
ParamIdreferences in residual blocks resolve to existing parameter blocks - Parameter dimensions match what each factor expects
- Manifold kinds are compatible with the factor requirements
- No duplicate parameter IDs
Initial Values
Parameter blocks carry only their structural definition (dimension, manifold, fixing). The initial values are passed separately as a HashMap<String, DVector<f64>> (keyed by parameter block name) to the backend's solve() method.
Example: Planar Intrinsics IR
For a 10-view planar intrinsics problem with 48 points per view:
Parameters:
- "cam": dim=4, Euclidean (fx, fy, cx, cy)
- "dist": dim=5, Euclidean (k1, k2, k3, p1, p2)
- "pose/0"..."pose/9": dim=7, SE3
Residuals: 480 blocks
- Each: ReprojPointPinhole4Dist5 { pw, uv, w=1.0 }
- Params: [cam_id, dist_id, pose_k_id]
- Residual dim: 2
Total residuals: 960 scalar values
Total parameters: 4 + 5 + 10*7 = 79
Autodiff and Generic Residual Functions
The Levenberg-Marquardt algorithm requires the Jacobian of the residual vector. Computing this analytically for complex camera models (distortion, Scheimpflug tilt, SE(3) transforms) is error-prone and tedious. calibration-rs uses automatic differentiation (autodiff) via dual numbers to compute exact Jacobians from the same code that evaluates residuals.
Dual Numbers
A dual number extends a real number with an infinitesimal part:
Arithmetic preserves the chain rule automatically:
By setting for the parameter of interest and for others, evaluating simultaneously computes both and .
For multi-variable functions, hyper-dual numbers or multi-dual numbers generalize this to compute full Jacobians. The num-dual crate provides these types.
The RealField Pattern
All residual functions in calibration-rs are written generic over the scalar type:
#![allow(unused)] fn main() { fn reproj_residual_pinhole4_dist5_se3_generic<T: RealField>( intr: DVectorView<'_, T>, // [fx, fy, cx, cy] dist: DVectorView<'_, T>, // [k1, k2, k3, p1, p2] pose: DVectorView<'_, T>, // [qx, qy, qz, qw, tx, ty, tz] pw: [f64; 3], // 3D world point (constant) uv: [f64; 2], // observed pixel (constant) w: f64, // weight (constant) ) -> SVector<T, 2> }
When called with T = f64, this computes the residual value. When called with T = DualNumber, it simultaneously computes the residual and its derivatives with respect to all parameters.
Design Rules for Autodiff-Compatible Code
-
Use
.clone()liberally: Dual numbers are small structs; cloning is cheap and avoids borrow checker issues with operator overloading. -
Avoid in-place operations: Operations like
x += yorv[i] = exprcan cause issues with dual number tracking. Prefer functional style. -
Convert constants: Use
T::from_f64(1.0).unwrap()to convert floating-point literals. Do not use1.0directly where aTis expected. -
Constant data as
f64: World points, observed pixels, and weights are constant data — not optimized. Keep them asf64and convert toTinside the function.
Walkthrough: Reprojection Residual
The core reprojection residual computes:
Step by step:
1. Unpack Parameters
#![allow(unused)] fn main() { let fx = intr[0].clone(); let fy = intr[1].clone(); let cx = intr[2].clone(); let cy = intr[3].clone(); let k1 = dist[0].clone(); let k2 = dist[1].clone(); // ... }
2. SE(3) Transform: World to Camera
Apply the pose to the 3D world point using the exponential map or direct quaternion rotation:
The implementation unpacks the quaternion and applies the rotation formula.
3. Pinhole Projection
4. Apply Distortion
5. Apply Intrinsics
6. Compute Residual
Every arithmetic operation on T values propagates derivatives automatically through the dual number mechanism.
Backend Integration
The backend wraps each generic residual function in a solver-specific factor struct that implements the solver's cost function interface:
FactorKind::ReprojPointPinhole4Dist5 { pw, uv, w }
↓ (compile step)
TinySolverFactor {
evaluate: |params| {
reproj_residual_pinhole4_dist5_se3_generic(
params["cam"], params["dist"], params["pose/k"],
pw, uv, w
)
}
}
The tiny-solver backend then calls evaluate with dual numbers to compute both residuals and Jacobians in a single pass.
SE(3) Exponential Map
A critical autodiff-compatible function is se3_exp<T>(), which maps a 6D tangent vector to an SE(3) transform:
#![allow(unused)] fn main() { fn se3_exp<T: RealField>(xi: &[T; 6]) -> [[T; 4]; 4] }
This implements Rodrigues' formula with a special case for small angles () using Taylor expansions to avoid numerical issues with the dual number's infinitesimal part.
Performance Considerations
Autodiff with dual numbers is typically 2-10x slower than hand-coded Jacobians, but:
- Correctness: Eliminates the risk of Jacobian implementation bugs
- Maintainability: Adding a new factor only requires writing the forward evaluation
- Flexibility: The same code works for any parameter configuration
For calibration problems (hundreds to thousands of residuals, tens to hundreds of parameters), the autodiff overhead is negligible compared to the linear solve in each LM iteration.
Levenberg-Marquardt Backend
The TinySolverBackend is the current optimization backend in calibration-rs. It wraps the tiny-solver crate, providing Levenberg-Marquardt optimization with sparse linear solvers, manifold support, and robust loss functions.
Backend Trait
All backends implement the OptimBackend trait:
#![allow(unused)] fn main() { pub trait OptimBackend { fn solve( &self, ir: &ProblemIR, initial_params: &HashMap<String, DVector<f64>>, opts: &BackendSolveOptions, ) -> Result<BackendSolution>; } }
The backend receives the problem IR, initial parameter values, and solver options, and returns the optimized parameters with a solve report.
Compilation: IR to Solver
The compile() step translates the abstract IR into tiny-solver's concrete types:
Parameters
For each ParamBlock in the IR:
- Create parameter with the correct dimension
- Set manifold based on
ManifoldKind:Euclidean→ no manifold (standard addition)SE3→SE3Manifold(7D ambient, 6D tangent)SO3→QuaternionManifold(4D ambient, 3D tangent)S2→UnitVector3Manifold(3D ambient, 2D tangent)
- Fix parameters according to
FixedMask:- Euclidean: fix individual indices
- Manifolds: fix entire block (all-or-nothing)
- Set bounds if present (box constraints on parameter values)
Residuals
For each ResidualBlock:
- Compile the factor: Create a closure that calls the appropriate generic residual function
- Apply robust loss: Wrap in Huber/Cauchy/Arctan if specified
- Connect parameters: Reference the correct parameter blocks by their compiled IDs
Factor Compilation
Each FactorKind maps to a specific generic function call:
FactorKind::ReprojPointPinhole4Dist5 { pw, uv, w }
→ reproj_residual_pinhole4_dist5_se3_generic(intr, dist, pose, pw, uv, w)
FactorKind::LaserLineDist2D { laser_pixel, w }
→ laser_line_dist_normalized_generic(intr, dist, pose, plane_normal, plane_distance, laser_pixel, w)
FactorKind::Se3TangentPrior { sqrt_info }
→ se3_tangent_prior_generic(pose, sqrt_info)
Solver Options
#![allow(unused)] fn main() { pub struct BackendSolveOptions { pub max_iters: usize, // Maximum LM iterations (default: 100) pub verbosity: u32, // 0=silent, 1=summary, 2=per-iteration pub linear_solver: LinearSolverType, // SparseCholesky or SparseQR pub min_abs_decrease: f64, // Absolute cost decrease threshold pub min_rel_decrease: f64, // Relative cost decrease threshold pub min_error: f64, // Minimum cost to stop early pub initial_lambda: Option<f64>, // Initial damping (None = auto) } pub enum LinearSolverType { SparseCholesky, // Default: fast for well-conditioned problems SparseQR, // More robust for ill-conditioned problems } }
Choosing the Linear Solver
- SparseCholesky (default): Factors the normal equations directly. Fast but can fail if is poorly conditioned.
- SparseQR: Factors directly (QR decomposition). More robust but slower. Use when Cholesky fails or when the problem has near-singular directions.
Solution
#![allow(unused)] fn main() { pub struct BackendSolution { pub params: HashMap<String, DVector<f64>>, // Optimized values by name pub report: SolveReport, } pub struct SolveReport { pub initial_cost: f64, pub final_cost: f64, pub iterations: usize, pub termination: TerminationReason, } }
The cost is (half sum of squared residuals). Problem-specific code extracts domain types (cameras, poses, planes) from the raw parameter vectors.
Typical Convergence
For a well-initialized planar intrinsics problem:
- Initial cost: - (from linear initialization)
- Final cost: - (sub-pixel residuals)
- Iterations: 10-50 (depends on problem size and initial quality)
- Termination: Usually relative decrease below threshold
Error Handling
The backend propagates errors for:
- Missing initial values for a parameter block
- Manifold dimension mismatch
- Linear solver failure (singular system)
- NaN/Inf in residual evaluation
Factor Catalog Reference
This chapter is a complete reference for all factor types (residual computations) supported by the optimization IR. Each factor defines a residual function connecting specific parameter blocks.
Reprojection Factors
All reprojection factors compute the weighted pixel residual:
They differ in the transform chain from world point to pixel.
ReprojPointPinhole4
Parameters: [intrinsics(4), pose(7)]
Transform chain: , then pinhole projection with intrinsics only (no distortion).
Use case: Distortion-free cameras or when distortion is handled externally.
ReprojPointPinhole4Dist5
Parameters: [intrinsics(4), distortion(5), pose(7)]
Transform chain: , then pinhole + Brown-Conrady distortion + intrinsics.
Use case: Standard single-camera calibration (planar intrinsics).
ReprojPointPinhole4Dist5Scheimpflug2
Parameters: [intrinsics(4), distortion(5), sensor(2), pose(7)]
Transform chain: , then pinhole + distortion + Scheimpflug tilt + intrinsics.
Use case: Laser profilers with tilted sensors.
ReprojPointPinhole4Dist5TwoSE3
Parameters: [intrinsics(4), distortion(5), extrinsics(7), rig_pose(7)]
Transform chain:
The camera pose is composed from two SE(3) transforms: the camera-to-rig extrinsics and the rig-to-target pose.
Use case: Multi-camera rig calibration.
ReprojPointPinhole4Dist5HandEye
Parameters: [intrinsics(4), distortion(5), extrinsics(7), handeye(7), target_pose(7)]
Per-residual data: robot_pose (, known from robot kinematics)
Transform chain (eye-in-hand):
Then project .
For single-camera hand-eye, .
Use case: Hand-eye calibration (camera on robot arm).
ReprojPointPinhole4Dist5HandEyeRobotDelta
Parameters: [intrinsics(4), distortion(5), extrinsics(7), handeye(7), target_pose(7), robot_delta(7)]
Per-residual data: robot_pose
Transform chain: Same as HandEye, but with a per-view SE(3) correction applied to the robot pose:
The correction is regularized by an Se3TangentPrior factor.
Use case: Hand-eye calibration with imprecise robot kinematics (accounts for robot pose uncertainty).
Laser Factors
Both laser factors have residual dimension 1 and connect [intrinsics(4), distortion(5), pose(7), plane(3+1)].
LaserPlanePixel
Computation:
- Undistort laser pixel to normalized coordinates
- Back-project to a ray in camera frame
- Intersect ray with target plane (known from pose)
- Compute 3D point in camera frame
- Measure signed distance from 3D point to laser plane
Residual: — distance in meters.
LaserLineDist2D
Computation:
- Compute 3D intersection line of laser plane and target plane (in camera frame)
- Project this line onto the normalized camera plane
- Undistort laser pixel to normalized coordinates
- Measure perpendicular distance from pixel to projected line (2D geometry)
- Scale by for pixel-comparable units
Residual: — distance in effective pixels.
Advantages over LaserPlanePixel: Undistortion done once per pixel (not per-iteration), residuals in pixel units (directly comparable to reprojection error), simpler 2D geometry.
Default: This is the recommended laser residual type.
Prior Factors
Se3TangentPrior
Parameters: [se3_param(7)]
Computation: Maps the SE(3) parameter to its tangent vector (via logarithm map) and divides by the prior standard deviations:
where is the 6D tangent vector.
Use case: Zero-mean Gaussian prior on SE(3) parameters. Applied to robot pose corrections (robot_delta) to penalize deviations from the nominal robot kinematics.
Effect: Adds a regularization term to the cost function.
Summary Table
| Factor | Params | Res. dim | Units | Domain |
|---|---|---|---|---|
| ReprojPointPinhole4 | 2 blocks | 2 | pixels | Simple pinhole |
| ReprojPointPinhole4Dist5 | 3 blocks | 2 | pixels | Standard calibration |
| ReprojPointPinhole4Dist5Scheimpflug2 | 4 blocks | 2 | pixels | Tilted sensor |
| ReprojPointPinhole4Dist5TwoSE3 | 4 blocks | 2 | pixels | Multi-camera rig |
| ReprojPointPinhole4Dist5HandEye | 5 blocks | 2 | pixels | Hand-eye |
| ReprojPointPinhole4Dist5HandEyeRobotDelta | 6 blocks | 2 | pixels | Hand-eye + robot refinement |
| LaserPlanePixel | 4 blocks | 1 | meters | Laser triangulation |
| LaserLineDist2D | 4 blocks | 1 | pixels | Laser triangulation |
| Se3TangentPrior | 1 block | 6 | normalized | Regularization |
Planar Intrinsics Calibration
This is the most common calibration workflow: estimate camera intrinsics and lens distortion from multiple views of a planar calibration board. It combines Zhang's linear initialization with Levenberg-Marquardt bundle adjustment.
Problem Formulation
Parameters:
- Intrinsics: — 4 scalars
- Distortion: — 5 scalars
- Per-view poses: — SE(3) transforms (6 DOF each)
Total: parameters.
Observations: For each view and board point :
- Known 3D position (on the board, at )
- Observed pixel
Objective: Minimize total reprojection error:
where is the full camera projection pipeline: SE(3) transform → pinhole → distortion → intrinsics.
With robust loss :
Two-Step Pipeline
Step 1: Linear Initialization (step_init)
- Homographies: Compute for each view via DLT (from board points at to observed pixels)
- Intrinsics: Estimate from homographies using Zhang's method, iteratively refined with distortion estimation (see Iterative Intrinsics)
- Distortion: Estimate from homography residuals (see Distortion Fit)
- Poses: Decompose each homography to recover (see Pose from Homography)
After initialization, intrinsics are typically within 10-40% of the true values.
Step 2: Non-Linear Refinement (step_optimize)
Constructs the optimization problem as IR:
- Parameter blocks:
"cam"(4D, Euclidean),"dist"(5D, Euclidean),"pose/0"..."pose/M-1"(7D, SE3) - Residual blocks: One
ReprojPointPinhole4Dist5per observation (2D residual) - Backend: Levenberg-Marquardt via TinySolverBackend
After optimization, expect <2% intrinsics error and <1 px mean reprojection error.
Configuration
#![allow(unused)] fn main() { pub struct PlanarIntrinsicsConfig { // Initialization pub init_iterations: usize, // Iterative intrinsics iterations (default: 2) pub fix_k3_in_init: bool, // Fix k3 during init (default: true) pub fix_tangential_in_init: bool, // Fix p1, p2 during init (default: false) pub zero_skew: bool, // Enforce zero skew (default: true) // Optimization pub max_iters: usize, // LM iterations (default: 100) pub verbosity: u32, // Solver output level pub robust_loss: RobustLoss, // Robust loss function (default: None) pub fix_intrinsics: IntrinsicsFixMask, // Fix specific intrinsics pub fix_distortion: DistortionFixMask, // Fix specific distortion params pub fix_poses: Vec<usize>, // Fix specific view poses } }
Fix Masks
Fine-grained control over which parameters are optimized:
#![allow(unused)] fn main() { // Fix cx, cy but optimize fx, fy session.update_config(|c| { c.fix_intrinsics = IntrinsicsFixMask { fx: false, fy: false, cx: true, cy: true, }; })?; // Fix k3 and tangential distortion session.update_config(|c| { c.fix_distortion = DistortionFixMask { k1: false, k2: false, k3: true, p1: true, p2: true, }; })?; }
Complete Example
#![allow(unused)] fn main() { use vision_calibration::prelude::*; use vision_calibration::planar_intrinsics::{step_init, step_optimize}; let mut session = CalibrationSession::<PlanarIntrinsicsProblem>::new(); session.set_input(dataset)?; // Optional: customize configuration session.update_config(|c| { c.max_iters = 50; c.robust_loss = RobustLoss::Huber { scale: 2.0 }; })?; // Run pipeline step_init(&mut session, None)?; // Inspect initialization let init_k = session.state.initial_intrinsics.as_ref().unwrap(); println!("Init fx={:.1}, fy={:.1}", init_k.fx, init_k.fy); step_optimize(&mut session, None)?; // Export results let export = session.export()?; let k = export.params.intrinsics(); println!("Final fx={:.1}, fy={:.1}", k.fx, k.fy); println!("Reprojection error: {:.4} px", export.mean_reproj_error); }
Filtering (Optional Step)
After optimization, views or individual observations with high reprojection error can be filtered out:
#![allow(unused)] fn main() { use vision_calibration::planar_intrinsics::{step_filter, FilterOptions}; let filter_opts = FilterOptions { max_reproj_error: 2.0, // Remove observations > 2 px min_points_per_view: 10, // Minimum points to keep a view remove_sparse_views: true, // Drop views below threshold }; step_filter(&mut session, filter_opts)?; // Re-optimize with cleaned data step_optimize(&mut session, None)?; }
OpenCV equivalence:
cv::calibrateCameraperforms both initialization and optimization internally. calibration-rs separates these steps for inspection and customization.
Accuracy Expectations
| Stage | Intrinsics error | Reprojection error |
|---|---|---|
After step_init | 10-40% | Not computed |
After step_optimize | <2% | <1 px mean |
| After filtering + re-optimize | <1% | <0.5 px mean |
Input Requirements
- Minimum 3 views (for Zhang's method with skew)
- Minimum 4 points per view (for homography estimation)
- View diversity: Views should include rotation around both axes and vary in distance
- Board coverage: Points should span the full image area for good distortion estimation
Planar Intrinsics with Real Data
[COLLAB] This chapter requires user collaboration for dataset-specific details, data collection advice, and practical tips.
This chapter walks through the planar_real example, which calibrates a camera from real chessboard images.
Dataset
The example uses images from data/stereo/imgs/leftcamera/:
- Pattern: 7×11 chessboard, 30 mm square size
- Images: Multiple PNG files (
Im_L_*.png) - Camera: Standard industrial camera (specific model TBD)
Workflow
1. Corner Detection
The example uses the chess-corners crate for chessboard detection:
#![allow(unused)] fn main() { use chess_corners::{ChessboardParams, ChessConfig, detect_chessboard}; let board_params = ChessboardParams { rows: 7, cols: 11, square_size: 0.03, // 30 mm }; }
Each image is processed independently. Failed detections are skipped — the workflow continues with the views that succeed.
2. Dataset Construction
Detected corners are assembled into CorrespondenceView structures with matched 3D board points:
#![allow(unused)] fn main() { let views: Vec<View<NoMeta>> = images .iter() .filter_map(|img| { let corners = detect_chessboard(img, &config)?; let obs = CorrespondenceView::new(board_3d.clone(), corners); Some(View::without_meta(obs)) }) .collect(); let dataset = PlanarDataset::new(views)?; }
3. Calibration
The pipeline is identical to the synthetic case:
#![allow(unused)] fn main() { let mut session = CalibrationSession::<PlanarIntrinsicsProblem>::new(); session.set_input(dataset)?; step_init(&mut session, None)?; step_optimize(&mut session, None)?; let export = session.export()?; }
Interpreting Results
Key outputs to examine:
- Focal lengths (, ): Should match the expected value based on sensor size and lens focal length. For a 1/2" sensor with 8mm lens: pixels.
- Principal point (, ): Should be near the image center. Large offsets may indicate a decentered lens.
- Distortion (, ): Negative indicates barrel distortion (common). suggests a wide-angle lens.
- Reprojection error: <1 px is good. >2 px suggests problems with corner detection or insufficient view diversity.
Comparison with Synthetic Data
| Aspect | Synthetic | Real |
|---|---|---|
| Corner accuracy | Exact (no detection noise) | ~0.1-0.5 px (detector dependent) |
| Distortion | Known ground truth | Unknown, estimated |
| View coverage | Controlled | May have gaps |
| Typical reprojection error | <0.01 px | 0.1-1.0 px |
Practical Tips
- Use 15-30 images with diverse viewpoints
- Cover the full image area — points only near the center poorly constrain distortion
- Include tilted views — not just frontal. Rotation around both axes constrains all intrinsic parameters
- Check corner ordering — mismatched 2D-3D correspondences cause calibration failure
- Start with
fix_k3: true— only estimate if reprojection error is high and you suspect higher-order radial distortion
Running the Example
cargo run -p vision-calibration --example planar_real
The example prints per-step results including initialization accuracy, optimization convergence, and final calibrated parameters.
Single-Camera Hand-Eye Calibration
Hand-eye calibration estimates the rigid transform between a camera and a robot's gripper (or base). This workflow combines intrinsics calibration with hand-eye estimation in a 4-step pipeline.
Problem Formulation
Transformation Chain (Eye-in-Hand)
For a camera mounted on the robot gripper:
where:
- : camera-to-gripper (the calibrated hand-eye transform, inverted)
- : gripper-to-base (known from robot kinematics, inverted)
- : base-to-target (the target's pose in the robot base frame, also calibrated)
Equivalently:
Eye-to-Hand Variant
For a fixed camera observing a target on the gripper:
where is the camera-to-base transform and is the target-to-gripper transform.
Optimization Objective
Minimize reprojection error across all views, jointly over all parameters:
where depends on the known robot pose for view , the hand-eye transform , and the target pose .
4-Step Pipeline
Step 1: Intrinsics Initialization (step_intrinsics_init)
Runs the Iterative Intrinsics solver to estimate and distortion from the calibration board observations, ignoring robot poses entirely.
Step 2: Intrinsics Optimization (step_intrinsics_optimize)
Runs planar intrinsics bundle adjustment to refine , distortion, and per-view camera-to-target poses. This gives accurate per-view poses needed for hand-eye initialization.
Step 3: Hand-Eye Initialization (step_handeye_init)
Uses the Tsai-Lenz linear method (see Hand-Eye Linear):
- Compute relative motions from robot poses and camera poses
- Filter pairs by rotation magnitude (reject small rotations)
- Estimate rotation via quaternion SVD
- Estimate translation via linear least squares
- Estimate from the hand-eye and camera poses
Step 4: Hand-Eye Optimization (step_handeye_optimize)
Joint non-linear optimization of all parameters:
Parameter blocks:
"cam"(4D, Euclidean): intrinsics"dist"(5D, Euclidean): distortion"handeye"(7D, SE3): gripper-to-camera transform"target"(7D, SE3): base-to-target transform
Residual blocks: ReprojPointPinhole4Dist5HandEye per observation.
The robot poses are not optimized — they are known constants from the robot kinematics (passed as per-residual data).
Optional: Robot Pose Refinement
If robot pose accuracy is suspect, the pipeline supports per-view SE(3) corrections:
Additional parameter blocks: "robot_delta/v" (7D, SE3) per view
Additional residuals: Se3TangentPrior per view, with configurable rotation and translation sigmas:
This penalizes deviations from the nominal robot poses, allowing small corrections while preventing the optimizer from absorbing all error into robot pose changes.
Configuration
#![allow(unused)] fn main() { pub struct SingleCamHandeyeConfig { // Intrinsics init pub intrinsics_init_iterations: usize, pub fix_k3: bool, // Fix k3 (default: true) pub fix_tangential: bool, // Fix p1, p2 (default: false) pub zero_skew: bool, // Enforce zero skew (default: true) // Hand-eye pub handeye_mode: HandEyeMode, // EyeInHand or EyeToHand pub min_motion_angle_deg: f64, // Filter small rotations in Tsai-Lenz // Optimization pub max_iters: usize, // LM iterations (default: 100) pub verbosity: usize, pub robust_loss: RobustLoss, // Robot pose refinement pub refine_robot_poses: bool, // Enable per-view corrections pub robot_rot_sigma: f64, // Rotation prior sigma (radians) pub robot_trans_sigma: f64, // Translation prior sigma (meters) } }
Input Format
Each view is a View<HandeyeMeta> (aliased as SingleCamHandeyeView):
#![allow(unused)] fn main() { pub struct HandeyeMeta { pub base_se3_gripper: Iso3, // Robot pose (from kinematics) } // Type alias pub type SingleCamHandeyeView = View<HandeyeMeta>; // Where View<Meta> has fields: obs: CorrespondenceView, meta: Meta }
Complete Example
#![allow(unused)] fn main() { use vision_calibration::prelude::*; use vision_calibration::single_cam_handeye::*; let views: Vec<SingleCamHandeyeView> = /* load data */; let input = SingleCamHandeyeInput::new(views)?; let mut session = CalibrationSession::<SingleCamHandeyeProblem>::new(); session.set_input(input)?; step_intrinsics_init(&mut session, None)?; step_intrinsics_optimize(&mut session, None)?; step_handeye_init(&mut session, None)?; step_handeye_optimize(&mut session, None)?; let export = session.export()?; println!("Hand-eye: {:?}", export.gripper_se3_camera); println!("Target in base: {:?}", export.base_se3_target); println!("Reprojection error: {:.4} px", export.mean_reproj_error); }
Practical Considerations
- Rotation diversity is critical for Tsai-Lenz initialization. Include rotations around at least 2 axes.
- 5-10 views minimum, with 10-20 recommended for robustness.
- Robot accuracy: If the robot's reported poses have errors > 1mm or > 0.1°, enable robot pose refinement.
- Initialization failure: If Tsai-Lenz produces a poor estimate (large reprojection error after optimization), the most likely cause is insufficient rotation diversity.
OpenCV equivalence:
cv::calibrateHandEyeprovides the linear initialization step. calibration-rs adds joint non-linear refinement and optional robot pose correction.
Hand-Eye with KUKA Robot
[COLLAB] This chapter requires user collaboration for KUKA dataset details, data collection protocol, and practical tips.
This chapter walks through the handeye_session example, which performs hand-eye calibration using a real KUKA robot dataset.
Dataset
Located at data/kuka_1/:
- Images: 30 PNG files (
01.pngthrough30.png) - Robot poses:
RobotPosesVec.txt— 30 poses in matrix format - Calibration board: 17×28 chessboard, 20 mm square size
Data Loading
The example loads images and robot poses, detects chessboard corners, and constructs the input:
#![allow(unused)] fn main() { // Load robot poses from 4×4 matrix format let robot_poses: Vec<Iso3> = load_robot_poses("data/kuka_1/RobotPosesVec.txt")?; // Detect corners per image let views: Vec<SingleCamHandeyeView> = images .iter() .zip(&robot_poses) .filter_map(|(img, pose)| { let corners = detect_chessboard(img, &config)?; Some(SingleCamHandeyeView { obs: CorrespondenceView::new(board_3d.clone(), corners), meta: HandeyeMeta { base_se3_gripper: *pose }, }) }) .collect(); }
Typically 20-25 of the 30 images yield successful corner detections.
Running the Example
cargo run -p vision-calibration --example handeye_session
The example reports:
- Dataset summary (total images, used views, skipped views)
- Per-step results (initialization, optimization)
- Final calibrated parameters (intrinsics, distortion, hand-eye transform, target pose)
- Robot pose refinement deltas (if enabled)
Interpreting Results
Key outputs:
- Hand-eye transform (): Translation magnitude should match the physical camera-to-gripper distance. Rotation should reflect the mounting orientation.
- Target-in-base (): The calibration board's position in the robot base frame. Verify against the known physical setup.
- Reprojection error: <1 px indicates a good calibration. >3 px suggests problems.
- Robot pose deltas: If enabled, large deltas (>1° rotation, >5mm translation) suggest robot kinematic inaccuracies.
Common Failure Modes
-
Insufficient rotation diversity: All robot poses rotate around the same axis (e.g., only wrist rotation). The Tsai-Lenz initialization will fail or produce a poor estimate.
-
Incorrect pose convention: Robot poses must be (base-to-gripper). If the convention is inverted, the calibration will diverge.
-
Mismatched corner ordering: If the chessboard detector assigns corners in a different order than expected, the 2D-3D correspondences are wrong.
-
Robot pose timestamps: If images and robot poses are not synchronized, the calibration will fail.
Data Collection Recommendations
- Rotation diversity: Include poses with significant roll, pitch, and yaw. Avoid pure translations or rotations around a single axis.
- Target visibility: Ensure the calibration board is fully visible in all images. Partial visibility causes corner detection failure.
- Stable poses: Take images when the robot is stationary. Motion blur degrades corner detection.
- Coverage: Vary the distance to the board and the position within the image.
Multi-Camera Rig Extrinsics
A multi-camera rig is a set of cameras rigidly attached to a common frame. Rig extrinsics calibration estimates the relative pose of each camera within the rig, in addition to per-camera intrinsics and distortion.
Problem Formulation
Parameters
- Per-camera intrinsics: — one per camera
- Per-camera distortion: — one per camera
- Per-camera extrinsics: — camera-to-rig transforms (one per camera, reference camera = identity)
- Per-view rig poses: — rig-to-target transforms (one per view)
Transform Chain
For camera in view :
where .
Objective
This jointly optimizes all cameras' parameters and all rig geometry. Each observation depends on two SE(3) transforms composed together (the ReprojPointPinhole4Dist5TwoSE3 factor).
4-Step Pipeline
Step 1: Per-Camera Intrinsics Initialization (step_intrinsics_init_all)
For each camera independently:
- Compute homographies from the views where that camera has observations
- Run iterative intrinsics estimation (Zhang + distortion fit)
- Recover per-camera, per-view poses
Step 2: Per-Camera Intrinsics Optimization (step_intrinsics_optimize_all)
For each camera independently:
- Run planar intrinsics bundle adjustment
- Refine , , and per-view poses
Step 3: Rig Extrinsics Initialization (step_rig_init)
Uses the linear initialization from Multi-Camera Rig Initialization:
- Define the rig frame by the reference camera ()
- For each non-reference camera, compute by averaging across views
- Compute rig-to-target poses from the reference camera's poses
Step 4: Rig Bundle Adjustment (step_rig_optimize)
Joint optimization of all parameters:
Parameter blocks:
"cam/k"(4D): per-camera intrinsics"dist/k"(5D): per-camera distortion"extrinsics/k"(7D, SE3): per-camera camera-to-rig transforms"rig_pose/v"(7D, SE3): per-view rig-to-target poses
Residual blocks: ReprojPointPinhole4Dist5TwoSE3 per observation.
Fixed parameters: The reference camera's extrinsics are fixed at identity (gauge freedom).
Configuration
#![allow(unused)] fn main() { pub struct RigExtrinsicsConfig { // Per-camera intrinsics options pub intrinsics_init_iterations: usize, pub fix_k3: bool, pub fix_tangential: bool, pub zero_skew: bool, // Rig settings pub reference_camera_idx: usize, // Optimization pub max_iters: usize, pub verbosity: usize, pub robust_loss: RobustLoss, // Rig BA options pub refine_intrinsics_in_rig_ba: bool, pub fix_first_rig_pose: bool, } }
Input Format
#![allow(unused)] fn main() { pub struct RigDataset<Meta> { pub num_cameras: usize, pub views: Vec<RigView<Meta>>, } pub struct RigView<Meta> { pub obs: RigViewObs, // Per-camera observations (Option per camera) pub meta: Meta, } }
Each view has an Option<CorrespondenceView> per camera — cameras that don't see the target in a given view have None.
Per-Camera Reprojection Error
After optimization, per-camera reprojection errors are computed independently:
#![allow(unused)] fn main() { let export = session.export()?; for (cam_idx, error) in export.per_cam_reproj_errors.iter().enumerate() { println!("Camera {}: {:.4} px", cam_idx, error); } println!("Overall: {:.4} px", export.mean_reproj_error); }
Stereo Baseline
For a 2-camera rig, the baseline (distance between camera centers) is:
since (camera 0 is the reference).
Complete Example
#![allow(unused)] fn main() { use vision_calibration::prelude::*; use vision_calibration::rig_extrinsics::*; let mut session = CalibrationSession::<RigExtrinsicsProblem>::new(); session.set_input(rig_dataset)?; step_intrinsics_init_all(&mut session, None)?; step_intrinsics_optimize_all(&mut session, None)?; step_rig_init(&mut session)?; step_rig_optimize(&mut session, None)?; let export = session.export()?; let baseline = export.cam_se3_rig[1].translation.vector.norm(); println!("Stereo baseline: {:.1} mm", baseline * 1000.0); }
OpenCV equivalence:
cv::stereoCalibratefor 2-camera rigs. calibration-rs generalizes to cameras.
Stereo Rig with Real Data
[COLLAB] This chapter requires user collaboration for dataset-specific details and verification strategies.
This chapter walks through the stereo_session example, which calibrates a stereo camera rig from synchronized left/right image pairs.
Dataset
Located at data/stereo/imgs/:
- Left camera:
leftcamera/Im_L_*.png - Right camera:
rightcamera/Im_R_*.png - Pattern: 7×11 chessboard, 30 mm square size
- Synchronized: Left and right images are captured simultaneously
Workflow
1. Load Stereo Pairs
The example loads images from both camera directories and pairs them by filename:
#![allow(unused)] fn main() { let board = ChessboardParams { rows: 7, cols: 11, square_size: 0.03 }; let views: Vec<RigView<NoMeta>> = image_pairs .iter() .filter_map(|(left_img, right_img)| { let left_corners = detect_chessboard(left_img, &config)?; let right_corners = detect_chessboard(right_img, &config)?; // Both cameras must detect corners for a valid view Some(RigView::new(vec![ Some(CorrespondenceView::new(board_3d.clone(), left_corners)), Some(CorrespondenceView::new(board_3d.clone(), right_corners)), ])) }) .collect(); }
2. Run 4-Step Pipeline
#![allow(unused)] fn main() { let mut session = CalibrationSession::<RigExtrinsicsProblem>::new(); session.set_input(RigDataset::new(views, 2)?)?; step_intrinsics_init_all(&mut session, None)?; step_intrinsics_optimize_all(&mut session, None)?; step_rig_init(&mut session)?; step_rig_optimize(&mut session, None)?; }
Interpreting Results
Per-Camera Intrinsics
Both cameras should have similar focal lengths (if they use the same lens). Differences in principal point are normal.
Stereo Baseline
The extrinsics give the relative pose between cameras. For a horizontal stereo pair, expect:
- Rotation: primarily around the Y axis (converging or parallel configuration)
- Translation: primarily along X (horizontal baseline)
- Baseline magnitude: should match the physical distance between cameras
Verification
- Epipolar constraint: For correctly calibrated stereo, corresponding points should lie on epipolar lines. Measure the average distance from points to their epipolar lines.
- Rectification: The stereo pair should rectify cleanly — horizontal scanlines in the rectified images should correspond.
- Triangulation: Triangulate known board corners and verify the 3D distances match the board geometry.
Running the Example
cargo run -p vision-calibration --example stereo_session
cargo run -p vision-calibration --example stereo_session -- --max-views 15
The --max-views flag limits the number of image pairs processed, useful for faster iteration during development.
Data Collection Tips
- Both cameras must see the board in each view. Discard pairs where only one camera detects corners.
- Vary board orientation and distance — same advice as single-camera calibration.
- Ensure synchronization — if images are not captured simultaneously, the board may have moved between left and right captures.
- Overlap region — the board should be in the overlapping field of view of both cameras.
Stereo Rig with Real ChArUco Data
This chapter walks through the stereo_charuco_session example, which calibrates a stereo camera rig from synchronized ChArUco board images.
Dataset
The example uses data/stereo_charuco/:
cam1/Cam1_*.pngfor camera 0cam2/Cam2_*.pngfor camera 1- Pairing by shared filename suffix (deterministic sorted intersection)
The loader ignores non-PNG files (for example Thumbs.db).
ChArUco Target Parameters
- Board squares:
22 x 22 - Cell size:
0.45 mm(0.00045 m) - Marker size scale:
0.75 - Dictionary:
DICT_4X4_1000 - Layout: OpenCV ChArUco (
OpenCvCharuco)
Detector Tuning
For this 2048x1536 dataset, the example increases ChArUco graph spacing:
params.graph.max_spacing_pix = 120.0
Without this, the default graph spacing may be too strict for this board scale in image space.
Workflow
The example follows the standard rig extrinsics 4-step session pipeline:
step_intrinsics_init_allstep_intrinsics_optimize_allstep_rig_initstep_rig_optimize
It also runs run_calibration as a convenience-path comparison.
Running the Example
cargo run -p vision-calibration --example stereo_charuco_session
cargo run -p vision-calibration --example stereo_charuco_session -- --max-views 8
cargo run -p vision-calibration --example stereo_charuco_session -- --max-views=15
- Default: uses all detected stereo pairs
- Optional
--max-viewscaps the number of pairs used (deterministic prefix of sorted pairs)
Output Interpretation
The example prints:
- Dataset summary (
total_pairs,used_views,skipped_views,usable_left,usable_right) - Per-camera intrinsics and per-camera reprojection errors
- Rig BA mean reprojection error and per-camera BA errors
- Baseline magnitude in meters and millimeters
As with any real-data calibration, verify that:
- Both cameras have enough usable views (at least 3 each)
- Reprojection errors are within your application tolerance
- Estimated baseline is physically plausible for the hardware setup
Multi-Camera Rig Hand-Eye
This is the most complex calibration workflow: a multi-camera rig mounted on a robot arm. It combines per-camera intrinsics calibration, rig extrinsics estimation, and hand-eye calibration in a 6-step pipeline.
Problem Formulation
Transformation Chain
For camera in view with robot pose :
where:
- : camera to rig (rig extrinsics)
- : rig to gripper (hand-eye transform)
- : base to gripper (known robot pose for view )
- : base to target (calibrated)
Parameters
- Per-camera intrinsics: ( scalar parameters)
- Per-camera distortion: ( scalar parameters)
- Per-camera extrinsics: ( DOF, reference camera = identity)
- Hand-eye: (6 DOF)
- Target pose: (6 DOF)
- Optionally: per-view robot corrections ( DOF, regularized)
6-Step Pipeline
Steps 1-2: Per-Camera Intrinsics
Same as Rig Extrinsics: initialize and optimize each camera's intrinsics independently.
Steps 3-4: Rig Extrinsics
Same as Rig Extrinsics: initialize camera-to-rig transforms via SE(3) averaging, then jointly optimize the rig geometry.
Step 5: Hand-Eye Initialization (step_handeye_init)
Uses Tsai-Lenz with the rig's reference camera poses and robot poses:
- Extract relative camera motions from rig-to-target poses
- Extract relative robot motions from base-to-gripper poses
- Solve for (gripper-to-rig hand-eye)
- Estimate target-in-base
Step 6: Hand-Eye Optimization (step_handeye_optimize)
Joint optimization of all parameters:
Parameter blocks:
"cam/k"(4D): per-camera intrinsics"dist/k"(5D): per-camera distortion"extrinsics/k"(7D, SE3): per-camera camera-to-rig"handeye"(7D, SE3): rig-to-gripper"target"(7D, SE3): base-to-target
Factor: ReprojPointPinhole4Dist5HandEye per observation, which composes the full transform chain.
Complete Example
#![allow(unused)] fn main() { use vision_calibration::prelude::*; use vision_calibration::rig_handeye::*; let mut session = CalibrationSession::<RigHandeyeProblem>::new(); session.set_input(rig_dataset_with_robot_poses)?; // Per-camera calibration step_intrinsics_init_all(&mut session, None)?; step_intrinsics_optimize_all(&mut session, None)?; // Rig geometry step_rig_init(&mut session)?; step_rig_optimize(&mut session, None)?; // Hand-eye step_handeye_init(&mut session, None)?; step_handeye_optimize(&mut session, None)?; let export = session.export()?; println!("Mode: {:?}", export.handeye_mode); if let Some(gripper_se3_rig) = export.gripper_se3_rig { println!("gripper_se3_rig: {:?}", gripper_se3_rig); } println!("Baseline: {:.1} mm", export.cam_se3_rig[1].translation.vector.norm() * 1000.0); println!("Per-camera errors: {:?}", export.per_cam_reproj_errors); }
Gauge Freedom
The system has a gauge freedom: the rig frame origin and the hand-eye transform are coupled. Fixing the reference camera's extrinsics at identity resolves this by defining the rig frame to coincide with camera 0.
Practical Considerations
All the advice from Single-Camera Hand-Eye applies, plus:
- All cameras must observe the target in at least some views for the rig extrinsics to be well-constrained
- Views where only some cameras see the target are handled (missing observations are skipped)
- The hand-eye transform describes gripper-to-rig, not gripper-to-individual-camera. The per-camera offset comes from the rig extrinsics.
Laserline Device Calibration
Laserline calibration jointly estimates camera parameters and a laser plane from observations of both a calibration board and laser line projections on the board. This is used in laser triangulation systems where a camera and laser are rigidly mounted together.
Problem Formulation
Parameters
- Camera intrinsics:
- Distortion:
- Sensor tilt: (optional, for Scheimpflug cameras)
- Per-view poses: (camera-to-target SE(3))
- Laser plane: normal , distance
Observations
Each view provides two types of observations:
- Calibration points: 2D-3D correspondences from the chessboard (same as planar intrinsics)
- Laser pixels: 2D pixel positions where the laser line appears on the target
Objective
The cost function has two terms:
where are weights, are (possibly different) robust loss functions, and is the laser residual.
Laser Residual Types
Two approaches are implemented for the laser residual, selectable via configuration.
PointToPlane (3D Distance)
Algorithm:
- Undistort laser pixel to normalized coordinates
- Back-project to a ray in camera frame
- Intersect the ray with the target plane (at , transformed by pose ) to get a 3D point
- Compute signed distance from to the laser plane:
Residual dimension: 1 (meters)
LineDistNormalized (2D Line Distance) — Default
Algorithm:
- Compute the 3D intersection line of the laser plane and the target plane (in camera frame)
- Project this 3D line onto the normalized camera plane
- Undistort the laser pixel to normalized coordinates (done once, not per-iteration)
- Measure the perpendicular distance from the undistorted pixel to the projected line
- Scale by for pixel-comparable units:
Residual dimension: 1 (effective pixels)
Comparison
| Property | PointToPlane | LineDistNormalized |
|---|---|---|
| Residual units | meters | pixels |
| Undistortion | Per-iteration | Once per pixel |
| Geometry | Ray-plane intersection | 2D line distance |
| Speed | Slower | Faster |
| Recommended | Alternative | Default |
Both approaches yield similar accuracy in practice (<6% intrinsics error, <5° plane normal error in synthetic tests).
Derivation: Line-Distance Residual
Plane-Plane Intersection Line
The laser plane and the target plane (normal , distance from camera, derived from pose ) intersect in a 3D line with:
Projection onto Normalized Plane
Project the 3D line to the plane:
Perpendicular Distance
For an undistorted pixel in normalized coordinates, the perpendicular distance to the 2D line is:
Configuration
#![allow(unused)] fn main() { pub struct LaserlineDeviceConfig { // Initialization pub init_iterations: usize, // Iterative intrinsics iterations (default: 2) pub fix_k3_in_init: bool, // Fix k3 during init (default: true) pub fix_tangential_in_init: bool, // Fix p1, p2 during init (default: false) pub zero_skew: bool, // Enforce zero skew (default: true) pub sensor_init: ScheimpflugParams, // Initial sensor tilt (default: identity) // Optimization pub max_iters: usize, // LM iterations (default: 50) pub verbosity: usize, pub calib_loss: RobustLoss, // Default: Huber { scale: 1.0 } pub laser_loss: RobustLoss, // Default: Huber { scale: 0.01 } pub calib_weight: f64, // Weight for calibration residuals (default: 1.0) pub laser_weight: f64, // Weight for laser residuals (default: 1.0) pub fix_intrinsics: bool, pub fix_distortion: bool, pub fix_k3: bool, // Default: true pub fix_sensor: bool, // Default: true pub fix_poses: Vec<usize>, // Default: vec![0] pub fix_plane: bool, pub laser_residual_type: LaserlineResidualType, // Default: LineDistNormalized } }
Weight Balancing
Since calibration residuals (in pixels) and laser residuals may have different scales, the weights allow balancing their relative influence. A common starting point is calib_weight = 1.0 and laser_weight = 1.0, adjusting if one term dominates.
Complete Example
#![allow(unused)] fn main() { use vision_calibration::prelude::*; use vision_calibration::laserline_device::*; let mut session = CalibrationSession::<LaserlineDeviceProblem>::new(); session.set_input(laserline_input)?; run_calibration(&mut session, None)?; let export = session.export()?; println!("Plane normal: {:?}", export.estimate.params.plane.normal); println!("Plane distance: {:.4}", export.estimate.params.plane.distance); println!("Reprojection error: {:.4} px", export.mean_reproj_error); println!("Laser error: {:.4}", export.stats.mean_laser_error); }
Scheimpflug Support
For laser profilers with tilted sensors, the sensor model parameters are jointly optimized. The ReprojPointPinhole4Dist5Scheimpflug2 factor handles the extended camera model.
Laserline with Industrial Data
[COLLAB] This chapter requires user collaboration for industrial application context, data collection protocols, and accuracy requirements.
This chapter discusses practical considerations for laser triangulation calibration in industrial settings.
Industrial Context
Laser triangulation is widely used in industrial inspection:
- Weld seam inspection: Verifying weld bead geometry (width, height, penetration)
- 3D surface profiling: Measuring surface topology for quality control
- Gap and flush measurement: Checking assembly tolerances in automotive manufacturing
- Tire and rubber inspection: Measuring tread depth and sidewall profiles
In each case, a laser line is projected onto the workpiece and a camera observes the deformed line. The calibrated laser plane, combined with camera parameters, allows converting pixel positions to metric 3D coordinates.
Accuracy Requirements
| Application | Typical accuracy needed |
|---|---|
| Weld inspection | 0.1-0.5 mm |
| Surface profiling | 0.01-0.1 mm |
| Gap/flush | 0.05-0.2 mm |
| Coarse 3D scanning | 0.5-2 mm |
Data Collection for Laser Calibration
Calibration Board Setup
The calibration board serves dual purpose:
- Provides chessboard corners for camera intrinsics calibration
- Provides a known plane to constrain the laser line observations
Capturing Views
For each view:
- Position the calibration board at a known angle to the laser plane
- Capture an image with both the chessboard pattern and the laser line visible
- Record chessboard corners and laser line pixel positions separately
View Diversity
- Board angles: Vary the board tilt so the laser-board intersection line covers different positions in the image
- Board distances: Vary the distance to cover the working range
- Minimum views: 5-10 views with diverse orientations
The laserline_device_session Example
cargo run -p vision-calibration --example laserline_device_session
This example uses synthetic data to demonstrate the workflow. For real data, the laser pixel extraction must be performed by an external algorithm (typically a peak detector applied to each image column).
Interpreting Results
Key outputs:
- Laser plane normal: The direction the laser projects in. Should align with the physical mounting.
- Laser plane distance: Distance from camera center to the laser plane. Should match the physical geometry.
- Mean laser error: The average residual of laser observations. In pixel units for
LineDistNormalized, in meters forPointToPlane.
DS8 Dataset
The data/DS8/ directory contains a real industrial laser triangulation dataset:
ExperimentDetails.txt: Experiment metadatacalibration_object.txt: Calibration pattern specificationimages/: Captured imagesrobot_cali.txt: Robot calibration data (if applicable)
Troubleshooting
- Poor laser plane estimate: Usually caused by insufficient view diversity or laser pixels too close to collinear (all from similar board angles)
- High laser residuals: May indicate laser pixel detection errors, or that the laser is not well-described by a single plane (diverging laser, curved projection)
- Scheimpflug convergence: If sensor tilt parameters diverge, try initializing with
fix_sensor: truefor the first few iterations, then release
CalibrationSession
CalibrationSession<P: ProblemType> is the central state container for calibration workflows. It holds all data for a calibration run — configuration, input observations, intermediate state, final output, and an audit log — with full JSON serialization for checkpointing.
Structure
#![allow(unused)] fn main() { pub struct CalibrationSession<P: ProblemType> { pub metadata: SessionMetadata, pub config: P::Config, pub state: P::State, pub exports: Vec<ExportRecord<P::Export>>, pub log: Vec<LogEntry>, // input and output are private — access via methods } }
| Field | Access | Purpose |
|---|---|---|
metadata | pub | Problem type name, schema version, timestamps |
config | pub | Algorithm parameters (iterations, fix masks, loss functions) |
| input | input(), set_input() | Observation data (set once per calibration run) |
state | pub | Mutable intermediate results (modified by step functions) |
| output | output(), set_output() | Final calibration result (set by the last step) |
exports | pub | Timestamped export records |
log | pub | Audit trail of operations |
Lifecycle
1. Create
#![allow(unused)] fn main() { let mut session = CalibrationSession::<PlanarIntrinsicsProblem>::new(); // Or with a description: let mut session = CalibrationSession::<PlanarIntrinsicsProblem>::with_description( "Lab camera calibration 2024-01-15" ); }
2. Set Input
#![allow(unused)] fn main() { session.set_input(dataset)?; }
Input is validated via ProblemType::validate_input(). Setting input clears computed state (per the invalidation policy).
3. Configure
#![allow(unused)] fn main() { session.update_config(|c| { c.max_iters = 50; c.robust_loss = RobustLoss::Huber { scale: 2.0 }; })?; }
Configuration is validated via ProblemType::validate_config().
4. Run Steps
#![allow(unused)] fn main() { step_init(&mut session, None)?; step_optimize(&mut session, None)?; }
Step functions are free functions operating on &mut CalibrationSession<P>. Each step reads input/state, performs computation, and updates state (or output).
5. Export
#![allow(unused)] fn main() { let export = session.export()?; }
Creates an ExportRecord with the current timestamp and output. Multiple exports can be created (e.g., after re-optimization with different settings).
JSON Checkpointing
The entire session can be serialized and restored:
#![allow(unused)] fn main() { // Save let json = session.to_json()?; std::fs::write("calibration.json", &json)?; // Restore let json = std::fs::read_to_string("calibration.json")?; let restored = CalibrationSession::<PlanarIntrinsicsProblem>::from_json(&json)?; // Resume from where we left off step_optimize(&mut restored, None)?; }
This enables:
- Interruption recovery: Save progress and resume later
- Reproducibility: Share exact calibration state
- Debugging: Inspect the session at any point
Invalidation Policies
When input or configuration changes, computed state may need to be cleared:
| Event | Default policy |
|---|---|
set_input() | Clear state and output (CLEAR_COMPUTED) |
update_config() | Keep everything (KEEP_ALL) |
clear_input() | Clear everything (CLEAR_ALL) |
These defaults can be overridden per problem type.
Audit Log
Every step function appends to the session log:
#![allow(unused)] fn main() { pub struct LogEntry { pub timestamp: u64, pub operation: String, pub success: bool, pub notes: Option<String>, } }
The log records what was done and when, useful for tracking calibration history.
Accessing State
#![allow(unused)] fn main() { // Input (required before steps) let input = session.require_input()?; // Returns error if no input // Intermediate state (available after init) if let Some(k) = &session.state.initial_intrinsics { println!("Init fx={:.1}", k.fx); } // Output (available after optimize) let output = session.require_output()?; }
ProblemType Trait
The ProblemType trait defines the interface for a calibration problem. It specifies the types for configuration, input, state, output, and export, along with validation and lifecycle hooks. The trait is intentionally minimal — behavior is implemented in external step functions, not trait methods.
Definition
#![allow(unused)] fn main() { pub trait ProblemType: Sized + 'static { type Config: Clone + Default + Serialize + DeserializeOwned + Debug; type Input: Clone + Serialize + DeserializeOwned + Debug; type State: Clone + Default + Serialize + DeserializeOwned + Debug; type Output: Clone + Serialize + DeserializeOwned + Debug; type Export: Clone + Serialize + DeserializeOwned + Debug; fn name() -> &'static str; fn schema_version() -> u32 { 1 } fn validate_input(input: &Self::Input) -> Result<()> { Ok(()) } fn validate_config(config: &Self::Config) -> Result<()> { Ok(()) } fn validate_input_config( input: &Self::Input, config: &Self::Config ) -> Result<()> { Ok(()) } fn on_input_change() -> InvalidationPolicy { InvalidationPolicy::CLEAR_COMPUTED } fn on_config_change() -> InvalidationPolicy { InvalidationPolicy::KEEP_ALL } fn export( output: &Self::Output, config: &Self::Config ) -> Result<Self::Export>; } }
Associated Types
| Type | Purpose | Requirements |
|---|---|---|
Config | Algorithm parameters | Default (sensible defaults) |
Input | Observation data | Validated on set |
State | Intermediate results | Default (empty initial state) |
Output | Final calibration result | Set by last step |
Export | User-facing result | Created from Output + Config |
The Serialize + DeserializeOwned bounds enable JSON checkpointing. Clone enables snapshot operations.
Required Method: name()
Returns a stable string identifier for the problem type:
#![allow(unused)] fn main() { fn name() -> &'static str { "planar_intrinsics_v2" } }
This is stored in session metadata and used for deserialization. Changing the name breaks compatibility with existing checkpoints.
Required Method: export()
Converts the internal output to a user-facing export format:
#![allow(unused)] fn main() { fn export(output: &Self::Output, config: &Self::Config) -> Result<Self::Export>; }
The export may transform, filter, or enrich the output. For example, PlanarIntrinsicsProblem::export() computes reprojection error metrics from the raw optimization output.
Validation Hooks
Three optional validation hooks run at specific points:
| Hook | When it runs |
|---|---|
validate_input(input) | On session.set_input(input) |
validate_config(config) | On session.set_config(config) or update_config() |
validate_input_config(input, config) | When both input and config are present |
Example: PlanarIntrinsicsProblem validates that input has ≥ 3 views with ≥ 4 points each.
Invalidation Policies
Control what happens when input or config changes:
#![allow(unused)] fn main() { pub struct InvalidationPolicy { pub clear_state: bool, pub clear_output: bool, pub clear_exports: bool, } impl InvalidationPolicy { pub const KEEP_ALL: Self = ...; // Nothing cleared pub const CLEAR_COMPUTED: Self = ...; // State + output cleared pub const CLEAR_ALL: Self = ...; // Everything cleared } }
Defaults:
- Input change →
CLEAR_COMPUTED(re-running steps is needed) - Config change →
KEEP_ALL(output is still valid, but may not reflect new config)
Existing Problem Types
| Type | Name | Steps |
|---|---|---|
PlanarIntrinsicsProblem | "planar_intrinsics_v2" | init → optimize |
SingleCamHandeyeProblem | "single_cam_handeye" | intrinsics_init → intrinsics_optimize → handeye_init → handeye_optimize |
RigExtrinsicsProblem | "rig_extrinsics" | intrinsics_init_all → intrinsics_optimize_all → rig_init → rig_optimize |
RigHandeyeProblem | "rig_handeye" | 6 steps (intrinsics + rig + handeye) |
LaserlineDeviceProblem | "laserline_device" | init → optimize |
Design Philosophy
The trait is minimal by design:
- No step methods: Steps are free functions, not trait methods. This allows flexible composition and avoids trait object limitations.
- No algorithm logic: The trait defines data types and validation, not computation.
- Serialization-first: All types are serializable, enabling checkpointing from day one.
Step Functions vs Pipeline Functions
calibration-rs offers two ways to run calibration workflows: step functions for granular control and pipeline functions for convenience.
Step Functions
Step functions are free functions that operate on a mutable session reference:
#![allow(unused)] fn main() { pub fn step_init( session: &mut CalibrationSession<PlanarIntrinsicsProblem>, opts: Option<IntrinsicsInitOptions>, ) -> Result<()> }
Each step:
- Reads input and/or state from the session
- Performs one phase of the calibration (e.g., initialization or optimization)
- Updates the session state (and possibly output)
- Logs the operation
Advantages
Intermediate inspection: Examine the state between steps.
#![allow(unused)] fn main() { step_init(&mut session, None)?; // Inspect initialization quality before committing to optimization let init_k = session.state.initial_intrinsics.as_ref().unwrap(); if (init_k.fx - expected_fx).abs() / expected_fx > 0.5 { eprintln!("Warning: init fx={:.0} is far from expected {:.0}", init_k.fx, expected_fx); } step_optimize(&mut session, None)?; }
Per-step configuration: Override options for individual steps.
#![allow(unused)] fn main() { // Use more iterations for optimization let opts = IntrinsicsOptimizeOptions { max_iters: Some(200), ..Default::default() }; step_optimize(&mut session, Some(opts))?; }
Selective execution: Skip steps or re-run specific steps.
#![allow(unused)] fn main() { // Re-run optimization with different settings (without re-initializing) session.update_config(|c| c.robust_loss = RobustLoss::Cauchy { scale: 3.0 })?; step_optimize(&mut session, None)?; }
Checkpointing: Save and restore between steps.
#![allow(unused)] fn main() { step_init(&mut session, None)?; let checkpoint = session.to_json()?; std::fs::write("after_init.json", &checkpoint)?; step_optimize(&mut session, None)?; }
Pipeline Functions
Pipeline functions chain all steps into a single call:
#![allow(unused)] fn main() { pub fn run_calibration( session: &mut CalibrationSession<PlanarIntrinsicsProblem>, ) -> Result<()> { step_init(session, None)?; step_optimize(session, None)?; Ok(()) } }
When to Use
- Quick prototyping: Get results with minimal code
- Default settings: When the defaults work and you don't need inspection
- Scripts and automation: When human inspection is not needed
#![allow(unused)] fn main() { let mut session = CalibrationSession::<PlanarIntrinsicsProblem>::new(); session.set_input(dataset)?; run_calibration(&mut session)?; let export = session.export()?; }
Available Pipeline Functions
| Problem | Pipeline function | Steps |
|---|---|---|
PlanarIntrinsicsProblem | run_calibration() | init → optimize |
SingleCamHandeyeProblem | single_cam_handeye::run_calibration() | 4 steps |
RigExtrinsicsProblem | rig_extrinsics::run_calibration() | 4 steps |
RigHandeyeProblem | rig_handeye::run_calibration() | 6 steps |
LaserlineDeviceProblem | run_calibration(session, config) | init → optimize |
ScheimpflugIntrinsicsProblem | scheimpflug_intrinsics::run_calibration(session, config) | init → optimize |
Recommendation
Use step functions for production calibration where you need to:
- Verify initialization quality
- Adjust parameters between steps
- Handle failures gracefully
- Log and audit the process
Use pipeline functions for:
- Examples and tutorials
- Batch processing with known-good settings
- Quick experiments
Adding a New Optimization Problem
This chapter walks through adding a new optimization problem to vision-calibration-optim, following the pattern established by the planar intrinsics implementation.
Overview
Adding a new problem requires these steps:
- Define the generic residual function in
factors/ - Add a
FactorKindvariant inir/types.rs - Create a problem builder in
problems/ - Integrate with the backend in
backend/tiny_solver_backend.rs - Write tests with synthetic ground truth
Step 1: Generic Residual Function
Create a new file or add to an existing file in crates/vision-calibration-optim/src/factors/:
#![allow(unused)] fn main() { pub fn my_residual_generic<T: RealField>( param_a: DVectorView<'_, T>, // optimizable parameters param_b: DVectorView<'_, T>, constant_data: [f64; 3], // per-residual constants (f64, not T) w: f64, // weight ) -> SVector<T, 2> { // residual dimension // Convert constants to T let cx = T::from_f64(constant_data[0]).unwrap(); // Compute residual using param_a, param_b, cx // Use .clone() liberally, avoid in-place ops let r_x = /* ... */; let r_y = /* ... */; let wt = T::from_f64(w).unwrap(); SVector::<T, 2>::new(r_x * wt.clone(), r_y * wt) } }
Key rules:
- Generic over
T: RealFieldfor autodiff - Optimizable parameters as
DVectorView<'_, T> - Constants as
f64(converted inside) - Use
.clone(), no in-place mutation - Return
SVector<T, N>for fixed residual dimension
Step 2: FactorKind Variant
Add a new variant to FactorKind in crates/vision-calibration-optim/src/ir/types.rs:
#![allow(unused)] fn main() { pub enum FactorKind { // ... existing variants ... MyNewFactor { constant_data: [f64; 3], w: f64, }, } }
Add validation in the validate method:
#![allow(unused)] fn main() { FactorKind::MyNewFactor { .. } => { ensure!(params.len() == 2, "MyNewFactor requires 2 param blocks"); ensure!(params[0].dim == 4, "param_a must be dim 4"); ensure!(params[1].dim == 7, "param_b must be dim 7 (SE3)"); } }
Update the residual_dim() method:
#![allow(unused)] fn main() { FactorKind::MyNewFactor { .. } => 2, }
Step 3: Problem Builder
Create crates/vision-calibration-optim/src/problems/my_problem.rs:
#![allow(unused)] fn main() { pub struct MyProblemParams { pub param_a: DVector<f64>, pub param_b: Iso3, } pub struct MySolveOptions { pub fix_param_a: FixedMask, pub robust_loss: RobustLoss, } pub fn build_my_problem_ir( data: &MyDataset, initial: &MyProblemParams, opts: &MySolveOptions, ) -> Result<(ProblemIR, HashMap<String, DVector<f64>>)> { let mut ir = ProblemIR::new(); let mut initial_values = HashMap::new(); // Add parameter blocks via the builder API let param_a_id = ir.add_param_block( "param_a", 4, ManifoldKind::Euclidean, opts.fix_param_a.clone(), None, ); initial_values.insert("param_a".to_string(), initial.param_a.clone()); let param_b_id = ir.add_param_block( "param_b", 7, ManifoldKind::SE3, FixedMask::all_free(), None, ); initial_values.insert( "param_b".to_string(), iso3_to_se3_dvec(&initial.param_b), ); // Add residual blocks for obs in &data.observations { ir.add_residual_block(ResidualBlock { params: vec![param_a_id, param_b_id], loss: opts.robust_loss, factor: FactorKind::MyNewFactor { constant_data: obs.constant_data, w: obs.weight, }, residual_dim: 2, }); } Ok((ir, initial_values)) } }
Step 4: Backend Integration
In crates/vision-calibration-optim/src/backend/tiny_solver_backend.rs, add a match arm in compile_factor():
#![allow(unused)] fn main() { FactorKind::MyNewFactor { constant_data, w } => { let constant_data = *constant_data; let w = *w; Box::new(move |params: &[DVectorView<'_, T>]| { my_residual_generic( params[0], params[1], constant_data, w, ).as_slice().to_vec() }) } }
Step 5: Tests
Write a test with synthetic ground truth in crates/vision-calibration-optim/tests/:
#![allow(unused)] fn main() { #[test] fn my_problem_converges() { // 1. Define ground truth parameters let gt_param_a = /* ... */; let gt_param_b = /* ... */; // 2. Generate synthetic observations let data = generate_synthetic_data(>_param_a, >_param_b); // 3. Create perturbed initial values let initial = perturb(>_param_a, >_param_b); // 4. Build and solve let (ir, init_vals) = build_my_problem_ir(&data, &initial, &opts)?; let solution = solve_with_backend( BackendKind::TinySolver, &ir, &init_vals, &backend_opts, )?; // 5. Verify convergence let solved_a = &solution.params["param_a"]; assert!((solved_a[0] - gt_param_a[0]).abs() < tolerance); } }
Checklist
-
Generic residual function with
T: RealField -
FactorKindvariant with validation -
Problem builder producing
ProblemIR+ initial values - Backend compilation for the new factor
- Synthetic ground truth test verifying convergence
- (Optional) Pipeline integration with session framework
Adding a New Pipeline Problem Type
This chapter describes how to create a new session-based calibration workflow in vision-calibration-pipeline, using the laserline device module as a template.
Module Structure
Create a new folder under crates/vision-calibration-pipeline/src/:
my_problem/
├── mod.rs # Module re-exports
├── problem.rs # ProblemType implementation + Config
├── state.rs # Intermediate state type
└── steps.rs # Step functions + pipeline function
Step 1: Define the Problem Type (problem.rs)
#![allow(unused)] fn main() { use crate::session::{ProblemType, InvalidationPolicy}; pub struct MyProblem; #[derive(Clone, Default, Debug, Serialize, Deserialize)] pub struct MyConfig { pub max_iters: usize, pub fix_k3: bool, // ... other configuration } #[derive(Clone, Debug, Serialize, Deserialize)] pub struct MyInput { pub views: Vec<MyView>, } #[derive(Clone, Debug, Serialize, Deserialize)] pub struct MyOutput { pub calibrated_params: CameraParams, pub mean_reproj_error: f64, } #[derive(Clone, Debug, Serialize, Deserialize)] pub struct MyExport { pub params: CameraParams, pub mean_reproj_error: f64, } impl ProblemType for MyProblem { type Config = MyConfig; type Input = MyInput; type State = MyState; // defined in state.rs type Output = MyOutput; type Export = MyExport; fn name() -> &'static str { "my_problem_v1" } fn validate_input(input: &MyInput) -> Result<()> { ensure!(input.views.len() >= 3, "Need at least 3 views"); Ok(()) } fn on_input_change() -> InvalidationPolicy { InvalidationPolicy::CLEAR_COMPUTED } fn on_config_change() -> InvalidationPolicy { InvalidationPolicy::KEEP_ALL } fn export(output: &MyOutput, _config: &MyConfig) -> Result<MyExport> { Ok(MyExport { params: output.calibrated_params.clone(), mean_reproj_error: output.mean_reproj_error, }) } } }
Step 2: Define the State (state.rs)
#![allow(unused)] fn main() { #[derive(Clone, Default, Debug, Serialize, Deserialize)] pub struct MyState { // Initialization results pub initial_intrinsics: Option<FxFyCxCySkew<f64>>, pub initial_distortion: Option<BrownConrady5<f64>>, pub initial_poses: Option<Vec<Iso3>>, // Optimization results pub final_cost: Option<f64>, pub mean_reproj_error: Option<f64>, } }
The state must implement Default (empty state) and Clone + Serialize + Deserialize (for checkpointing).
Step 3: Implement Step Functions (steps.rs)
#![allow(unused)] fn main() { use crate::session::CalibrationSession; use super::problem::MyProblem; pub fn step_init( session: &mut CalibrationSession<MyProblem>, _opts: Option<&()>, ) -> Result<()> { let input = session.require_input()?; let config = &session.config; // Run linear initialization let intrinsics = /* ... */; let poses = /* ... */; // Update state session.state.initial_intrinsics = Some(intrinsics); session.state.initial_poses = Some(poses); // Log session.log_success("step_init", Some("Initialization complete")); Ok(()) } pub fn step_optimize( session: &mut CalibrationSession<MyProblem>, _opts: Option<&()>, ) -> Result<()> { let input = session.require_input()?; let config = &session.config; // Require initialization let init_k = session.state.initial_intrinsics.as_ref() .context("Run step_init first")?; // Build and solve optimization problem let result = /* ... */; // Update state and output session.state.final_cost = Some(result.cost); session.state.mean_reproj_error = Some(result.reproj_error); session.set_output(MyOutput { calibrated_params: result.params, mean_reproj_error: result.reproj_error, })?; session.log_success("step_optimize", Some("Optimization complete")); Ok(()) } /// Convenience pipeline function pub fn run_calibration( session: &mut CalibrationSession<MyProblem>, ) -> Result<()> { step_init(session, None)?; step_optimize(session, None)?; Ok(()) } }
Step 4: Module Re-exports (mod.rs)
#![allow(unused)] fn main() { mod problem; mod state; mod steps; pub use problem::{MyProblem, MyConfig, MyInput, MyOutput, MyExport}; pub use state::MyState; pub use steps::{step_init, step_optimize, run_calibration}; }
Step 5: Register in the Pipeline Crate
In crates/vision-calibration-pipeline/src/lib.rs:
#![allow(unused)] fn main() { pub mod my_problem; }
Step 6: Wire into the Facade Crate
In crates/vision-calibration/src/lib.rs:
#![allow(unused)] fn main() { pub mod my_problem { pub use vision_calibration_pipeline::my_problem::*; } }
Do not add new workflows to the prelude by default; keep the prelude minimal for planar hello-world usage.
Testing
Write an integration test in crates/vision-calibration-pipeline/tests/:
#![allow(unused)] fn main() { #[test] fn my_problem_session_workflow() -> Result<()> { let input = generate_synthetic_input(); let mut session = CalibrationSession::<MyProblem>::new(); session.set_input(input)?; step_init(&mut session, None)?; assert!(session.state.initial_intrinsics.is_some()); step_optimize(&mut session, None)?; assert!(session.output().is_some()); // Test JSON round-trip let json = session.to_json()?; let restored = CalibrationSession::<MyProblem>::from_json(&json)?; assert!(restored.output.is_some()); Ok(()) } }
Adding a New Solver Backend
The backend-agnostic IR design allows adding new optimization backends without modifying problem definitions. This chapter describes what a backend must implement and how to integrate it.
The OptimBackend Trait
#![allow(unused)] fn main() { pub trait OptimBackend { fn solve( &self, ir: &ProblemIR, initial_params: &HashMap<String, DVector<f64>>, opts: &BackendSolveOptions, ) -> Result<BackendSolution>; } }
A backend receives:
ir: The problem structure (parameter blocks, residual blocks, factor kinds)initial_params: Initial values for all parameter blocks (keyed by name)opts: Solver options (max iterations, tolerances, verbosity)
And returns:
BackendSolution: Optimized parameter values (keyed by parameter name) and a solve report
What a Backend Must Handle
1. Parameter Blocks
For each ParamBlock, the backend must:
- Allocate storage for a parameter vector of the given dimension
- Initialize from the provided initial values
- Apply the manifold (if not Euclidean)
- Respect the fixed mask (hold specified indices constant)
- Apply box constraints (if bounds are specified)
2. Manifold Constraints
The backend must implement the plus () and minus () operations for each ManifoldKind:
| Manifold | Ambient dim | Tangent dim | Plus operation |
|---|---|---|---|
Euclidean | |||
SE3 | 7 | 6 | |
SO3 | 4 | 3 | |
S2 | 3 | 2 | Retract via tangent plane basis |
3. Residual Evaluation
For each ResidualBlock, the backend must:
- Call the appropriate residual function based on
FactorKind - Pass the correct parameter block values (referenced by
ParamId) - Include per-residual constant data (3D points, observed pixels, weights)
- Compute Jacobians (via autodiff or finite differences)
4. Robust Loss Functions
The backend must apply the RobustLoss to each residual:
None→ standard squared lossHuber { scale }→ Huber loss with the given scaleCauchy { scale }→ Cauchy lossArctan { scale }→ Arctan loss
5. Solution Extraction
Return optimized values as a HashMap<String, DVector<f64>> keyed by parameter block name (not ID).
Implementation Pattern
A typical backend has two phases:
Compile Phase
Translate the IR into solver-specific data structures:
#![allow(unused)] fn main() { fn compile(&self, ir: &ProblemIR) -> SolverProblem { for param in &ir.params { // Create solver parameter with manifold and fixing } for residual in &ir.residuals { // Create solver cost function from FactorKind } } }
Solve Phase
Run the optimizer and extract results:
#![allow(unused)] fn main() { fn solve(&self, problem: SolverProblem, opts: &BackendSolveOptions) -> BackendSolution { // Set convergence criteria from opts // Run optimization loop // Extract final parameter values // Build SolveReport } }
Potential Backends
| Backend | Description | Advantages |
|---|---|---|
| tiny-solver | Current. Rust-native LM. | Pure Rust, no external deps |
| Ceres-RS | Rust bindings to Google Ceres | Battle-tested, many features |
| Custom GN | Hand-written Gauss-Newton | Full control, educational |
| L-BFGS | Quasi-Newton for large problems | Memory-efficient |
Testing
A new backend should pass the same convergence tests as the existing backend:
#![allow(unused)] fn main() { #[test] fn new_backend_planar_converges() { // Same synthetic data and initial values as tiny-solver tests let (ir, init) = build_planar_test_problem(); let solution = MyNewBackend.solve(&ir, &init, &opts)?; // Verify same convergence quality assert!(solution.report.final_cost < 1e-4); } }
Run the full test suite with both backends to ensure equivalent results.
Synthetic Data Generation
[COLLAB] This appendix benefits from user collaboration on recommended noise levels and realistic test scenarios.
calibration-rs provides utilities for generating synthetic calibration data, used in examples and tests. Synthetic data allows testing with known ground truth, verifying convergence, and debugging algorithm issues.
Board Point Generation
#![allow(unused)] fn main() { use vision_calibration::synthetic::planar; // 8×6 grid of 3D points at Z=0, spaced 40mm apart let board_points = planar::grid_points(8, 6, 0.04); // Returns Vec<Pt3>: [(0,0,0), (0.04,0,0), ..., (0.28,0.20,0)] }
Points are generated in the plane with the specified column count, row count, and spacing (in meters).
Pose Generation
#![allow(unused)] fn main() { // 6 camera poses with varying yaw and distance let poses = planar::poses_yaw_y_z( 6, // number of views -0.2, // start yaw (radians) 0.08, // yaw step 0.5, // start Z distance (meters) 0.05, // Z step ); // Returns Vec<Iso3>: camera-to-board transforms }
Poses are generated looking at the board center, with rotation around the Y axis (yaw) and varying distance along Z.
Projection
#![allow(unused)] fn main() { // Project board points through all poses let views = planar::project_views_all(&camera, &board_points, &poses)?; // Returns Vec<CorrespondenceView>: 2D-3D point pairs per view }
This applies the full camera model (projection + distortion + intrinsics) to generate pixel observations.
Adding Noise
For realistic testing, add Gaussian noise to the projected pixels:
#![allow(unused)] fn main() { use rand::Rng; let mut rng = rand::thread_rng(); let noise_sigma = 0.5; // pixels for view in &mut views { for pixel in view.points_2d.iter_mut() { pixel.x += rng.gen::<f64>() * noise_sigma; pixel.y += rng.gen::<f64>() * noise_sigma; } } }
Typical noise levels:
| Scenario | Noise sigma (px) |
|---|---|
| Ideal (testing convergence) | 0.0 |
| High-quality detector | 0.1 - 0.3 |
| Standard detector | 0.3 - 1.0 |
| Noisy conditions | 1.0 - 3.0 |
Deterministic Seeds
All examples and tests use fixed random seeds for reproducibility:
#![allow(unused)] fn main() { use rand::SeedableRng; let rng = rand::rngs::StdRng::seed_from_u64(42); }
Common Test Scenarios
Minimal (3 views, no noise)
For verifying algorithm correctness. Should converge to machine precision.
Moderate (6-10 views, 0.5 px noise)
For testing realistic convergence. Should achieve <1% intrinsics error after optimization.
Challenging (20 views, 1.0 px noise, outliers)
For testing robustness. Should achieve <2% intrinsics error with robust loss functions.
Distortion stress test
Use large distortion (, ) to verify the iterative intrinsics solver handles strong distortion.
Data Types Quick Reference
Core Math Types
All based on nalgebra with f64 precision:
| Type | Definition | Description |
|---|---|---|
Real | f64 | Scalar type |
Pt2 | Point2<f64> | 2D point |
Pt3 | Point3<f64> | 3D point |
Vec2 | Vector2<f64> | 2D vector |
Vec3 | Vector3<f64> | 3D vector |
Mat3 | Matrix3<f64> | 3×3 matrix |
Mat4 | Matrix4<f64> | 4×4 matrix |
Iso3 | Isometry3<f64> | SE(3) rigid transform |
Camera Model Types
| Type | Parameters | Description |
|---|---|---|
FxFyCxCySkew<S> | fx, fy, cx, cy, skew | Intrinsics matrix |
BrownConrady5<S> | k1, k2, k3, p1, p2, iters | Distortion model |
Pinhole | (none) | Projection model |
IdentitySensor | (none) | No sensor tilt |
ScheimpflugParams | tilt_x, tilt_y | Scheimpflug tilt |
Camera<S,P,D,Sm,K> | proj, dist, sensor, k | Composable camera |
Observation Types
| Type | Fields | Description |
|---|---|---|
CorrespondenceView | points_3d, points_2d, weights | 2D-3D correspondences |
View<Meta> | obs, meta | Observation + metadata |
PlanarDataset | views: Vec<View<NoMeta>> | Planar calibration input |
RigView<Meta> | obs: RigViewObs, meta | Multi-camera view |
RigDataset<Meta> | num_cameras, views | Multi-camera input |
ReprojectionStats | mean, rms, max, count | Error statistics |
Fix Masks
| Type | Fields | Description |
|---|---|---|
IntrinsicsFixMask | fx, fy, cx, cy (bool each) | Fix individual intrinsics |
DistortionFixMask | k1, k2, k3, p1, p2 (bool each) | Fix individual distortion params |
CameraFixMask | intrinsics, distortion | Combined camera fix mask |
FixedMask | opaque (all_free(), all_fixed(dim), fix_indices(&[usize])) | IR-level parameter fixing |
Optimization IR Types
| Type | Description |
|---|---|
ProblemIR | Complete optimization problem |
ParamBlock | Parameter group (id, name, dim, manifold, fixed) |
ResidualBlock | Residual definition (params, loss, factor) |
ParamId | Unique parameter identifier |
FactorKind | Residual computation type (enum) |
ManifoldKind | Parameter geometry (Euclidean, SE3, SO3, S2) |
RobustLoss | Loss function (None, Huber, Cauchy, Arctan) |
Session Types
| Type | Description |
|---|---|
CalibrationSession<P> | Generic session container |
SessionMetadata | Problem name, version, timestamps |
LogEntry | Audit log entry (timestamp, operation, success) |
ExportRecord<E> | Timestamped export |
InvalidationPolicy | What to clear on input/config change |
Backend Types
| Type | Description |
|---|---|
BackendSolveOptions | Solver settings (max_iters, tolerances, etc.) |
BackendSolution | Optimized params + solve report |
SolveReport | Initial/final cost, iterations, termination |
LinearSolverType | SparseCholesky or SparseQR |
Hand-Eye Types
| Type | Description |
|---|---|
HandEyeMode | EyeInHand or EyeToHand |
HandeyeMeta | base_se3_gripper: Iso3 |
RobotPoseMeta | base_se3_gripper: Iso3 |
References and Further Reading
[COLLAB] Please help curate and prioritize this reference list, and add any additional references.
Foundational Textbooks
-
Hartley, R.I. & Zisserman, A. (2004). Multiple View Geometry in Computer Vision. 2nd edition. Cambridge University Press. — The definitive reference for projective geometry, fundamental/essential matrices, homography, triangulation, and bundle adjustment.
-
Ma, Y., Soatto, S., Kosecka, J., & Sastry, S.S. (2004). An Invitation to 3-D Vision. Springer. — Rigorous treatment of Lie groups and geometric vision with emphasis on SE(3) and optimization on manifolds.
Camera Calibration
-
Zhang, Z. (2000). "A Flexible New Technique for Camera Calibration." IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11), 1330-1334. — Zhang's planar calibration method: intrinsics from homographies via the image of the absolute conic.
-
Brown, D.C. (1966). "Decentering Distortion of Lenses." Photometric Engineering, 32(3), 444-462. — The original Brown-Conrady distortion model with radial and tangential components.
Minimal Solvers
-
Nister, D. (2004). "An Efficient Solution to the Five-Point Relative Pose Problem." IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6), 756-770. — The 5-point essential matrix solver used in stereo and visual odometry.
-
Kneip, L., Scaramuzza, D., & Siegwart, R. (2011). "A Novel Parametrization of the Perspective-Three-Point Problem for a Direct Computation of Absolute Camera Position and Orientation." IEEE Conference on Computer Vision and Pattern Recognition (CVPR). — The P3P solver used for camera pose estimation.
Hand-Eye Calibration
- Tsai, R.Y. & Lenz, R.K. (1989). "A New Technique for Fully Autonomous and Efficient 3D Robotics Hand/Eye Calibration." IEEE Transactions on Robotics and Automation, 5(3), 345-358. — The Tsai-Lenz method for solving AX=XB.
Robust Estimation
-
Fischler, M.A. & Bolles, R.C. (1981). "Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography." Communications of the ACM, 24(6), 381-395. — The original RANSAC paper.
-
Hartley, R.I. (1997). "In Defense of the Eight-Point Algorithm." IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(6), 580-593. — Demonstrates that data normalization makes the 8-point algorithm competitive with iterative methods.
Non-Linear Optimization
-
Levenberg, K. (1944). "A Method for the Solution of Certain Non-Linear Problems in Least Squares." Quarterly of Applied Mathematics, 2(2), 164-168.
-
Marquardt, D.W. (1963). "An Algorithm for Least-Squares Estimation of Nonlinear Parameters." Journal of the Society for Industrial and Applied Mathematics, 11(2), 431-441.
-
Triggs, B., McLauchlan, P.F., Hartley, R.I., & Fitzgibbon, A.W. (2000). "Bundle Adjustment — A Modern Synthesis." International Workshop on Vision Algorithms. Springer. — Comprehensive survey of bundle adjustment techniques.
Lie Groups in Vision
- Sola, J., Deray, J., & Atchuthan, D. (2018). "A Micro Lie Theory for State Estimation in Robotics." arXiv:1812.01537. — Accessible introduction to Lie groups for robotics and vision, covering SO(3), SE(3), and their tangent spaces.