Non-Linear Least Squares Overview

Non-linear least squares (NLLS) is the mathematical framework underlying all refinement in calibration-rs. After linear initialization provides approximate parameter estimates, NLLS minimizes the reprojection error to achieve sub-pixel accuracy.

Problem Formulation

Objective: Minimize the sum of squared residuals:

$θ min F (θ) = \frac{1}{2} i = 1 \sum N ∥ r_{i} (θ) ∥^{2} = \frac{1}{2} r (θ)^{T} r (θ)$

where $θ \in R^{n}$ is the parameter vector and $r (θ) \in R^{m}$ is the stacked residual vector.

In camera calibration, a typical residual is the reprojection error: the difference between an observed pixel and the predicted projection:

$r_{i} = π (K, d, T_{v}, P_{j}) - p_{v j}$

where $π$ is the camera projection function, and the parameters $θ$ include intrinsics $K$ , distortion $d$ , and poses ${T_{v}}$ .

Gauss-Newton Method

The Gauss-Newton method exploits the least-squares structure. At the current estimate $θ$ , linearize the residuals:

$r (θ + δ) \approx r (θ) + J δ$

where $J = \frac{\partial r}{\partial θ} \in R^{m \times n}$ is the Jacobian.

Substituting into the objective and minimizing with respect to $δ$ :

$\frac{\partial}{\partial δ} \frac{1}{2} ∥ r + J δ ∥^{2} = 0$

gives the normal equations:

$J^{T} J δ = - J^{T} r$

The matrix $H_{GN} = J^{T} J$ is the Gauss-Newton approximation to the Hessian ( $H_{GN} \approx \nabla^{2} F$ when residuals are small).

Update: $θ \leftarrow θ + δ$

Levenberg-Marquardt Method

Gauss-Newton can diverge when the linearization is poor (far from the minimum) or when $J^{T} J$ is singular. Levenberg-Marquardt (LM) adds a damping term:

$(J^{T} J + λ diag (J^{T} J)) δ = - J^{T} r$

The damping parameter $λ > 0$ interpolates between:

$λ \to 0$ : Pure Gauss-Newton (fast convergence near minimum)
$λ \to \infty$ : Gradient descent with small step (safe far from minimum)

Trust Region Interpretation

LM is a trust region method: $λ$ controls the size of the region where the linear approximation is trusted.

If the update reduces the cost: accept the step, decrease $λ$ (expand trust region)
If the update increases the cost: reject the step, increase $λ$ (shrink trust region)

Convergence Criteria

LM terminates when any of:

Cost threshold: $F (θ) < ϵ_{min}$
Absolute decrease: $∣ F_{k} - F_{k + 1} ∣ < ϵ_{abs}$
Relative decrease: $∣ F_{k} - F_{k + 1} ∣/ F_{k} < ϵ_{rel}$
Maximum iterations: $k > k_{m a x}$
Parameter change: $∥ δ ∥ < ϵ_{param}$

Sparsity

In bundle adjustment problems, the Jacobian $J$ is sparse: each residual depends on only a few parameter blocks (one camera intrinsics, one distortion, one pose). The $J^{T} J$ matrix has a block-arrow structure that can be exploited by sparse linear solvers:

Sparse Cholesky: Efficient for well-structured problems
Sparse QR: More robust when the normal equations are ill-conditioned

calibration-rs uses sparse linear solvers through the tiny-solver backend.

Cost Function vs. Reprojection Error

The optimizer minimizes the cost $F = \frac{1}{2} \sum r_{i}^{2}$ . The commonly reported mean reprojection error is:

$\overset{e}{ˉ} = \frac{1}{N} i = 1 \sum N ∥ r_{i} ∥$

These are related but not identical: the cost weights large errors quadratically, while the mean error weighs them linearly. calibration-rs reports both: the cost from the solver and the mean reprojection error computed post-optimization.

With Robust Losses

When a robust loss $ρ$ is used, the objective becomes:

$θ min i = 1 \sum N ρ (∥ r_{i} ∥)$

The normal equations are modified to incorporate the loss function's weight:

$J^{T} W J δ = - J^{T} W r$

where $W = diag (ρ^{'} (∥ r_{i} ∥) /∥ r_{i} ∥)$ is the iteratively reweighted diagonal matrix. See Robust Loss Functions for details.

vision-calibration Book