Camera Matrix and RQ Decomposition
The camera matrix (or projection matrix) is a matrix that directly maps 3D world points to 2D pixel coordinates. Estimating via DLT and decomposing it into intrinsics and extrinsics via RQ decomposition provides an alternative initialization path.
Camera Matrix DLT
Problem Statement
Given: correspondences between 3D world points and 2D pixel points .
Find: projection matrix such that .
Derivation
The projection (with in homogeneous coordinates) gives, after cross-multiplication:
where is the -th row of .
This gives a system , solved via SVD with Hartley normalization of the 3D and 2D points.
Post-Processing
After denormalization, is but not guaranteed to decompose cleanly into due to noise. The RQ decomposition extracts the components.
RQ Decomposition
Problem Statement
Given: The left submatrix of (where ).
Find: Upper-triangular and orthogonal such that .
Algorithm
RQ decomposition is computed by transposing QR decomposition:
- Compute QR decomposition of :
- Then , where is lower-triangular and is orthogonal
- Apply a permutation matrix to flip the matrix to upper-triangular form:
- (upper-triangular)
- (orthogonal)
Sign Conventions
After decomposition, ensure:
- has positive diagonal entries: if , negate column of and row of
- : if , negate a column of (and the corresponding column of )
Translation Extraction
Full Decomposition
The CameraMatrixDecomposition struct:
#![allow(unused)] fn main() { pub struct CameraMatrixDecomposition { pub k: Mat3, // Upper-triangular intrinsics pub r: Mat3, // Rotation matrix (orthonormal, det = +1) pub t: Vec3, // Translation vector } }
API
#![allow(unused)] fn main() { // Estimate the full 3×4 camera matrix let P = dlt_camera_matrix(&world_pts, &image_pts)?; // Decompose into K, R, t let decomp = decompose_camera_matrix(&P)?; println!("Intrinsics: {:?}", decomp.k); println!("Rotation: {:?}", decomp.r); println!("Translation: {:?}", decomp.t); // Or just RQ decompose any 3×3 matrix let (K, R) = rq_decompose(&M); }
When to Use
Camera matrix DLT is useful when:
- You have non-coplanar 3D-2D correspondences and want to estimate both intrinsics and pose simultaneously
- You need a quick estimate of from a single view (without multiple homographies)
For calibration with a planar board, Zhang's method is preferred because it uses the planar constraint to get more constraints per view.