Perspective-n-Point Solvers

The Perspective-n-Point (PnP) problem estimates the camera pose from $n$ known 3D-2D point correspondences. Unlike homography-based pose estimation, PnP does not require coplanar points.

Problem Statement

Given: $n$ 3D world points ${P_{i}}$ and their corresponding 2D image points ${p_{i}}$ , plus camera intrinsics $K$ .

Find: Camera pose $T_{C, W} = [R ∣ t] \in SE (3)$ such that $p_{i} \sim K (R P_{i} + t)$ .

Assumptions:

Camera intrinsics $K$ are known
Correspondences are correct (or RANSAC is used)
Points are not degenerate (e.g., not all collinear)

P3P: Kneip's Minimal Solver

The P3P solver uses exactly 3 correspondences — the minimum for a finite number of solutions. It returns up to 4 candidate poses.

Algorithm

Input: 3 world points ${P_{0}, P_{1}, P_{2}}$ , 3 pixel points ${p_{0}, p_{1}, p_{2}}$ , intrinsics $K$ .

Bearing vectors: Convert pixels to unit bearing vectors in the camera frame:

$\hat{b}_{i} = \frac{K ^{- 1} [ p _{i} , 1 ] ^{T}}{∥ K ^{- 1} [ p _{i} , 1 ] ^{T} ∥}$

Inter-point distances in world frame:

$a = ∥ P_{1} - P_{2} ∥, b = ∥ P_{0} - P_{2} ∥, c = ∥ P_{0} - P_{1} ∥$

Bearing vector cosines:

$cos α = \hat{b}_{1} \cdot \hat{b}_{2}, cos β = \hat{b}_{0} \cdot \hat{b}_{2}, cos γ = \hat{b}_{0} \cdot \hat{b}_{1}$

Quartic polynomial: Using the ratios $d = (b^{2} - a^{2}) / c^{2}$ and $e = b^{2} / c^{2}$ , Kneip derives a quartic polynomial in a distance ratio $u$ . The coefficients are functions of $d$ , $e$ , $cos α$ , $cos β$ , $cos γ$ .
Solve quartic for up to 4 real roots $u_{k}$ .
For each root:
- Compute the second distance ratio $v$
- Compute the three camera-frame distances $x, y, z$ (depths of the three points)
- Back-project to 3D points in camera frame: $Q_{i} = x_{i} \hat{b}_{i}$
- Recover pose from the 3D-3D correspondence ${P_{i}} \leftrightarrow {Q_{i}}$ using SVD-based rigid alignment

Disambiguation

P3P returns up to 4 poses. To select the correct one, use a fourth point (or more points with RANSAC) and pick the pose with the smallest reprojection error.

DLT PnP: Linear Solver for $n \geq 6$

The DLT (Direct Linear Transform) PnP uses an overdetermined system for $n \geq 6$ points.

Derivation

The projection equation in normalized coordinates is:

$[u v] = \frac{1}{[ r _{3}^{T} t _{z} ] [ P , 1 ] ^{T}} [[r_{1}^{T} t_{x}] [P, 1]^{T} [r_{2}^{T} t_{y}] [P, 1]^{T}]$

where $[u, v]$ are normalized coordinates (after applying $K^{- 1}$ to pixels) and $r_{i}^{T}$ are rows of $R$ .

Cross-multiplying gives two equations per point:

$u (r_{3}^{T} P + t_{z}) - (r_{1}^{T} P + t_{x}) = 0$ $v (r_{3}^{T} P + t_{z}) - (r_{2}^{T} P + t_{y}) = 0$

The 12 unknowns are the entries of the $3 \times 4$ matrix $[R ∣ t]$ .

The Linear System

For each point $(P_{i}, u_{i}, v_{i})$ :

$[X 0 Y 0 Z 0 10 0 X 0 Y 0 Z 01 - u X - v X - u Y - v Y - u Z - v Z - u - v]$

Stacking gives $2 n \times 12$ matrix $A$ . Solve $A p = 0$ via SVD.

Post-Processing

Reshape the 12-vector into a $3 \times 4$ matrix
Normalize the scale using the row norms of the $3 \times 3$ block
Project the rotation block onto SO(3) via SVD (same as in Pose from Homography)
Extract translation from the fourth column

The 3D world points are normalized before building the system (center at origin, scale mean distance to $3$ ). The image points are normalized via $K^{- 1}$ . The result is denormalized after solving.

RANSAC Wrappers

Both solvers have RANSAC variants for handling outliers:

#![allow(unused)]
fn main() {
// DLT PnP + RANSAC
let (pose, inliers) = dlt_ransac(
    &world_pts, &image_pts, &K,
    &RansacOptions { thresh: 5.0, ..Default::default() }
)?;
}

The DLT PnP solver uses MIN_SAMPLES = 6 with RANSAC. P3P (p3p()) returns up to 4 candidates and is intended for manual disambiguation (e.g., using a 4th point), not as a RANSAC estimator.

Comparison

Solver	Min. points	Solutions	Strengths
P3P	3	Up to 4	Best for RANSAC (minimal sample)
DLT PnP	6	1	Simple, no polynomial solving

OpenCV equivalence: cv::solvePnP with SOLVEPNP_P3P or SOLVEPNP_DLS; cv::solvePnPRansac for robust estimation.

References

Kneip, L., Scaramuzza, D., & Siegwart, R. (2011). "A Novel Parametrization of the Perspective-Three-Point Problem for a Direct Computation of Absolute Camera Position and Orientation." CVPR.

vision-calibration Book