Perspective-n-Point Solvers
The Perspective-n-Point (PnP) problem estimates the camera pose from known 3D-2D point correspondences. Unlike homography-based pose estimation, PnP does not require coplanar points.
Problem Statement
Given: 3D world points and their corresponding 2D image points , plus camera intrinsics .
Find: Camera pose such that .
Assumptions:
- Camera intrinsics are known
- Correspondences are correct (or RANSAC is used)
- Points are not degenerate (e.g., not all collinear)
P3P: Kneip's Minimal Solver
The P3P solver uses exactly 3 correspondences — the minimum for a finite number of solutions. It returns up to 4 candidate poses.
Algorithm
Input: 3 world points , 3 pixel points , intrinsics .
- Bearing vectors: Convert pixels to unit bearing vectors in the camera frame:
- Inter-point distances in world frame:
- Bearing vector cosines:
-
Quartic polynomial: Using the ratios and , Kneip derives a quartic polynomial in a distance ratio . The coefficients are functions of , , , , .
-
Solve quartic for up to 4 real roots .
-
For each root:
- Compute the second distance ratio
- Compute the three camera-frame distances (depths of the three points)
- Back-project to 3D points in camera frame:
- Recover pose from the 3D-3D correspondence using SVD-based rigid alignment
Disambiguation
P3P returns up to 4 poses. To select the correct one, use a fourth point (or more points with RANSAC) and pick the pose with the smallest reprojection error.
DLT PnP: Linear Solver for
The DLT (Direct Linear Transform) PnP uses an overdetermined system for points.
Derivation
The projection equation in normalized coordinates is:
where are normalized coordinates (after applying to pixels) and are rows of .
Cross-multiplying gives two equations per point:
The 12 unknowns are the entries of the matrix .
The Linear System
For each point :
Stacking gives matrix . Solve via SVD.
Post-Processing
- Reshape the 12-vector into a matrix
- Normalize the scale using the row norms of the block
- Project the rotation block onto SO(3) via SVD (same as in Pose from Homography)
- Extract translation from the fourth column
Hartley Normalization
The 3D world points are normalized before building the system (center at origin, scale mean distance to ). The image points are normalized via . The result is denormalized after solving.
RANSAC Wrappers
Both solvers have RANSAC variants for handling outliers:
#![allow(unused)] fn main() { // DLT PnP + RANSAC let (pose, inliers) = dlt_ransac( &world_pts, &image_pts, &K, &RansacOptions { thresh: 5.0, ..Default::default() } )?; }
The DLT PnP solver uses MIN_SAMPLES = 6 with RANSAC. P3P (p3p()) returns up to 4 candidates and is intended for manual disambiguation (e.g., using a 4th point), not as a RANSAC estimator.
Comparison
| Solver | Min. points | Solutions | Strengths |
|---|---|---|---|
| P3P | 3 | Up to 4 | Best for RANSAC (minimal sample) |
| DLT PnP | 6 | 1 | Simple, no polynomial solving |
OpenCV equivalence:
cv::solvePnPwithSOLVEPNP_P3PorSOLVEPNP_DLS;cv::solvePnPRansacfor robust estimation.
References
- Kneip, L., Scaramuzza, D., & Siegwart, R. (2011). "A Novel Parametrization of the Perspective-Three-Point Problem for a Direct Computation of Absolute Camera Position and Orientation." CVPR.