Perspective-n-Point Solvers

The Perspective-n-Point (PnP) problem estimates the camera pose from known 3D-2D point correspondences. Unlike homography-based pose estimation, PnP does not require coplanar points.

Problem Statement

Given: 3D world points and their corresponding 2D image points , plus camera intrinsics .

Find: Camera pose such that .

Assumptions:

  • Camera intrinsics are known
  • Correspondences are correct (or RANSAC is used)
  • Points are not degenerate (e.g., not all collinear)

P3P: Kneip's Minimal Solver

The P3P solver uses exactly 3 correspondences — the minimum for a finite number of solutions. It returns up to 4 candidate poses.

Algorithm

Input: 3 world points , 3 pixel points , intrinsics .

  1. Bearing vectors: Convert pixels to unit bearing vectors in the camera frame:

  1. Inter-point distances in world frame:

  1. Bearing vector cosines:

  1. Quartic polynomial: Using the ratios and , Kneip derives a quartic polynomial in a distance ratio . The coefficients are functions of , , , , .

  2. Solve quartic for up to 4 real roots .

  3. For each root:

    • Compute the second distance ratio
    • Compute the three camera-frame distances (depths of the three points)
    • Back-project to 3D points in camera frame:
    • Recover pose from the 3D-3D correspondence using SVD-based rigid alignment

Disambiguation

P3P returns up to 4 poses. To select the correct one, use a fourth point (or more points with RANSAC) and pick the pose with the smallest reprojection error.

DLT PnP: Linear Solver for

The DLT (Direct Linear Transform) PnP uses an overdetermined system for points.

Derivation

The projection equation in normalized coordinates is:

where are normalized coordinates (after applying to pixels) and are rows of .

Cross-multiplying gives two equations per point:

The 12 unknowns are the entries of the matrix .

The Linear System

For each point :

Stacking gives matrix . Solve via SVD.

Post-Processing

  1. Reshape the 12-vector into a matrix
  2. Normalize the scale using the row norms of the block
  3. Project the rotation block onto SO(3) via SVD (same as in Pose from Homography)
  4. Extract translation from the fourth column

Hartley Normalization

The 3D world points are normalized before building the system (center at origin, scale mean distance to ). The image points are normalized via . The result is denormalized after solving.

RANSAC Wrappers

Both solvers have RANSAC variants for handling outliers:

#![allow(unused)]
fn main() {
// DLT PnP + RANSAC
let (pose, inliers) = dlt_ransac(
    &world_pts, &image_pts, &K,
    &RansacOptions { thresh: 5.0, ..Default::default() }
)?;
}

The DLT PnP solver uses MIN_SAMPLES = 6 with RANSAC. P3P (p3p()) returns up to 4 candidates and is intended for manual disambiguation (e.g., using a 4th point), not as a RANSAC estimator.

Comparison

SolverMin. pointsSolutionsStrengths
P3P3Up to 4Best for RANSAC (minimal sample)
DLT PnP61Simple, no polynomial solving

OpenCV equivalence: cv::solvePnP with SOLVEPNP_P3P or SOLVEPNP_DLS; cv::solvePnPRansac for robust estimation.

References

  • Kneip, L., Scaramuzza, D., & Siegwart, R. (2011). "A Novel Parametrization of the Perspective-Three-Point Problem for a Direct Computation of Absolute Camera Position and Orientation." CVPR.