Hand-Eye Calibration (Tsai-Lenz)

Hand-eye calibration estimates the rigid transform between a camera and a robot gripper (or between a camera and a robot base). It is essential for any application where a camera is mounted on a robot arm and the robot needs to localize objects in its own coordinate frame.

The AX = XB Problem

Setup

Consider a camera rigidly mounted on a robot gripper (eye-in-hand configuration). The system involves four coordinate frames:

Base (B): The robot's fixed base frame
Gripper (G): The robot's end-effector frame (known from robot kinematics)
Camera (C): The camera frame (observations come from here)
Target (T): The calibration board frame (fixed in the world)

The unknown is $X = T_{G, C}$ (gripper-to-camera transform).

The Equation

Given two observations $(i, j)$ , we can compute:

Robot motion: $A_{ij} = T_{G_{j}, G_{i}} = T_{B, G_{j}}^{- 1} T_{B, G_{i}}$ (relative gripper motion, known from robot kinematics)
Camera motion: $B_{ij} = T_{C_{j}, C_{i}} = T_{T, C_{j}}^{- 1} T_{T, C_{i}}$ (relative camera motion, from calibration board observations)

Since the camera is rigidly attached to the gripper:

$A_{ij} X = X B_{ij}$

This is the classic $A X = XB$ equation. We need to find $X$ that satisfies this for all motion pairs.

Eye-to-Hand Variant

When the camera is fixed and the target is on the gripper (eye-to-hand), the equation becomes:

$A_{ij} X = X B_{ij}$

where now $A$ is the relative gripper motion and $B$ is the relative target-in-camera motion, and $X = T_{C, B}$ (camera-to-base transform).

The Tsai-Lenz Method

Overview

Tsai and Lenz (1989) decomposed $A X = XB$ into separate rotation and translation subproblems:

Rotation: Solve $R_{A} R_{X} = R_{X} R_{B}$ for $R_{X}$
Translation: Given $R_{X}$ , solve $(R_{A} - I) t_{X} = R_{X} t_{B} - t_{A}$ for $t_{X}$

Step 1: All-Pairs Motion Computation

From $M$ observations, construct all $(2 M)$ motion pairs. For each pair $(i, j)$ :

$A_{ij} = T_{B, G_{i}}^{- 1} \cdot T_{B, G_{j}}$ $B_{ij} = T_{T, C_{i}}^{- 1} \cdot T_{T, C_{j}}$

Step 2: Filtering

Discard pairs with insufficient rotation (degenerate for the rotation subproblem):

Reject pairs where $∥ angle (R_{A}) ∥ < θ_{m i n}$ (default: 10°)
Optionally reject pairs where rotation axes of $A$ and $B$ are near-parallel (ill-conditioned)

Rotation diversity is critical: If all robot motions are rotations around the same axis, the hand-eye rotation around that axis is undetermined. Use poses with diverse rotation axes (roll, pitch, and yaw).

Step 3: Rotation Estimation

The rotation constraint $R_{A} R_{X} = R_{X} R_{B}$ is solved using quaternion algebra.

Convert $R_{A}$ and $R_{B}$ to quaternions $q_{A}$ and $q_{B}$ . The constraint becomes:

$q_{A} \otimes q_{X} = q_{X} \otimes q_{B}$

Using the left and right quaternion multiplication matrices $L (q_{A})$ and $R (q_{B})$ :

$(L (q_{A}) - R (q_{B})) q_{X} = 0$

Stacking all $N$ motion pairs gives a $4 N \times 4$ system:

$M q_{X} = 0, M = L (q_{A_{1}}) - R (q_{B_{1}}) ⋮ L (q_{A_{N}}) - R (q_{B_{N}})$

Solve via SVD: $q_{X}$ is the right singular vector of $M$ corresponding to the smallest singular value. Normalize to unit quaternion.

Step 4: Translation Estimation

Given $R_{X}$ , the translation constraint for each motion pair is:

$(R_{A} - I) t_{X} = R_{X} t_{B} - t_{A}$

This is a $3 N \times 3$ overdetermined linear system $C t_{X} = d$ , solved via least squares:

$t_{X} = (C^{T} C)^{- 1} C^{T} d$

with optional ridge regularization for numerical stability.

Target-in-Base Estimation

After finding $X = T_{G, C}$ , the target pose in the base frame $T_{B, T}$ can be estimated. For each observation $i$ :

$T_{B, T} = T_{B, G_{i}} \cdot X \cdot T_{C_{i}, T}$

The estimates from different views are averaged (quaternion averaging for rotation, arithmetic mean for translation).

Alternatively, $T_{B, T}$ is included in the non-linear optimization.

Practical Requirements

Minimum 3 views with diverse rotations (in practice, 5-10 views recommended)
Rotation diversity: Motions should span multiple rotation axes. Pure translations provide no rotation constraint. Pure Z-axis rotations leave the Z-component of the hand-eye rotation undetermined.
Accuracy: The linear Tsai-Lenz method typically gives 5-20% translation accuracy and 2-10° rotation accuracy. This initializes the non-linear joint optimization.

API

#![allow(unused)]
fn main() {
let X = estimate_handeye_dlt(
    &base_se3_gripper,      // &[Iso3]: robot poses (base to gripper)
    &target_se3_camera,     // &[Iso3]: camera-to-target poses (inverted)
    min_angle_deg,          // f64: minimum rotation angle filter
)?;
}

Returns $X$ as an Iso3 transform.

OpenCV equivalence: cv::calibrateHandEye with method CALIB_HAND_EYE_TSAI.

References

Tsai, R.Y. & Lenz, R.K. (1989). "A New Technique for Fully Autonomous and Efficient 3D Robotics Hand/Eye Calibration." IEEE Transactions on Robotics and Automation, 5(3), 345-358.

vision-calibration Book