Part V: Subpixel refiners
A corner detector (ChESS or Radon) returns integer pixel positions
plus a response value. Most downstream consumers — camera calibration,
pose estimation, homography fitting — need positions to better than a
pixel. A refiner takes one seed (x₀, y₀) plus a local view into
either the image or the response map and returns a refined
(x, y) in subpixel coordinates.
The library ships five refiners with one trait and one configuration enum, so swapping between them is a one-line change. The ChESS detector always runs one of these refiners to reach subpixel accuracy. The Radon detector has its own peak fit built in (see Part IV §4.4), but its output can still be post-refined if you need a different statistical behavior.
5.1 The refiner trait
#![allow(unused)]
fn main() {
pub trait CornerRefiner {
/// Half-width of the patch the refiner reads around the seed.
fn radius(&self) -> i32;
/// Refine one seed. `ctx` exposes the image and/or response map.
fn refine(&mut self, seed_xy: [f32; 2], ctx: RefineContext<'_>) -> RefineResult;
}
}
A RefineResult carries the refined (x, y), a refiner-specific
score, and a RefineStatus:
Accepted— refined position is insideradius-sized support and below the refiner’s rejection thresholds.OutOfBounds— the patch would read past the image border.IllConditioned— the refiner’s local system was singular or too eccentric (edge rather than corner).Rejected— the refined offset exceeded the refiner’smax_offset, or a per-refiner score threshold fired.
The user-facing selector is per-detector: ChessRefiner (carrying
CenterOfMass, Forstner, SaddlePoint, and optionally Ml) lives
inside ChessConfig; RadonRefiner (carrying RadonPeak and
CenterOfMass) lives inside RadonConfig. Each enum variant carries
its tuning struct as a payload, so a refiner kind change cannot leave
a stale per-refiner config field behind. At runtime the facade stores
an owned Refiner enum (one allocated scratch buffer per concrete
refiner) so the same instance is reused across seeds — no per-corner
allocation.
5.2 CenterOfMass
Operates on the ChESS response map. Computes the response-weighted
centroid of a (2r + 1)² window around the seed:
x_r = Σ_{p in W} (x_p · R_p) / Σ_p R_p
y_r = Σ_{p in W} (y_p · R_p) / Σ_p R_p
R_p = max(R(x, y), 0) clips negative responses so one strong
negative pixel can’t push the centroid outside the window.
| Property | Value |
|---|---|
| Input | Response map |
Default radius | 2 (5×5 window) |
| Typical cost | ~20 ns per corner |
| Strengths | Cheapest option in the benchmark; closed-form |
| Weaknesses | Centroid bias when the response is asymmetric; |
fails when radius crosses neighboring corners |
Use when throughput matters more than sub-0.1 px accuracy, or when the ChESS response is the only signal you want the refinement to see.
5.3 Förstner
Gradient-based. Solves a weighted least-squares system for the point closest (in a Mahalanobis sense) to all image-gradient lines in a local window. The structure tensor
M = Σ_p w_p · [ gx² gx·gy ]
[ gx·gy gy² ]
is assembled with 3×3 central-difference gradients and radial
weights w_p = 1 / (1 + 0.5·‖p − seed‖²). The refined position is
u = M⁻¹ · Σ_p w_p · (p − seed) · [gx·gy]ᵀ
Rejections:
trace(M) < min_trace— low-gradient region.det(M) < min_det— singular structure tensor.λ_max / λ_min > max_condition_number— one dominant direction (an edge, not a corner).‖u‖ > max_offset— extrapolating beyond a trusted neighborhood.
| Property | Value |
|---|---|
| Input | Image |
Default radius | 2 (5×5 gradient window + 1 px halo) |
| Typical cost | ~60 ns per corner |
| Strengths | Principled on sharp, high-SNR images |
| Weaknesses | Relies on sharp gradients — blur flattens M |
Good pick for clean calibration frames where image edges are sharp. In the synthetic blur sweep in Part VIII, Gaussian blur flattens the gradient magnitudes and this refiner’s error increases with σ.
Reference: Förstner & Gülch, 1987, “A fast operator for detection and precise location of distinct points, corners and centres of circular features.”
5.4 SaddlePoint
Fits a 2D quadratic f(x, y) = a·x² + b·x·y + c·y² + d·x + e·y + g
to a (2r + 1)² image patch and returns the saddle point (the
critical point where ∇f = 0). The six coefficients come from a
6×6 normal-equation solve with partial pivoting.
The determinant of the Hessian is 4·a·c − b². A true X‑junction is a
saddle, so the Hessian should be indefinite (det < 0). The refiner
rejects:
|det| < min_abs_det— flat patch.det > -det_margin— the quadratic is a bowl or ridge, not a saddle.‖offset‖ > max_offset— refined point outside the patch.
| Property | Value |
|---|---|
| Input | Image |
Default radius | 2 (5×5 patch) |
| Typical cost | ~120 ns per corner |
| Strengths | No gradient required; low error in the blur sweep |
| Weaknesses | Parabolic model is approximate on sharp edges |
A reasonable default when you do not know in advance whether frames will be sharp or blurred. In Part VIII’s synthetic blur sweep it stays inside the same error band as the other non-Förstner geometric refiners.
5.5 RadonPeak
Per-candidate version of the Radon detector’s peak fit. Computes the
local Radon response on a (2·patch_radius + 1)² grid around the
seed (at working resolution set by image_upsample), applies the
same 3×3 box blur and 3-point Gaussian peak fit used by the full
detector, and returns the refined offset. See
Part IV §4.4 for
the underlying formulas — this refiner shares all of them.
| Property | Value |
|---|---|
| Input | Image |
| Default settings | ray_radius = 2, patch_radius = 3, image_upsample = 2 |
| Typical cost | ~17 µs per corner |
| Strengths | Lowest clean/blurred error in the benchmark |
| Weaknesses | 100–1000× slower than the structure-tensor refiners |
This is the lowest-error option on the clean and blurred synthetic rows reported in Part VIII. Choose it when that extra accuracy matters more than the added per-corner cost.
If you are already running the Radon detector (Part IV), its built-in peak fit gives you the same refinement implicitly, and this refiner is redundant.
5.6 ML (ONNX model)
A learned refiner. Feeds a 21×21 normalized grayscale patch into a
small CNN and takes [dx, dy, conf_logit] back out. The ChESS path
extracts the patch, stages a batch, runs ONNX inference via
tract-onnx, and adds [dx, dy] to each seed.
Available behind the ml-refiner feature. The default model
chess_refiner_v4.onnx is embedded in the
chess-corners-ml crate at
crates/chess-corners-ml/assets/ml/.
5.6.1 Architecture
The shipped model is CornerRefinerNet, a CoordConv CNN with a
flatten + MLP head. About 180 K parameters:
| Layer | Shape | Notes |
|---|---|---|
| Input | 1 × 21 × 21 | Grayscale patch, normalized u8/255. |
| CoordConv prepend | 3 × 21 × 21 | Two extra channels with per-pixel x, y in [-1, 1]. |
| Conv3×3, ReLU | 16 × 21 × 21 | |
| Conv3×3, ReLU | 16 × 21 × 21 | |
| Conv3×3 stride 2, ReLU | 32 × 11 × 11 | |
| Conv3×3, ReLU | 32 × 11 × 11 | |
| Conv3×3 stride 2, ReLU | 64 × 6 × 6 | |
| Flatten | 2304 | |
| Linear, ReLU | 64 | |
| Linear (no activation) | 3 | [dx, dy, conf_logit]. |
CoordConv (Liu et al., 2018) concatenates explicit x, y coordinate
channels to the input. Standard convolutions are translation-equivariant
and cannot reliably regress to an absolute pixel offset from a patch
center; CoordConv restores the center reference that pure convolutions
discard.
The head outputs three scalars: an offset (dx, dy) in patch-pixel
units (valid range about [-0.6, 0.6] px, matching the training
distribution) and a confidence logit. The current Rust consumer
applies the offset and ignores the confidence.
The PyTorch source in tools/ml_refiner/model.py also defines a
wider variant (CornerRefinerNetLarge, ~730 K params, GroupNorm
between convs) and a spatial-softargmax head
(CornerRefinerNetSoftArgmax). Both match the 1-channel in, 3-scalar
out contract so they are drop-in replacements for the inference
path. The shipped model is the small variant — the larger and
softargmax variants did not move the held-out error meaningfully in
our sweeps.
5.6.2 Training data and loss
The training pipeline lives in tools/ml_refiner/. The v4 dataset
(configs/synth_v6.yaml) renders 200 000 patches with a 50/50 mix of
two rendering modes:
- Hard cells. An anti-aliased periodic checkerboard, rendered at
a random cell size in
[4, 12]px, then blurred by a Gaussian PSF inσ ∈ [0.3, 2.0]px. This matches real camera output: ink/paper step edges, softened by the optical system’s PSF, sampled by the sensor. The benchmark fixture in Part VIII uses the same renderer. - Smooth saddle. The
tanh(x)·tanh(y)model from earlier dataset revisions. Included at 50 % weight so callers who still feed the model smooth synthetic patches — as older documentation suggested — keep working.
Augmentations per sample: additive Gaussian noise σ ∈ [0, 10] gray
levels, photometric jitter (contrast, brightness, gamma), optional
projective warp for perspective robustness, and 20 % negative
samples (flat, edge, stripe, blob, pure noise, near-corner) with
is_pos = 0.
The true subpixel offset is sampled from [-0.6, 0.6] px.
Loss: Huber on (dx, dy) for positives, binary cross-entropy on
confidence for all samples. The regression loss is weighted up on
positives only via is_pos (negatives have no valid target).
5.6.3 Why earlier versions failed
Historical context, for anyone wondering why v4 is the version that
ships. Versions v1–v2 trained exclusively on smooth tanh(x)·tanh(y)
corner patches. That patch model does not resemble what a camera
produces — real sensor images have hard step edges (ink/paper)
softened by a small optical PSF, whereas tanh has smooth transitions
the width of the entire patch. Models trained on tanh only generalized
poorly to the benchmark fixture (mean error ~0.5 px on hard cells).
v3 swapped to hard cells only and hit ~0.6 px on tanh inputs — the
opposite distribution failure.
v4 is the mixed dataset (50/50) plus retuned offsets and augmentations.
In the Part VIII synthetic benchmark it avoids the earlier tanh /
hard-cell mismatch and has the lowest mean error on the heaviest noise
row (σ = 10 gray levels). It does not beat RadonPeak on clean or
mildly blurred data. The current evidence is that the training setup did
not learn the hand-designed RadonPeak structure; neither a wider CNN nor
a softargmax head closed that gap in the sweeps.
5.6.4 ONNX export and inference
The export step (tools/ml_refiner/export_onnx.py) writes an ONNX
graph at opset 17 (falling back to 18 if a conversion is unsupported).
The graph contract is:
- Input
patches:float32 [N, 1, 21, 21],u8 / 255in[0, 1]. - Output
pred:float32 [N, 3]with[dx, dy, conf_logit].
Rust inference is chess_corners_ml::MlModel::infer_batch. It wraps
tract-onnx, sizes the input to the runtime batch, and returns
Vec<[f32; 3]>. Dynamic batch sizing is supported via a tract
SymbolScope, so a single loaded model can handle variable-batch
calls without re-optimization.
5.7 Picking a refiner
The measurement-driven comparison lives in Part VIII. In short:
- Budget matters more than anything else: the structure-tensor refiners are 100–1000× faster than RadonPeak and ML.
- RadonPeak gives the lowest error on the clean and blurred synthetic rows, at a much higher per-corner cost.
- ML has the lowest mean error on the heaviest synthetic noise row.
- SaddlePoint is a conservative default when you want image-patch refinement without the cost of RadonPeak or ML.
The refiner is selected through the strategy’s refiner field.
For ChESS: DetectorConfig::chess().with_chess(|c| c.refiner = ChessRefiner::forstner()).
For Radon: DetectorConfig::radon().with_radon(|r| r.refiner = RadonRefiner::radon_peak()).
Switching is a single-line change, and the comparison numbers
in Part VIII come from running all five on the same fixture at a
single build.
Next, Part VI covers the orientation
methods that produce the axes and sigma_theta* descriptor fields,
shared by both the ChESS and Radon pipelines.