Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Part V: Subpixel refiners

A corner detector (ChESS or Radon) returns integer pixel positions plus a response value. Most downstream consumers — camera calibration, pose estimation, homography fitting — need positions to better than a pixel. A refiner takes one seed (x₀, y₀) plus a local view into either the image or the response map and returns a refined (x, y) in subpixel coordinates.

The library ships five refiners with one trait and one configuration enum, so swapping between them is a one-line change. The ChESS detector always runs one of these refiners to reach subpixel accuracy. The Radon detector has its own peak fit built in (see Part IV §4.4), but its output can still be post-refined if you need a different statistical behavior.

5.1 The refiner trait

#![allow(unused)]
fn main() {
pub trait CornerRefiner {
    /// Half-width of the patch the refiner reads around the seed.
    fn radius(&self) -> i32;

    /// Refine one seed. `ctx` exposes the image and/or response map.
    fn refine(&mut self, seed_xy: [f32; 2], ctx: RefineContext<'_>) -> RefineResult;
}
}

A RefineResult carries the refined (x, y), a refiner-specific score, and a RefineStatus:

  • Accepted — refined position is inside radius-sized support and below the refiner’s rejection thresholds.
  • OutOfBounds — the patch would read past the image border.
  • IllConditioned — the refiner’s local system was singular or too eccentric (edge rather than corner).
  • Rejected — the refined offset exceeded the refiner’s max_offset, or a per-refiner score threshold fired.

The user-facing selector is per-detector: ChessRefiner (carrying CenterOfMass, Forstner, SaddlePoint, and optionally Ml) lives inside ChessConfig; RadonRefiner (carrying RadonPeak and CenterOfMass) lives inside RadonConfig. Each enum variant carries its tuning struct as a payload, so a refiner kind change cannot leave a stale per-refiner config field behind. At runtime the facade stores an owned Refiner enum (one allocated scratch buffer per concrete refiner) so the same instance is reused across seeds — no per-corner allocation.

5.2 CenterOfMass

Operates on the ChESS response map. Computes the response-weighted centroid of a (2r + 1)² window around the seed:

x_r = Σ_{p in W}  (x_p · R_p)  /  Σ_p R_p
y_r = Σ_{p in W}  (y_p · R_p)  /  Σ_p R_p

R_p = max(R(x, y), 0) clips negative responses so one strong negative pixel can’t push the centroid outside the window.

PropertyValue
InputResponse map
Default radius2 (5×5 window)
Typical cost~20 ns per corner
StrengthsCheapest option in the benchmark; closed-form
WeaknessesCentroid bias when the response is asymmetric;
fails when radius crosses neighboring corners

Use when throughput matters more than sub-0.1 px accuracy, or when the ChESS response is the only signal you want the refinement to see.

5.3 Förstner

Gradient-based. Solves a weighted least-squares system for the point closest (in a Mahalanobis sense) to all image-gradient lines in a local window. The structure tensor

M = Σ_p  w_p · [ gx²  gx·gy ]
               [ gx·gy  gy²  ]

is assembled with 3×3 central-difference gradients and radial weights w_p = 1 / (1 + 0.5·‖p − seed‖²). The refined position is

u = M⁻¹ · Σ_p  w_p · (p − seed) · [gx·gy]ᵀ

Rejections:

  • trace(M) < min_trace — low-gradient region.
  • det(M) < min_det — singular structure tensor.
  • λ_max / λ_min > max_condition_number — one dominant direction (an edge, not a corner).
  • ‖u‖ > max_offset — extrapolating beyond a trusted neighborhood.
PropertyValue
InputImage
Default radius2 (5×5 gradient window + 1 px halo)
Typical cost~60 ns per corner
StrengthsPrincipled on sharp, high-SNR images
WeaknessesRelies on sharp gradients — blur flattens M

Good pick for clean calibration frames where image edges are sharp. In the synthetic blur sweep in Part VIII, Gaussian blur flattens the gradient magnitudes and this refiner’s error increases with σ.

Reference: Förstner & Gülch, 1987, “A fast operator for detection and precise location of distinct points, corners and centres of circular features.”

5.4 SaddlePoint

Fits a 2D quadratic f(x, y) = a·x² + b·x·y + c·y² + d·x + e·y + g to a (2r + 1)² image patch and returns the saddle point (the critical point where ∇f = 0). The six coefficients come from a 6×6 normal-equation solve with partial pivoting.

The determinant of the Hessian is 4·a·c − b². A true X‑junction is a saddle, so the Hessian should be indefinite (det < 0). The refiner rejects:

  • |det| < min_abs_det — flat patch.
  • det > -det_margin — the quadratic is a bowl or ridge, not a saddle.
  • ‖offset‖ > max_offset — refined point outside the patch.
PropertyValue
InputImage
Default radius2 (5×5 patch)
Typical cost~120 ns per corner
StrengthsNo gradient required; low error in the blur sweep
WeaknessesParabolic model is approximate on sharp edges

A reasonable default when you do not know in advance whether frames will be sharp or blurred. In Part VIII’s synthetic blur sweep it stays inside the same error band as the other non-Förstner geometric refiners.

5.5 RadonPeak

Per-candidate version of the Radon detector’s peak fit. Computes the local Radon response on a (2·patch_radius + 1)² grid around the seed (at working resolution set by image_upsample), applies the same 3×3 box blur and 3-point Gaussian peak fit used by the full detector, and returns the refined offset. See Part IV §4.4 for the underlying formulas — this refiner shares all of them.

PropertyValue
InputImage
Default settingsray_radius = 2, patch_radius = 3, image_upsample = 2
Typical cost~17 µs per corner
StrengthsLowest clean/blurred error in the benchmark
Weaknesses100–1000× slower than the structure-tensor refiners

This is the lowest-error option on the clean and blurred synthetic rows reported in Part VIII. Choose it when that extra accuracy matters more than the added per-corner cost.

If you are already running the Radon detector (Part IV), its built-in peak fit gives you the same refinement implicitly, and this refiner is redundant.

5.6 ML (ONNX model)

A learned refiner. Feeds a 21×21 normalized grayscale patch into a small CNN and takes [dx, dy, conf_logit] back out. The ChESS path extracts the patch, stages a batch, runs ONNX inference via tract-onnx, and adds [dx, dy] to each seed.

Available behind the ml-refiner feature. The default model chess_refiner_v4.onnx is embedded in the chess-corners-ml crate at crates/chess-corners-ml/assets/ml/.

5.6.1 Architecture

The shipped model is CornerRefinerNet, a CoordConv CNN with a flatten + MLP head. About 180 K parameters:

LayerShapeNotes
Input1 × 21 × 21Grayscale patch, normalized u8/255.
CoordConv prepend3 × 21 × 21Two extra channels with per-pixel x, y in [-1, 1].
Conv3×3, ReLU16 × 21 × 21
Conv3×3, ReLU16 × 21 × 21
Conv3×3 stride 2, ReLU32 × 11 × 11
Conv3×3, ReLU32 × 11 × 11
Conv3×3 stride 2, ReLU64 × 6 × 6
Flatten2304
Linear, ReLU64
Linear (no activation)3[dx, dy, conf_logit].

CoordConv (Liu et al., 2018) concatenates explicit x, y coordinate channels to the input. Standard convolutions are translation-equivariant and cannot reliably regress to an absolute pixel offset from a patch center; CoordConv restores the center reference that pure convolutions discard.

The head outputs three scalars: an offset (dx, dy) in patch-pixel units (valid range about [-0.6, 0.6] px, matching the training distribution) and a confidence logit. The current Rust consumer applies the offset and ignores the confidence.

The PyTorch source in tools/ml_refiner/model.py also defines a wider variant (CornerRefinerNetLarge, ~730 K params, GroupNorm between convs) and a spatial-softargmax head (CornerRefinerNetSoftArgmax). Both match the 1-channel in, 3-scalar out contract so they are drop-in replacements for the inference path. The shipped model is the small variant — the larger and softargmax variants did not move the held-out error meaningfully in our sweeps.

5.6.2 Training data and loss

The training pipeline lives in tools/ml_refiner/. The v4 dataset (configs/synth_v6.yaml) renders 200 000 patches with a 50/50 mix of two rendering modes:

  • Hard cells. An anti-aliased periodic checkerboard, rendered at a random cell size in [4, 12] px, then blurred by a Gaussian PSF in σ ∈ [0.3, 2.0] px. This matches real camera output: ink/paper step edges, softened by the optical system’s PSF, sampled by the sensor. The benchmark fixture in Part VIII uses the same renderer.
  • Smooth saddle. The tanh(x)·tanh(y) model from earlier dataset revisions. Included at 50 % weight so callers who still feed the model smooth synthetic patches — as older documentation suggested — keep working.

Augmentations per sample: additive Gaussian noise σ ∈ [0, 10] gray levels, photometric jitter (contrast, brightness, gamma), optional projective warp for perspective robustness, and 20 % negative samples (flat, edge, stripe, blob, pure noise, near-corner) with is_pos = 0.

The true subpixel offset is sampled from [-0.6, 0.6] px.

Loss: Huber on (dx, dy) for positives, binary cross-entropy on confidence for all samples. The regression loss is weighted up on positives only via is_pos (negatives have no valid target).

5.6.3 Why earlier versions failed

Historical context, for anyone wondering why v4 is the version that ships. Versions v1–v2 trained exclusively on smooth tanh(x)·tanh(y) corner patches. That patch model does not resemble what a camera produces — real sensor images have hard step edges (ink/paper) softened by a small optical PSF, whereas tanh has smooth transitions the width of the entire patch. Models trained on tanh only generalized poorly to the benchmark fixture (mean error ~0.5 px on hard cells).

v3 swapped to hard cells only and hit ~0.6 px on tanh inputs — the opposite distribution failure.

v4 is the mixed dataset (50/50) plus retuned offsets and augmentations. In the Part VIII synthetic benchmark it avoids the earlier tanh / hard-cell mismatch and has the lowest mean error on the heaviest noise row (σ = 10 gray levels). It does not beat RadonPeak on clean or mildly blurred data. The current evidence is that the training setup did not learn the hand-designed RadonPeak structure; neither a wider CNN nor a softargmax head closed that gap in the sweeps.

5.6.4 ONNX export and inference

The export step (tools/ml_refiner/export_onnx.py) writes an ONNX graph at opset 17 (falling back to 18 if a conversion is unsupported). The graph contract is:

  • Input patches: float32 [N, 1, 21, 21], u8 / 255 in [0, 1].
  • Output pred: float32 [N, 3] with [dx, dy, conf_logit].

Rust inference is chess_corners_ml::MlModel::infer_batch. It wraps tract-onnx, sizes the input to the runtime batch, and returns Vec<[f32; 3]>. Dynamic batch sizing is supported via a tract SymbolScope, so a single loaded model can handle variable-batch calls without re-optimization.

5.7 Picking a refiner

The measurement-driven comparison lives in Part VIII. In short:

  • Budget matters more than anything else: the structure-tensor refiners are 100–1000× faster than RadonPeak and ML.
  • RadonPeak gives the lowest error on the clean and blurred synthetic rows, at a much higher per-corner cost.
  • ML has the lowest mean error on the heaviest synthetic noise row.
  • SaddlePoint is a conservative default when you want image-patch refinement without the cost of RadonPeak or ML.

The refiner is selected through the strategy’s refiner field. For ChESS: DetectorConfig::chess().with_chess(|c| c.refiner = ChessRefiner::forstner()). For Radon: DetectorConfig::radon().with_radon(|r| r.refiner = RadonRefiner::radon_peak()). Switching is a single-line change, and the comparison numbers in Part VIII come from running all five on the same fixture at a single build.


Next, Part VI covers the orientation methods that produce the axes and sigma_theta* descriptor fields, shared by both the ChESS and Radon pipelines.