MWPM vs Union-Find: Choosing Your Decoder

A practical comparison of Minimum Weight Perfect Matching and Union-Find for surface-code syndrome decoding. When does the latency advantage of Union-Find outweigh the accuracy advantage of MWPM?

MWPM and Union-Find decoder comparison chart

Two decoder algorithms dominate practical surface-code implementations: Minimum Weight Perfect Matching (MWPM) and Union-Find. Both take the same input — a syndrome array — and return the same type of output — a Pauli correction. But they make a different tradeoff between decode latency and logical error rate, and the right choice depends on your hardware's coherence time, your code distance, and the acceptable latency budget for your syndrome cycle.

This article works through both algorithms mechanically, states the performance tradeoff in concrete numbers, and gives practical guidance on when each makes sense.

MWPM: syndrome decoding as a graph matching problem

Minimum Weight Perfect Matching frames syndrome decoding as a combinatorial optimization problem. Start by constructing the syndrome graph: each violated stabilizer becomes a vertex, and pairs of vertices are connected by edges whose weights represent the log-likelihood of the corresponding error configuration. Lower weight = higher likelihood.

In the simplest case (uniform depolarizing noise), edge weights are proportional to the Manhattan distance between stabilizer positions — a two-qubit error chain is less likely than a one-qubit chain. With per-device noise calibration, edge weights are derived from measured gate fidelities and T1/T2 times, so qubit Q₃ with 2× higher relaxation rate than Q₇ will produce higher-weight edges in its neighborhood.

The decoder finds the minimum-weight perfect matching — the pairing of all violated stabilizers into pairs that minimizes total edge weight. This matching tells you which error configuration is most likely, and the correction operator is the Pauli chain implied by each matched pair.

The algorithm most commonly used for MWPM in QEC is Blossom V, a generalization of Edmond's algorithm for non-bipartite graphs. Worst-case complexity is O(n³) where n is the number of syndrome events (violated stabilizers). At code distance 5 with 24 syndrome positions, this is fast. At distance 11 with 120 syndrome positions and typical error rates producing 3–8 simultaneously violated stabilizers per round, the combinatorial cost begins to matter for real-time decoding.

Union-Find: near-linear decoding with a small accuracy cost

Union-Find, introduced by Delfosse and Nickerson (2021), achieves near-linear time decoding via a different strategy. Rather than finding the globally optimal matching, it grows clusters around each violated stabilizer and merges clusters when they meet. Decoding terminates when all clusters have been fully grown and paired.

The algorithm maintains a union-find data structure over syndrome events. Each violated stabilizer starts as its own cluster. In each growth step, each cluster expands outward by half an edge — when two cluster boundaries meet at an edge, the clusters merge. Growth continues until every cluster contains an even number of syndrome events. The clusters then define the correction regions.

Complexity is nearly linear in the number of syndrome events — empirically O(n α(n)) where α is the inverse Ackermann function, essentially constant for practical inputs. At code distance 7 on a single CPU core, Union-Find decodes 10 rounds of syndrome data in under 0.6 ms. MWPM at the same distance takes 1.7 ms median.

The accuracy cost is real but modest. Union-Find does not find the globally optimal matching — it finds a good-enough local matching. On uniform depolarizing noise at code distance 5, MWPM achieves logical error suppression ~0.5 percentage points higher than Union-Find. The gap widens slightly with more complex noise models (biased or correlated noise) where MWPM's weight-aware matching has more discriminating power.

The latency constraint: why it drives the choice

On a superconducting transmon device, a full syndrome measurement cycle — one round of stabilizer measurements including reset, gates, and ancilla readout — takes approximately 1–5 microseconds, depending on gate time and readout speed. If the decoder cannot return a correction before the next syndrome round begins, corrections begin to pile up and the decoder falls behind in real time.

At distance 5, MWPM's median decode time of 0.62 ms is comfortably within a 1 ms cycle budget. At distance 9, MWPM's 4.11 ms median exceeds many hardware cycle times. Union-Find's 1.04 ms at distance 9 stays within budget on most current control systems.

For trapped-ion systems, the picture is different. Gate times are 100–1000× slower than superconducting devices, but coherence times are correspondingly longer (often greater than 1 second). Syndrome cycles may take 10–100 ms, making the decode latency difference between MWPM and Union-Find irrelevant. MWPM's accuracy advantage dominates.

Per-device weight tuning matters more than algorithm choice

A point worth emphasizing: the accuracy gap between MWPM and Union-Find is smaller than the gap between a calibrated decoder (using per-device noise data to set edge weights) and an uncalibrated decoder (using uniform weights on non-uniform hardware). On devices with spatially non-uniform error rates — which is every real device — proper weight tuning improves logical error suppression by 15–35% over a uniform-weight baseline, regardless of which algorithm is used.

This is the core argument for per-device calibration in QECSync's architecture. You can pick the right algorithm for your latency budget, but you should always tune weights to your specific device's measured noise profile. An uncalibrated MWPM decoder on a non-uniform device performs worse than a calibrated Union-Find decoder on the same device.

Decision table: which decoder for your situation

Situation Recommended decoder Rationale
Superconducting, d=3–5, cycle >2 ms MWPM Latency budget comfortable, take the accuracy
Superconducting, d=7–9, fast cycle (<2 ms) Union-Find MWPM latency exceeds cycle budget at this distance
Trapped-ion, any distance MWPM Slow cycles, long coherence — accuracy wins
Highly correlated or biased noise MWPM MWPM weight tuning handles correlation better
Initial experiments, characterization runs Both (compare) Run both decoders on captured syndrome data to measure gap on your specific device

Using QECSync to compare on your data

QECSync's decoder interface supports switching algorithms at configuration time without changing the rest of your experiment stack. If you've captured syndrome data from a run, you can run both decoders against the same syndrome array offline and compare logical error counts directly. This is the cleanest way to measure the accuracy gap on your specific hardware and noise model — because synthetic benchmarks, including our own, cannot fully capture the noise structure of your device.

See the API reference for the DecoderEngine interface, which exposes both algorithms under a common decode_rounds() method with the algorithm as a constructor parameter.