Docs/Matching Pipeline

Core Algorithm

5-Pass Matching Pipeline

STET's reconciliation engine runs in 5 passes, processing matches in order of decreasing certainty to prevent the "greedy trap" where weak early matches steal better candidates from later, more rigorous passes.

Overview

Traditional reconciliation tools use greedy matching—finding any plausible match and moving on. This produces cascading errors where a mediocre early match permanently blocks a perfect later one.

STET solves this by running matching in strict rounds. Each pass applies specific criteria, and matched transactions are removed from the candidate pool before the next pass begins. Deterministic rules always run first; ML is used only as an enhancement for edge cases.

Important: STET is a software tool. Its outputs are technical records of a matching process, not audit opinions or professional certifications. All results require review by a qualified professional before any reliance.

Pass 1: Anchor

Pass 2: Fuzzy

Pass 2.5: Semantic

Pass 3: Float

Pass 4: Redline

Pass 1: Anchor

Criteria

Exact amount match (to the cent)
Exact date match
Exact description match (after normalization)

Confidence Range

100%

Perfect identity matches with zero ambiguity. These form the foundation of the reconciliation and are excluded from all subsequent passes.

Pass 2: High-Confidence Fuzzy

Criteria

Exact amount match
Exact date match
Description similarity ≥ 85% (Levenshtein ratio)

Confidence Range

85–99%

Catches minor description variations like 'AWS' vs 'AWS Inc' or 'Wire Transfer' vs 'Wire Xfer'. Amount and date must still be exact.

Pass 2.5: Semantic Match

Criteria

Exact amount match
Exact date match
Semantic embedding similarity ≥ 85%

Confidence Range

85–99%

Uses ML embeddings to match conceptually similar descriptions that character-level fuzzy matching misses. Example: 'AWS' ↔ 'Amazon Web Services'.

ML Note: Semantic matching uses the frozenall-MiniLM-L6-v2model run entirely in your browser. Your data never leaves your device during this step. The model produces similarity scores only — it cannot invent or fabricate transactions.

Pass 3: Float

Criteria

Exact amount match
Date within ±3 business days
Description similarity ≥ 70%

Confidence Range

70–99%

Handles timing differences caused by weekends, holidays, or processing delays. A Friday bank transaction might appear Monday in the ledger. The auditor should verify each float match.

Pass 4: Redline

After all matching passes complete, Redline analyzes remaining unmatched transactions and classifies them into discrepancy types for auditor triage. Redline is a flagging tool — investigation and final determination are the auditor's responsibility.

Classification Errors: Same amount + date, but description similarity <40%
Amount Variance: Same date + description, but different amounts
Timing Mismatch: Same amount + description, but dates >3 days apart
Missing Entries: Transactions present in one source with no candidate in the other

Discrepancy Types

MISSING_ENTRY

Transaction exists in one source but has no match in the other. May indicate a recording error or fraud. Requires auditor investigation.

TIMING_MISMATCH

Same transaction, but dates differ by more than 3 business days. Often benign (processing delays) but must be reviewed by the auditor.

AMOUNT_VARIANCE

Same description and date, but amounts differ. Requires auditor investigation to determine root cause.

CLASSIFICATION_ERROR

Same amount and date, but descriptions are significantly different. Could be miscategorization or an unrelated transaction.

Auditor's Responsibility: Discrepancy flags are produced automatically by pattern matching. STET does not determine whether a discrepancy is material, intentional, or an error. That determination belongs solely to the qualified professional reviewing the output.

Confidence Scores

Every match includes a confidence score from 0.0 to 1.0 based on the pass and criteria used. Confidence scores are inputs to the auditor's judgment, not final verdicts.

0.95 – 1.00(Very High)

Anchor or near-exact matches

0.85 – 0.94(High)

Strong fuzzy or semantic matches

0.70 – 0.84(Moderate)

Float matches with date variance — warrant review

< 0.70(Low)

Flagged for auditor triage