Docs/The Assay
Core Concept

The Multi-Pass Assay

STET's matching algorithm runs in 5 passes, processing matches in order of decreasing certainty to prevent the "greedy trap" where weak early matches steal better candidates.

Overview

Traditional reconciliation systems often use greedy matching—finding any plausible match and moving on. This leads to cascading errors where a mediocre early match prevents a perfect later match.

The Multi-Pass Assay solves this by processing matches in rounds. Each pass has strict criteria, and matched transactions are removed from the pool before the next pass begins.

Pass 1: Anchor
Pass 2: Fuzzy
Pass 2.5: Semantic
Pass 3: Float
Pass 4: Redline

Pass 1: The Anchor

Criteria
  • Exact amount match (to the cent)
  • Exact date match
  • Exact description match (after normalization)
Confidence Range

100%

Anchor matches are perfect identity matches with zero ambiguity. These form the foundation of the reconciliation.

Pass 2: High Confidence Fuzzy

Criteria
  • Exact amount match
  • Exact date match
  • Description similarity ≥ 85% (Levenshtein ratio)
Confidence Range

85-99%

Catches minor description variations like 'AWS' vs 'AWS Inc' or 'Wire Transfer' vs 'Wire Xfer'.

Pass 2.5: Semantic Match

Criteria
  • Exact amount match
  • Exact date match
  • Semantic embedding similarity ≥ 85%
Confidence Range

85-99%

Uses ML embeddings (sentence-transformers) to match conceptually similar descriptions that fuzzy matching misses. Example: 'AWS' ↔ 'Amazon Web Services'.

ML Enhancement: Semantic matching uses theall-MiniLM-L6-v2model to compute text embeddings and cosine similarity.

Pass 3: The Float

Criteria
  • Exact amount match
  • Date within ±3 business days
  • Description similarity ≥ 70%
Confidence Range

70-99%

Handles timing differences caused by weekends, holidays, or processing delays. A Friday bank transaction might appear on Monday in the ledger.

Pass 4: Redline Analysis

After matching passes, Redline analyzes remaining transactions to flag discrepancies:

  • Classification Errors: Same amount + date, but description similarity <40%
  • Amount Variance: Same date + description, but different amounts
  • Timing Mismatch: Same amount + description, but dates >3 days apart
  • Missing Entries: Transactions in one source with no candidate in the other

Discrepancy Types

MISSING_ENTRY

Transaction exists in one source but has no match in the other. May indicate a recording error or fraud.

TIMING_MISMATCH

Same transaction, but dates differ by more than 3 business days. Often benign but worth reviewing.

AMOUNT_VARIANCE

Same transaction description and date, but amounts don't match. Requires investigation.

CLASSIFICATION_ERROR

Same amount and date, but descriptions are significantly different. Could be miscategorization.

Confidence Scores

Every match includes a confidence score from 0.0 to 1.0:

0.95 - 1.00(Very High)

Anchor or near-exact matches

0.85 - 0.94(High)

Strong fuzzy or semantic matches

0.70 - 0.84(Moderate)

Float matches with date variance

< 0.70(Low)

Flagged for review (usually discrepancies)