Going for Audit

nuMetrix Audit Readiness Assessment

Where we stand today, what's missing,
and how to get to 100% auditability.

February 2026

The Question

Can an external auditor trace any finding
back to its source data, understand why it was flagged,
and verify the rules haven't changed since?

What auditors need

  • Complete data lineage (source to finding)
  • Deterministic, reproducible results
  • Versioned rules with change history
  • Temporal tracking (when did this appear?)
  • Exception trail (who reviewed what?)

What nuMetrix provides

  • Source file + row number on every record
  • MD5 deterministic finding IDs
  • Probe version strings on findings
  • No temporal tracking yet
  • No exception management yet

Overall Readiness

3.4 / 5

Strong analytical foundations.
Weak operational audit trail.

7 of 12 capabilities scored 4 or 5 — the data pipeline is solid.
5 capabilities scored 1 or 2 — the operational layer needs work.

Maturity Matrix

CapabilityEvidence
Source lineage 5 source_file + row_number on all 9 Bronze models
Validation trail 5 is_valid + invalid_reason per Silver row; 34 rules documented
Validation rule registry 5 lineage_validation_rules + lineage_validation_counts queryable
Finding determinism 5 MD5(probe_id | tenant_id | entity_id | time_bucket)
Evidence chain 4 Probes → Hypotheses → Diagnoses fully linked
Pipeline metrics 4 Row counts per layer; no timestamps
Audit reports (PDF) 4 4 report types × 3 languages, stored in DuckDB
Finding lifecycle 2 Rebuilt each run — no created_at or state tracking
Rule version history 2 Version string exists; no changelog
Column-level lineage 2 Implicit in macros; not queryable
Execution metadata 1 No run_id, no executed_at on findings
Exception management 1 No false-positive marking, no review trail

What's Already Solid

Data Pipeline

  • No silent filtering — invalid rows flagged, not dropped, until Gold
  • source_file + row_number on every Bronze row — full CSV traceability
  • Surrogate keys via dbt_utils — deterministic, reproducible
  • 34 validation rules documented in lineage_validation_rules
  • Pipeline metrics — row counts at every layer, loss percentages

Diagnostic System

  • Deterministic finding IDs — MD5 hash, idempotent per run
  • Evidence chain — Probes → Hypotheses → Diagnoses linked
  • Weighted verdicts — primary/supporting/context/counter roles
  • 4 contracts — Gold, Silver, Findings, Diagnosis (all versioned)
  • PDF audit reports — data quality, financial risk, entity health, analytics readiness

The Audit Trail Today

Every layer is traceable to the one below it. The chain is complete — but only as a snapshot.

Diagnosis
↓ gates on confirmed hypothesis
Hypothesis Verdicts
↓ aggregates evidence from probes
Probe Findings
↓ queries Gold entities
Gold (valid rows only)
↓ filters is_valid = true
Silver (all rows + is_valid + invalid_reason)
↓ validates + surrogate keys
Bronze (source_file + row_number)
↓ reads CSV
Source CSVs (tenants/{id}/{system}/csv/)

Gap Analysis

Tier 1 Must-have for audit

  • Execution metadata — No run_id or timestamp on findings. Can't answer "when was this finding generated?"
  • Finding lifecycle — Findings rebuilt each run. Can't track when a finding first appeared or was resolved.
  • Score transparency — Hypothesis verdict shows 0.65 but not which probes contributed what weight.

Tier 2 Should-have for external audit

  • Rule change history — Probe YAML changes not tracked in queryable form.
  • Exception management — No way to mark findings as false-positive or reviewed.
  • Evidence schema — Evidence JSON unstructured; varies by probe type.

Tier 3 Nice-to-have

  • Column-level lineage — Source-to-target column mapping not queryable (implicit in macros).
  • Finding → source rows — Can't trace a finding back to specific Bronze rows.
  • Cross-entity impact — A bad material record affects billing, usage, and cases — no unified view.

Key insight: The analytical layer (what happened, why) is strong. The operational layer (when, by whom, status changes) is almost absent.

Phase 1: Execution Context

Low effort

Add run_id + timestamp to every finding

Modify the probe compiler (probecompile.py) to inject two columns into every generated SQL model:

Same change to hypothesiscompile.py and diagnosiscompile.py.

What this unlocks

Files: scripts/probecompile.py, scripts/hypothesiscompile.py, scripts/diagnosiscompile.py, contracts/findings_contract.v1.json

Phase 2: Score Transparency

Low effort

Show the math behind every verdict

Add an evidence_breakdown JSON column to hypothesis_verdicts:

[
  {"probe_id": "probe_revenue_leakage",  "role": "primary",    "weight": 3, "findings": 42, "signal": 1.0,  "contribution": 0.43},
  {"probe_id": "probe_orphan_billing",   "role": "supporting", "weight": 2, "findings": 18, "signal": 0.89, "contribution": 0.25},
  {"probe_id": "probe_duplicate_billing", "role": "context",   "weight": 1, "findings": 0,  "signal": 0.0,  "contribution": 0.0}
]

Same for diagnosis_verdicts: add confidence_breakdown showing base + each conditional boost.

Explorer hypothesis detail page renders the breakdown as a table.

Phase 3: Finding Lifecycle

Medium effort

Track when findings appear, persist, and resolve

New dbt incremental model: finding_snapshots

What this unlocks

Requires: dbt incremental materialization (merge strategy on finding_id). New pattern for the project.

Phase 4: Exception Management

Medium effort

Let humans annotate findings

New table: finding_exceptions

Explorer changes

This is the bridge from "analytics tool" to "audit workflow."

Roadmap to 5/5

Phase 1
Execution context
3.4 → 3.8
Phase 2
Score transparency
3.8 → 4.2
Phase 3
Finding lifecycle
4.2 → 4.6
Phase 4
Exception management
4.6 → 5.0

Phases 1+2

Compiler-only changes

No new dbt models. No Explorer changes.

Phases 3+4

New dbt models + Explorer write

Incremental materialization. API endpoints.

Summary

Today (3.4 / 5)

  • Full source-to-Gold lineage
  • No silent filtering anywhere
  • Deterministic finding IDs
  • Evidence chain: Probes → Hypotheses → Diagnoses
  • PDF audit reports in 3 languages
  • Calibration harness with recall metrics

After 4 phases (5.0 / 5)

  • Every finding timestamped with run_id
  • Full score transparency on verdicts
  • Finding lifecycle: first_seen, age, state
  • Exception trail: who reviewed, when, why
  • Auditor can reconstruct any finding's full history

The data tells the truth.
Now we make the trail visible.