The Three Compilers
How nuMetrix turns declarative YAML
into a complete diagnostic pipeline,
explains itself in three languages —
and what the forward compiler makes possible.
Everybody writes intent.
A probe is a business question: “show me where usage exceeds billing.”
A hypothesis is a concern: “I suspect we have revenue leakage.”
A diagnosis is an explanation: “the billing interface drops weekend transfers.”
Each is expressed as YAML. Each is compiled to SQL.
The compiler handles the plumbing. The author handles the meaning.
Every compiler follows the same architecture
declarative rules
validate + cross-ref
generate SQL
execute per tenant
--check dry-run on every compiler. Tri-lingual text (DE/EN/FR) throughout.
Cross-reference validation: every ID must resolve to an existing file.
Deterministic MD5 keys. No hand-written SQL touches the pipeline.
44 YAML definitions. 62 generated files. Zero hand-wired SQL.
13 probe types
4-status verdicts
6 root cause categories
files in dbt
Adding a new probe is 30 lines of YAML and a single command.
No SQL. No schema file. No platform wiring. The compiler does it all.
The Probe Compiler
27 YAML definitions → 56 generated files → 22 probes + 5 assessments
This is what a probe looks like
No SQL. No table names. Just: compare usage value against billing, flag deviations.
The compiler resolves CaseMaterialUsage to gold_case_material_usage via the contract.
One DSL, thirteen compilation strategies
| Type | Pattern | Count |
|---|---|---|
| balance | Left aggregate vs right aggregate | 2 |
| mandatory_item | Entity must have required children | 2 |
| distribution_outlier | Z-score flagging | 1 |
| duplicate | GROUP BY + HAVING count > 1 | 2 |
| ratio | Numerator / denominator vs expected | 1 |
| trend | Rolling average drift detection | 1 |
| temporal_sequence | Event ordering violations | 1 |
| silver_audit | Validity & activity audits on Silver | 4 |
| entity_filter | WHERE clause on Gold dimension | 3 |
| enrichment | Fact aggregate joined to dimension | 2 |
| reconciliation | Two fact tables, full-outer-join | 1 |
| hand_written | Companion SQL, YAML for metadata | 3 |
| assessment | Aggregate probes → health score | 5 |
Probes don't know table names.
They know entity names.
entity: BillingEventgold_contract.v1.jsonBillingEvent →
gold_billing_events
{{ ref('gold_billing_events') }}
Rename a dbt model? Update the contract. All probes follow automatically.
Add a new source system? The contract stays the same — only Bronze changes.
The contract is the seam between detection and data.
Every probe emits the same 10 columns
This is the interface contract. Assessments, hypotheses, and diagnoses
all consume this shape. Add a new probe and the entire pyramid sees it.
The Hypothesis Compiler
9 business questions → weighted evidence → 4-status verdicts
A business concern, formalised
The YAML says: these probes are the evidence for this concern.
The compiler builds the SQL that weighs the evidence and renders a verdict.
Four roles. Weighted scoring. One verdict.
Score = sum(weight × signal × direction) / sum(weight)
Then: confirmed ≥ 0.6, plausible ≥ 0.3, else not observed.
9 hypotheses fold into 2 SQL models
3 CTEs per hypothesis:
__evidence, __scored, __verdict.Final SELECT: UNION ALL of all verdict CTEs.
One row per hypothesis per tenant.
Status, evidence score, finding count, CHF at risk.
Tri-lingual statements, interpretation templates.
Category, audience, probe linkage.
Identical across tenants.
The human-readable side of each hypothesis.
Unlike probes (one SQL per probe), hypotheses compile to a single model.
This keeps the dbt DAG simple and the evidence query atomic.
The Diagnosis Compiler
8 root causes → conditions + confidence → structured explanations
Why it happened — in YAML
Gate: only fires if the linked hypothesis is confirmed.
Conditions: pattern-match against evidence probes. All must pass.
Confidence: base + dynamic boosts, capped at 1.0.
Four CTEs per diagnosis
gate: confirmed?
all thresholds met?
base + boosts
emit or skip
If the hypothesis isn't confirmed: no output. Zero rows.
If any condition fails: no output. Zero rows.
Only when both gates pass does the diagnosis emit a root cause
with a computed confidence score and tri-lingual explanation.
Six categories. One shared vocabulary.
| Category | Meaning | Current |
|---|---|---|
| process_failure | A workflow step was skipped, delayed, or incorrect | 4 |
| system_failure | Interface dropped data, mapping stale, sync failed | 1 |
| data_quality | Missing fields, orphan references, type mismatches | 2 |
| behavioral | Staff bypass, selective scanning, undocumented workarounds | 0 |
| structural | Organisational misalignment, taxonomy drift | 1 |
| external | Regulatory changes, supplier updates, seasonal shifts | 0 |
8 diagnoses across 4 categories today. Behavioral and external are waiting
for real-world deployment where floor knowledge informs the rules.
How the Compilers Stack
Each layer trusts the layer below. Each layer adds meaning.
The pipeline builds bottom-up
dbt build resolves the DAG automatically.
Add a new probe? It flows through hypotheses and diagnoses on the next build.
Compile-time safety at every layer
- Entity references checked. If a probe YAML says
entity: BillingEventand the contract doesn't have it — compile fails. - Probe IDs cross-referenced. If a hypothesis references
probe_id: probe_fooand noprobes/probe_foo.yamlexists — compile fails. - Hypothesis IDs cross-referenced. If a diagnosis references
hypothesis_id: hyp_fooand nohypotheses/hyp_foo.yamlexists — compile fails. - Confidence ranges checked. Base 0–1, boosts 0–0.3, total ≤ 1.0. Thresholds: plausible < confirmed ≤ 1.
- Tri-lingual text required. Every explanation, interpretation, and recommendation must have EN, DE, and FR.
The Reverse Compiler
YAML → human language. The machine explains itself in three languages.
A curated glossary powers
tri-lingual text generation
Entities, fields, and computed measures — all named in three languages.
proberegistry.py reads this glossary and generates text from YAML structure.
No LLM. No translation service. Pure code generation.
Every compiler outputs a registry table
| Registry | Generator | Text Source | Content |
|---|---|---|---|
| probe_registry | proberegistry.py | Auto-generated from YAML structure + glossary, overridable per probe | display_name, description, interpretation ×3 langs |
| hypothesis_registry | hypothesiscompile.py | Hand-authored in YAML, serialised | statement + 4 interpretation templates ×3 langs |
| diagnosis_registry | diagnosiscompile.py | Hand-authored in YAML, serialised | explanation + recommendation ×3 langs |
The probe registry is the most sophisticated — 12 type-specific generators
that synthesise text from the DSL structure. One per probe type.
Hypothesis and diagnosis registries serialise hand-authored text —
the compiler packages it, doesn't create it.
Templates become sentences
when findings arrive
interpretation.ts resolves {placeholders} from the finding's evidence JSON.
Works identically for probe, hypothesis, and diagnosis text.
One rendering engine. Three registries. Three languages.
The Forward Compiler
From natural language to validated YAML. Closing the loop.
We already compile in both directions.
Almost.
YAML → human-readable text.
Glossary-driven auto-generation +
hand-authored overrides + runtime resolution.
The machine explains itself to humans.
“Flag materials where usage exceeds billing
by more than 30% in any month.”
→ generates
probe_*.yamlThe human tells the machine what to look for.
Natural language in. Findings out.
No SQL in between.
without billing”
Compiler
LLM + contracts
validated probe
Compiler
existing pipeline
The forward compiler doesn't replace the probe compiler.
It feeds it. The existing validation pipeline catches every error.
The LLM proposes. The validator disposes.
Three reasons the forward compiler
is feasible — not a fantasy.
- The contract is the guardrail. The LLM doesn't need to know SQL. It needs to pick entities, fields, and comparisons from a finite, documented vocabulary:
gold_contract.v1.json. - The DSL is the target. 13 probe types. Each has a well-defined structure. The LLM generates structured YAML, not freeform code. The search space is bounded.
- Validation is already built.
probecheck.pycatches every invalid entity, missing field, broken reference. The human reviews only probes that pass validation.
Human in the loop. Always.
The system proposes. The admin disposes.
No probe reaches production without human approval.
Three compilers.
One pipeline.
Zero SQL.
The probe compiler detects.
The hypothesis compiler judges.
The diagnosis compiler explains.
The reverse compiler makes it all human-readable.
The forward compiler will let anyone ask the question.