The Three Compilers

How nuMetrix turns declarative YAML
into a complete diagnostic pipeline,
explains itself in three languages —
and what the forward compiler makes possible.

Nobody writes SQL.
Everybody writes intent.

A probe is a business question: “show me where usage exceeds billing.”
A hypothesis is a concern: “I suspect we have revenue leakage.”
A diagnosis is an explanation: “the billing interface drops weekend transfers.”

Each is expressed as YAML. Each is compiled to SQL.
The compiler handles the plumbing. The author handles the meaning.

Every compiler follows the same architecture

*.yaml
declarative rules
check.py
validate + cross-ref
compile.py
generate SQL
dbt build
execute per tenant

--check dry-run on every compiler. Tri-lingual text (DE/EN/FR) throughout.
Cross-reference validation: every ID must resolve to an existing file.
Deterministic MD5 keys. No hand-written SQL touches the pipeline.

44 YAML definitions. 62 generated files. Zero hand-wired SQL.

27
Probe YAMLs
13 probe types
9
Hypothesis YAMLs
4-status verdicts
8
Diagnosis YAMLs
6 root cause categories
62
Generated SQL + YML
files in dbt

Adding a new probe is 30 lines of YAML and a single command.
No SQL. No schema file. No platform wiring. The compiler does it all.

The Probe Compiler

27 YAML definitions → 56 generated files → 22 probes + 5 assessments

This is what a probe looks like

probe_id: probe_revenue_leakage type: balance contract: "gold.v1" left: entity: CaseMaterialUsage aggregate: "sum(quantity * standard_price)" right: entity: BillingEvent aggregate: "sum(billed_amount)" severity: high: "pct_deviation > 0.30" medium: "pct_deviation > 0.10"

No SQL. No table names. Just: compare usage value against billing, flag deviations.
The compiler resolves CaseMaterialUsage to gold_case_material_usage via the contract.

One DSL, thirteen compilation strategies

Type Pattern Count
balanceLeft aggregate vs right aggregate2
mandatory_itemEntity must have required children2
distribution_outlierZ-score flagging1
duplicateGROUP BY + HAVING count > 12
ratioNumerator / denominator vs expected1
trendRolling average drift detection1
temporal_sequenceEvent ordering violations1
silver_auditValidity & activity audits on Silver4
entity_filterWHERE clause on Gold dimension3
enrichmentFact aggregate joined to dimension2
reconciliationTwo fact tables, full-outer-join1
hand_writtenCompanion SQL, YAML for metadata3
assessmentAggregate probes → health score5

Probes don't know table names.
They know entity names.

YAML
entity: BillingEvent
gold_contract.v1.json
BillingEvent →
gold_billing_events
SQL
{{ ref('gold_billing_events') }}

Rename a dbt model? Update the contract. All probes follow automatically.
Add a new source system? The contract stays the same — only Bronze changes.
The contract is the seam between detection and data.

Every probe emits the same 10 columns

finding_id -- MD5(probe_id | tenant_id | entity_id | time_bucket) tenant_id -- which hospital probe_id -- which detector probe_version -- reproducibility severity -- high | medium | low entity_type -- Case | Material | CostCenter entity_id -- the affected entity time_bucket -- month or quarter money_at_risk -- CHF impact evidence -- JSON: the proof

This is the interface contract. Assessments, hypotheses, and diagnoses
all consume this shape. Add a new probe and the entire pyramid sees it.

The Hypothesis Compiler

9 business questions → weighted evidence → 4-status verdicts

A business concern, formalised

hypothesis_id: hyp_revenue_leakage_unbilled statement: en: "The hospital is losing revenue because materials are used but never billed." evidence: - probe_id: probe_revenue_leakage role: primary # weight 3 — must fire for "confirmed" weight: 3 - probe_id: probe_missing_mandatory_implants role: supporting # weight 2 — strengthens the case weight: 2 verdict: thresholds: { confirmed: 0.6, plausible: 0.3 }

The YAML says: these probes are the evidence for this concern.
The compiler builds the SQL that weighs the evidence and renders a verdict.

Four roles. Weighted scoring. One verdict.

P
Primary
Must fire for “confirmed”. The core evidence. Without it, the hypothesis cannot be proven.
S
Supporting
Strengthens the case. More supporting probes firing = higher evidence score.
C
Context
Background signal. Not decisive, but adds depth. Low weight.
Counter
Evidence against the hypothesis. If counter probes fire, the score drops. Negative weight.

Score = sum(weight × signal × direction) / sum(weight)
Then: confirmed ≥ 0.6, plausible ≥ 0.3, else not observed.

9 hypotheses fold into 2 SQL models

hypothesis_verdicts.sql
One monolithic SQL file.
3 CTEs per hypothesis: __evidence, __scored, __verdict.
Final SELECT: UNION ALL of all verdict CTEs.

One row per hypothesis per tenant.
Status, evidence score, finding count, CHF at risk.
hypothesis_registry.sql
Metadata table.
Tri-lingual statements, interpretation templates.
Category, audience, probe linkage.

Identical across tenants.
The human-readable side of each hypothesis.

Unlike probes (one SQL per probe), hypotheses compile to a single model.
This keeps the dbt DAG simple and the evidence query atomic.

The Diagnosis Compiler

8 root causes → conditions + confidence → structured explanations

Why it happened — in YAML

diagnosis_id: diag_billing_workflow_gap hypothesis_id: hyp_revenue_leakage_unbilled root_cause_category: process_failure conditions: # ALL must pass - probe_id: probe_revenue_leakage field: finding_count above: 10 # not isolated confidence: base: 0.70 boost_if: # dynamic refinement - probe_id: assessment_case_financial_integrity above: 20 boost: 0.15 # systematic = higher confidence

Gate: only fires if the linked hypothesis is confirmed.
Conditions: pattern-match against evidence probes. All must pass.
Confidence: base + dynamic boosts, capped at 1.0.

Four CTEs per diagnosis

__hypothesis
gate: confirmed?
__conditions
all thresholds met?
__confidence
base + boosts
__verdict
emit or skip

If the hypothesis isn't confirmed: no output. Zero rows.
If any condition fails: no output. Zero rows.
Only when both gates pass does the diagnosis emit a root cause
with a computed confidence score and tri-lingual explanation.

Six categories. One shared vocabulary.

Category Meaning Current
process_failureA workflow step was skipped, delayed, or incorrect4
system_failureInterface dropped data, mapping stale, sync failed1
data_qualityMissing fields, orphan references, type mismatches2
behavioralStaff bypass, selective scanning, undocumented workarounds0
structuralOrganisational misalignment, taxonomy drift1
externalRegulatory changes, supplier updates, seasonal shifts0

8 diagnoses across 4 categories today. Behavioral and external are waiting
for real-world deployment where floor knowledge informs the rules.

How the Compilers Stack

Each layer trusts the layer below. Each layer adds meaning.

The pipeline builds bottom-up

Diagnosis Verdicts
↑ queries
Hypothesis Verdicts
↑ queries
Probe Findings (27 models)
↑ queries
Gold Entities (7 models)
↑ validated from
Bronze → Silver (18 models)

dbt build resolves the DAG automatically.
Add a new probe? It flows through hypotheses and diagnoses on the next build.

Compile-time safety at every layer

  • Entity references checked. If a probe YAML says entity: BillingEvent and the contract doesn't have it — compile fails.
  • Probe IDs cross-referenced. If a hypothesis references probe_id: probe_foo and no probes/probe_foo.yaml exists — compile fails.
  • Hypothesis IDs cross-referenced. If a diagnosis references hypothesis_id: hyp_foo and no hypotheses/hyp_foo.yaml exists — compile fails.
  • Confidence ranges checked. Base 0–1, boosts 0–0.3, total ≤ 1.0. Thresholds: plausible < confirmed ≤ 1.
  • Tri-lingual text required. Every explanation, interpretation, and recommendation must have EN, DE, and FR.

The Reverse Compiler

YAML → human language. The machine explains itself in three languages.

A curated glossary powers
tri-lingual text generation

# registry_glossary.yaml entities: Case: { en: "case", de: "Fall", fr: "cas" } BillingEvent: { en: "billing event", de: "Abrechnungsereignis", fr: "événement de facturation" } Material: { en: "material", de: "Material", fr: "matériau" } fields: billed_amount: { en: "billed amount", de: "Rechnungsbetrag", fr: "montant facturé" } standard_price: { en: "standard price", de: "Standardpreis", fr: "prix standard" } derived_fields: money_at_risk: { en: "money at risk", de: "Risikobetrag", fr: "montant à risque" } io_coefficient: { en: "I/O coefficient", de: "I/O-Koeffizient", fr: "coefficient I/O" }

Entities, fields, and computed measures — all named in three languages.
proberegistry.py reads this glossary and generates text from YAML structure.
No LLM. No translation service. Pure code generation.

Every compiler outputs a registry table

Registry Generator Text Source Content
probe_registry proberegistry.py Auto-generated from YAML structure + glossary, overridable per probe display_name, description, interpretation ×3 langs
hypothesis_registry hypothesiscompile.py Hand-authored in YAML, serialised statement + 4 interpretation templates ×3 langs
diagnosis_registry diagnosiscompile.py Hand-authored in YAML, serialised explanation + recommendation ×3 langs

The probe registry is the most sophisticated — 12 type-specific generators
that synthesise text from the DSL structure. One per probe type.
Hypothesis and diagnosis registries serialise hand-authored text —
the compiler packages it, doesn't create it.

Templates become sentences
when findings arrive

── Template (from probe_registry) ── "Case {entity_id} shows a gap of {money_at_risk} between material usage and billing ({pct_diff}% of usage unbilled)." ── Finding row ── entity_id: CASE-2847 money_at_risk: 1'250.00 evidence: { "pct_diff": "14.2" } ── Rendered in Explorer ── "Case CASE-2847 shows a gap of CHF 1'250.00 between material usage and billing (14.2% of usage unbilled)."

interpretation.ts resolves {placeholders} from the finding's evidence JSON.
Works identically for probe, hypothesis, and diagnosis text.
One rendering engine. Three registries. Three languages.

The Forward Compiler

From natural language to validated YAML. Closing the loop.

We already compile in both directions.
Almost.

Reverse compiler (built)
3 registries · 42 entries · 3 languages

YAML → human-readable text.
Glossary-driven auto-generation +
hand-authored overrides + runtime resolution.

The machine explains itself to humans.
Forward compiler (ambition)
Natural language → YAML.

“Flag materials where usage exceeds billing
by more than 30% in any month.”

→ generates probe_*.yaml

The human tells the machine what to look for.

Natural language in. Findings out.
No SQL in between.

“Flag usage
without billing”
Forward
Compiler

LLM + contracts
YAML
validated probe
Probe
Compiler

existing pipeline
Findings

The forward compiler doesn't replace the probe compiler.
It feeds it. The existing validation pipeline catches every error.
The LLM proposes. The validator disposes.

Three reasons the forward compiler
is feasible — not a fantasy.

  • The contract is the guardrail. The LLM doesn't need to know SQL. It needs to pick entities, fields, and comparisons from a finite, documented vocabulary: gold_contract.v1.json.
  • The DSL is the target. 13 probe types. Each has a well-defined structure. The LLM generates structured YAML, not freeform code. The search space is bounded.
  • Validation is already built. probecheck.py catches every invalid entity, missing field, broken reference. The human reviews only probes that pass validation.

Human in the loop. Always.

Describe
Admin describes the concern in natural language. Any language.
Generate
LLM generates candidate YAML using the contract vocabulary.
Validate
probecheck.py validates structure, contracts, and cross-references.
Review
Human reads the YAML, adjusts thresholds, approves or rejects.
Compile
Approved YAML enters the standard pipeline. dbt builds. Findings appear.

The system proposes. The admin disposes.
No probe reaches production without human approval.

Three compilers.
One pipeline.
Zero SQL.

The probe compiler detects.
The hypothesis compiler judges.
The diagnosis compiler explains.
The reverse compiler makes it all human-readable.

The forward compiler will let anyone ask the question.

nuMetrix