nuMetrix · Architecture

The Three Compilers

How nuMetrix turns declarative YAML
into a complete diagnostic pipeline,
explains itself in three languages —
and what the forward compiler makes possible.

The Core Idea

Nobody writes SQL.
Everybody writes intent.

A probe is a business question: “show me where usage exceeds billing.”
A hypothesis is a concern: “I suspect we have revenue leakage.”
A diagnosis is an explanation: “the billing interface drops weekend transfers.”

Each is expressed as YAML. Each is compiled to SQL.
The compiler handles the plumbing. The author handles the meaning.

One Pattern, Three Layers

Every compiler follows the same architecture

*.yaml
declarative rules

→

check.py
validate + cross-ref

→

compile.py
generate SQL

→

dbt build
execute per tenant

--check dry-run on every compiler. Tri-lingual text (DE/EN/FR) throughout.
Cross-reference validation: every ID must resolve to an existing file.
Deterministic MD5 keys. No hand-written SQL touches the pipeline.

At a Glance

44 YAML definitions. 62 generated files. Zero hand-wired SQL.

27

Probe YAMLs
13 probe types

9

Hypothesis YAMLs
4-status verdicts

8

Diagnosis YAMLs
6 root cause categories

62

Generated SQL + YML
files in dbt

Adding a new probe is 30 lines of YAML and a single command.
No SQL. No schema file. No platform wiring. The compiler does it all.

nuMetrix

Compiler I

The Probe Compiler

27 YAML definitions → 56 generated files → 22 probes + 5 assessments

Probe YAML

This is what a probe looks like

probe_id: probe_revenue_leakage
type: balance
contract: "gold.v1"

left:
  entity: CaseMaterialUsage
  aggregate: "sum(quantity * standard_price)"
right:
  entity: BillingEvent
  aggregate: "sum(billed_amount)"

severity:
  high: "pct_deviation > 0.30"
  medium: "pct_deviation > 0.10"

No SQL. No table names. Just: compare usage value against billing, flag deviations.
The compiler resolves CaseMaterialUsage to gold_case_material_usage via the contract.

13 Probe Types

One DSL, thirteen compilation strategies

Type	Pattern	Count
balance	Left aggregate vs right aggregate	2
mandatory_item	Entity must have required children	2
distribution_outlier	Z-score flagging	1
duplicate	GROUP BY + HAVING count > 1	2
ratio	Numerator / denominator vs expected	1
trend	Rolling average drift detection	1
temporal_sequence	Event ordering violations	1
silver_audit	Validity & activity audits on Silver	4
entity_filter	WHERE clause on Gold dimension	3
enrichment	Fact aggregate joined to dimension	2
reconciliation	Two fact tables, full-outer-join	1
hand_written	Companion SQL, YAML for metadata	3
assessment	Aggregate probes → health score	5

Contract Indirection

Probes don't know table names.
They know entity names.

YAML
entity: BillingEvent

→

gold_contract.v1.json
BillingEvent →
gold_billing_events

→

SQL
{{ ref('gold_billing_events') }}

Rename a dbt model? Update the contract. All probes follow automatically.
Add a new source system? The contract stays the same — only Bronze changes.
The contract is the seam between detection and data.

Findings Output

Every probe emits the same 10 columns

finding_id       -- MD5(probe_id | tenant_id | entity_id | time_bucket)
tenant_id        -- which hospital
probe_id         -- which detector
probe_version    -- reproducibility
severity         -- high | medium | low
entity_type      -- Case | Material | CostCenter
entity_id        -- the affected entity
time_bucket      -- month or quarter
money_at_risk    -- CHF impact
evidence         -- JSON: the proof

This is the interface contract. Assessments, hypotheses, and diagnoses
all consume this shape. Add a new probe and the entire pyramid sees it.

nuMetrix

Compiler II

The Hypothesis Compiler

9 business questions → weighted evidence → 4-status verdicts

Hypothesis YAML

A business concern, formalised

hypothesis_id: hyp_revenue_leakage_unbilled

statement:
  en: "The hospital is losing revenue because materials are used but never billed."

evidence:
  - probe_id: probe_revenue_leakage
    role: primary         # weight 3 — must fire for "confirmed"
    weight: 3
  - probe_id: probe_missing_mandatory_implants
    role: supporting      # weight 2 — strengthens the case
    weight: 2

verdict:
  thresholds: { confirmed: 0.6, plausible: 0.3 }

The YAML says: these probes are the evidence for this concern.
The compiler builds the SQL that weighs the evidence and renders a verdict.

Evidence Roles

Four roles. Weighted scoring. One verdict.

P

Primary

Must fire for “confirmed”. The core evidence. Without it, the hypothesis cannot be proven.

S

Supporting

Strengthens the case. More supporting probes firing = higher evidence score.

C

Context

Background signal. Not decisive, but adds depth. Low weight.

−

Counter

Evidence against the hypothesis. If counter probes fire, the score drops. Negative weight.

Score = sum(weight × signal × direction) / sum(weight)
Then: confirmed ≥ 0.6, plausible ≥ 0.3, else not observed.

Compilation Output

9 hypotheses fold into 2 SQL models

hypothesis_verdicts.sql

One monolithic SQL file.
3 CTEs per hypothesis: __evidence, __scored, __verdict.
Final SELECT: UNION ALL of all verdict CTEs.

One row per hypothesis per tenant.
Status, evidence score, finding count, CHF at risk.

hypothesis_registry.sql

Metadata table.
Tri-lingual statements, interpretation templates.
Category, audience, probe linkage.

Identical across tenants.
The human-readable side of each hypothesis.

Unlike probes (one SQL per probe), hypotheses compile to a single model.
This keeps the dbt DAG simple and the evidence query atomic.

nuMetrix

Compiler III

The Diagnosis Compiler

8 root causes → conditions + confidence → structured explanations

Diagnosis YAML

Why it happened — in YAML

diagnosis_id: diag_billing_workflow_gap
hypothesis_id: hyp_revenue_leakage_unbilled
root_cause_category: process_failure

conditions:                   # ALL must pass
  - probe_id: probe_revenue_leakage
    field: finding_count
    above: 10               # not isolated

confidence:
  base: 0.70
  boost_if:                  # dynamic refinement
    - probe_id: assessment_case_financial_integrity
      above: 20
      boost: 0.15            # systematic = higher confidence

Gate: only fires if the linked hypothesis is confirmed.
Conditions: pattern-match against evidence probes. All must pass.
Confidence: base + dynamic boosts, capped at 1.0.

Compilation Strategy

Four CTEs per diagnosis

__hypothesis
gate: confirmed?

→

__conditions
all thresholds met?

→

__confidence
base + boosts

→

__verdict
emit or skip

If the hypothesis isn't confirmed: no output. Zero rows.
If any condition fails: no output. Zero rows.
Only when both gates pass does the diagnosis emit a root cause
with a computed confidence score and tri-lingual explanation.

Root Cause Taxonomy

Six categories. One shared vocabulary.

Category	Meaning	Current
process_failure	A workflow step was skipped, delayed, or incorrect	4
system_failure	Interface dropped data, mapping stale, sync failed	1
data_quality	Missing fields, orphan references, type mismatches	2
behavioral	Staff bypass, selective scanning, undocumented workarounds	0
structural	Organisational misalignment, taxonomy drift	1
external	Regulatory changes, supplier updates, seasonal shifts	0

8 diagnoses across 4 categories today. Behavioral and external are waiting
for real-world deployment where floor knowledge informs the rules.

nuMetrix

The Chain

How the Compilers Stack

Each layer trusts the layer below. Each layer adds meaning.

Dependency Stack

The pipeline builds bottom-up

Diagnosis Verdicts

↑ queries

Hypothesis Verdicts

↑ queries

Probe Findings (27 models)

↑ queries

Gold Entities (7 models)

↑ validated from

Bronze → Silver (18 models)

dbt build resolves the DAG automatically.
Add a new probe? It flows through hypotheses and diagnoses on the next build.

Validation Guarantees

Compile-time safety at every layer

Entity references checked. If a probe YAML says entity: BillingEvent and the contract doesn't have it — compile fails.
Probe IDs cross-referenced. If a hypothesis references probe_id: probe_foo and no probes/probe_foo.yaml exists — compile fails.
Hypothesis IDs cross-referenced. If a diagnosis references hypothesis_id: hyp_foo and no hypotheses/hyp_foo.yaml exists — compile fails.
Confidence ranges checked. Base 0–1, boosts 0–0.3, total ≤ 1.0. Thresholds: plausible < confirmed ≤ 1.
Tri-lingual text required. Every explanation, interpretation, and recommendation must have EN, DE, and FR.

nuMetrix

The Other Direction

The Reverse Compiler

YAML → human language. The machine explains itself in three languages.

The Vocabulary

A curated glossary powers
tri-lingual text generation

# registry_glossary.yaml

entities:
  Case:         { en: "case", de: "Fall", fr: "cas" }
  BillingEvent: { en: "billing event", de: "Abrechnungsereignis", fr: "événement de facturation" }
  Material:     { en: "material", de: "Material", fr: "matériau" }

fields:
  billed_amount:  { en: "billed amount", de: "Rechnungsbetrag", fr: "montant facturé" }
  standard_price: { en: "standard price", de: "Standardpreis", fr: "prix standard" }

derived_fields:
  money_at_risk:  { en: "money at risk", de: "Risikobetrag", fr: "montant à risque" }
  io_coefficient: { en: "I/O coefficient", de: "I/O-Koeffizient", fr: "coefficient I/O" }

Entities, fields, and computed measures — all named in three languages.
proberegistry.py reads this glossary and generates text from YAML structure.
No LLM. No translation service. Pure code generation.

Three Registries

Every compiler outputs a registry table

Registry	Generator	Text Source	Content
probe_registry	proberegistry.py	Auto-generated from YAML structure + glossary, overridable per probe	display_name, description, interpretation ×3 langs
hypothesis_registry	hypothesiscompile.py	Hand-authored in YAML, serialised	statement + 4 interpretation templates ×3 langs
diagnosis_registry	diagnosiscompile.py	Hand-authored in YAML, serialised	explanation + recommendation ×3 langs

The probe registry is the most sophisticated — 12 type-specific generators
that synthesise text from the DSL structure. One per probe type.
Hypothesis and diagnosis registries serialise hand-authored text —
the compiler packages it, doesn't create it.

Runtime Resolution

Templates become sentences
when findings arrive

── Template (from probe_registry) ──
"Case {entity_id} shows a gap of {money_at_risk} between
 material usage and billing ({pct_diff}% of usage unbilled)."

── Finding row ──
entity_id:    CASE-2847
money_at_risk: 1'250.00
evidence:     { "pct_diff": "14.2" }

── Rendered in Explorer ──
"Case CASE-2847 shows a gap of CHF 1'250.00 between
 material usage and billing (14.2% of usage unbilled)."

interpretation.ts resolves {placeholders} from the finding's evidence JSON.
Works identically for probe, hypothesis, and diagnosis text.
One rendering engine. Three registries. Three languages.

nuMetrix

The Ambition

The Forward Compiler

From natural language to validated YAML. Closing the loop.

Two Directions

We already compile in both directions.
Almost.

Reverse compiler (built)

3 registries · 42 entries · 3 languages

YAML → human-readable text.
Glossary-driven auto-generation +
hand-authored overrides + runtime resolution.

The machine explains itself to humans.

Forward compiler (ambition)

Natural language → YAML.

“Flag materials where usage exceeds billing
by more than 30% in any month.”

→ generates probe_*.yaml

The human tells the machine what to look for.

The Full Loop

Natural language in. Findings out.
No SQL in between.

“Flag usage
without billing”

→

Forward
Compiler
LLM + contracts

→

YAML
validated probe

→

Probe
Compiler
existing pipeline

→

Findings

The forward compiler doesn't replace the probe compiler.
It feeds it. The existing validation pipeline catches every error.
The LLM proposes. The validator disposes.

Why This Works

Three reasons the forward compiler
is feasible — not a fantasy.

The contract is the guardrail. The LLM doesn't need to know SQL. It needs to pick entities, fields, and comparisons from a finite, documented vocabulary: gold_contract.v1.json.
The DSL is the target. 13 probe types. Each has a well-defined structure. The LLM generates structured YAML, not freeform code. The search space is bounded.
Validation is already built. probecheck.py catches every invalid entity, missing field, broken reference. The human reviews only probes that pass validation.

The Workflow

Human in the loop. Always.

Describe

Admin describes the concern in natural language. Any language.

Generate

LLM generates candidate YAML using the contract vocabulary.

Validate

probecheck.py validates structure, contracts, and cross-references.

Review

Human reads the YAML, adjusts thresholds, approves or rejects.

Compile

Approved YAML enters the standard pipeline. dbt builds. Findings appear.

The system proposes. The admin disposes.
No probe reaches production without human approval.

nuMetrix

Three compilers.
One pipeline.
Zero SQL.

The probe compiler detects.
The hypothesis compiler judges.
The diagnosis compiler explains.
The reverse compiler makes it all human-readable.

The forward compiler will let anyone ask the question.

nuMetrix

The Three Compilers

Every compiler follows the same architecture

44 YAML definitions. 62 generated files. Zero hand-wired SQL.

The Probe Compiler

This is what a probe looks like

One DSL, thirteen compilation strategies

Probes don't know table names.They know entity names.

Every probe emits the same 10 columns

The Hypothesis Compiler

A business concern, formalised

Four roles. Weighted scoring. One verdict.

9 hypotheses fold into 2 SQL models

The Diagnosis Compiler

Why it happened — in YAML

Four CTEs per diagnosis

Six categories. One shared vocabulary.

How the Compilers Stack

The pipeline builds bottom-up

Compile-time safety at every layer

The Reverse Compiler

A curated glossary powerstri-lingual text generation

Every compiler outputs a registry table

Templates become sentenceswhen findings arrive

The Forward Compiler

We already compile in both directions.Almost.

Natural language in. Findings out.No SQL in between.

Three reasons the forward compileris feasible — not a fantasy.

Human in the loop. Always.

Three compilers.One pipeline.Zero SQL.

Probes don't know table names.
They know entity names.

A curated glossary powers
tri-lingual text generation

Templates become sentences
when findings arrive

We already compile in both directions.
Almost.

Natural language in. Findings out.
No SQL in between.

Three reasons the forward compiler
is feasible — not a fantasy.

Three compilers.
One pipeline.
Zero SQL.