Thesis YAML Reference
Field-level specification for thesis YAML definitions.
Overview
Section titled “Overview”| Property | Value |
|---|---|
| File location | hypotheses/hyp_*.yaml |
| ID rule | hypothesis_id must match filename (without .yaml) |
| Validator | python3 scripts/hypothesischeck.py |
| Compiler | python3 scripts/hypothesiscompile.py |
| Compiler dry-run | python3 scripts/hypothesiscompile.py --check |
Fields
Section titled “Fields”| Field | Type | Required | Valid values |
|---|---|---|---|
hypothesis_id | string | yes | Must match filename |
short_code | string | yes | 3-4 uppercase letters; must have a vowel at position 2 or 3; unique across all theses |
version | string | yes | Semantic version (e.g. "1.0") |
statement | dict | yes | {en, de, fr} tri-lingual text |
category | string | yes | financial_anomaly | data_quality | compliance | operational |
tags | list | no | Freeform string tags |
audience | list | no | Freeform audience identifiers |
evidence | list | yes | Min 1 entry; see Evidence Block |
verdict | dict | yes | See Verdict Block |
interpretation | dict | yes | All 4 statuses required; see Interpretation Block |
created_at | string | no | ISO date |
modified_at | string | no | ISO date |
Valid Categories
Section titled “Valid Categories”| Category | Description |
|---|---|
financial_anomaly | Revenue leakage, cost overruns, billing discrepancies |
data_quality | Stale, missing, or inconsistent master data |
compliance | Regulatory or contractual non-compliance |
operational | Process inefficiencies, SLA breaches, capacity issues |
Evidence Block
Section titled “Evidence Block”Each entry in the evidence list:
| Field | Type | Required | Valid values |
|---|---|---|---|
probe_id | string | yes | Must reference an existing probes/*.yaml |
role | string | yes | primary | supporting | context | counter |
weight | integer | yes | 1-5 |
Rules:
- At least one entry must have
role: primary. - All referenced
probe_idvalues must have a corresponding YAML file inprobes/.
Role semantics:
| Role | Impact |
|---|---|
primary | Must have findings for verdict to reach “confirmed” |
supporting | Strengthens the evidence score |
context | Background signal, not decisive |
counter | Reduces score if findings exist (contradictory evidence) |
Verdict Block
Section titled “Verdict Block”| Field | Type | Required | Valid values |
|---|---|---|---|
thresholds.confirmed | float | yes | 0.0-1.0 |
thresholds.plausible | float | yes | 0.0-1.0 |
scaling | string | no | binary | graduated |
saturation | number | no | Must be > 0 (only meaningful with graduated scaling) |
Rules:
0 <= plausible < confirmed <= 1(plausible must be strictly less than confirmed).scalingdefaults tobinaryif omitted.saturationcontrols the point at which additional findings stop increasing the score (graduated mode only).
Interpretation Block
Section titled “Interpretation Block”Required keys: confirmed, plausible, not_observed, insufficient.
Each key maps to a tri-lingual dict with {en, de, fr}. These are human-readable narrative templates shown in Explorer for each verdict status.
| Status | When assigned |
|---|---|
confirmed | Evidence score >= thresholds.confirmed |
plausible | Evidence score >= thresholds.plausible and < thresholds.confirmed |
not_observed | Evidence score < thresholds.plausible (signals returned data but findings are below threshold) |
insufficient | Not enough signal data to evaluate (signals returned no rows) |
Verdict Output
Section titled “Verdict Output”Each thesis produces one row per tenant in hypothesis_verdicts:
| Column | Type | Description |
|---|---|---|
verdict_id | string | Deterministic surrogate key |
tenant_id | string | Tenant identifier |
hypothesis_id | string | Thesis identifier |
status | string | confirmed | plausible | not_observed | insufficient |
evidence_score | float | Weighted evidence score (0.0-1.0) |
finding_count | integer | Total findings across all evidence signals |
money_at_risk | numeric | Sum of money_at_risk from all evidence signals |
worst_severity | string | Highest severity across evidence signals |
Minimal Example
Section titled “Minimal Example”hypothesis_id: hyp_unbilled_servicesshort_code: UBSversion: "1.0"
statement: en: "Significant revenue is lost due to services delivered but never billed." de: "Erheblicher Umsatz geht durch erbrachte aber nicht verrechnete Leistungen verloren." fr: "Des revenus significatifs sont perdus en raison de prestations fournies mais jamais facturees."
category: financial_anomaly
evidence: - probe_id: probe_revenue_leakage role: primary weight: 3 - probe_id: probe_billing_completeness role: supporting weight: 2
verdict: thresholds: confirmed: 0.7 plausible: 0.3
interpretation: confirmed: en: "Revenue leakage is confirmed. Unbilled services represent a material financial gap." de: "Umsatzverlust bestaetigt. Nicht verrechnete Leistungen stellen eine wesentliche Luecke dar." fr: "La perte de revenus est confirmee. Les prestations non facturees representent un ecart financier significatif." plausible: en: "Evidence suggests possible unbilled services, but the pattern is not yet conclusive." de: "Hinweise auf moeglicherweise nicht verrechnete Leistungen, aber das Muster ist noch nicht schluessig." fr: "Les donnees suggerent des prestations possiblement non facturees, mais le schema n'est pas encore concluant." not_observed: en: "No evidence of unbilled services found in the current dataset." de: "Keine Hinweise auf nicht verrechnete Leistungen im aktuellen Datensatz." fr: "Aucune preuve de prestations non facturees dans le jeu de donnees actuel." insufficient: en: "Insufficient data to evaluate this thesis." de: "Unzureichende Daten zur Bewertung dieser Hypothese." fr: "Donnees insuffisantes pour evaluer cette hypothese."
v0.45.1 · built 2026-04-17 08:14 UTC