Signal System — Full Lifecycle

Date: 2026-02-20 (last vocabulary pass: 2026-04-10)

This document describes the jinflow signal system end-to-end: from detection through perspective, interpretation, prioritisation, action, resolution, and verification.

The guiding metaphor (from the DDD glossary): the organization is the patient, signals are diagnostics, findings are symptoms, perspectives are differential verdicts, interpretations explain in plain language, treatments are interventions, and re-probing closes the loop.

Overview

┌──────────┐    ┌──────────┐    ┌─────────────┐    ┌──────────────┐    ┌───────┐    ┌─────────┐    ┌────────┐
│ 1 Detect │───▶│ 2 Assess │───▶│ 3 Interpret  │───▶│ 4 Prioritise │───▶│ 5 Act │───▶│6 Resolve│───▶│7 Verify│
└──────────┘    └──────────┘    └─────────────┘    └──────────────┘    └───────┘    └─────────┘    └────────┘
      ▲                                                                                                │
      └────────────────────────────────────────────────────────────────────────────────────────────────┘
                                                feedback loop

Each phase has a clear input, output, and owner. Phases 1-3 are automated. Phases 4-7 are progressively more human-in-the-loop today, but designed for automation as the system matures.

Phase 1 — Detect

Goal: Produce standardised findings from data layer tables.

Input: Gold layer tables (the product contract) for financial_anomaly and compliance signals. Silver layer tables for data_quality signals that need to see invalid rows.

Output: signal_findings__<probe_id> tables, each row a finding.

Input layer by category

Category	Input layer	Why
`financial_anomaly`	Gold	Operates on validated, deduplicated business entities
`data_quality`	Silver or Gold	Silver when checking validation rates / invalid rows (Gold filters them out). Gold when checking master data completeness (e.g. supplier coverage)
`compliance`	Gold	Regulatory checks on validated data

Finding output contract

Every signal emits a table with these columns:

Column	Type	Description
`finding_id`	varchar	Deterministic hash of (probe_id, tenant_id, entity_id, time_bucket)
`tenant_id`	varchar	Tenant identifier
`probe_id`	varchar	Stable signal identifier
`probe_version`	varchar	Semver
`severity`	varchar	`high`, `medium`, or `low`
`entity_type`	varchar	Gold entity name (e.g. `Material`, `Case`)
`entity_id`	varchar	Primary identifier of flagged entity
`time_bucket`	varchar	YYYY-MM or date range
`money_at_risk`	double	Estimated monetary impact in CHF
`evidence`	varchar	JSON object with signal-specific details

Signal types

DSL signals (declaratively defined): balance, mandatory_item, distribution_outlier. Compiled to SQL by signalcompile.py. For repeatable patterns.
Hand-written signals (SQL): For complex joins or custom logic that don’t fit the DSL. Same output contract, same models/probes/ location.

Self-describing signals

Each signal embeds metadata in its evidence JSON so downstream phases don’t need to know signal internals:

{
  "signal": "io_coefficient_trend",
  "earliest_rolling_avg": "0.82",
  "latest_rolling_avg": "0.53",
  "drift": "0.29",
  "months_active": "18",
  "trend_direction": "worsening"
}

Dynamic discovery

The platform and Explorer discover signals at runtime by querying information_schema.tables for tables matching signal_findings__% in each tenant schema. No hardcoded signal lists.

Signal categories

Each signal belongs to a category (probe_category in the registry):

Category	Purpose	money_at_risk
`financial_anomaly`	Revenue leakage, cost attribution errors, pricing issues	Direct CHF impact
`data_quality`	Master data completeness, referential integrity, stale records	Indirect or null
`compliance`	Regulatory, documentation, audit trail gaps	Risk-based estimate

Current signals

Signal	Category	Type	Entity	What it detects
`revenue_leakage`	financial_anomaly	DSL (balance)	Case	Usage value exceeds billing
`missing_mandatory_implants`	financial_anomaly	DSL (mandatory_item)	Case	Implant procedure without implant material
`cost_center_billing_mismatch`	financial_anomaly	DSL (balance)	Case	Billing on wrong cost center
`io_coefficient`	financial_anomaly	Hand-written	Material	Aggregate I/O deviation > 20%
`io_coefficient_trend`	financial_anomaly	Hand-written	Material	Worsening I/O coefficient over time
`missing_standard_price`	data_quality	Hand-written	Material	Active materials with zero price but transactional activity

Silver `is_valid` semantics

Silver models use is_valid (boolean) and invalid_reason (varchar) to flag rows that fail validation. Gold filters to is_valid = true. Understanding what is_valid means is essential for data_quality signals.

Three categories of validation:

Category	Examples	Semantics
Missing PK / required field	`missing case_token`, `missing material_id`, `missing supplier_code`	Row is structurally unusable — cannot participate in any join or aggregation. Always fatal.
Invalid value	`invalid quantity` (null/≤0), `invalid case_type` (not in enum), `invalid movement_type`, `invalid billing_status`	Domain logic cannot process this row. The value exists but is outside the allowed domain. Fatal.
Orphan FK	`orphan case_token`, `orphan material_id`, `orphan parent_node_id`	The referenced parent entity is missing. Ambiguous — could mean: (a) source data is incomplete, (b) parent was purged, (c) parent failed its own validation (cascading invalidity).

The first two categories are clear-cut: the row is broken. The third is where data_quality signals add value — not by re-checking individual rows, but by analyzing the aggregate pattern: What percentage of billing events have orphan case_tokens? Is it increasing? What’s the monetary impact of excluded rows?

is_valid is a validation gate (per-row, binary, deterministic). Data quality signals are diagnostics (aggregate, severity-graded, trend-aware). They complement each other — signals don’t replace is_valid, they analyze its output.

Phase 2 — Assess

Goal: Aggregate symptoms into entity health scores. Signals detect individual symptoms; perspectives tell you how healthy an entity is overall.

Input: Findings from multiple signals for a tenant.

Output: Per-entity perspective findings with severity, health_score, and aggregated risk.

The problem perspectives solve

Individual findings tell you what is wrong. But the same entity can be flagged by many signals. Without perspectives, the user sees a flat list of symptoms and must mentally aggregate the overall picture per entity.

Perspective architecture

Perspectives are a second-order construct: they consume findings, not raw data. They run after all signals complete and emit the same findings contract.

Gold / Silver
    │
    ▼
Signals (Phase 1)  ──▶  Findings
                            │
                            ▼
                    Perspectives (Phase 2)  ──▶  Entity health scores
                            │
                            ▼
                    Composed Perspectives  ──▶  Cross-domain health

Perspective output

Perspectives emit the standard findings contract with these additional computed fields available in the aggregated CTE:

Field	Type	Description
`probe_count`	int	Number of distinct signals flagging the entity
`finding_count`	int	Total findings across all source signals
`total_risk`	decimal	Sum of money_at_risk from all findings
`weighted_risk`	decimal	Risk weighted by source signal weights
`weighted_probe_score`	decimal	Sum of source signal weights
`health_score`	decimal	0.0 (all signals flag) to 1.0 (no flags)
`worst_severity`	varchar	Highest severity across source signals
`probes_flagged`	varchar	Comma-separated probe_ids

Severity rules

Perspective severity supports three rule forms:

Simple — single field threshold:

severity_rules:
  - field: total_risk
    above: 100000
    level: high

Compound — AND-joined conditions:

severity_rules:
  - conditions:
      - field: probe_count
        above: 2
      - field: total_risk
        above: 50000
    level: high

Default — fallback:

severity_rules:
  - default: low

Valid fields: probe_count, finding_count, total_risk, weighted_risk, weighted_probe_score.

Health score

health_score = 1.0 - (weighted_probe_score / max_possible_weighted_score)

Where max_possible_weighted_score = sum(all source probe weights). Computed at compile time from the YAML. An entity with 0 findings → 1.0 (healthy). All signals flagging → 0.0 (critical). Enables continuous ranking within severity buckets.

Perspective composability

Perspectives can reference other perspectives as sources:

source_probes:
  - probe_id: assessment_material_health    # another perspective
    weight: 3
  - probe_id: assessment_material_compliance
    weight: 2

This works because perspectives emit the same findings contract. The dbt DAG automatically builds source perspectives before composed ones.

Current perspectives

Perspective	Entity	Sources	Purpose
`assessment_material_health`	Material	io_coefficient, io_trend, pricing, margin, shelf_life	Supply chain health
`assessment_material_compliance`	Material	controlled_substance, MiGEL, stale_article, barcode	Regulatory compliance
`assessment_material_overall`	Material	material_health + material_compliance	Composed — overall material health
`assessment_case_financial_integrity`	Case	revenue_leakage, implants, CC mismatch, billing timing, procedures	Financial integrity
`assessment_billing_quality`	Case	cross-site billing, CC producteur, invoice integrity	Billing data quality

Perspective vs. signal

	Signal	Perspective
Input	Data tables (Gold/Silver)	Findings from signals (or other perspectives)
Output	Individual findings (symptoms)	Entity health score + aggregated severity
Grain	One entity, one time period	All findings for one entity
Question	What is anomalous?	How healthy is this entity?

Phase 3 — Interpret

Goal: Translate a finding into a human-readable explanation in the user’s language, so non-technical stakeholders understand what happened and why it matters.

Input: A finding row (including evidence JSON).

Output: A localised text (DE/FR/EN) explaining the finding.

Interpretation templates

Each signal defines its own interpretation templates, embedded in the signal definition. The interpreter performs string substitution — no code changes needed for new signals.

Template structure per signal:

interpretation:
  de: >
    Material {entity_id} zeigt einen sich verschlechternden I/O-Koeffizienten:
    der 3-Monats-Durchschnitt sank von {earliest_rolling_avg} auf {latest_rolling_avg}
    über {months_active} Monate. Dies deutet auf eine systematische Prozessverschlechterung hin.
  en: >
    Material {entity_id} shows a worsening I/O coefficient trend:
    the 3-month rolling average moved from {earliest_rolling_avg} to {latest_rolling_avg}
    over {months_active} months. This indicates systematic process degradation.
  fr: >
    Le matériau {entity_id} présente une tendance I/O en dégradation:
    la moyenne mobile 3 mois est passée de {earliest_rolling_avg} à {latest_rolling_avg}
    sur {months_active} mois. Cela indique une dégradation systématique du processus.

Placeholders are resolved from the finding’s evidence JSON plus the standard finding columns (entity_id, severity, money_at_risk, time_bucket).

Fallback

If a signal has no registered template, the interpreter renders the evidence JSON as a structured key-value list. Functional but not polished — an incentive to always provide templates.

Phase 4 — Prioritise

Goal: Rank findings so users focus on what matters most.

Input: All findings for a tenant (or across tenants).

Output: A ranked, filterable list with context.

Prioritisation signals

Signal	Source	Weight
`severity`	Signal-assigned (high/medium/low)	Primary
`money_at_risk`	Signal-computed	Primary
`trend_direction`	Trend signals (worsening > stable > improving)	Secondary
Recurrence	Same entity flagged across multiple signals	Secondary
Age	How long the finding has persisted	Tertiary

Prioritisation today

The Explorer sorts by money_at_risk DESC within severity groups. This is adequate for v1. Future: a composite priority score combining the signals above.

Cross-signal correlation

A material flagged by both io_coefficient (bad level) and io_coefficient_trend (getting worse) is a stronger signal than either alone. The prioritisation layer should surface these clusters.

Phase 5 — Act

Goal: Recommend or trigger concrete next steps for a finding.

Input: A prioritised finding with interpretation.

Output: A recommended action or set of actions.

Action types

Action	Description	Automation level
Investigate	Gather more data — drill into the entity, check related findings	Manual today, guided navigation in Explorer
Escalate	Route to a responsible person or team	Manual today
Correct	Fix the root cause (e.g. adjust stock levels, rebill, update CC mapping)	Always manual — the Explorer is read-only
Accept	Acknowledge and accept the risk (with justification)	Manual with audit trail
Suppress	Mark as known/expected (e.g. seasonal pattern) with expiry	Manual today, rule-based later

Recommended actions per signal

Each signal can define default recommended actions:

actions:
  - type: investigate
    description: "Review material movement history for this item"
    link_pattern: "/findings/{probe_id}/{finding_id}"
  - type: escalate
    description: "Notify ward manager if trend persists > 6 months"

This is metadata — jinflow does not execute actions. It surfaces recommendations.

Phase 6 — Resolve

Goal: Track the disposition of each finding through its lifecycle.

Input: A finding + human decision.

Output: A resolution record.

Resolution states

open ──▶ acknowledged ──▶ in_progress ──▶ resolved
  │                                          │
  ├──▶ accepted_risk ◀─────────────────────┘ (if re-opened)
  │
  └──▶ suppressed (with expiry)

Resolution record

Field	Description
`finding_id`	FK to the finding
`status`	Current resolution state
`assigned_to`	Person or role responsible
`resolution_note`	Free text explanation
`resolved_at`	Timestamp
`expires_at`	For suppressed findings — auto-reopen after this date

Where resolution state lives

Findings are recomputed on every dbt build (immutable, deterministic). Resolution state is mutable and lives outside dbt — in a separate resolutions table managed by the Explorer app or a future API layer. Joined at query time.

A finding that disappears after a dbt rebuild (because the underlying data improved) is auto-resolved.

Phase 7 — Verify

Goal: Confirm that actions taken actually fixed the problem.

Input: A resolved finding + the next signal run.

Output: Verification status (confirmed fixed, regressed, unchanged).

Verification logic

After a finding is marked resolved:

The next dbt build re-runs all signals
If the finding_id no longer appears → confirmed fixed
If it reappears with lower severity or lower money_at_risk → improving
If it reappears unchanged or worse → regressed (auto-reopen)

Temporal tracking

Because finding_id is deterministic (hash of signal + tenant + entity + time_bucket), the same finding across builds can be tracked over time. A finding_history table records:

Field	Description
`finding_id`	The finding
`build_timestamp`	When the dbt build ran
`severity`	Severity at this point in time
`money_at_risk`	Impact at this point in time

This enables trend-over-time views: “this finding has been open for 3 builds, money_at_risk is increasing.”

How to register a new signal

Follow this checklist when adding a new signal:

1. Write the signal SQL

Create models/probes/signal_findings__<probe_id>.sql:

Materialized as table, tagged probe: {{ config(materialized='table', tags=['probe']) }}
Reference Gold entities: {{ ref('gold_...') }}. Data quality signals may also reference Silver: {{ ref('silver_...') }}
Emit the standard findings columns: finding_id, tenant_id, probe_id, probe_version, severity, entity_type, entity_id, time_bucket, money_at_risk, evidence
finding_id must be deterministic: md5('probe_id' || '|' || tenant_id || '|' || entity_id || '|' || time_bucket)
evidence is a JSON string with signal-specific details

2. Register in signal_registry

The registry is auto-generated by signalregistry.py. Two options:

Option A (auto-generated): Just run python3 scripts/signalregistry.py. The tool reads your signal YAML and generates display_name, description, and interpretation templates automatically from the DSL structure + glossary.

Option B (override): Add a registry: block to your signal YAML with custom tri-lingual text, then run signalregistry.py:

registry:
  probe_category: financial_anomaly
  display_name:
    de: "Kurzer Name DE"
    en: "Short Name EN"
    fr: "Nom court FR"
  description:
    de: "Beschreibung..."
    en: "Description..."
    fr: "Description..."
  interpretation:
    de: "Material {entity_id} zeigt..."
    en: "Material {entity_id} shows..."
    fr: "Le matériau {entity_id} montre..."

Placeholders ({...}) are resolved from evidence JSON keys and standard finding fields (entity_id, entity_type, money_at_risk, severity, time_bucket, probe_id). See docs/architecture/signal_registry.md for the full override mechanism.

3. Add to platform_probe_findings

Edit models/platform/platform_probe_findings.sql:

Add a -- depends_on: {{ ref('signal_findings__<probe_id>') }} comment
Append 'signal_findings__<probe_id>' to the probe_models list

(This is a dbt limitation — ref() must be statically parseable.)

4. Document in YAML

Add an entry to models/probes/probes.yml with description and column docs.

5. Build and verify

# Build for a tenant with relevant data
dbt build --select signal_findings__<probe_id> \
  --vars '{"tenant_id": "my_tenant"}'

# Check findings
dbt show --inline "select severity, count(*) from my_tenant.signal_findings__<probe_id> group by 1" \
  --vars '{"tenant_id": "my_tenant"}'

6. What you do NOT need to change

Explorer TypeScript — signal discovery is dynamic (information_schema)
Interpretation rendering — templates come from signal_registry
Platform union — already covered in step 3; no other platform files needed

What exists today vs. what’s planned

Phase	Status	What exists
1 Detect	Implemented	20+ signals across 3 categories, DSL compiler with 10 signal types, hand-written SQL, platform union, dynamic discovery in Explorer
2 Assess	Implemented	5 perspectives (3 entity-level + 1 composed), multi-field/compound severity rules, health_score, composability
3 Interpret	Implemented	Template-based (DE/EN/FR) via `signal_registry` table, generic fallback for unregistered signals
4 Prioritise	Basic	Sort by money_at_risk within severity in Explorer
5 Act	Not started	—
6 Resolve	Not started	—
7 Verify	Not started	—

Next steps

Temporal persistence — track entity health_score over time via finding_history table
Verdict layer — root cause analysis for confirmed theses (see docs/design/analytics_pyramid_roadmap.md)
Recommended actions metadata per signal
Resolution tracking (mutable state layer)
Finding history / verification loop
Composite priority scoring

Design principles

Signals are self-describing — carry their own metadata, interpretation templates, and recommended actions. No central registry that must be updated.
Detection is immutable — findings are recomputed fresh each build. Mutable state (resolution, assignment) lives in a separate layer.
Input layer matches the question — financial_anomaly and compliance signals read Gold (validated business entities). Data quality signals may read Silver (to see what Gold filtered out). Bronze is never accessed.
Perspectives consume findings, not data — perspectives are second-order: they correlate symptoms into verdicts. They never query Gold or Silver directly.
Dynamic discovery — new signals and perspectives are picked up automatically. No hardcoded lists.
Human-in-the-loop by default — jinflow verdicts, humans decide and act. Automation is additive, not mandatory.
Feedback closes the loop — the system learns whether interventions worked by re-probing and comparing.

Relationship to other docs

Signal registry → docs/architecture/signal_registry.md (auto-generated tri-lingual metadata, glossary, override mechanism)
Signal YAML schema → docs/architecture/probe_yaml_json_schema_v1.md
Signal validation spec → docs/architecture/production_grade_probe_validation_spec.md
Signal v2 composability → docs/design/probe_yaml_v2_composability_dependencies.md
Gold contract → contracts/gold_contract.v1.json
Terminology glossary → docs/architecture/terminology.md
I/O coefficient analysis → docs/investigations/io-coefficient/io_coefficient_spike.md

jinflow is a jazzisnow product

v0.45.1 · built 2026-04-17 08:14 UTC