Signal YAML Reference

Field-level specification for signal (probe) and perspective (assessment) YAML definitions.

Overview

Property	Value
File location	`probes/probe_.yaml` (signals) or `probes/assessment_.yaml` (perspectives)
ID rule	`probe_id` must match filename (without `.yaml`)
Validator	`python3 scripts/signalcheck.py`
Compiler	`python3 scripts/signalcompile.py`
Compiler dry-run	`python3 scripts/signalcompile.py --check`
Registry compiler	`python3 scripts/signalregistry.py`

Core Fields

All fields in this table are required on every signal.

Field	Type	Valid values
`probe_id`	string	Must match filename; prefix `probe_` or `assessment_`
`version`	string	Semantic version (e.g. `"1.0.0"`)
`contract`	string	`"gold.v1"` (signals), `"findings.v1"` (perspectives), `"silver.v1"` (silver_audit)
`type`	string	One of the 13 types listed in Signal Types
`severity`	string	`high` \| `medium` \| `low`
`description`	string	Free-text explanation of what the signal detects

Optional top-level fields:

Field	Type	Description
`created_at`	string	ISO date (`YYYY-MM-DD`)
`modified_at`	string	ISO date (`YYYY-MM-DD`)
`evidence_fields`	list	Field names included in the findings `evidence` JSON column

Scope Block

Required on all signals.

Field	Type	Required	Description
`scope.entity_type`	string	yes	Entity label for findings output (e.g. `BillingEvent`, `Shipment`)
`scope.group_by`	list	yes	Fields that define one finding row (min 1)
`scope.time.entity`	string	yes	Entity containing the time field
`scope.time.field`	string	yes	Date/timestamp field name
`scope.time.bucket`	string	yes	`week` \| `month` \| `quarter` \| `raw`

Registry Block (optional)

Tri-lingual metadata for the signal registry. Auto-generated from the glossary if omitted.

Field	Type	Valid values
`registry.probe_category`	string	Freeform (e.g. `inventory_integrity`, `customs_compliance`)
`registry.risk_tier`	string	`direct_financial` \| `compliance_exposure` \| `operational_signal`
`registry.confidence_weight`	float	0.0-1.0
`registry.display_name`	dict	`{en, de, fr}`
`registry.description`	dict	`{en, de, fr}`
`registry.interpretation`	dict	`{en, de, fr}` — supports `{field}` placeholders from `evidence_fields`

Severity Rules

Standard form

Rules are evaluated top-down; first match wins. Must end with a default entry.

severity_rules:
  - above: 10.0
    level: high
  - above: 2.0
    level: medium
  - default: low

Field	Type	Description
`above`	number	Threshold (exclusive)
`level`	string	`high` \| `medium` \| `low`
`default`	string	Fallback severity

Some types support field to target a specific column:

severity_rules:
  - field: finding_count
    above: 50
    level: high

Perspective compound form

Multiple conditions AND-joined within a single rule:

severity_rules:
  - conditions:
      - field: finding_count
        above: 50
      - field: total_risk
        above: 10000
    level: high
  - field: finding_count
    above: 10
    level: medium
  - default: low

Money at Risk

Form	Example	Typical use
Expression	`"abs(left_total - right_total)"`	balance, ratio
Field reference	`total_risk`	perspective
Fixed amount	`money_at_risk_fixed: 500`	mandatory_item

Signal Types

All 13 valid types and their required type-specific blocks:

Type	Contract	Type-specific keys	Purpose
`balance`	gold.v1	`left`, `right`, `join_key`, `tolerance_pct`	Compare two aggregates per entity
`assessment`	findings.v1	`source_probes`	Aggregate findings across signals
`duplicate`	gold.v1	`duplicate`	Find duplicates by field combination
`reconciliation`	gold.v1	`left`, `right`, `join`, `derived`, `flag`	Cross-entity reconciliation
`temporal_sequence`	gold.v1	`sequence`	Validate event ordering
`distribution_outlier`	gold.v1	`metric`, `distribution`	Flag statistical outliers (z-score)
`ratio`	gold.v1	`numerator`, `denominator`, `expected_ratio`, `tolerance_pct`, `direction`	Ratio against expected value
`mandatory_item`	gold.v1	`qualifying`, `required`	Check required items exist
`trend`	gold.v1	`metric`, `trend`	Detect worsening trends
`silver_audit`	silver.v1	`source`, audit-specific fields	Audit Silver-layer data quality
`entity_filter`	gold.v1	`filter`	Filter entities by condition
`enrichment`	gold.v1	`dimension`, `fact`, `derived`, `flag`	Enrich entities with computed fields
`hand_written`	gold.v1	(none — reads `probes/{probe_id}.sql`)	Raw SQL escape hatch

Type: balance

Compares two aggregated sides per entity group. The compiler calculates balance_pct.

Left/right side fields:

Field	Type	Required	Description
`left.entity` / `right.entity`	string	yes	Contract entity name
`left.expression` / `right.expression`	string	yes	SQL expression to aggregate
`left.aggregate` / `right.aggregate`	string	yes	`sum` \| `count` \| `avg` \| `min` \| `max`
`left.alias` / `right.alias`	string	yes	Column alias
`left.where` / `right.where`	dict	no	Filter conditions (`{}` for none)

Join and tolerance:

Field	Type	Required	Description
`join_key`	string or list	yes	Field(s) to join left and right
`tolerance_pct`	float	yes	Percentage threshold for flagging

Type: perspective

Aggregates findings from multiple signals. Must use contract: "findings.v1".

Field	Type	Required	Description
`source_probes`	list	yes	Signals to aggregate
`source_probes[].probe_id`	string	yes	Must reference an existing signal YAML
`source_probes[].weight`	integer	yes	Relative weight for scoring

Aggregate columns available for severity rules: probe_count, finding_count, total_risk, worst_severity, probes_flagged.

Type: duplicate

Finds records where match fields are identical but a conflict field differs.

Field	Type	Required	Description
`duplicate.entity`	string	yes	Contract entity name
`duplicate.match_fields`	list	yes	Fields identifying “same” records
`duplicate.conflict_field`	string	yes	Field that should be consistent
`duplicate.min_distinct`	integer	no	Min distinct conflict values to flag (default: 2)

Type: reconciliation

Cross-entity reconciliation with independent aggregation, join, and derived metrics.

Field	Type	Required	Description
`left.entity` / `right.entity`	string	yes	Contract entity name
`left.group_by` / `right.group_by`	list	yes	Group-by fields
`left.aggregates` / `right.aggregates`	list	yes	`[{aggregate, expression, alias}]`
`left.where` / `right.where`	dict	no	Filter conditions
`join.keys`	list	yes	Join key fields
`join.time_key`	string	yes	Time bucket column
`join.type`	string	yes	`full_outer` \| `left` \| `inner`
`derived`	list	yes	`[{expression, alias}]` computed columns
`flag`	string	yes	SQL WHERE for flagging
`entity_id_field`	string	yes	Field used as entity ID
`time_bucket_field`	string	yes	Time bucket field

Type: temporal_sequence

Validates expected ordering or presence of events in a sequence.

Field	Type	Required	Description
`sequence.entity`	string	yes	Entity containing actual events
`sequence.order_field`	string	yes	Timestamp/ordering field
`sequence.group_by`	string	yes	Group key (e.g. `shipment_id`)
`sequence.expected_steps.source`	string	yes	Entity defining expected sequence
`sequence.expected_steps.match_field`	string	yes	Field matching expected to actual
`sequence.expected_steps.step_field`	string	yes	Step ordering field
`sequence.expected_steps.location_field`	string	no	Location field for spatial comparison
`sequence.actual_steps.field`	string	yes	Actual field to compare

Remaining Types (brief)

distribution_outlier — Flags statistical outliers. Requires metric (entity, expression, alias) and distribution (method: zscore, threshold, baseline_group).

ratio — Compares numerator/denominator against expected ratio. Requires numerator and denominator (each: entity, expression, aggregate, alias), expected_ratio, tolerance_pct, and optional direction (above | below | both).

mandatory_item — Checks qualifying entities have required items. Requires qualifying (entity, join_key, where) and required (entity, join_key, min_count). Uses money_at_risk_fixed.

trend — Detects worsening trends via rolling metrics. Requires metric and trend blocks with rolling window configuration.

silver_audit — Audits Silver-layer data quality. Uses contract: "silver.v1". Two variants: validity (filters by is_valid, groups by invalid_reason) and activity (joins to activity tables, filters on aggregates).

entity_filter — Filters entities by SQL WHERE condition. Requires source with entity, where clause, entity_id_field.

enrichment — Joins fact to filtered dimension, computes derived metrics. Requires dimension, fact, derived, flag.

hand_written — SQL escape hatch. No type-specific YAML keys. Reads probes/{probe_id}.sql which must emit the standard findings columns.

Validation Rules

probe_id must match the YAML filename (without .yaml).
All entity/field references are validated against the contract’s entity definitions.
severity_rules must end with a default entry.
scope.time.bucket must be one of: week, month, quarter, raw.
registry.risk_tier must be one of: direct_financial, compliance_exposure, operational_signal.
Expression strings are validated for safe characters (alphanumeric, _, *, +, -, /, (, ), ., space).
All probe_id references in perspectives must have a corresponding YAML file.
Aggregate functions must be one of: sum, count, avg, min, max.

Minimal Example (balance)

probe_id: probe_warehouse_balance
version: "1.0.0"
contract: "gold.v1"
type: balance
severity: high
description: >
  Compares inventory snapshot quantities against cumulative inbound
  minus outbound movements.

scope:
  entity_type: InventorySnapshot
  group_by: [warehouse_id, item_id]
  time:
    entity: InventorySnapshot
    field: snapshot_date
    bucket: month

left:
  entity: InventorySnapshot
  expression: "quantity"
  aggregate: sum
  alias: snapshot_quantity
  where: {}

right:
  entity: Checkpoint
  expression: "CASE WHEN direction = 'inbound' THEN quantity ELSE -quantity END"
  aggregate: sum
  alias: movement_balance
  where:
    checkpoint_type: warehouse

join_key: [warehouse_id, item_id]
tolerance_pct: 2.0

severity_rules:
  - above: 10.0
    level: high
  - above: 2.0
    level: medium
  - default: low

money_at_risk: "abs(snapshot_quantity - movement_balance) * avg_item_value"

evidence_fields: [warehouse_id, item_id, snapshot_quantity, movement_balance, balance_pct]

Findings Output Contract

Every compiled signal emits rows with these columns:

Column	Type	Description
`finding_id`	string	Deterministic surrogate key
`tenant_id`	string	Tenant identifier
`probe_id`	string	Signal identifier
`probe_version`	string	Semantic version
`severity`	string	`high` \| `medium` \| `low`
`entity_type`	string	From `scope.entity_type`
`entity_id`	string	Entity identifier
`time_bucket`	string	Formatted time bucket
`money_at_risk`	numeric	Financial exposure
`evidence`	string	JSON object with signal-specific fields

jinflow is a jazzisnow product

v0.45.1 · built 2026-04-17 08:14 UTC