Data Engineer Cheat Sheet

Build Pipeline

jinflow make                         # build default tenant
jinflow make millesime.domaine_zufferey   # specific tenant
jinflow make millesime                    # all tenants in pack
jinflow make --all                   # all tenants, all packs
jinflow make --clean                 # drop + rebuild from scratch
jinflow make --sync                  # copy CSVs from DLZ first
jinflow make --extract               # XLSX → CSV → sync → build
jinflow make --snapshot post-audit   # freeze KLS after build

Build Phases

Phase	What happens
0a	Extract XLSX → CSV (if `--extract`)
0b	Sync DLZ → raw/ (if `--sync`)
1	Validate + enrich CSVs
2	Compile instruments (probes, hypotheses, diagnoses, SMEbits, reports, entities, lineage)
3a	dbt build: Bronze → Silver → Gold
3b	dbt build: Probes → Assessments → Hypotheses → Diagnoses → SMEbits
3c	dbt build: Lineage
3d	dbt build: Reports
4	Pipeline graph, PDF reports, calibration
5	Stamp metadata, bake AFS archive, create SIS

Medallion Layers

jinflow follows the medallion architecture popularized by Databricks.

Layer	Purpose	Materialization	Key principle
Bronze	Structural ingestion	TABLE	Source-system dispatch, adds `source_file` + `row_number`
Silver	Domain validation	TABLE	`is_valid` + `invalid_reason` on every row
Gold	Consumption contract	VIEW	Only `is_valid = true`, source-system agnostic
Platform	Cross-tenant	VIEW	UNION ALL with `tenant_id`

Instrument Compilation

# Validate (check YAML against contracts)
python3 scripts/probecheck.py
python3 scripts/hypothesischeck.py
python3 scripts/diagnosischeck.py
python3 scripts/smebitcheck.py
python3 scripts/reportcheck.py

# Compile (YAML → dbt SQL)
python3 scripts/probecompile.py          # --check for dry-run
python3 scripts/hypothesiscompile.py
python3 scripts/diagnosiscompile.py
python3 scripts/smebitcompile.py
python3 scripts/reportcompile.py

# Registry
python3 scripts/proberegistry.py

Drift Detection

python3 scripts/docsdrift.py             # full report
python3 scripts/docsdrift.py --quiet     # exit 1 if drift found

Checks: CLI commands vs cheatsheet, Explorer routes vs guide, probe types vs instruments guide, frontmatter stamps.

Tenant Management

jinflow init --tenant my_analysis --source-system csv        # from scratch
jinflow init --pack millesime --tenant domaine_zufferey --source-system opale  # from pack
jinflow clone millesime.domaine_zufferey --name sandbox          # clone tenant
jinflow us millesime.domaine_zufferey                            # set default
jinflow ls                                                  # list all tenants
jinflow stat                                                # KLS health check

AFS Operations

jinflow afs update --do-it          # sync pack → tenant
jinflow afs reset --do-it           # hard reset to pack state
jinflow afs status                  # git status
jinflow afs log                     # commit history
jinflow afs pull                    # pull from remote
jinflow afs push -m "message"       # commit + push
jinflow afs remote <url>            # set git remote

Tenant Layout

{live_root}/{pack}/{tenant}/
  afs/       ← instruments, dbt, contracts (git-backed)
  raw/       ← immutable source CSVs
  build/     ← intermediaries (managed by make)
  store/     ← KLS + SIS + snapshots

Key File Naming

File	Pattern	Example
KLS (working)	`{pack}_{tenant}_kls.duckdb`	`millesime_domaine_zufferey_kls.duckdb`
SIS	`{pack}_{tenant}_sis.duckdb`	`millesime_domaine_zufferey_sis.duckdb`
Snapshot (auto)	`YYYYMMDD-HHMM_kls.duckdb`	`20260322-1430_kls.duckdb`
Snapshot (named)	`{tag}_kls.duckdb`	`post-audit_kls.duckdb`

dbt Commands

# Always use .venv/bin/dbt (NOT system dbt)
.venv/bin/dbt build --vars '{"tenant_id": "my_tenant"}'
.venv/bin/dbt build --select probes --vars '{"tenant_id": "my_tenant"}'
.venv/bin/dbt test --vars '{"tenant_id": "my_tenant"}'
.venv/bin/dbt show --select model_name --vars '{"tenant_id": "my_tenant"}'

Environment Variables

Variable	Purpose
`JINFLOW_LIVE`	Override live root
`JINFLOW_TENANT`	Override default tenant
`JINFLOW_DB_PATH`	Explicit KLS path (Explorer)
`JINFLOW_AFS_ROOT`	AFS root (Explorer)
`JINFLOW_SYSTEM_DB_PATH`	System DB path
`JINFLOW_PACKS_ROOT`	Pack repos directory
`ANTHROPIC_API_KEY`	Claude API key (evolve)

jinflow is a jazzisnow product

v0.45.1 · built 2026-04-17 08:14 UTC