Skip to content

Data Engineer Cheat Sheet

Terminal window
jinflow make # build default tenant
jinflow make millesime.domaine_zufferey # specific tenant
jinflow make millesime # all tenants in pack
jinflow make --all # all tenants, all packs
jinflow make --clean # drop + rebuild from scratch
jinflow make --sync # copy CSVs from DLZ first
jinflow make --extract # XLSX → CSV → sync → build
jinflow make --snapshot post-audit # freeze KLS after build
PhaseWhat happens
0aExtract XLSX → CSV (if --extract)
0bSync DLZ → raw/ (if --sync)
1Validate + enrich CSVs
2Compile instruments (probes, hypotheses, diagnoses, SMEbits, reports, entities, lineage)
3adbt build: Bronze → Silver → Gold
3bdbt build: Probes → Assessments → Hypotheses → Diagnoses → SMEbits
3cdbt build: Lineage
3ddbt build: Reports
4Pipeline graph, PDF reports, calibration
5Stamp metadata, bake AFS archive, create SIS

jinflow follows the medallion architecture popularized by Databricks.

LayerPurposeMaterializationKey principle
BronzeStructural ingestionTABLESource-system dispatch, adds source_file + row_number
SilverDomain validationTABLEis_valid + invalid_reason on every row
GoldConsumption contractVIEWOnly is_valid = true, source-system agnostic
PlatformCross-tenantVIEWUNION ALL with tenant_id
Terminal window
# Validate (check YAML against contracts)
python3 scripts/probecheck.py
python3 scripts/hypothesischeck.py
python3 scripts/diagnosischeck.py
python3 scripts/smebitcheck.py
python3 scripts/reportcheck.py
# Compile (YAML → dbt SQL)
python3 scripts/probecompile.py # --check for dry-run
python3 scripts/hypothesiscompile.py
python3 scripts/diagnosiscompile.py
python3 scripts/smebitcompile.py
python3 scripts/reportcompile.py
# Registry
python3 scripts/proberegistry.py
Terminal window
python3 scripts/docsdrift.py # full report
python3 scripts/docsdrift.py --quiet # exit 1 if drift found

Checks: CLI commands vs cheatsheet, Explorer routes vs guide, probe types vs instruments guide, frontmatter stamps.

Terminal window
jinflow init --tenant my_analysis --source-system csv # from scratch
jinflow init --pack millesime --tenant domaine_zufferey --source-system opale # from pack
jinflow clone millesime.domaine_zufferey --name sandbox # clone tenant
jinflow us millesime.domaine_zufferey # set default
jinflow ls # list all tenants
jinflow stat # KLS health check
Terminal window
jinflow afs update --do-it # sync pack → tenant
jinflow afs reset --do-it # hard reset to pack state
jinflow afs status # git status
jinflow afs log # commit history
jinflow afs pull # pull from remote
jinflow afs push -m "message" # commit + push
jinflow afs remote <url> # set git remote
{live_root}/{pack}/{tenant}/
afs/ ← instruments, dbt, contracts (git-backed)
raw/ ← immutable source CSVs
build/ ← intermediaries (managed by make)
store/ ← KLS + SIS + snapshots
FilePatternExample
KLS (working){pack}_{tenant}_kls.duckdbmillesime_domaine_zufferey_kls.duckdb
SIS{pack}_{tenant}_sis.duckdbmillesime_domaine_zufferey_sis.duckdb
Snapshot (auto)YYYYMMDD-HHMM_kls.duckdb20260322-1430_kls.duckdb
Snapshot (named){tag}_kls.duckdbpost-audit_kls.duckdb
Terminal window
# Always use .venv/bin/dbt (NOT system dbt)
.venv/bin/dbt build --vars '{"tenant_id": "my_tenant"}'
.venv/bin/dbt build --select probes --vars '{"tenant_id": "my_tenant"}'
.venv/bin/dbt test --vars '{"tenant_id": "my_tenant"}'
.venv/bin/dbt show --select model_name --vars '{"tenant_id": "my_tenant"}'
VariablePurpose
JINFLOW_LIVEOverride live root
JINFLOW_TENANTOverride default tenant
JINFLOW_DB_PATHExplicit KLS path (Explorer)
JINFLOW_AFS_ROOTAFS root (Explorer)
JINFLOW_SYSTEM_DB_PATHSystem DB path
JINFLOW_PACKS_ROOTPack repos directory
ANTHROPIC_API_KEYClaude API key (evolve)
jazzisnow jinflow is a jazzisnow product
v0.45.1 · built 2026-04-17 08:14 UTC