Skip to content

Design Principles

These are the principles of the jinflow.core analytical engine — the generic framework that domain packs instantiate for specific verticals (e.g. healthcare, winemaking, logistics). Four core values define the identity; eighteen principles implement them.


ValueMottoSupporting principles
DeclarativeYou declare what to detect. The engine does the rest.#9 YAML-to-SQL Compilation, #3 Contract-First, #8 Metadata-Driven, #16 Pragmatic Generalization
TransparentNothing is hidden. Everything is queryable and rebuildable.#4 No Silent Filtering, #6 Rebuildable by Default, #17 Calibrated Not Just Tested, #1 Layered Architecture
IndependentWorlds, tenants, and layers don’t block each other.#18 Parallel Evolution, #7 Processing and Exploration are Independent, #5 Tenant Isolation, #2 Source-System Agnosticism
HumanPeople decide. AI assists. Knowledge is captured.#10 Human-in-the-Loop, #11 AI as SME, #12 AI as Colleague, #13 No AI in Data Path, #14 Privacy, #15 Tri-Lingual

The 18 principles below are the implementation. These 4 values are the identity. If a design decision aligns with the principles but violates a value, the value wins.


Bronze is structure. Silver is domain truth. Gold is consumption.

Three medallion layers with strict responsibility boundaries. Bronze ingests and maps columns. Silver validates, casts, and flags. Gold filters to valid rows and presents the product contract. Layer responsibilities must not leak — Silver never drops rows, Gold never validates, Bronze never interprets.

No upward references.

Dependencies flow downward only: signals reference Gold, Gold references Silver, Silver references Bronze. No layer imports from or reaches into a layer above it.


Dispatch once, at Bronze. Then forget where the data came from.

Source-system-specific column mappings are resolved in Bronze via dispatch macros. From Silver onward, the system operates on a canonical schema. No model above Bronze knows or cares which ERP produced the data. Adding a new source system means writing dispatch macros — nothing else changes.


Versioned JSON schemas define every boundary.

Gold entities, Silver entities, signal findings, verdict findings — each has a versioned contract (gold_contract.v1.json, findings_contract.v1.json, etc.). Compilers validate references at build time, not runtime. If a signal references a field that doesn’t exist in the contract, it fails to compile. The contract is the API between layers.


Invalid rows are flagged, never dropped.

Every Silver model produces is_valid and invalid_reason for every row. Gold filters to is_valid = true, but the invalid rows remain queryable in Silver. Quality is a first-class artifact — metrics models (silver_quality_metrics, gold_quality_metrics) expose data quality as dbt models, not hidden log lines.


Schema-per-tenant. Data never mingles.

Each tenant gets its own DuckDB schema. Tenant A cannot see Tenant B’s data at any layer. The platform layer unions across tenants for cross-tenant analytics, but individual tenants are resettable and rebuildable independently. Tenant isolation is structural (schema routing), not just a WHERE clause.


The analytical database is ephemeral.

All dbt-managed layers are fully reproducible from source CSVs. Delete dev.duckdb and rebuild — you get the same result. Synthetic data is deterministic (seeded PRNG). Builds are idempotent. The only irreplaceable artifacts are the source CSVs in tenants/ and the YAML definitions.


7. Processing and Exploration are Independent

Section titled “7. Processing and Exploration are Independent”

They share a contract, not a runtime.

The processing pipeline (dbt) writes to DuckDB. The Explorer (SvelteKit) reads from DuckDB in read-only mode. No lock contention, no shared state, no runtime coupling. Processing could run on a schedule; exploration runs on demand. The Gold schema is the interface between them. Evidence (the BI layer) is similarly read-only and decoupled.


Assume schema, not content.

Dimensions are discovered at runtime by querying information_schema.columns for gold_* tables — the Explorer never hardcodes entity lists. Signal registries are compiled from YAML. Taxonomy structures are generic trees, not domain-specific hierarchies. The system knows how to render a dimension; it does not know what a “cost center” is until the data tells it.


One compiler pattern for everything.

Signals, theses, verdicts, registries, lineage — all follow the same lifecycle: define in YAML, validate against a contract, compile to SQL, build with dbt. Every compiler supports --check for dry-run validation. The YAML is the source of truth; the SQL is a build artifact. Human-readable definition in, executable query out.

Don’t extend the DSL until a second instance validates the pattern.

If only one signal has a particular shape, hand-write it as SQL. The bar for a new DSL type is a second signal of the same shape. Premature abstraction is worse than duplication.


The system detects and explains. Humans decide and act.

The system is the diagnostic lab, not the surgeon. It identifies what’s wrong (signals), scores entity health (perspectives), tests business concerns (theses), explains root causes (verdicts). But the treatment — what to actually do — requires floor knowledge that lives in people’s heads. Prescriptions are recommendations, not automated actions. Automation is additive, never mandatory.


AI captures domain knowledge. The pipeline executes it.

Claude writes signal YAML, thesis definitions, verdict rules, interpretation templates, tri-lingual display text. The AI is the SME encoder — it translates domain expertise into structured, validated, version-controlled artifacts. These artifacts are then compiled to deterministic SQL. The knowledge is captured once and runs forever without AI involvement.


The codebase is AI-native.

Claude Code builds the system itself — SQL, Python, Svelte, dbt macros, tests, documentation. CLAUDE.md is simultaneously a human-readable specification and an AI-executable instruction set. The project is designed to be worked on by both humans and AI from the start, not retrofitted for AI assistance. The AI reads the spec, understands the architecture, and writes code that follows the conventions.


Zero LLM calls in production.

dbt builds are deterministic SQL. Explorer queries are deterministic SQL. Evidence reports are deterministic SQL. No model calls an API, no query involves probabilistic inference, no dashboard depends on a language model. AI stays in the development loop (writing code, writing YAML, writing docs) and never enters the processing or exploration loop. The system a user queries is fully reproducible and explainable without AI.


All IDs are pseudonymised tokens.

Sensitive identifiers are pseudonymised before they enter the system. Anonymisation is a first-class service in the architecture, not an afterthought bolted on before go-live. The analytical database contains no directly identifying information.


Every user-facing string exists in DE/FR/EN.

Not a translation phase after launch. Signal descriptions, thesis interpretations, verdict explanations, Explorer labels — all are authored in three languages from the start. The registry compilers enforce completeness. If a language is missing, the build tells you.


80% generic, 20% domain-specific. Keep it that way.

The analytical engine (compilers, Explorer shell, medallion pattern, taxonomy engine, findings contract) is domain-agnostic. The domain knowledge (entity schemas, source-system dispatch, signal definitions, interpretation text) is concentrated in well-identified files. Don’t abstract until a second domain validates the boundary. The first domain is the paying customer today — the split must not slow domain-specific iteration. If it does, the abstraction is wrong.


Synthetic tenants carry deliberately injected defects.

Each synthetic tenant has seeded data quality issues: orphan billing, duplicate records, phantom usage, stale pricing, timing anomalies. A calibration harness measures precision and recall — does the signal find the injected defects, and does it avoid false positives? Real tenant data then validates that the system works beyond test fixtures. The gap between synthetic and real is itself a finding.


The pipeline and the instruments evolve continuously and independently.

The P-world (Pipeline) and the T-world (Talk) are two parallel evolution tracks sharing a stable interface: Entity + Contract. The engineer improves source adapters, validation rules, and data quality — without touching instruments. The consultant tunes signals, refines theses, captures expert knowledge — without touching the pipeline. Neither world blocks the other.

jinflow make rebuilds both worlds in one pass. jinflow evolve assists both — whether you’re debugging a dbt model or drafting a new thesis. The two worlds are not sequential phases (“build first, then analyze”). They are concurrent, ongoing, and independently improvable.


make is a pure function: AFS in → KLS out.

Every build must be reproducible from the AFS commit it was built on. make reads only from the AFS and writes only to the KLS — no SIS mutation, no external fetches, no network calls, no silent discovery. All side effects (extracting source data, fetching SIS content, touching the DLZ) happen in pre-make, which commits its outputs to the AFS before make runs.

This makes make the security boundary: whoever controls make controls the analytical truth. It also makes builds auditable — the AFS git history is the complete provenance of every KLS.

The companion principle for the ingress layer is Extractor Discipline: everything crossing the extract layer must be listed, pinned by SHA-256, verified on every run, logged in an append-only audit, and failure-intolerant. The configuration IS the contract, and the contract is the audit trail.


In the three-layer split (jinflow.core / jinflow.erp / domain packs):

  • jinflow.core owns principles 1–13 and 16–17 — the engine’s architecture, compilation model, AI philosophy, and generalization strategy.
  • Domain packs add principles 14–15 as domain and regulatory constraints specific to their context (e.g. healthcare, winemaking).

The engine doesn’t mandate privacy or tri-lingual text. The product does.

jazzisnow jinflow is a jazzisnow product
v0.45.1 · built 2026-04-17 08:14 UTC