Design Principles
These are the principles of the jinflow.core analytical engine — the generic framework that domain packs instantiate for specific verticals (e.g. healthcare, winemaking, logistics). Four core values define the identity; eighteen principles implement them.
The Four Values
Section titled “The Four Values”| Value | Motto | Supporting principles |
|---|---|---|
| Declarative | You declare what to detect. The engine does the rest. | #9 YAML-to-SQL Compilation, #3 Contract-First, #8 Metadata-Driven, #16 Pragmatic Generalization |
| Transparent | Nothing is hidden. Everything is queryable and rebuildable. | #4 No Silent Filtering, #6 Rebuildable by Default, #17 Calibrated Not Just Tested, #1 Layered Architecture |
| Independent | Worlds, tenants, and layers don’t block each other. | #18 Parallel Evolution, #7 Processing and Exploration are Independent, #5 Tenant Isolation, #2 Source-System Agnosticism |
| Human | People decide. AI assists. Knowledge is captured. | #10 Human-in-the-Loop, #11 AI as SME, #12 AI as Colleague, #13 No AI in Data Path, #14 Privacy, #15 Tri-Lingual |
The 18 principles below are the implementation. These 4 values are the identity. If a design decision aligns with the principles but violates a value, the value wins.
The 18 Principles
Section titled “The 18 Principles”1. Layered Architecture
Section titled “1. Layered Architecture”Bronze is structure. Silver is domain truth. Gold is consumption.
Three medallion layers with strict responsibility boundaries. Bronze ingests and maps columns. Silver validates, casts, and flags. Gold filters to valid rows and presents the product contract. Layer responsibilities must not leak — Silver never drops rows, Gold never validates, Bronze never interprets.
No upward references.
Dependencies flow downward only: signals reference Gold, Gold references Silver, Silver references Bronze. No layer imports from or reaches into a layer above it.
2. Source-System Agnosticism
Section titled “2. Source-System Agnosticism”Dispatch once, at Bronze. Then forget where the data came from.
Source-system-specific column mappings are resolved in Bronze via dispatch macros. From Silver onward, the system operates on a canonical schema. No model above Bronze knows or cares which ERP produced the data. Adding a new source system means writing dispatch macros — nothing else changes.
3. Contract-First
Section titled “3. Contract-First”Versioned JSON schemas define every boundary.
Gold entities, Silver entities, signal findings, verdict findings — each has a
versioned contract (gold_contract.v1.json, findings_contract.v1.json, etc.).
Compilers validate references at build time, not runtime. If a signal references
a field that doesn’t exist in the contract, it fails to compile. The contract is
the API between layers.
4. No Silent Filtering
Section titled “4. No Silent Filtering”Invalid rows are flagged, never dropped.
Every Silver model produces is_valid and invalid_reason for every row. Gold
filters to is_valid = true, but the invalid rows remain queryable in Silver.
Quality is a first-class artifact — metrics models (silver_quality_metrics,
gold_quality_metrics) expose data quality as dbt models, not hidden log lines.
5. Tenant Isolation
Section titled “5. Tenant Isolation”Schema-per-tenant. Data never mingles.
Each tenant gets its own DuckDB schema. Tenant A cannot see Tenant B’s data at any layer. The platform layer unions across tenants for cross-tenant analytics, but individual tenants are resettable and rebuildable independently. Tenant isolation is structural (schema routing), not just a WHERE clause.
6. Rebuildable by Default
Section titled “6. Rebuildable by Default”The analytical database is ephemeral.
All dbt-managed layers are fully reproducible from source CSVs. Delete
dev.duckdb and rebuild — you get the same result. Synthetic data is
deterministic (seeded PRNG). Builds are idempotent. The only irreplaceable
artifacts are the source CSVs in tenants/ and the YAML definitions.
7. Processing and Exploration are Independent
Section titled “7. Processing and Exploration are Independent”They share a contract, not a runtime.
The processing pipeline (dbt) writes to DuckDB. The Explorer (SvelteKit) reads from DuckDB in read-only mode. No lock contention, no shared state, no runtime coupling. Processing could run on a schedule; exploration runs on demand. The Gold schema is the interface between them. Evidence (the BI layer) is similarly read-only and decoupled.
8. Metadata-Driven, Not Hardcoded
Section titled “8. Metadata-Driven, Not Hardcoded”Assume schema, not content.
Dimensions are discovered at runtime by querying information_schema.columns
for gold_* tables — the Explorer never hardcodes entity lists. Signal
registries are compiled from YAML. Taxonomy structures are generic trees, not
domain-specific hierarchies. The system knows how to render a dimension; it
does not know what a “cost center” is until the data tells it.
9. YAML-to-SQL Compilation
Section titled “9. YAML-to-SQL Compilation”One compiler pattern for everything.
Signals, theses, verdicts, registries, lineage — all follow the same
lifecycle: define in YAML, validate against a contract, compile to SQL, build
with dbt. Every compiler supports --check for dry-run validation. The YAML is
the source of truth; the SQL is a build artifact. Human-readable definition in,
executable query out.
Don’t extend the DSL until a second instance validates the pattern.
If only one signal has a particular shape, hand-write it as SQL. The bar for a new DSL type is a second signal of the same shape. Premature abstraction is worse than duplication.
10. Human-in-the-Loop by Default
Section titled “10. Human-in-the-Loop by Default”The system detects and explains. Humans decide and act.
The system is the diagnostic lab, not the surgeon. It identifies what’s wrong (signals), scores entity health (perspectives), tests business concerns (theses), explains root causes (verdicts). But the treatment — what to actually do — requires floor knowledge that lives in people’s heads. Prescriptions are recommendations, not automated actions. Automation is additive, never mandatory.
11. AI as Subject Matter Expert
Section titled “11. AI as Subject Matter Expert”AI captures domain knowledge. The pipeline executes it.
Claude writes signal YAML, thesis definitions, verdict rules, interpretation templates, tri-lingual display text. The AI is the SME encoder — it translates domain expertise into structured, validated, version-controlled artifacts. These artifacts are then compiled to deterministic SQL. The knowledge is captured once and runs forever without AI involvement.
12. AI as Programming Colleague
Section titled “12. AI as Programming Colleague”The codebase is AI-native.
Claude Code builds the system itself — SQL, Python, Svelte, dbt macros, tests, documentation. CLAUDE.md is simultaneously a human-readable specification and an AI-executable instruction set. The project is designed to be worked on by both humans and AI from the start, not retrofitted for AI assistance. The AI reads the spec, understands the architecture, and writes code that follows the conventions.
13. No AI in the Data Path
Section titled “13. No AI in the Data Path”Zero LLM calls in production.
dbt builds are deterministic SQL. Explorer queries are deterministic SQL. Evidence reports are deterministic SQL. No model calls an API, no query involves probabilistic inference, no dashboard depends on a language model. AI stays in the development loop (writing code, writing YAML, writing docs) and never enters the processing or exploration loop. The system a user queries is fully reproducible and explainable without AI.
14. Privacy by Design
Section titled “14. Privacy by Design”All IDs are pseudonymised tokens.
Sensitive identifiers are pseudonymised before they enter the system. Anonymisation is a first-class service in the architecture, not an afterthought bolted on before go-live. The analytical database contains no directly identifying information.
15. Tri-Lingual from Day One
Section titled “15. Tri-Lingual from Day One”Every user-facing string exists in DE/FR/EN.
Not a translation phase after launch. Signal descriptions, thesis interpretations, verdict explanations, Explorer labels — all are authored in three languages from the start. The registry compilers enforce completeness. If a language is missing, the build tells you.
16. Pragmatic Generalization
Section titled “16. Pragmatic Generalization”80% generic, 20% domain-specific. Keep it that way.
The analytical engine (compilers, Explorer shell, medallion pattern, taxonomy engine, findings contract) is domain-agnostic. The domain knowledge (entity schemas, source-system dispatch, signal definitions, interpretation text) is concentrated in well-identified files. Don’t abstract until a second domain validates the boundary. The first domain is the paying customer today — the split must not slow domain-specific iteration. If it does, the abstraction is wrong.
17. Calibrated, Not Just Tested
Section titled “17. Calibrated, Not Just Tested”Synthetic tenants carry deliberately injected defects.
Each synthetic tenant has seeded data quality issues: orphan billing, duplicate records, phantom usage, stale pricing, timing anomalies. A calibration harness measures precision and recall — does the signal find the injected defects, and does it avoid false positives? Real tenant data then validates that the system works beyond test fixtures. The gap between synthetic and real is itself a finding.
18. Parallel Evolution (P/T-Worlds)
Section titled “18. Parallel Evolution (P/T-Worlds)”The pipeline and the instruments evolve continuously and independently.
The P-world (Pipeline) and the T-world (Talk) are two parallel evolution tracks sharing a stable interface: Entity + Contract. The engineer improves source adapters, validation rules, and data quality — without touching instruments. The consultant tunes signals, refines theses, captures expert knowledge — without touching the pipeline. Neither world blocks the other.
jinflow make rebuilds both worlds in one pass. jinflow evolve assists both — whether you’re debugging a dbt model or drafting a new thesis. The two worlds are not sequential phases (“build first, then analyze”). They are concurrent, ongoing, and independently improvable.
19. Pure Function (Make)
Section titled “19. Pure Function (Make)”make is a pure function: AFS in → KLS out.
Every build must be reproducible from the AFS commit it was built on. make reads only from the AFS and writes only to the KLS — no SIS mutation, no external fetches, no network calls, no silent discovery. All side effects (extracting source data, fetching SIS content, touching the DLZ) happen in pre-make, which commits its outputs to the AFS before make runs.
This makes make the security boundary: whoever controls make controls the analytical truth. It also makes builds auditable — the AFS git history is the complete provenance of every KLS.
The companion principle for the ingress layer is Extractor Discipline: everything crossing the extract layer must be listed, pinned by SHA-256, verified on every run, logged in an append-only audit, and failure-intolerant. The configuration IS the contract, and the contract is the audit trail.
Where These Principles Live
Section titled “Where These Principles Live”In the three-layer split (jinflow.core / jinflow.erp / domain packs):
- jinflow.core owns principles 1–13 and 16–17 — the engine’s architecture, compilation model, AI philosophy, and generalization strategy.
- Domain packs add principles 14–15 as domain and regulatory constraints specific to their context (e.g. healthcare, winemaking).
The engine doesn’t mandate privacy or tri-lingual text. The product does.