Build systems that
respect their users.

Principles. Technology. Roots.

jinflow is inspired by the virtue of jin
benevolence, clarity, and flow.

Not a marketing exercise. A design constraint.
Every architectural decision is tested against these three qualities.

The qualities that shape every decision

Benevolence
The system serves its users, not itself.

Human-in-the-loop by default. The system diagnoses; humans decide and act. Automation is additive, never mandatory.

Respect for the person at the other end.
Clarity
Nothing is hidden. Everything is traceable.

No silent filtering. Contracts at every boundary. Quality is visible, queryable, first-class.

The system earns trust by showing its work.
Flow
Data flows through layers. Knowledge flows from people into artifacts. Understanding flows back.

Make → Explore → Evolve. The cycle never ends.

The system gets better because the artifacts get better.

Seventeen Commitments

Not aspirational. They describe what is built, not what's planned.

Three principles define
the character of the system

No Silent Filtering
Invalid rows are flagged, never dropped.
Every row gets is_valid and invalid_reason.

Only at the Gold layer do invalid rows disappear from views — and they remain queryable in Silver.

Data quality is not a prerequisite for analysis. It is analysis.
Contract-First
Versioned JSON schemas define every boundary.
The Gold layer is not “clean data.” It’s a published API.

Change it, and consumers break loudly.
Not silently, not next Monday, not in a dashboard nobody checks.

Schema changes are a deliberate, versioned act.
Human-in-the-Loop
The system is the diagnostic lab, not the surgeon.

It identifies what’s wrong. It scores severity.
It explains root causes. It recommends.

But the treatment requires floor knowledge that lives in people’s heads.

Automation is additive, never mandatory.

How the system is structured

  • 1Layered architecture. Bronze is structure. Silver is domain truth. Gold is consumption. Layer responsibilities must not leak.
  • 2Source-system agnosticism. Dispatch once, at Bronze. Then forget where the data came from.
  • 5Tenant isolation. Schema-per-tenant. Data never mingles. Structural isolation, not just a WHERE clause.
  • 6Rebuildable by default. The analytical database is ephemeral. Delete it and rebuild — same result.
  • 7Processing and exploration are independent. They share a contract, not a runtime. No lock contention, no shared state.

Declare once. Compile forever.

  • 8Metadata-driven, not hardcoded. The system knows how to render a dimension. It doesn’t know what a “cost center” is until the data tells it.
  • 9YAML-to-SQL compilation. Define in YAML, validate against a contract, compile to SQL, build with dbt. The YAML is the source of truth. The SQL is a build artifact.

A materials manager can read the YAML and understand what a diagnostic checks
without knowing SQL. An engineer can read the SQL without knowing logistics.

Don’t extend the language until a second instance validates the pattern.
Premature abstraction is worse than duplication.

Three roles for AI. One absolute boundary.

P11: AI as Subject Matter Expert
AI captures domain knowledge —
translates expertise into structured,
validated, version-controlled artifacts.

The knowledge encoder.
P12: AI as Programming Colleague
The codebase is AI-native, not AI-assisted.
CLAUDE.md is simultaneously a
human-readable spec and an AI-executable
instruction set.

Specification engineering.
P13: No AI in the Data Path
Zero LLM calls in production.

Every query is deterministic SQL.
Reproducible. Auditable. Explainable.
No probabilistic inference in the pipeline.

The absolute boundary.
AI is brilliant at encoding knowledge.
But once knowledge is encoded,
you don’t need AI to execute it.

You need deterministic, reproducible, auditable computation.
SQL does that. Language models don’t.

AI writes the YAML. The pipeline runs the SQL. Forever.

Boring Where It Matters.
Sharp Where It Counts.

Every technology choice serves a principle.

Each choice earns its place

DuckDB
The Knowledge Store
Embedded. No server. No credentials. A single file you can hand to someone. The universal contract between make, explore, and evolve.
dbt
The Transformation Layer
SQL models, tests, lineage. The medallion pipeline is dbt models. Compilers generate dbt-compatible SQL. The ecosystem does the heavy lifting.
YAML
The Knowledge Format
Human-readable. Version-controllable. Diffable. A domain expert can review a probe definition. A compiler validates it. Both work from the same file.
Python
The Engine
CLI, compilers, validators. Same library locally and in Lambda. Nothing exotic — argparse, pathlib, json, yaml. The boring parts are boring on purpose.
SvelteKit
The Explorer
Fast, light, server-rendered. Opens DuckDB read-only. No framework overhead in the way of the data. Desktop and web from the same codebase.
Claude
The AI Backbone
Domain expertise on demand. Writes YAML, writes code, never runs in production. The knowledge encoder that never enters the data path.

The file is the database.
The database is the contract.

  • Portable. A single file. Copy it, email it, put it on S3, hand it to an auditor. No server, no connection string, no credentials.
  • Ephemeral. Delete it and rebuild from source — same result. The database is a deterministic function of artifacts + data.
  • Snapshottable. Freeze the working copy as an immutable file. Self-describing: git commit, build timestamp, tenant metadata stamped inside.
  • Swappable. DuckDB today. Snowflake or BigQuery tomorrow. The Explorer queries through an adapter. The Gold schema is standard SQL.

Three things that matter enormously

Compile-Time Validation
Reference a field that doesn’t exist?
The compile fails.

Not the dashboard. Not the query at 2 AM.
Not the report the CFO reads on Monday.

Fail loud. Fail early. Fail here.
Uniformity
Six compilers, same pattern.
ID matches filename. Version. Scope.
Multilingual text. Registry table.
--check for dry-run.

Understand one layer, understand all.
Knowledge / Execution Split
YAML captures the “what” and “why.”
SQL captures the “how.”

A domain expert reads the YAML.
An engineer reads the SQL.
Neither needs the other’s domain.

Two audiences, one source of truth.

Where the Ideas Come From

The metaphors and convictions behind the architecture.

The system is a diagnostic lab.
Not a surgeon.

Probes
diagnostic tests
Assessments
health scores
Hypotheses
medical questions
Diagnoses
root causes
Treatment
human decision

Just as a doctor doesn’t start with a diagnosis,
jinflow doesn’t start with dashboards and hope the data is good enough.

It starts from the opposite end: build a rigorous, validated dataset first.
Only then diagnose.

Most analytics projects start with a dashboard
and work backwards.

They connect to a data source, build some charts, and hope the data is good enough.
When it isn’t — when numbers don’t add up, when a filter produces unexpected results,
when year-over-year breaks because someone renamed a cost centre —
they patch. Another filter. Another workaround.
Another dashboard that nobody trusts.

jinflow starts from the opposite end.
Build the data right. Validate every row. Flag every problem.
Make quality visible. Only then diagnose.

Tests check that code runs.
Calibration checks that code finds.

The Method
Synthetic tenants carry deliberately injected defects at known rates.

Revenue leakage at 10%. Missing mandatory items at 15%. Duplicates at 2%. Timing anomalies. Orphan records.

These aren’t bugs. They’re the signal the system is designed to detect.
The Measurement
A calibration harness computes precision and recall against the answer key.

“The probe catches 87% of planted cases.”

That’s a testable, reproducible number. Not a marketing claim. You can re-run it. You can challenge it. You can watch it improve.

The gap between synthetic and real is itself a finding.

The difference between data
and knowledge is the why.

jinflow has a concept called SMEbits — atomic pieces of expert knowledge.
Not derived from data. Contributed by people.

A process workaround that only the floor manager knows.
A system behavior that only the IT admin remembers.
A historical event that explains why this year’s numbers look different.

Each SMEbit carries attribution (who said it),
scope (where it applies), and lifecycle (is it still true?).

The validator warns when the “why” field is empty —
some things genuinely don’t have a known reason.
But the nudge ensures the question is always asked.

The spec is the product.

The project specification — CLAUDE.md
is simultaneously a human-readable architectural document
and an AI-executable instruction set.

When Claude reads it, it understands the architecture,
the naming conventions, the layer responsibilities, the compilation patterns.
It can write a new diagnostic, add an entity, or build an Explorer page
that follows all conventions — because the conventions are documented
in a form both humans and AI can parse.

This is not prompt engineering. It’s specification engineering.
Precise enough that either a human or an AI produces correct,
convention-following code from the same document.

Design Choices That Compound

These aren’t compromises. They’re compounding investments.

A system that earns trust

  • Every finding is traceable — from the probe, through Gold, through Silver validation, all the way to the raw source row. No black box in the chain.
  • Every quality metric is visible — not hidden in log files. Data quality is a dbt model, not a footnote.
  • Every expert insight is attributed and versioned — who said it, when, for what scope, and is it still true?
  • Every diagnostic is calibrated against known defects — precision and recall, not marketing claims.
  • Every domain deployment enriches the engine — new patterns, better DSL, cross-domain learning.

All seventeen principles

Architecture & Pipeline
1. Layered architecture
2. Source-system agnosticism
3. Contract-first
4. No silent filtering
5. Tenant isolation
6. Rebuildable by default
7. Processing ≠ exploration
Compilation & Knowledge
8. Metadata-driven
9. YAML-to-SQL compilation
10. Human-in-the-loop
11. AI as subject matter expert
12. AI as programming colleague
13. No AI in the data path
Product & Quality
14. Privacy by design
15. Tri-lingual from day one
16. Pragmatic generalisation
17. Calibrated, not just tested

80% generic engine. 20% domain-specific. Keep it that way.

Benevolence.
Clarity.
Flow.

The engine doesn’t know what your domain is. That’s the point.
It knows how to diagnose. What it diagnoses is a matter
of configuration, not architecture.

The architecture is settled. The domains are open.

jinflow.io