jinflow.io

Build systems that
respect their users.

Principles. Technology. Roots.

The Name

jinflow is inspired by the virtue of jin —
benevolence, clarity, and flow.

Not a marketing exercise. A design constraint.
Every architectural decision is tested against these three qualities.

Three Virtues

The qualities that shape every decision

Benevolence

The system serves its users, not itself.

Human-in-the-loop by default. The system diagnoses; humans decide and act. Automation is additive, never mandatory.

Respect for the person at the other end.

Clarity

Nothing is hidden. Everything is traceable.

No silent filtering. Contracts at every boundary. Quality is visible, queryable, first-class.

The system earns trust by showing its work.

Flow

Data flows through layers. Knowledge flows from people into artifacts. Understanding flows back.

Make → Explore → Evolve. The cycle never ends.

The system gets better because the artifacts get better.

jinflow

The Principles

Seventeen Commitments

Not aspirational. They describe what is built, not what's planned.

The Character

Three principles define
the character of the system

No Silent Filtering

Invalid rows are flagged, never dropped.
Every row gets is_valid and invalid_reason.

Only at the Gold layer do invalid rows disappear from views — and they remain queryable in Silver.

Data quality is not a prerequisite for analysis. It is analysis.

Contract-First

Versioned JSON schemas define every boundary.
The Gold layer is not “clean data.” It’s a published API.

Change it, and consumers break loudly.
Not silently, not next Monday, not in a dashboard nobody checks.

Schema changes are a deliberate, versioned act.

Human-in-the-Loop

The system is the diagnostic lab, not the surgeon.

It identifies what’s wrong. It scores severity.
It explains root causes. It recommends.

But the treatment requires floor knowledge that lives in people’s heads.

Automation is additive, never mandatory.

Architecture

How the system is structured

1Layered architecture. Bronze is structure. Silver is domain truth. Gold is consumption. Layer responsibilities must not leak.
2Source-system agnosticism. Dispatch once, at Bronze. Then forget where the data came from.
5Tenant isolation. Schema-per-tenant. Data never mingles. Structural isolation, not just a WHERE clause.
6Rebuildable by default. The analytical database is ephemeral. Delete it and rebuild — same result.
7Processing and exploration are independent. They share a contract, not a runtime. No lock contention, no shared state.

The YAML Philosophy

Declare once. Compile forever.

8Metadata-driven, not hardcoded. The system knows how to render a dimension. It doesn’t know what a “cost center” is until the data tells it.
9YAML-to-SQL compilation. Define in YAML, validate against a contract, compile to SQL, build with dbt. The YAML is the source of truth. The SQL is a build artifact.

A materials manager can read the YAML and understand what a diagnostic checks
without knowing SQL. An engineer can read the SQL without knowing logistics.

Don’t extend the language until a second instance validates the pattern.
Premature abstraction is worse than duplication.

AI Philosophy

Three roles for AI. One absolute boundary.

P11: AI as Subject Matter Expert

AI captures domain knowledge —
translates expertise into structured,
validated, version-controlled artifacts.

The knowledge encoder.

P12: AI as Programming Colleague

The codebase is AI-native, not AI-assisted.
CLAUDE.md is simultaneously a
human-readable spec and an AI-executable
instruction set.

Specification engineering.

P13: No AI in the Data Path

Zero LLM calls in production.

Every query is deterministic SQL.
Reproducible. Auditable. Explainable.
No probabilistic inference in the pipeline.

The absolute boundary.

The Deeper Point

AI is brilliant at encoding knowledge.
But once knowledge is encoded,
you don’t need AI to execute it.

You need deterministic, reproducible, auditable computation.
SQL does that. Language models don’t.

AI writes the YAML. The pipeline runs the SQL. Forever.

jinflow

The Technology

Boring Where It Matters.
Sharp Where It Counts.

Every technology choice serves a principle.

The Stack

Each choice earns its place

DuckDB

The Knowledge Store

Embedded. No server. No credentials. A single file you can hand to someone. The universal contract between make, explore, and evolve.

dbt

The Transformation Layer

SQL models, tests, lineage. The medallion pipeline is dbt models. Compilers generate dbt-compatible SQL. The ecosystem does the heavy lifting.

YAML

The Knowledge Format

Human-readable. Version-controllable. Diffable. A domain expert can review a probe definition. A compiler validates it. Both work from the same file.

Python

The Engine

CLI, compilers, validators. Same library locally and in Lambda. Nothing exotic — argparse, pathlib, json, yaml. The boring parts are boring on purpose.

SvelteKit

The Explorer

Fast, light, server-rendered. Opens DuckDB read-only. No framework overhead in the way of the data. Desktop and web from the same codebase.

Claude

The AI Backbone

Domain expertise on demand. Writes YAML, writes code, never runs in production. The knowledge encoder that never enters the data path.

Why DuckDB

The file is the database.
The database is the contract.

Portable. A single file. Copy it, email it, put it on S3, hand it to an auditor. No server, no connection string, no credentials.
Ephemeral. Delete it and rebuild from source — same result. The database is a deterministic function of artifacts + data.
Snapshottable. Freeze the working copy as an immutable file. Self-describing: git commit, build timestamp, tenant metadata stamped inside.
Swappable. DuckDB today. Snowflake or BigQuery tomorrow. The Explorer queries through an adapter. The Gold schema is standard SQL.

Why YAML → SQL

Three things that matter enormously

Compile-Time Validation

Reference a field that doesn’t exist?
The compile fails.

Not the dashboard. Not the query at 2 AM.
Not the report the CFO reads on Monday.

Fail loud. Fail early. Fail here.

Uniformity

Six compilers, same pattern.
ID matches filename. Version. Scope.
Multilingual text. Registry table.
--check for dry-run.

Understand one layer, understand all.

Knowledge / Execution Split

YAML captures the “what” and “why.”
SQL captures the “how.”

A domain expert reads the YAML.
An engineer reads the SQL.
Neither needs the other’s domain.

Two audiences, one source of truth.

jinflow

The Roots

Where the Ideas Come From

The metaphors and convictions behind the architecture.

The Diagnostic Metaphor

The system is a diagnostic lab.
Not a surgeon.

Probes
diagnostic tests

→

Assessments
health scores

→

Hypotheses
medical questions

→

Diagnoses
root causes

→

Treatment
human decision

Just as a doctor doesn’t start with a diagnosis,
jinflow doesn’t start with dashboards and hope the data is good enough.

It starts from the opposite end: build a rigorous, validated dataset first.
Only then diagnose.

Against Silent Loss

Most analytics projects start with a dashboard
and work backwards.

They connect to a data source, build some charts, and hope the data is good enough.
When it isn’t — when numbers don’t add up, when a filter produces unexpected results,
when year-over-year breaks because someone renamed a cost centre —
they patch. Another filter. Another workaround.
Another dashboard that nobody trusts.

jinflow starts from the opposite end.
Build the data right. Validate every row. Flag every problem.
Make quality visible. Only then diagnose.

Calibrated, Not Just Tested

Tests check that code runs.
Calibration checks that code finds.

The Method

Synthetic tenants carry deliberately injected defects at known rates.

Revenue leakage at 10%. Missing mandatory items at 15%. Duplicates at 2%. Timing anomalies. Orphan records.

These aren’t bugs. They’re the signal the system is designed to detect.

The Measurement

A calibration harness computes precision and recall against the answer key.

“The probe catches 87% of planted cases.”

That’s a testable, reproducible number. Not a marketing claim. You can re-run it. You can challenge it. You can watch it improve.

The gap between synthetic and real is itself a finding.

The “Why” Field

The difference between data
and knowledge is the why.

jinflow has a concept called SMEbits — atomic pieces of expert knowledge.
Not derived from data. Contributed by people.

A process workaround that only the floor manager knows.
A system behavior that only the IT admin remembers.
A historical event that explains why this year’s numbers look different.

Each SMEbit carries attribution (who said it),
scope (where it applies), and lifecycle (is it still true?).

The validator warns when the “why” field is empty —
some things genuinely don’t have a known reason.
But the nudge ensures the question is always asked.

Specification Engineering

The spec is the product.

The project specification — CLAUDE.md —
is simultaneously a human-readable architectural document
and an AI-executable instruction set.

When Claude reads it, it understands the architecture,
the naming conventions, the layer responsibilities, the compilation patterns.
It can write a new diagnostic, add an entity, or build an Explorer page
that follows all conventions — because the conventions are documented
in a form both humans and AI can parse.

This is not prompt engineering. It’s specification engineering.
Precise enough that either a human or an AI produces correct,
convention-following code from the same document.

jinflow

Why It Matters

Design Choices That Compound

These aren’t compromises. They’re compounding investments.

Compounding

A system that earns trust

Every finding is traceable — from the probe, through Gold, through Silver validation, all the way to the raw source row. No black box in the chain.
Every quality metric is visible — not hidden in log files. Data quality is a dbt model, not a footnote.
Every expert insight is attributed and versioned — who said it, when, for what scope, and is it still true?
Every diagnostic is calibrated against known defects — precision and recall, not marketing claims.
Every domain deployment enriches the engine — new patterns, better DSL, cross-domain learning.

The Full Set

All seventeen principles

Architecture & Pipeline

1. Layered architecture
2. Source-system agnosticism
3. Contract-first
4. No silent filtering
5. Tenant isolation
6. Rebuildable by default
7. Processing ≠ exploration

Compilation & Knowledge

8. Metadata-driven
9. YAML-to-SQL compilation
10. Human-in-the-loop
11. AI as subject matter expert
12. AI as programming colleague
13. No AI in the data path

Product & Quality

14. Privacy by design
15. Tri-lingual from day one
16. Pragmatic generalisation
17. Calibrated, not just tested

80% generic engine. 20% domain-specific. Keep it that way.

jinflow.io

Benevolence.
Clarity.
Flow.

The engine doesn’t know what your domain is. That’s the point.
It knows how to diagnose. What it diagnoses is a matter
of configuration, not architecture.

The architecture is settled. The domains are open.

jinflow

Build systems thatrespect their users.

The qualities that shape every decision

Seventeen Commitments

Three principles definethe character of the system

How the system is structured

Declare once. Compile forever.

Three roles for AI. One absolute boundary.

Boring Where It Matters.Sharp Where It Counts.

Each choice earns its place

The file is the database.The database is the contract.

Three things that matter enormously

Where the Ideas Come From

The system is a diagnostic lab.Not a surgeon.

Tests check that code runs.Calibration checks that code finds.

The difference between dataand knowledge is the why.

The spec is the product.

Design Choices That Compound

A system that earns trust

All seventeen principles

Benevolence.Clarity.Flow.

Build systems that
respect their users.

Three principles define
the character of the system

Boring Where It Matters.
Sharp Where It Counts.

The file is the database.
The database is the contract.

The system is a diagnostic lab.
Not a surgeon.

Tests check that code runs.
Calibration checks that code finds.

The difference between data
and knowledge is the why.

Benevolence.
Clarity.
Flow.