Build systems that
respect their users.
Principles. Technology. Roots.
benevolence, clarity, and flow.
Not a marketing exercise. A design constraint.
Every architectural decision is tested against these three qualities.
The qualities that shape every decision
Human-in-the-loop by default. The system diagnoses; humans decide and act. Automation is additive, never mandatory.
Respect for the person at the other end.
No silent filtering. Contracts at every boundary. Quality is visible, queryable, first-class.
The system earns trust by showing its work.
Make → Explore → Evolve. The cycle never ends.
The system gets better because the artifacts get better.
Seventeen Commitments
Not aspirational. They describe what is built, not what's planned.
Three principles define
the character of the system
Every row gets
is_valid and invalid_reason.Only at the Gold layer do invalid rows disappear from views — and they remain queryable in Silver.
Data quality is not a prerequisite for analysis. It is analysis.
The Gold layer is not “clean data.” It’s a published API.
Change it, and consumers break loudly.
Not silently, not next Monday, not in a dashboard nobody checks.
Schema changes are a deliberate, versioned act.
It identifies what’s wrong. It scores severity.
It explains root causes. It recommends.
But the treatment requires floor knowledge that lives in people’s heads.
Automation is additive, never mandatory.
How the system is structured
- 1Layered architecture. Bronze is structure. Silver is domain truth. Gold is consumption. Layer responsibilities must not leak.
- 2Source-system agnosticism. Dispatch once, at Bronze. Then forget where the data came from.
- 5Tenant isolation. Schema-per-tenant. Data never mingles. Structural isolation, not just a WHERE clause.
- 6Rebuildable by default. The analytical database is ephemeral. Delete it and rebuild — same result.
- 7Processing and exploration are independent. They share a contract, not a runtime. No lock contention, no shared state.
Declare once. Compile forever.
- 8Metadata-driven, not hardcoded. The system knows how to render a dimension. It doesn’t know what a “cost center” is until the data tells it.
- 9YAML-to-SQL compilation. Define in YAML, validate against a contract, compile to SQL, build with dbt. The YAML is the source of truth. The SQL is a build artifact.
A materials manager can read the YAML and understand what a diagnostic checks
without knowing SQL. An engineer can read the SQL without knowing logistics.
Don’t extend the language until a second instance validates the pattern.
Premature abstraction is worse than duplication.
Three roles for AI. One absolute boundary.
translates expertise into structured,
validated, version-controlled artifacts.
The knowledge encoder.
CLAUDE.md is simultaneously ahuman-readable spec and an AI-executable
instruction set.
Specification engineering.
Every query is deterministic SQL.
Reproducible. Auditable. Explainable.
No probabilistic inference in the pipeline.
The absolute boundary.
But once knowledge is encoded,
you don’t need AI to execute it.
You need deterministic, reproducible, auditable computation.
SQL does that. Language models don’t.
AI writes the YAML. The pipeline runs the SQL. Forever.
Boring Where It Matters.
Sharp Where It Counts.
Every technology choice serves a principle.
Each choice earns its place
The file is the database.
The database is the contract.
- Portable. A single file. Copy it, email it, put it on S3, hand it to an auditor. No server, no connection string, no credentials.
- Ephemeral. Delete it and rebuild from source — same result. The database is a deterministic function of artifacts + data.
- Snapshottable. Freeze the working copy as an immutable file. Self-describing: git commit, build timestamp, tenant metadata stamped inside.
- Swappable. DuckDB today. Snowflake or BigQuery tomorrow. The Explorer queries through an adapter. The Gold schema is standard SQL.
Three things that matter enormously
The compile fails.
Not the dashboard. Not the query at 2 AM.
Not the report the CFO reads on Monday.
Fail loud. Fail early. Fail here.
ID matches filename. Version. Scope.
Multilingual text. Registry table.
--check for dry-run.Understand one layer, understand all.
SQL captures the “how.”
A domain expert reads the YAML.
An engineer reads the SQL.
Neither needs the other’s domain.
Two audiences, one source of truth.
Where the Ideas Come From
The metaphors and convictions behind the architecture.
The system is a diagnostic lab.
Not a surgeon.
diagnostic tests
health scores
medical questions
root causes
human decision
Just as a doctor doesn’t start with a diagnosis,
jinflow doesn’t start with dashboards and hope the data is good enough.
It starts from the opposite end: build a rigorous, validated dataset first.
Only then diagnose.
and work backwards.
They connect to a data source, build some charts, and hope the data is good enough.
When it isn’t — when numbers don’t add up, when a filter produces unexpected results,
when year-over-year breaks because someone renamed a cost centre —
they patch. Another filter. Another workaround.
Another dashboard that nobody trusts.
jinflow starts from the opposite end.
Build the data right. Validate every row. Flag every problem.
Make quality visible. Only then diagnose.
Tests check that code runs.
Calibration checks that code finds.
Revenue leakage at 10%. Missing mandatory items at 15%. Duplicates at 2%. Timing anomalies. Orphan records.
These aren’t bugs. They’re the signal the system is designed to detect.
“The probe catches 87% of planted cases.”
That’s a testable, reproducible number. Not a marketing claim. You can re-run it. You can challenge it. You can watch it improve.
The gap between synthetic and real is itself a finding.
The difference between data
and knowledge is the why.
jinflow has a concept called SMEbits — atomic pieces of expert knowledge.
Not derived from data. Contributed by people.
A process workaround that only the floor manager knows.
A system behavior that only the IT admin remembers.
A historical event that explains why this year’s numbers look different.
Each SMEbit carries attribution (who said it),
scope (where it applies), and lifecycle (is it still true?).
The validator warns when the “why” field is empty —
some things genuinely don’t have a known reason.
But the nudge ensures the question is always asked.
The spec is the product.
The project specification — CLAUDE.md —
is simultaneously a human-readable architectural document
and an AI-executable instruction set.
When Claude reads it, it understands the architecture,
the naming conventions, the layer responsibilities, the compilation patterns.
It can write a new diagnostic, add an entity, or build an Explorer page
that follows all conventions — because the conventions are documented
in a form both humans and AI can parse.
This is not prompt engineering. It’s specification engineering.
Precise enough that either a human or an AI produces correct,
convention-following code from the same document.
Design Choices That Compound
These aren’t compromises. They’re compounding investments.
A system that earns trust
- Every finding is traceable — from the probe, through Gold, through Silver validation, all the way to the raw source row. No black box in the chain.
- Every quality metric is visible — not hidden in log files. Data quality is a dbt model, not a footnote.
- Every expert insight is attributed and versioned — who said it, when, for what scope, and is it still true?
- Every diagnostic is calibrated against known defects — precision and recall, not marketing claims.
- Every domain deployment enriches the engine — new patterns, better DSL, cross-domain learning.
All seventeen principles
2. Source-system agnosticism
3. Contract-first
4. No silent filtering
5. Tenant isolation
6. Rebuildable by default
7. Processing ≠ exploration
9. YAML-to-SQL compilation
10. Human-in-the-loop
11. AI as subject matter expert
12. AI as programming colleague
13. No AI in the data path
15. Tri-lingual from day one
16. Pragmatic generalisation
17. Calibrated, not just tested
80% generic engine. 20% domain-specific. Keep it that way.
Benevolence.
Clarity.
Flow.
The engine doesn’t know what your domain is. That’s the point.
It knows how to diagnose. What it diagnoses is a matter
of configuration, not architecture.
The architecture is settled. The domains are open.