002 · Selected work · Digital banking · 2022–2023
An active customer,
defined once.
Fourteen source systems, fourteen definitions of the same word, and an OSFI reviewer asking the same opening question every cycle: where does this number come from. We wrote a metric contract, landed bronze / silver / gold under audit lineage, and put one answer on every desk.
Duration: ~5 months (Build phase of a 14-month engagement) · Team: 4 · Programs: Build · Advance · Year: 2022–2023
Brief
The ask
A Canadian digital bank under OSFI supervision asked us to stand up one analytics layer the executive team could read together — product, finance, risk, and compliance all reconciling against the same row. The constraint was the regulator. Every transformation had to be auditable, versioned, and explainable to a reviewer who would never see the SQL. Ad-hoc query writing at the modeled layer was disallowed by the time we arrived.
Starting state: fourteen source systems — core banking, mobile-app telemetry, KYC, AML scoring, marketing automation, two CRM instances, a support tool, NPS, a rates engine, the card ledger, and three operational CSV drops. Sixty-plus analysts on read-replicas of operational systems, each reporting a different number for the same word. A quarterly board deck that shipped late and got reconciled by footnote. The Build phase ran ~5 months inside a 14-month engagement.
Problem
The real diagnosis
The reported symptom was clean: "we can't agree on how many active customers we have." The COO had asked the question in three consecutive ops reviews and received three different numbers. Marketing counted email-engaged in the rolling 30. Product counted weekly-app-launch. Core banking counted any settled transaction. KYC counted verified-and-funded. Compliance counted something else again. Pairwise, the definitions diverged by 18–34% depending on the business unit and the week.
The real diagnosis was not measurement — it was governance. Every team owned its own definition because nobody had ever been paid to write the one. Audit lineage was missing from anything that ended up on a slide. The opening question of every OSFI examination was the same: where does this number come from. The bank could not answer it from a report. A forensic crawl through three systems by an analyst with tribal knowledge counted as a control — until the next analyst hire.
Quantify it. Fourteen production systems feeding reports. Sixty-plus analysts writing ad-hoc SQL against operational read-replicas at a cadence that meant the same business question produced different answers on different desks the same morning. The quarterly board deck shipped reconciled by methodology footnote. The Head of Analytics ran a Friday standing call to reconcile two analyst numbers that should have been the same number. That call existed for eighteen months before we were brought in.
Approach
Five moves, in order
Move 01
Sign the metric contract. One definition, on one page.
Three weeks of interviews across product, marketing, core banking, KYC, AML, and compliance produced a one-page contract. Active customer was defined once: primary-product holder, at least one customer-initiated event in the trailing 30 days, opted-in to operational contact, not in a frozen or under-review state. The CFO signed it. The Chief Compliance Officer signed it. Every downstream model carried a comment tying the column it produced back to the contract clause it implemented.
Move 02
Land bronze. Fivetran for SaaS, custom CDC for the core, audit lineage everywhere.
Snowflake as the warehouse. Fivetran for the eleven SaaS sources. A custom CDC pipeline ran in-VPC for the core banking platform and the card ledger — regulated data, could not leave the network boundary. Bronze was strictly load-as-is, no transforms, PII tagged at ingest with classification inherited from the source contract. Datadog watched freshness against the source SLAs and paged before the analyst noticed.
┌────────────────────────────────────────────────┐
│ 14 SOURCE SYSTEMS │
│ core_banking · kyc · aml · mobile_app · crm_a │
│ crm_b · marketing_automation · support · nps │
│ rates_engine · card_ledger · csv_drops (x3) │
└─────────────────────┬──────────────────────────┘
│ Fivetran (SaaS)
│ Custom CDC (regulated, in-VPC)
▼
┌────────────────────────────────────────────────┐
│ BRONZE raw_<source>.<table> │
│ load-as-is · PII-tagged · freshness-monitored │
└─────────────────────┬──────────────────────────┘
│ dbt: staging models
▼
┌────────────────────────────────────────────────┐
│ SILVER stg_<entity> │
│ typed · de-duplicated · contract-conformed │
│ one row per business-key · lineage-tagged │
└─────────────────────┬──────────────────────────┘
│ dbt: dimensional models
▼
┌────────────────────────────────────────────────┐
│ GOLD dim_customer (SCD-2) │
│ fct_journey_event │
│ mart_weekly_funnel │
│ one canonical definition per metric · tested │
└────────────────────────────────────────────────┘Move 03
Build dim_customer as SCD-2, identity-resolved across fourteen sources.
One canonical customer dimension, slowly-changing on every attribute that affected the funnel — product holdings, KYC state, AML risk band, opt-in posture, frozen flag. Identity resolution joined the fourteen sources by a deterministic key set (core banking party-ID first, then a hashed-SIN fallback, then a phone+email composite for the long tail). The is_active flag is computed once, from the contract, on this table. Nothing downstream redefines it.
-- models/marts/dim_customer.sql
-- Slowly-changing dimension, Type 2, one row per (party_id, valid_from).
-- is_active follows the signed metric contract:
-- primary-product holder + customer-initiated event in trailing 30d
-- + opted-in + not in frozen/under-review state.
-- Lineage tag: METRIC-CONTRACT-2022-Q3 · clause 3.1 (active customer).
{{ config(
materialized = 'incremental',
unique_key = 'customer_sk',
on_schema_change = 'fail',
tags = ['gold', 'lineage:metric-contract-2022-q3']
) }}
with resolved as (
select
coalesce(cb.party_id, ph.party_id, pc.party_id) as party_id,
coalesce(cb.legal_name_hash, ph.sin_hash) as identity_hash,
cb.primary_product_code,
cb.primary_product_opened_at,
ks.kyc_state,
ks.kyc_verified_at,
am.aml_risk_band,
cm.opt_in_operational,
cb.account_status, -- 'active' / 'frozen' / 'under_review'
ev.last_customer_event_at, -- max across journey-affecting events
current_timestamp::timestamp_ntz as snapshot_at
from {{ ref('stg_core_banking__party') }} cb
left join {{ ref('stg_kyc__verification_state') }} ks
on ks.party_id = cb.party_id
left join {{ ref('stg_aml__risk_band') }} am
on am.party_id = cb.party_id
left join {{ ref('stg_crm__contact_marketing') }} cm
on cm.party_id = cb.party_id
left join {{ ref('stg_journey__last_event') }} ev
on ev.party_id = cb.party_id
-- fallback joins for the long tail (no core-banking party row yet)
left join {{ ref('stg_kyc__pending_handoff') }} ph
on ph.sin_hash = cb.legal_name_hash
left join {{ ref('stg_crm__phone_email_composite') }} pc
on pc.contact_hash = cb.legal_name_hash
),
scored as (
select
r.*,
case
when r.primary_product_code is not null
and r.last_customer_event_at >= dateadd(day, -30, r.snapshot_at)
and r.opt_in_operational = true
and r.account_status = 'active'
then true
else false
end as is_active
from resolved r
),
hashed as (
select
hash(party_id, snapshot_at) as customer_sk,
party_id,
identity_hash,
primary_product_code,
primary_product_opened_at,
kyc_state,
kyc_verified_at,
aml_risk_band,
opt_in_operational,
account_status,
last_customer_event_at,
is_active,
snapshot_at as valid_from,
cast(null as timestamp_ntz) as valid_to,
true as is_current
from scored
)
{% if is_incremental() %}
-- close out the previous current row when any tracked attribute changes
merge into {{ this }} tgt
using (
select h.*
from hashed h
join {{ this }} prev
on prev.party_id = h.party_id
and prev.is_current = true
where prev.primary_product_code is distinct from h.primary_product_code
or prev.kyc_state is distinct from h.kyc_state
or prev.aml_risk_band is distinct from h.aml_risk_band
or prev.account_status is distinct from h.account_status
or prev.opt_in_operational is distinct from h.opt_in_operational
or prev.is_active is distinct from h.is_active
) chg
on tgt.party_id = chg.party_id
and tgt.is_current = true
when matched then update set
valid_to = chg.valid_from,
is_current = false;
{% endif %}
select * from hashedMove 04
Capture every customer-affecting event in fct_journey_event.
One fact table, one row per event, stage-attributed at write time against the four contracted stages — onboarded, activated, engaged, retained. Source-system column on every row. Audit-lineage tag on every column tying it back to the metric clause it implemented. Datadog watched freshness; a missed source SLA paged the analytics engineering on-call before Monday's funnel rendered. The fact-table grain stayed event-level so any new stage definition could be replayed without a backfill.
Move 05
Decommission read-replica access. Move the sixty onto the modeled layer.
We did not flip a switch. Product analytics migrated first, then marketing, then risk, then operations. Each team got a one-week pairing engagement: their three highest-volume queries rewritten against gold, plus office hours. Read-replica grants on operational systems were revoked team by team, with the analyst's manager in the meeting. By the end of the Build phase, no analyst was querying operational systems directly. The weekly executive funnel — six panels, one PDF, Monday 07:00 — replaced the quarterly board deck with the CFO's blessing.
Outcome
60 analysts.
14 sources.
1 customer.
One definition, written and signed. Sixty analysts reading the same tables. Fourteen production systems collapsed into a single governed layer whose lineage survives a regulator's opening question — without anyone leaving the room to look something up.
14 → 1
Source systems collapsed into one governed customer-journey warehouse with audit lineage tagged to a signed metric contract.
60 → 0
Analysts writing ad-hoc SQL against operational read-replicas. All sixty moved onto tested dbt models at the modeled layer.
Quarterly → weekly
Executive funnel cadence. The deck nobody read was retired; the Monday funnel is the report the COO opens first.
1 definition
Of active customer — primary product, last 30 days, opted-in. CFO-signed, CCO-signed, used end to end.
What this enabled, beyond the headline. The first OSFI review after migration opened on a lineage screen, not a forensic crawl — the reviewer's opening question was answered by clicking a column. Compliance reviews of new marketing campaigns moved from days to hours because the contract was queryable. The product team began reading A/B results against the weekly funnel inside the same week, replacing a six-to-eight-week analyst cycle. The Friday reconciliation call the Head of Analytics had been running for eighteen months was cancelled and not rescheduled.
Credits
Who shipped this
- Lead Analyst
- Haris Naeem
- Analytics Engineering
- Contractor — anonymized
- Data Engineering
- Internal team — anonymized
- Client Stakeholder
- Head of Analytics — anonymized
- Client Stakeholder
- VP Product — anonymized
- Client Sponsor
- Chief Operating Officer — anonymized
- Governance Review
- Compliance team — anonymized
Client-side names withheld under a multi-year confidentiality agreement. Named references available on request after a discovery call.
Next
Building a governed warehouse under regulator supervision?
Send a note describing the source systems you're unifying, the regulator your team answers to, and where your current definitions diverge across business units. If it's a fit, a named reference from this engagement is available after a discovery call.