Case Studies

Work That Speaks for Itself

Real problems, real solutions — anonymized stories from the front lines of adtech data engineering.

When 30% of Ad Conversions Vanished

Attribution Data Forensics Airflow BigQuery

A major streaming platform noticed a growing gap between what their DSP partner reported and what their internal attribution system counted. By the time they called me, the discrepancy had grown to 30% — meaning nearly a third of conversions were invisible to their optimization team.

The root cause turned out to be a timezone mismatch in the event ingestion layer. Conversion events arriving near midnight UTC were being assigned to the wrong attribution window, causing them to fall outside the lookback period and get silently dropped. The fix was surgical — a timezone normalization step in the Airflow DAG — but finding it required tracing events across three systems and two cloud providers.

Result: Full conversion visibility restored within one week. The client estimated the fix recovered $2.4M in annual optimization value that had been invisible to their bidding algorithms.

Building an IP Intelligence Engine

Identity Resolution Fraud Detection Spark Snowflake

An adtech startup needed to enrich bid requests with IP-derived intelligence — geolocation, connection type, datacenter flags, and proxy detection — at a rate of 500K+ requests per second. Their existing solution was a vendor black box with a six-figure annual cost and no transparency into methodology.

I built a custom IP enrichment pipeline using MaxMind, proprietary datacenter lists, and a Spark-based scoring engine that classified IPs by fraud risk. The system processed raw IP logs nightly, maintained a continuously updated lookup table in Snowflake, and served enrichment via a low-latency Redis cache for real-time bidding decisions.

Result: 40% reduction in IVT (invalid traffic) spend, vendor cost eliminated entirely, and full auditability of every fraud signal — something the previous black-box solution never offered.

The 12% Gap

CTV Revenue Reconciliation dbt GCP

A streaming media company was leaving money on the table — their internal ad revenue reports consistently showed 12% less than what their sell-side partners claimed to have paid. Finance couldn't close the books, and nobody could explain where the gap lived.

I built a reconciliation pipeline in dbt that joined internal impression logs (from GAM and proprietary ad servers) against partner settlement reports, matching on a composite key of placement ID, creative, timestamp, and device. The 12% gap broke down into three distinct causes: a VAST parsing bug that miscounted companion ads, a timezone issue in the SSAI logs, and a deduplication rule that was too aggressive on CTV device IDs.

Result: Revenue gap closed from 12% to under 0.5%. The reconciliation pipeline now runs daily and automatically flags discrepancies above threshold, giving finance confidence to close books on time.

Beyond Correlation

Experimentation Incrementality Python BigQuery

A performance marketing agency was spending $50M/year across channels but had no way to answer the most basic question: "Would these conversions have happened anyway?" Their attribution model gave credit to the last touch, which meant retargeting always looked like a hero — even when it was just taking credit for organic intent.

I designed and built an incrementality testing framework: geo-based holdout experiments with automated power analysis, a pipeline that ingested conversion data from six platforms, and a Bayesian inference engine that estimated true incremental lift per channel. The system ran continuously, rotating test and control regions to maintain statistical validity.

Result: The agency discovered that one channel consuming 22% of budget was delivering near-zero incremental lift. Reallocating that spend to proven channels drove a 15% improvement in overall ROAS within two quarters.

Stitching Together the Invisible Viewer

CTV Identity Graph Kafka Snowflake

A CTV advertising platform had an identity problem: viewers on Roku, Fire TV, Apple TV, and mobile all generated different device IDs, and without a unified identity layer, reach and frequency reporting was wildly inaccurate. Advertisers were unknowingly bombarding the same households with 15+ impressions while missing others entirely.

I built a household-level identity graph that ingested device signals from Kafka streams, applied deterministic matching on IP + user-agent patterns, and used probabilistic scoring for edge cases. The graph updated in near real-time and fed directly into their frequency capping and reach estimation systems. Privacy was baked in — all PII was hashed at ingestion, and the system supported configurable TTLs for compliance.

Result: Reach accuracy improved from 62% to 91%. Frequency capping finally worked across devices, reducing wasted impressions by 34% and improving advertiser satisfaction scores to their highest level in two years.

Your Ad Spend Deserves Data You Can Trust

Let's figure out where your pipeline is leaking value — and fix it.