Netflix’s Time-Series Caching 🗄️, Airflow 3.2 Released 🚀, Meta’s Pipeline Context 🗺️

Stop Answering the Same Question Twice: Interval-Aware Caching for Druid at Netflix Scale (10 minute read)

Netflix built a caching layer in front of Apache Druid to stop answering the same time-series queries by intercepting queries at the Druid Router, parsing the query structure, and storing results in fine-grained time buckets using a Cassandra-backed cache. For overlapping windows, it serves cached data for settled intervals and only fetches the missing recent tail from Druid. It uses exponential TTLs and gap-aware merging to balance freshness with cache hit rates.

TLDR Data 2026-04-09

Netflix’s Time-Series Caching 🗄️, Airflow 3.2 Released 🚀, Meta’s Pipeline Context 🗺️

Deep Dives

Stop Answering the Same Question Twice: Interval-Aware Caching for Druid at Netflix Scale (10 minute read)

How Meta Used AI to Map Tribal Knowledge in Large-Scale Data Pipelines (8 minute read)

Proxy-Pointer RAG: Achieving Vectorless Accuracy at Vector RAG Scale and Cost (23 minute read)

Semantic Layer vs. Text-to-SQL: 2026 Benchmark Update (11 minute read)

Opinions & Advice

Is Data Visualization dead? (4 minute read)

SQL Superpowers: Your Streaming Delta Lake Pipeline Has Been Quietly Falling Apart (5 minute read)

Launches & Tools

❌ Stop doing SysAdmin. ✅ Start doing Data Science (Sponsor)

Apache Airflow 3.2.0: Data-Aware Workflows at Scale (6 minute read)

When Every Bit Counts: How Valkey Rebuilt Its Hashtable for Modern Hardware (37 minute video)

Introducing Metrics SQL: A SQL-based semantic layer for humans and agents (8 minute read)

Miscellaneous

Dagster vs Airflow 3 (Reddit Thread)

Simplest hash functions (11 minute read)

Quick Links

PhysioNet featured by MIT Jameel Clinic (3 minute read)

How we built a real-world evaluation platform for autonomous SRE agents at scale (15 minute read)

Curated deep dives, tools and trends in big data, data science and data engineering 📊