DuckDB Goes Remote 🦆, When Lakehouses Guess ❓, Netflix Tames Data Governance 🎬

We need to talk about dbt (5 minute read)

dbt's growth has created tension between its practitioner-led roots and enterprise ambitions. dbt must better protect community trust, improve dbt Core, strengthen integrations, fix developer ergonomics, and make dbt Cloud feel like a real IDE. The risk is not adoption, but alienating the users who made dbt valuable.

TLDR Data 2026-05-14

DuckDB Goes Remote 🦆, When Lakehouses Guess ❓, Netflix Tames Data Governance 🎬

Deep Dives

Data Projects: Managing Data Assets at Netflix Scale (6 minute read)

When 36,000 Tiny Files Break Your Spark Pipeline: A Deep Dive into S3 DNS Exhaustion and the Small File Problem (9 minute read)

Why your AI agent has amnesia and why forgetting is the fix (16 minute read)

Migrating Data Ingestion Systems at Meta Scale (8 minute read)

Opinions & Advice

We need to talk about dbt (5 minute read)

April 2026 PDC State of Data Modeling Survey Results Are In! (9 minute read)

Lakehouse statistics and why query engines get lost (6 minute read)

Launches & Tools

Can Kafka Queues Make Consumers Faster? Part 2: Head-Of-Line Blocking (4 minute read)

Quack: The DuckDB Client-Server Protocol (12 minute read)

Strong views on PostgreSQL VIEWs (19 minute read)

Miscellaneous

Agentic search models (3 minute read)

Stop Starting Data Projects (9 minute read)

Quick Links

Postgres has had a good 30-year run – is DuckDB coming for its crown? (2 minute read)

Complete End-To-End Build of ETL Pipeline in AWS (12 minute read)

Curated deep dives, tools and trends in big data, data science and data engineering 📊