TLDR Data 2026-07-20

Inside Spark 4.2 ✨, Data Modeling Still Matters 📐, Faster Spark With Rust 🦀

Your agent is failing to fetch web data; the average failure rate for hard-target web fetches is 40%. Ours is 4%. (Sponsor)

📱

Deep Dives

Accelerating Apache Spark Queries (and Iceberg Rust Development) with Apache DataFusion Comet (7 minute read)

From OTEL to SLMs: Distilling Frontier Model Behaviour from Production Telemetry (45 minute video)

Exploring Hierarchical Interest Representation For Meta Ads Deep Funnel Optimization (8 minute read)

🚀

Opinions & Advice

Data Modeling isn't Dead, You Just Stopped Doing It with Joe Reis (65 minute video)

Agents think in milliseconds, legacy infrastructure doesn't. LinkedIn, Walmart, and Zendesk shared how they closed the gap at VB Transform 2026 (4 minute read)

AI Agents Need Data Product Context Not More RAG (11 minute read)

💻

Launches & Tools

😥 CEO waiting for data? Cube agents deliver so your team can focus on thinking (Sponsor)

Introducing Apache Spark 4.2 (6 minute read)

Ontology Playground (GitHub Repo)

Postgres 19 Compression: from pglz to LZ4 (8 minute read)

Experience Graphs: The Data Foundation for Self-Improving Agents (28 minute read)

🎁

Miscellaneous

In-House LLM Serving at Netflix (7 minute read)

From Weeks to a Day: How We Made LLM Evaluation Fast Enough to Iterate on (10 minute read)

⚡️

Quick Links

Dremio's Exit Is the Clearest Sign Yet That Lakehouse-Only Won't Survive AI (3 minute read)

Databricks hits $188B valuation, extending its run as AI's favorite second act (4 minute read)

Curated deep dives, tools and trends in big data, data science and data engineering 📊

Join 570,000 readers for one daily email

Privacy Careers Advertise