Google’s Tabular Foundation Model 🧾, Meta’s Data Eng Agent 🛠️, LLM Spark Debugger 🚦

Never seen a data quality issue that wasn't actually an ownership problem (4 minute read)

Data quality failures are usually ownership failures: when multiple teams consume the same metric but no single person controls its definition, calculation, and change process, trust erodes and fixes stay temporary. The practical remedy is explicit metric governance: one named owner, clear decision rights, version/change control, and enforceable quality rules tied to the metric.

TLDR Data 2026-07-02

Google’s Tabular Foundation Model 🧾, Meta’s Data Eng Agent 🛠️, LLM Spark Debugger 🚦

Deep Dives

Using LLMs to Analyze Spark SQL Plans: A Practical Approach to Debugging Long-Running Jobs (8 minute read)

Ontology Everywhere! (8 minute read)

How We Built DEmate: Taming LLMs for Data Engineering at Meta (7 minute read)

Building Indexes on a Moving Target (20 minute read)

Opinions & Advice

Never seen a data quality issue that wasn't actually an ownership problem (4 minute read)

Query Faster, Query Smarter: Our Move to DuckDB and What We Learned (4 minute read)

Too many tables are bad for you (6 minute read)

Launches & Tools

Introducing TabFM: A zero-shot foundation model for tabular data (4 minute read)

SedonaDB 0.4: GPU-Accelerated Spatial Joins (3 minute read)

TiDB (GitHub Repo)

Miscellaneous

How To Corrupt An SQLite Database File (14 minute read)

Data Residency Is Not a Legal Problem. It Is An Infrastructure Design Problem (5 minute read)

Quick Links

Your AI isn't underperforming. Your data foundation is (4 minute read)

No babysitting, not today (9 minute read)

Curated deep dives, tools and trends in big data, data science and data engineering 📊