TLDR

TLDR Data 2025-05-19

Better SQL with AI 🤖, Multimodal data querying future 🔮, Flink CDC Updates 💾

📱

Deep Dives

Getting AI to Write Good SQL: Text-to-SQL Techniques Explained (8 minute read)

Turning Data Into Insight: Flexible Lakehouse with MinIO, Iceberg, Airflow, dbt, Spark, Pandera, & Superset (17 minute read)

DuckDB + PyIceberg + Lambda (8 minute read)

Handling GTFS Data with DuckDB (8 minute read)

🚀

Opinions & Advice

"Streaming vs. Batch" Is a Wrong Dichotomy, and I Think It's Confusing (3 minute read)

Building AI Agents? A2A vs. MCP Explained Simply (4 minute read)

We Need a New…Database? (12 minute read)

💻

Launches & Tools

Apache Flink CDC 3.4.0 Release Announcement (3 minute read)

Doctor (GitHub Repo)

🎁

Miscellaneous

So You Think You Want to Quit Your Job? (7 minute read)

Some English Hospitals Doubt Palantir's Utility: We'd “Lose Functionality Rather than Gain it” (3 minute read)

AI Agents Unite: Conference Reveals Next-Gen Frameworks (7 minute read)

⚡️

Quick Links

Curated deep dives, tools and trends in big data, data science and data engineering 📊

Join 400,000 readers for one daily email