TLDR

TLDR Data 2025-06-26

New LLM Data Stack πŸ₯ž, Why Spark Feels Slow 🐌, MCP in Minutes ⌚ 

πŸ“±

Deep Dives

Why Apache Spark is Often Considered as Slow? (21 minute read)

Hands-on with Apache Iceberg (71 minute video)

πŸš€

Opinions & Advice

The Hidden Cost of Over-instrumentation: Why More Tracking Can Hurt Product Teams (4 minute read)

Data Integrity vs Data Security: Why You Need Both (3 minute read)

πŸ’»

Launches & Tools

Schema In, Data Out: A Smarter Way to Mock (4 minute read)

Langfuse and ClickHouse: A New Data Stack for Modern LLM Applications (8 minute read)

New With Confluent Platform 8.0: Stream Securely, Monitor Easily, and Scale Endlessly (9 minute read)

🎁

Miscellaneous

Data federation: Understanding What It Is and How It Works (8 minute read)

Google Donates the Agent2Agent Protocol to the Linux Foundation (3 minute read)

Plane Tracking with Apache Flink (GitHub Repo)

⚑️

Quick Links

FastMCP 2.0 (GitHub Repo)

Curated deep dives, tools and trends in big data, data science and data engineering πŸ“Š

Join 400,000 readers for one daily email