TLDR

TLDR Data 2025-10-09

Lakehouse Formats Compared ⚖️, Hashing vs Locality 🔀, Open Source Signals 📊 

📱

Deep Dives

Building a Resilient Event Publisher with Dual Failure Capture (9 minute read)

Apache Iceberg vs Delta Lake vs Apache Hudi - Feature Comparison Deep Dive (15 minute read)

How OpenAI Uses Kubernetes And Apache Kafka for GenAI (15 minute read)

🚀

Opinions & Advice

The Single Node Rebellion (6 minute read)

Engineering Growth: The Data Layers Powering Modern GTM (12 minute read)

7 Questions Every Data Team Should Ask the Business (5 minute read)

💻

Launches & Tools

OSS Insight (Tool)

Introducing OpenZL: An Open Source Format-Aware Compression Framework (8 minute read)

Examining Versionless Apache Spark: AI-powered upgrades and seamless stability for 2 billion workloads (4 minute read)

Arc Core (GitHub Repo)

🎁

Miscellaneous

Locality, and Temporal-Spatial Hypothesis (8 minute read)

Accelerating Large-Scale Data Analytics with GPU-Native Velox and NVIDIA cuDF (7 minute read)

⚡️

Quick Links

Curated deep dives, tools and trends in big data, data science and data engineering 📊

Join 400,000 readers for one daily email