TLDR Data 2026-01-12

Secure RAG Access 🔐, Iceberg Lacks Operational Guarantees ⚠️, Spark Declarative Pipelines ⚙️

📱

Deep Dives

RAG with Access Control (7 minute read)

The Journey to Zero-Copy: How chDB Became the Fastest SQL Engine on Pandas DataFrame (11 minute read)

Apache Hudi 1.1 Deep Dive: Async Instant Time Generation for Flink Writers (11 minute read)

🚀

Opinions & Advice

Beyond One-Size-Fits-All RAG: Why Different Knowledge Sources Need Different Retrieval Strategies (12 minute read)

Data Trust is Death by a Thousand Paper Cuts (8 minute read)

LLM Predictions for 2026, Shared with Oxide and Friends (5 minute read)

A Critique of Iceberg REST Catalog: A Classic Case of Why Semantic Spec Fails (5 minute read)

💻

Launches & Tools

Scale shouldn't mean rebuilding your architecture. Handle any data volume with Fivetran (Sponsor)

OpenEverest, a Tool To Manage Multiple Databases on Kubernetes (3 minute read)

Spark Declarative Pipelines Programming Guide (6 minute read)

Supercharging LLMs: Scalable RL with torchforge and Weaver (4 minute read)

🎁

Miscellaneous

Introducing MCP CLI: A way to call MCP Servers Efficiently (7 minute read)

Databricks x Palantir | Partnership Deep Dive (16 minute video)

⚡️

Quick Links

Snowflake to acquire Observe to boost observability in AIops (3 minute read)

Faster Is Not Always Better: Choosing the Right PostgreSQL Insert Strategy in Python (+Benchmarks) (4 minute read)

Curated deep dives, tools and trends in big data, data science and data engineering 📊

Join 400,000 readers for one daily email

Privacy Careers Advertise