Vortex Outperforms Parquet 🌟, Vibe Analysis 🌴, MCP for Enterprise Data 🗂️

Tutorial: Build an AI agent with Amazon Bedrock and Pinecone (Sponsor)

Building production-ready AI agents from scratch means a lot of moving pieces, especially if you need to scale quickly. Reduce workflow complexity and deploy agents in minutes with Amazon Bedrock and Pinecone through AWS Marketplace.

→ Follow this tutorial to build an agent that leverages your custom knowledge using retrieval augmented generation (RAG).

→ You'll use Amazon Bedrock to create an agent (or a multi-agent system) that can access data from multiple sources, including Amazon S3 and third-party systems.

→ The Pinecone Vector database is used to store and retrieve embeddings for RAG.

Read the step-by-step guide →

TLDR Data 2025-08-11

Vortex Outperforms Parquet 🌟, Vibe Analysis 🌴, MCP for Enterprise Data 🗂️

Tutorial: Build an AI agent with Amazon Bedrock and Pinecone (Sponsor)

Deep Dives

Making Your Data Agent-Ready with EnrichMCP (22 minute video)

Redesigning Workers KV for increased availability and faster performance (13 minute read)

How I Won the “Mostly AI” Synthetic Data Challenge (8 minute read)

Opinions & Advice

The Inconvenient Truths of Self-Service Analytics (9 minute read)

Vibe Analysis (12 minute read)

The Pragmatic Guide to AI Agents in the Enterprise (50 minute podcast)

Launches & Tools

Spatial Joins in DuckDB (21 minute read)

Vectorless (GitHub Repo)

Hybrid Search Using Reciprocal Rank Fusion in SQL (4 minute read)

LF AI & Data Foundation Hosts Vortex Project to Power High Performance Data Access for AI and Analytics (5 minute read)

Miscellaneous

Kubernetes Will Solve YAML Headaches with KYAML (3 minute read)

Hashfuncs DuckDB Extension (6 minute read)

Quick Links

Achieving 10,000x training data reduction with high-fidelity labels (7 minute read)

The Amazon SageMaker lakehouse architecture now automates optimization configuration of Apache Iceberg tables on Amazon S3 (4 minute read)

Curated deep dives, tools and trends in big data, data science and data engineering 📊