TLDR AI 2025-03-19
Gemini update π€, Roblox Generative 3D Model π, Hybrid Vision-Language Model π
π§
Research & Innovation
Single-View 3D Scene Reconstruction (3 minute read)
Niagara introduces a novel framework for reconstructing outdoor 3D scenes from a single image, combining depth and normal estimation with a geometric affine field and 3D Gaussian decoding.
SmolDocling (13 minute read)
A new, extremely small, state-of-the-art model for document OCR. It can run lightning fast and has sufficient accuracy for many document processing tasks.
reWordBench (21 minute read)
This work shows that many popular reward models are brittle to basic rewording of the prompts. It includes a benchmark and a potential strategy to make the reward models more robust.
Train XGB in the browser (8 minute read)
This blog post describes how to build a browser based training system accelerated by WASM.
Google's new robot AI can fold delicate origami, close zipper bags without damage (6 minute read)
Google DeepMind has unveiled Gemini Robotics and Gemini Robotics-ER, AI models enhancing robots' fine motor skills and adaptability for real-world applications. Gemini Robotics integrates vision, language, and action capabilities, allowing robots to perform complex tasks like origami folding. Early results highlight significant improvements in generalization and dexterity over previous models, with Google's model partnering with Apptronik and others for further development.
Personalize Anything (25 minute read)
You can personalize almost any image without additional fine-tuning or training by using token replacement in a Diffusion Transformer.
Concierge AI: Talk to your apps in natural language (Sponsor)
Connect your apps (email, calendar, files, payments, etc.). Select your AI model from GPT, Claude, Deepseek, Grok, or others. Then simply ask whatever you want.
Sign up for freeGoogle playing AI search catchup, and forming relationships with chatbots (6 minute read)
Google is adding new AI features from Gemini to its search but this is seen as playing catch-up to OpenAI's ChatGPT.
Google and NVIDIA Expand AI Collaboration (4 minute read)
Google announced an expanded partnership with NVIDIA at NVIDIA GTC that will focus on AI infrastructure, hardware optimizations, and applications in fields like healthcare, energy, and robotics.
Nvidia won the AI race, but inference is still anyone's game (6 minute read)
Nvidia currently dominates AI training, but the inference market remains open to competition.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email