TLDR AI 2023-10-20
DALLE-3 research paper 📃, Adept’s Fuyu-8B open source 💻, Waymo simulator 🚗
DALLE-3 research paper (24 minute read)
This paper outlines the main ingredients to DALLE-3's amazing performance. They include a synthetic caption generator, improved modeling with latent diffusion, and improved metrics for faithfulness, style, and coherence.
$13M seed for multimodal search (2 minute read)
Objective raised a seed round to build a low-code multimodal search toolkit for enterprise.
Fuyu-8B: A Multimodal Architecture for AI Agents (12 minute read)
Fuyu-8B, a multimodal model designed for digital agents, is now available on HuggingFace. Unlike other multimodal models, it has a simplified architecture and supports arbitrary image resolutions, responding to large images in under 100ms. Though tailored for specific applications, Fuyu-8B still excels at standard image understanding benchmarks.
👨💻
Engineering & Research
XAgent (GitHub Repo)
XAgent is an open-source experimental LLM-driven autonomous agent that can automatically solve various tasks.
Mojo🔥 available on Apple Silicon (3 minute read)
The exciting systems language for AI is now available on Mac. It outperforms many powerful languages like C++ while being easier to write like Python.
Waymo simulator (4 minute read)
Waymo has introduced its Waymax simulator for evaluating the performance of agents on self-driving tasks. It is written entirely in Jax.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email