TLDR AI 2024-06-18

Runway Gen 3 📹, DeepMind V2A 🎵, Depth Anything V2 2️⃣

🚀

Headlines & Launches

Runway's Gen 3 Video Model (6 minute read)

Runway has trained an extremely powerful new video generation model. It will power many of the existing features on its platform. Examples are available in the provided link.

DeepMind's New AI Generates Soundtracks And Dialogue For Videos (2 minute read)

DeepMind is developing an AI technology called V2A to generate synchronized soundtracks for videos. It uses diffusion models trained on audio, dialogue transcripts, and video clips to create music, sound effects, and dialogue.

Giant Chips Give Supercomputers a Run for Their Money (4 minute read)

Cerebras, a California-based company, has demonstrated that its second-generation wafer-scale engine is significantly faster than the world's faster supercomputer in molecular dynamics calculations. It can also perform sparse large language model inference at one-third of the energy cost of a full model without losing any accuracy. Both achievements are possible due to the interconnects and fast memory access enabled by Cerebras' hardware. Cerebras is looking to extend the applications of its wafer-scale engine to a larger class of problems, including molecular dynamics simulations of biological processes and simulations of airflow around vehicles.

🧠

Research & Innovation

DeepSeek Coder V2 (GitHub Repo)

DeepSeek Coder is a powerful model that gets 90+ on HumanEval while matching GPT-4 Turbo performance on many other challenging benchmarks. It is available through an API and free for commercial use.

Depth Anything V2 (16 minute read)

The new Depth Anything model was trained primarily on synthetic data and has dramatically improved performance on complex scenes.

HelpSteer2: Open-source dataset for training top-performing reward models (23 minute read)

Nvidia has released a dataset and recipe along with a high quality paper about training reward models to align model output to human preferences.

👨‍💻

Engineering & Resources

LARS (GitHub Repo)

LARS is an application that enables you to run LLMs locally on your device. Upload your own documents and engage in conversations where the LLM grounds its responses with your uploaded content.

Hallucinations in Diffusion-Based Image Models (18 minute read)

This paper investigates why diffusion-based image generation models create "hallucinations" — images that never appeared in the training data.

Differentiable rasterization (25 minute read)

Everything should be differentiable. This post walks through how to write differentiable SVG lite.

🎁

Miscellaneous

Deep Dive: Beyond the Basics of RAG (48 minute video)

The creator of RAGatouille gave an excellent talk discussing how to dramatically improve RAG performance, some of the open challenges, and COLBERT.

General intelligence (6 minute read)

What would it take to make a generally intelligent agent and what are we missing? This post explores the 3 ideas needed to make an agent and posits that we are only a few years away. The author is a researcher at OpenAI.

Logical Reasoning in Language Models (17 minute read)

Chain of Preference Optimization (CPO) is a method that improves the logical reasoning abilities of large language models (LLMs). By fine-tuning LLMs using search trees from the Tree-of-Thought (ToT) method, CPO aligns the reasoning steps of Chain-of-Thought (CoT) decoding with ToT's optimal paths.

⚡️

Join 500,000 readers for

Privacy Careers Advertise