TLDR AI 2024-04-15

Grok-1.5 Vision 👀, OpenAI fires researchers for leaks 🛑, decoding brain signals 🧠

🚀
Headlines & Launches

Grok-1.5 Vision Preview (6 minute read)

xAI has announced that its latest flagship model has vision capabilities on par with (and in some cases exceeding) state-of-the-art models.

Google's new chips look to challenge Nvidia, Microsoft, and Amazon (2 minute read)

Google's new AI chip, Cloud TPU v5p, is now available. It boasts nearly triple the training speed for large language models compared to its predecessor, TPU v4. This release underscores Google's position in the AI hardware race alongside competitors like Nvidia. Google has also introduced the Google Axion CPU, based on Arm's chip infrastructure, promising better performance and energy efficiency.

OpenAI Fires Researchers For Leaking Information (2 minute read)

OpenAI has reportedly fired two researchers who were allegedly linked to the leaking of company secrets following months of leaks and company efforts to crack down on such incidents.
🧠
Research & Innovation

Manipulating LLMs to Increase Product Visibility (16 minute read)

Strategic text sequences added to product descriptions can manipulate large language models in search engines to favor certain products.

Decoding Brain Signals (4 minute read)

MindBridge is a singular model capable of decoding brain signals from multiple subjects.

Tackling Distribution Shifts with State Space Models (23 minute read)

DGMamba is a new framework designed to handle domain generalization challenges using the innovative state space model Mamba.
👨‍💻
Engineering & Resources

Creating 360-Degree Images from Text (2 minute read)

This project introduces a dual-branch diffusion model, PanFusion, that crafts 360-degree panoramic images directly from text prompts. The method merges the Stable Diffusion approach with a specialized panorama branch, enhanced by a unique cross-attention mechanism to reduce image distortion.

LLM friendly HTML conversion (GitHub Repo)

Jina AI reader converts URLS into LLM-friendly markdown to be used as input for a variety of tasks.

Discrete diffusion implementation (GitHub Repo)

Clean implementation of discrete diffusion. This code includes many state-of-the-art pieces and trains quickly and stably.
🎁
Miscellaneous

BabyLM Challenge (4 minute read)

BabyLM is a challenge to see who can train the best text and vision models while only using as much data as a typical human baby sees (~10M tokens).

Does AI need a "body" to become truly intelligent? Meta researchers think so (5 minute read)

Meta AI research's embodiment hypothesis suggests that true intelligence requires a physical form for sensory and environmental interaction. AI Habitat 3.0, Meta's updated simulation platform, aims to bridge the sim-to-real gap by training AIs with virtual bodies in simulated environments. It now has human avatars. Real-world testing is underway, with companies like Agility Robotics and Apptronik deploying robots alongside humans in various research and industrial settings.
⚡️
Quick Links

Gemma and Siglip based VLLM (HuggingFace Hub)

A small and powerful visual language model trained on LAION and LLaVA data.

Andrew Ng joins Amazon's board of directors (4 minute read)

Dr. Andrew Ng is currently the Managing General Partner of AI Fund and is joining Amazon's Board of Directors.

Micromanaging AI (1 minute read)

AI currently falls into the micromanage category, where motivation is high but the skill level is relatively low, requiring users to define tasks, review work frequently, and guide progress at each step, similar to managing high-school interns.
The most important AI, ML, and data science news in a free daily email.
Join 500,000 readers for