TLDR AI 2024-05-02

Coreweave raises $1.1B 💰, Amazon Q 🤖, Retrieval-Augmented Language Models 🌐

🚀
Headlines & Launches

Coreweave raises $1.1B (3 minute read)

Coreweave is a GPU provider that rents out super clusters. This recent funding brings them to a $19B valuation.

How Field AI Is Conquering Unstructured Autonomy (6 minute read)

Field AI, founded by former NASA JPL leader Ali Agha, specializes in autonomous robots that operate in unstructured environments without needing maps, GPS, or human supervision. Using advanced AI models developed from DARPA challenges, the company's robots can adapt to new and changing conditions, significantly enhancing their utility in industrial and construction settings.

Amazon Q, a generative AI-powered assistant for businesses and developers (4 minute read)

AWS has launched Amazon Q, a generative AI assistant aimed at improving software development and decision-making by leveraging a company's internal data. Amazon Q facilitates coding, testing, and app development for developers, while offering data-driven support for business users through natural language interaction. The service also includes Amazon Q Apps, enabling the creation of custom AI applications without coding expertise.
🧠
Research & Innovation

Reka Vibe-Eval (12 minute read)

Reka trains large foundation models and has quickly caught up to some of the best players despite having a fraction of the funding. It has released a subset of its internal evaluation suite, which it uses to determine how strong its models are.

KAN: Kolmogorov-Arnold Networks (60 minute read)

Multi-Layer Perceptrons are used widely in AI today, including in the Transformer between Attention layers. However, they use fixed activation functions. This paper suggests using learned activation functions on edges using the Kolmogorov-Arnold representation (functions can be represented by a superposition of simpler functions). In this case, the researchers replace weights with splines. The architecture is much more complicated, but has some interesting properties that potentially make it more interpretable.

DeepDive: A transformer walk-through, with Gemma (36 minute read)

Understanding the Transformer is an endeavor that often takes several tries. This blog post walks through the Gemma architecture and explains everything in detail. It is clear and has code and figures.
👨‍💻
Engineering & Resources

Retrieval-Augmented Language Models: A Survey (GitHub Repo)

This survey paper delves into the world of Retrieval-Augmented Language Models (RALMs), highlighting their evolution, structure, and diverse applications in NLP tasks such as translation and dialogue systems.

CLIP Pretrained Mamba Models (6 minute read)

The new Mamba model, trained using contrastive language-image pretraining (CLIP), shows impressive efficiency and performance in zero-shot image classification.

3D Rendering with Memory-Efficient Techniques (2 minute read)

Lightplane Renderer and Splatter components is a new method that dramatically reduces memory usage in 2D-3D mappings. The Lightplane Renderer skillfully generates images from neural 3D fields, while the Lightplane Splatter efficiently projects these images into 3D Hash structures.
🎁
Miscellaneous

3D Generation with the MicroDreamer Algorithm (GitHub Repo)

Researchers have developed an innovative 3D generation algorithm, dubbed MicroDreamer, which significantly accelerates the process by reducing the number of function evaluations required.

A.I. Start-Ups Face a Rough Financial Reality Check (6 minute read)

High-profile AI startups like Inflection AI, Stability AI, and Anthropic are facing financial pressures as they struggle with the high costs of developing generative AI models. While OpenAI, backed by Microsoft, has shown revenue growth, competitors like Anthropic and Stability AI grapple with substantial gaps between revenue and operating expenses. Microsoft's investment in AI hints at the tech industry's belief in AI's long-term profitability, despite the current challenges in monetizing these expensive technologies.

GPT-2? (7 minute read)

A mysterious AI model named gpt2-chatbot, displaying capabilities akin to GPT-4.5, has emerged on lmsys.org, prompting speculation of it being an unofficial OpenAI test for their next iteration. Key identifiers such as response quality, OpenAI-specific traits, and rate limits suggest a high level of sophistication, potentially hinting at a discreet benchmarking initiative by OpenAI. Investigations and discussions around gpt2-chatbot's origins and capabilities are ongoing in the AI community.
The most important AI, ML, and data science news in a free daily email.
Join 500,000 readers for