TLDR AI 2024-02-19

Anthropic Prompt Shield 🛡️, Gemini 1.5 🤖, Andrej Karpathy’s BPE implementation 💻

Headlines & Launches

Anthropic Takes Steps To Prevent Election Misinformation (2 minute read)

Anthropic is testing Prompt Shield, a technology designed to redirect U.S. users of its chatbot Claude seeking political and voting information to authoritative sources like TurboVote.

Gemini 1.5 pro (12 minute read)

Google released a new MoE model that matches Gemini 1.0 Ultra in performance but scales up to 1m tokens in context while using less compute due to its smaller size. It is natively multimodal.

OpenAI's next AI product could be after your job (again) (2 minute read)

OpenAI has been reportedly developing two types of AI agent software for over a year. The first type can be used to automate complex tasks by taking over a customer's device. The second AI agent class handles web-based tasks and can gather public data. It is unclear when the company plans to release these agents.
Research & Innovation

Long is More for Alignment (28 minute read)

It is often challenging to know which examples should be used when aligning language models using preference data. This work suggests a surprisingly robust baseline - choose the 1,000 longest examples.

Extreme video compression with pre-trained diffusion models (18 minute read)

Diffusion models can be repurposed for their broad ‘knowledge’ of the world as they get better at synthesizing images and videos. This paper found a phenomenal 0.02 bits per pixel compression. The key trick here was to measure perceptual similarity along the way and resend an original video frame as needed.

Improving Math Skills in LLMs (19 minute read)

Researchers have created OpenMathInstruct-1, a new dataset for training open-source Large Language Models in math, matching the performance of closed-source models. This breakthrough, featuring 1.8 million problem-solution pairs, opens the door for more accessible and competitive math instruction AI tools.
Engineering & Resources

GPTScript (GitHub Repo)

GPTScript is a new scripting language that automates interactions with OpenAI large language models. The project's ultimate goal is to create a fully natural language-based programming experience.

Qwen 1.8B and 72B LLMs (GitHub Repo)

These models, which look similar to Llama 2, are trained on 3T tokens and excel at a number of tasks. The Qwen team has released chat versions and quants. Excitingly, the models seem to excel at reasoning, math, and code.

Minbpe (GitHub Repo)

Andrej Karpathy released a minimal, clean, and extensible implementation of the byte pair encoding used in language model tokenizers.

Sora reference papers (HuggingFace Hub)

A list of 30 papers that relate to the newly released Sora video model.

The Data Revolution in Venture Capital (10 minute read)

Over 75% of public market trades (AUM of $1T+) are now driven by data-driven algorithms, a revolution initiated by hedge funds in the 90s. Today, that data-driven approach is rapidly infiltrating venture capital, with projections indicating that over 75% of VC deal reviews will involve AI and data analytics by 2025, transforming how investments are sourced, evaluated, and managed.

Community, collaboration, creativity in the age of AI (11 minute read)

How we talk to each other, collaborate, and undertake creative tasks has meaningfully shifted since we introduced software. We’re starting to see the beginnings of another meaningful shift with AI. How substantial this shift will be is being underestimated. Startups born with AI integration in their products from day one will have a huge advantage over existing companies adding it on top of their existing products.
Quick Links

NVIDIA Chat With RTX (1 minute read)

Chat with RTX now supports multiple file formats and can import content directly from YouTube playlists for convenient content querying.

Sam Altman Wants Washington Backing For His $7 Trillion AI Chip Venture (1 minute read)

OpenAI CEO Sam Altman is working to secure US government approval for his chip project as it risks raising national security and antitrust concerns.

OpenAI surpasses $2 billion in annualized revenue (3 minute read)

OpenAI has achieved an annual revenue run rate exceeding $2 billion propelled by the immense success of ChatGPT, making it one of the fastest-growing tech firms. With strong interest from enterprise clients looking to adopt Generative AI, OpenAI aims to more than double this figure in 2025.
The most important AI, ML, and data science news in a free daily email.
Join 500,000 readers for