TLDR AI 2024-05-22

Scale AI raises $1B πŸ’°, xAI Grok Multimodal πŸ“š, Phi-3 14B 🌐

Headlines & Launches

Scale AI raises $1B (6 minute read)

The monstrous series F comes from Accel and previous investors. Demand for the services provided is enormous and Scale is uniquely positioned to continue powering the modern AI data wave.

Microsoft wants to make Windows an AI operating system, launches Copilot Plus PCs (4 minute read)

This article rounds up all of the major announcements Microsoft just made, including a new lineup of Copilot Plus PCs and generative AI-powered features like Recall, which helps users find apps, files, and other content they have viewed in the past.

xAI Working To Make Grok Multimodal (2 minute read)

Elon Musk's AI company, xAI, is advancing its Grok chatbot to support multimodal inputs, allowing users to upload photos and receive text-based answers.
Research & Innovation

Text Classification with LLMs (18 minute read)

The Smart Expert System is a new approach using Large Language Models (LLMs) for text classification. This system streamlines the process by reducing the need for extensive preprocessing and domain expertise.

Dictionary Learning on Claude Sonnet (35 minute read)

Anthropic has made a breakthrough in mechanistic interpretability by mapping millions of concepts inside of Sonnet. It even found that you could push on the internal concepts and change Sonnet's sense of self (e.g., researchers made it believe it was the Golden Gate Bridge).

Phi-3 14B (Hugging Face Hub)

The powerful small series of models got an upgrade with the 14B version. It performs on par with Command R (a 104B model).
Engineering & Resources

Enhanced Video Summarization (GitHub Repo)

This project introduces a novel CNN-based SpatioTemporal Attention (CSTA) method to improve video summarization. Unlike traditional attention mechanisms, CSTA effectively captures the visual significance of frames using a 2D CNN to understand relationships and key attributes in videos.

Debiasing Vision-Language Models (GitHub Repo)

This project identifies a critical bias in Large Vision-Language Models (LVLMs), where outputs lean more towards language model priors than actual visual inputs. The project effectively reduces this bias by introducing "calibration" and "debias sampling" techniques, leading to more accurate and vision-focused responses across various tasks.

A Vision-Language Model for Real-World Applications (GitHub Repo)

DeepSeek-VL is a new open-source Vision-Language Model tailored for real-world uses with an emphasis on diverse data from web screenshots to charts and OCR.

ChatGPT Can Talk, But OpenAI Employees Sure Can't (5 minute read)

The departures of Ilya Sutskever and Jan Leike have brought to light OpenAI's restrictive NDA, which prevents former employees from criticizing OpenAI under threat of losing their vested equity. CEO Sam Altman responded to the story by promising a fix.

Ask HN: If you've used GPT-4-Turbo and Claude Opus, which do you prefer? (Hacker News Thread)

This Hacker News thread compares GPT-4-Turbo, the default model in ChatGPT Plus, to Claude Opus, Anthropic's competing model. Most developers seem to prefer Claude as it is apparently better at coding and engineering work. Its writing style also seems to be preferred, but part of that is due to the recognizability of GPT-4-Turbo's style. OpenAI's app functionalities, like its code interpreter and ability to search the internet, are still big reasons to continue using ChatGPT Plus.

Apple announces new accessibility features, including Eye Tracking (9 minute read)

Apple has announced upcoming accessibility features leveraging AI and machine learning, such as Eye Tracking for iPad, iPhone navigation using the front-facing camera, and Vocal Shortcuts that enable Siri to execute tasks via custom sounds. Other upcoming features include Music Haptics for tactile music feedback and advanced speech recognition for atypical speech patterns.
Quick Links

Auto Wiki (Product)

View high-quality, automatically-generated documentation for any repository.

ElevenLabs Launches Reader (1 minute read)

ElevenLabs has launched the ElevenLabs Reader app, which can recognize and voice text from various documents using 11 different voices.

Microsoft announces $3.3 billion investment in Wisconsin (6 minute read)

President Biden will participate in unveiling the plans, aimed at boosting the economy and creating tech jobs.
The most important AI, ML, and data science news in a free daily email.
Join 500,000 readers for