TLDR AI 2024-09-20
Apple Intelligence in pubic beta 📱, Cruise returns to SF 🌉, Snap AI video generation 📹
Your online privacy matters. Take back control with Incogni (Sponsor)
If you don't mind having your personal data available to every spammer, scammer, and bad actor who's willing to pay for it, skip this ad.
Still here? Check out Incogni — it's the hassle-free way to protect your data privacy:
- Incogni scans people search sites for your personal information and sends removal requests on your behalf.
- Within ±14 days, your records are off the dark corners of the internet.
- Every 10 days, Incogni does it all over again.
- You stay in the loop with regular privacy reports.
Take back control. Reduce spam, scam, and cyber risk.
Get 60% off Incogni with code TLDRAI (30 day money back guarantee)
Snap is introducing an AI video-generation tool for creators (2 minute read)
Snapchat has announced a new AI video-generation tool for select creators that enables video creation from text and soon image prompts. The tool, powered by Snap's foundational video models, will be available in beta on the web. Snap aims to compete with companies like OpenAI and Adobe but has not shared output examples yet.
Apple Intelligence is now available in public betas (2 minute read)
Apple has released public betas of iOS 18.1, iPadOS 18.1, and macOS Sequoia 15.1 that feature new Apple Intelligence tools like text rewriting and photo cleanup. Only the iPhone 15 Pro, iPhone 16, iPhone 16 Pro, and M1 iPads and Macs support these AI features. Final versions are expected in October.
Cruise robotaxis return to the Bay Area nearly one year after pedestrian crash (2 minute read)
Cruise is resuming operations in Sunnyvale and Mountain View, with human-driven vehicles for mapping and plans to progress to supervised AV testing later this fall. This follows a settlement and leadership change after an October 2023 crash. Cruise has issued software updates and signed a partnership with Uber for robotaxi services starting in 2025.
Heart Monitoring from Facial Videos (GitHub Repo)
PhysMamba is a new framework designed for remote heart monitoring via facial videos, addressing challenges in capturing long-range physiological signals.
Fast 3D Generation from Single Images (31 minute read)
Vista3D is a new framework that generates 3D models from a single image in just 5 minutes. Using a two-phase approach, it quickly forms rough geometry before refining the details, capturing both visible and hidden aspects of objects for more complete 3D reconstructions.
V-STaR: Training Verifiers for Self-Taught Reasoners (31 minute read)
V-STaR is a novel approach to improving large language models that utilizes both correct and incorrect solutions generated during self-improvement to train a verifier, which then selects the best solution at inference time. The method has shown significant improvements in accuracy on code generation and math reasoning benchmarks compared to existing approaches, potentially offering a more efficient way to enhance LLM performance.
👨💻
Engineering & Research
AIAI Boston: the East Coast's most significant summit for applied AI's builders & execs. 🚀 (Sponsor)
Uniting engineering teams & tech leadership unleashing the LLM revolution, AIAI Boston returns on October 16-18.
3 co-located summits. 500+ attendees. CXO speakers from Runway, NVIDIA, Takeda, Optum.
Leaders ➡️ apply for your Chief AI Officer Summit pass.
Engineers ➡️ explore Generative AI Summit & Computer Vision Summit.
GOT OCR (GitHub Repo)
A somewhat amazing advancement in general-purpose optical character recognition (OCR) that can read text from images with great performance. This particular version dramatically improves in-the-wild OCR as well.
1X Genie (GitHub Repo)
Genie is a video generation for world model systems. 1x Robotics has open-sourced a version that mirrors the one it trained internally.
Fish Speech (GitHub Repo)
Powerful voice generation and single-shot voice cloning. Completely open source and easy to get running.
OpenAI Says It's Fixed Issue Where ChatGPT Appeared to Be Messaging Users Unprompted (3 minute read)
A Reddit user reported that OpenAI's ChatGPT initiated a conversation unprompted, leading to speculation about new engagement features. OpenAI acknowledged the issue and issued a fix, attributing it to a glitch with unsent messages. Debate continues over the authenticity of the incident, with similar reports from other users.
Announcing Pixtral 12B (8 minute read)
Pixtral 12B excels in multimodal tasks, maintaining state-of-the-art performance on text-only benchmarks, and supports variable image sizes in a 128K token context window. Its architecture includes a new 400M parameter vision encoder and a 12B parameter multimodal decoder based on Mistral Nemo. Pixtral outperforms many open and closed models in multimodal reasoning and instruction following without compromising on text capabilities.
Scaling: The State of Play in AI (13 minute read)
LLMs like ChatGPT and Gemini are becoming increasingly capable as they scale up in size, data, and computing power, leading to improved performance across various tasks. Current Gen2 models like GPT-4 and Claude 3.5 are leading the market, with upcoming Gen3 models expected to further escalate capabilities and costs. The discovery of a new scaling law in AI, pertaining to increased "thinking" during inference, promises further advancements in AI performance beyond just model training.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email