TLDR AI 2024-04-02

OpenAI Voice Engine 🗣️, safety testing AI 🦺, LLMs on mobile phones 📱

🚀
Headlines & Launches

Apple Researchers Boast On-Device Model That Outperforms GPT-4 (3 minute read)

Apple's AI researchers have developed a new system called ReALM that improves Siri's ability to understand context by considering on-screen, conversational, and background entities. It outperforms ChatGPT 4.0 in benchmarks.

Navigating The Challenges And Opportunities Of Synthetic Voices (5 minute read)

OpenAI's Voice Engine is a model that generates speech mimicking a speaker's voice from a 15-second audio sample. It can be used in applications like educational aids, translation, and support for non-verbal individuals. OpenAI is employing a cautious approach to deployment due to potential misuse.

Nobody Knows How to Safety-Test AI (9 minute read)

Beth Barnes' nonprofit METR is partnering with major AI companies like OpenAI and Anthropic to develop safety tests for advanced AI systems, a move echoed by government initiatives. The focus is on assessing risks such as AI autonomy and self-replication, though there's acknowledgment that safety evaluations are still in early stages and cannot guarantee AI safety. METR's work is seen as pragmatic, despite concerns that current tests may not be sufficiently reliable to justify the rapid advancement of AI technologies.
🧠
Research & Innovation

Unsolvable Problem Detection in VLMs (22 minute read)

There are times when Visual Language Models (VLMs) cannot answer a query given an input image. This is a challenge even for state-of-the-art VLMs like GPT-4V. This paper proposes a benchmark and some potential improvements for VLMs faced with Unsolvable Problems.

Transformer-Lite: LLMs on Mobile Phone GPUs (26 minute read)

Running language models on phones is challenging due to latency, bandwidth, and power constraints. Using quantization, removal of the kv cache, and other optimizations - this research shows how to get 30 tokens/second generation for the powerful Gemma 2B model, which is approximately 3x faster than other frameworks.

3D Scene Editing with Total-Decom (16 minute read)

Total-Decom offers a breakthrough in 3D scene reconstruction, allowing for easy editing and manipulation by accurately decomposing objects from multi-view images with minimal user effort.
👨‍💻
Engineering & Resources

OpenChat Gemma (HuggingFace Hub)

Gemma has been challenging to tune. This work from the OpenChat team shows it is possible to match Mistral tune performance.

OpenUI (GitHub Repo)

Wandb has released a toolkit that allows you to describe a UI and have it rendered in React, Svelte, etc. It allows for text-based editing as well. The tool can be run locally with Ollama.

optimum-nvidia (GitHub Repo)

An update to TensorRT from Nvidia achieves speeds up to 28x faster than baseline. Llama 2 in particular can run at 1,200 tokens per second in benchmarks. The update takes advantage of the new Hopper and Ada chip architectures.
🎁
Miscellaneous

Beyond RPA: How LLMs are ushering in a new era of intelligent process automation (7 minute read)

Despite some early successes, RPA fell short of the enterprise-wide deployments as promised. A Deloitte survey revealed that only 3% of companies were able to successfully scale their RPA initiatives. Recent advances in AI are poised to change this. LLMs’ novel capabilities prime the market opportunity for intelligent process automation to grow by at least 10x in the coming decade.

Microsoft’s Generative AI for Beginners Course (GitHub Repo)

Version 2 of Microsoft's popular course on LLMs, vector databases, prompting, and low code applications is on GitHub. It contains 18 lessons. Some of the content is aspirational, but it is still a good resource for getting started in the space.

We’re Focusing On The Wrong Kind Of AI Apocalypse (5 minute read)

Discussions about AI's future often focus on extreme scenarios, overlooking its immediate impact on jobs and misinformation. However, with thoughtful integration, AI has the potential to transform work into more meaningful and productive tasks rather than leading to apocalyptic outcomes.
⚡️
Quick Links

Bezi AI enables generative AI-based 3D design (3 minute read)

A significant milestone in the world of 3D design: the ability to ideate at the speed of thought with an infinite asset library.

You Can Now Use ChatGPT Without An Account (2 minute read)

OpenAI has made ChatGPT accessible without an account, with additional safeguards for accountless users and options to opt out of data training, aiming to attract more users to its platform to widen AI's benefits and gather more training data.

Robot, Can You Say ‘Cheese’? (2 minute read)

Emo is a robot capable of anticipating and mimicking human facial expressions in real-time in a way that significantly enhances human-robot interaction, paving the way for future applications in assistance, education, and companionship.
The most important AI, ML, and data science news in a free daily email.
Join 500,000 readers for