Apple's AI researchers have developed a new system called ReALM that improves Siri's ability to understand context by considering on-screen, conversational, and background entities. It outperforms ChatGPT 4.0 in benchmarks.
OpenAI's Voice Engine is a model that generates speech mimicking a speaker's voice from a 15-second audio sample. It can be used in applications like educational aids, translation, and support for non-verbal individuals. OpenAI is employing a cautious approach to deployment due to potential misuse.
Beth Barnes' nonprofit METR is partnering with major AI companies like OpenAI and Anthropic to develop safety tests for advanced AI systems, a move echoed by government initiatives. The focus is on assessing risks such as AI autonomy and self-replication, though there's acknowledgment that safety evaluations are still in early stages and cannot guarantee AI safety. METR's work is seen as pragmatic, despite concerns that current tests may not be sufficiently reliable to justify the rapid advancement of AI technologies.
There are times when Visual Language Models (VLMs) cannot answer a query given an input image. This is a challenge even for state-of-the-art VLMs like GPT-4V. This paper proposes a benchmark and some potential improvements for VLMs faced with Unsolvable Problems.
Running language models on phones is challenging due to latency, bandwidth, and power constraints. Using quantization, removal of the kv cache, and other optimizations - this research shows how to get 30 tokens/second generation for the powerful Gemma 2B model, which is approximately 3x faster than other frameworks.
Total-Decom offers a breakthrough in 3D scene reconstruction, allowing for easy editing and manipulation by accurately decomposing objects from multi-view images with minimal user effort.
Wandb has released a toolkit that allows you to describe a UI and have it rendered in React, Svelte, etc. It allows for text-based editing as well. The tool can be run locally with Ollama.
An update to TensorRT from Nvidia achieves speeds up to 28x faster than baseline. Llama 2 in particular can run at 1,200 tokens per second in benchmarks. The update takes advantage of the new Hopper and Ada chip architectures.
Despite some early successes, RPA fell short of the enterprise-wide deployments as promised. A Deloitte survey revealed that only 3% of companies were able to successfully scale their RPA initiatives. Recent advances in AI are poised to change this. LLMs’ novel capabilities prime the market opportunity for intelligent process automation to grow by at least 10x in the coming decade.
Version 2 of Microsoft's popular course on LLMs, vector databases, prompting, and low code applications is on GitHub. It contains 18 lessons. Some of the content is aspirational, but it is still a good resource for getting started in the space.
Discussions about AI's future often focus on extreme scenarios, overlooking its immediate impact on jobs and misinformation. However, with thoughtful integration, AI has the potential to transform work into more meaningful and productive tasks rather than leading to apocalyptic outcomes.
OpenAI has made ChatGPT accessible without an account, with additional safeguards for accountless users and options to opt out of data training, aiming to attract more users to its platform to widen AI's benefits and gather more training data.
Emo is a robot capable of anticipating and mimicking human facial expressions in real-time in a way that significantly enhances human-robot interaction, paving the way for future applications in assistance, education, and companionship.
The most important AI, ML, and data science news in a free daily email.