Best of all, the Brave Search API is by far the least expensive option out there. It's FREE for up to 2,000 monthly calls, with paid plans that start from $5.
The Stanford Institute for Human-Centered AI has released its seventh annual AI Index report. This year's report covers the rise of multimodal foundation models, major cash investments into generative AI, new performance benchmarks, shifting global opinions, and new major regulations.
Apple's upcoming AI features in iOS 18 are rumored to focus on privacy, with the initial set of enhancements functioning entirely on-device without the need for an internet connection or cloud-based processing, thanks to the company's in-house large language model known internally as "Ajax."
Google researchers have introduced Infini-attention, a technique that enables LLMs to work with text of infinite length while keeping memory and compute requirements constant.
Most modern AI is built around the idea of compressing a training dataset into a model. The better the compression, the better the model. This paper shows that relation rigorously and posits that scale benchmark scores correlate strongly to a model's ability to compress novel text.
Another long context paper - this time, a new architecture that uses two novel weight updating schemes. It outperforms Llama 2 on the same number of training tokens 2T. It also scales to infinite context length at inference time.
TransformerFAM provides a feedback mechanism that allows Transformers to attend to their own latent representations. This can, in theory, introduce recurrence into the model for processing extremely long inputs in context.
Vision Language Models (vLLMs) often struggle with processing multiple queries per image and identifying when objects are absent. This study introduces a new query format to tackle these issues, and incorporates semantic segmentation into the training process.
Accurately segmenting road lines and markings is crucial for autonomous driving but challenging due to occlusions caused by vehicles, shadows, and glare. The Homography Guided Fusion (HomoFusion) module uses video frames to identify and classify obscured road lines by leveraging a novel surface normal estimator and a pixel-to-pixel attention mechanism.
Code Qwen 1.5 is a new set of 7B models trained on 3T tokens of code related data. It performs well on HumanEval, with a non-zero score on SWE-bench. The chat variant specifically shows promise for long context retrieval tasks up to 64k tokens.
Extreme low-bit quantization for small pre-trained models, like Llama2-7B, is challenging, but fine-tuning just 0.65% of parameters significantly improves performance. Newly fine-tuned 1-bit models outperform 2-bit Quip# models, while 2-bit models with specialized data can exceed full-precision counterparts. This research suggests that proper fine-tuning and quantization may enhance efficiency without compromising model quality, potentially shifting focus from training smaller models to optimizing larger, quantized ones.
Anyscale's latest release of Ray, Ray 2.10, adds support for Intel Gaudi 3. Developers can now spin up and manage their own Ray Clusters, provisioning Ray Core Task and Actors on a Gaudi fleet directly through Ray Core APIs, tap into Ray Serve on Gaudi through Ray Serve APIs for a higher level experience, and configure Intel Gaudi accelerator infrastructure for use at the Ray Train layer.
Apache Airflow is playing a key role in the AI/ML initiatives of modern enterprises. Find out how to start delivering production-ready AI with Airflow in this free ebook by Astronomer.