Stability AI has introduced Stable Artisan, a system combining all of its models that can be run from Discord. This is likely a play to compete directly with Midjourney.
Voice AI startup ElevenLabs is previewing a new model that converts prompts into song lyrics. The company is using a promotional strategy similar to the one OpenAI used for Sora AI.
The YOCO architecture is a decoder-decoder model that reduces GPU memory demands while retaining global attention capabilities. It consists of a self-decoder and cross-decoder, allowing for efficient caching and reuse of key-value pairs. YOCO achieves favorable performance compared to traditional Transformers, with significant improvements in inference memory, latency, and throughput, making it suitable for large language models and long context lengths.
Predicting more than one token at a time is an interesting paradigm of active research. If successful, it would dramatically improve generation time for many large language models. The approach in this post, which mirrors consistency models from image synthetics, attempts to use a parallel decoding strategy on fine-tuned LLMs to speed up generation. Early results match speculative decoding performance of 3x.
DiffMatch is a novel semi-supervised change detection method that leverages visual language models to synthesize pseudo labels for unlabeled data, providing additional supervision signals.
Buzz is a novel dataset that includes preference data in the pretraining mix. Its researchers have also released several models that were trained using this data. They found that the models perform well on a number of human preference tasks.
This new post-processing algorithm addresses model bias by applying a "fairness cost" to recalibrate output scores, ensuring compliance with various group fairness criteria such as statistical parity, equal opportunity, and equalized odds.
Discussion of the various ways to extend context for language models. It doesn't provide much in the way of evaluations, but it's a fascinating discussion of where the field is exploring.
A comprehensive survey that explores Mamba's applications across various visual tasks and its evolving impact. Stay updated on new findings and advancements on the Mamba project.
Alibaba Cloud has launched the latest version of its large language model, Tongyi Qianwen Qwen2.5, marking significant improvements in reasoning, code comprehension, and textual understanding over Qwen2.0.
OpenAI's Preferred Publishers Program, detailed in a pitch deck, offers financial incentives and enhanced visibility within ChatGPT for select news publishers.