Stability AI announced the first public release of StableCode, its new open large language model designed to help users generate programming language code.
An awesome roundup covering what the big tech companies are up to in Generative AI, based on some of the discussions in their earnings calls for Q2. The reaction from big tech to AI has been swift, demonstrating the near term importance of capitalizing on this opportunity.
HuggingFace is working with AWS and Nvidia to bring one click training to the platform. You can fine-tune state-of-the-art models directly from the hub just by uploading your data.
This project introduces AgentBench, a benchmark tool to test Large Language Models (LLMs) in various interactive settings. Initial tests across 25 LLMs showed that commercial models outperformed open-sourced ones.
This behemoth from Anthropic explores the question of how training examples affect the final model performance using a counterfactual technique called "influence functions". It is a pretty technical paper that outlines many challenges with this approach and some surprising findings. Primarily, they find that instance influence decays to zero when the order of key phrases is flipped. Language models are sensitive!
Researchers have created a new method called the Dual Aggregation Transformer (DAT) that uses both space and channel attention to make image super-resolution better. DAT performs better than other current methods by using special tools like the adaptive interaction module and spatial-gate feed-forward network.
Ever wanted to peer into the depths of implementation of the new SDXL model? This diffusers compatible repo is just a few hundred lines of code. Perfect for learning.
Sweep is an open-source AI junior developer that turns issues into pull requests. You make a GitHub Issue like "use os agnostic temp directory for windows" and Sweep writes a pull request to replace all occurrences of "/tmp" with "tempfile.gettempdir()".
A pick-your-problem style guide that was created to educate everyone, from the leadership team to the ML engineers in a startup, on how to work with AI in production settings. This is the stuff you won't learn in most ML/AI courses.
Glaive is a platform that lets people train hyper-focused small language models for their use case with the help of a synthetic data generation system. These small models are 2-20x faster, at least 2x cheaper, and outperform general purpose LLM APIs on the task they were trained for.