Humane Inc unveiled the "Humane Ai Pin", a screenless wearable designed for AI integration, at Paris Fashion Week in collaboration with Coperni. The device offers AI-powered optical recognition, laser-projected displays, and prioritizes user privacy without requiring smartphone pairing.
Training top-notch video models usually need huge resources, often beyond what academia can access. Researchers have found a way to train these models using just one machine with eight standard GPUs in a day.
The UniLM group at Microsoft has done some great work in the past few years around natural language. With the suite of Kosmos models, they’ve recently moved into images. This specific instantiation is for reading text-intensive documents from an image and generating the text or markdown for that document. It is similar to the recent Meta work for academic OCR.
Language models are limited by their context length. The context length is usually limited by compute hardware and clever algorithmic updates. This is an algorithmic update that streams the tokens through the attention mechanism to allow theoretically infinite context window size. Usually, these claims fall over at scale, but this one seems robust as it can work on existing pre-trained models without fine-tuning. Will it make the forgotten middle issue worse though?
Optical flow helps figure out how things move in images. This study introduces new techniques using Gaussian Attention to focus on finer details and match them better, resulting in a model named GAFlow.