TLDR AI 2024-05-14

OpenAI GPT-4o πŸ€–, IBM Granite Code Models πŸ’», Time-Evidence Fusion Network βŒ›

Headlines & Launches

OpenAI's New Model (12 minute read)

OpenAI has announced a new model called GPT-4o (o is for omni) that is a natively multimodal model, with superior performance to GPT-4 on text and state-of-the-art performance on a variety of modalities. It also announced a new desktop app, a near real-time audio interface, and a variety of improved reasoning features.

IBM Open Sources The Granite Code Models (6 minute read)

IBM is releasing its Granite code models, which range from 3 to 34 billion parameters and cover a variety of programming tasks, to the open-source community to facilitate easier and more efficient coding across numerous platforms.

Apple Finalizing Deal To Bring ChatGPT Features To The iPhone (1 minute read)

Apple is nearing an agreement with OpenAI to integrate ChatGPT technology into the iPhone, potentially featuring it in the upcoming iOS 18 as part of its AI enhancements.
Research & Innovation

MatterSim: A Deep Learning Atomistic Model Across Elements, Temperatures and Pressures (30 minute read)

Simulators can be powerful tools in AI for collecting training data or for model learning interactions. This simulator can be used to model different atomic interactions across a variety of elements.

Scene Graph Generation with Transformers (18 minute read)

Researchers have developed a new method for creating scene graphs, making it faster and more efficient. Their transformer-based technique focuses on improving the way the model understands and connects different elements in an image, leading to better performance on challenging tasks.

Text-Image Composition and Comprehension (11 minute read)

InternLM-XComposer2 is a vision-language model that excels in creating and understanding complex text-image content. It introduces a Partial LoRA approach for a balanced vision and text comprehension, outperforming existing models in multimodal content creation and understanding.
Engineering & Resources

Pipecat (GitHub Repo)

A framework for building voice (and multimodal) conversational agents.

An Improved Long-Term Forecaster (GitHub Repo)

The Time-Evidence Fusion Network (TEFN) is a novel deep learning model designed to enhance long-term time series forecasting. It combines information fusion and evidence theory, utilizing a specialized module to improve prediction accuracy and stability.

Expanded MRI Scan Capabilities (GitHub Repo)

MRSegmentator is a new tool designed to enhance MRI scan segmentation, effectively identifying 40 different organs and structures across the abdominal, pelvic, and thorax regions.

WebLlama (GitHub Repo)

A model designed to browse the web and answer questions accordingly. This could be used to synthesize a high-quality pretraining dataset or perform research that requires querying information from a web page.

MoonDream COYO Captions (Hugging Face Hub)

5M novel captions based on the alt-text and images of a portion of the COYO dataset.

Large language models as research assistants (4 minute read)

AI tools like GPT-4 are increasingly assisting and even outperforming academics in tasks like writing research papers. Liang et al. found up to 18% of papers in some fields are AI-assisted. This integration of AI could create a loop where software both generates and reviews academic publications. However, the impact on scientific progress is nuanced - it could potentially enable more productivity but there is also a risk of a phase where more is produced with less understanding.
Quick Links

OpenAI Says It Can Now Mostly Identify Images Generated By DALL-E 3 (2 minute read)

OpenAI has launched a detection tool to identify images generated by its DALL-E 3 model as part of an effort to address concerns about AI-generated fake content and enhance digital content authenticity by incorporating C2PA standards.

Microsoft Is β€˜Turning Everyone Into A Prompt Engineer' (2 minute read)

Microsoft is updating Copilot for Microsoft 365 with new features like auto-complete and prompt elaboration to enhance prompt creation for generative AI alongside a new "Catch Up" chat interface and customizable prompt management in Copilot Lab.

Eraser AI (Website)

Create and edit diagrams and docs using natural language prompts that output diagram code.
The most important AI, ML, and data science news in a free daily email.
Join 500,000 readers for