TLDR AI 2025-11-20
GPT-5.1-Codex-Max 💻, Grok 4.1 Agent Tools API 👨💻, Segment Anything 3 3️⃣
Elon Musk's xAI Is in Advanced Talks to Raise $15 Billion, Lifting Valuation (2 minute read)
xAI is in advanced talks to raise $15 billion in new equity at a $230 billion valuation. The startup has raised billions of dollars to compete with OpenAI and expand the capabilities of its chatbot, Grok. The startup recently lost several senior executives. It was previously valued at $113 billion in March after it acquired social media site X.
Building more with GPT-5.1-Codex-Max (7 minute read)
GPT-5.1-Codex-Max is trained to operate across multiple context windows through "compaction", allowing it to work over millions of tokens and complete tasks spanning 24+ hours. The model achieves 77.9% on SWE-bench Verified while using 30% fewer thinking tokens than its predecessor.
Google Has Your Data. Gemini Barely Uses It (13 minute read)
Google is dramatically under-capitalizing on the strongest context position in the industry. Gemini hides its workspace connector in its settings, treating it as an optional enhancement, not the center of the product. This is almost certainly because Google is trying to play it safe. The company has an opportunity to turn its dormant context advantage into experiences that would be impossible anywhere else.
Gemini 3 Prompting: Best Practices for General Usage (6 minute read)
Gemini 3 Pro responds best to direct, structured prompts with behavioral constraints placed at the top rather than scattered throughout. Unlike previous versions, it defaults to concise responses unless explicitly asked to be conversational. The model handles long contexts better when instructions appear after data rather than before and treats multimodal inputs as equal-class data requiring explicit cross-modal instructions. It benefits from explicit planning steps with self-critique loops using XML or Markdown formatting.
My GPT-5.1 Pro Review (10 minute read)
GPT-5.1 Pro is a slow, heavy-weight reasoning model that feels intelligent and can handle tough problems. It follows instructions well without going off the rails, making it feel like a contract engineer working from a spec rather than an assistant. Its biggest weakness is its interface, as is lives in ChatGPT, not in the IDE. Gemini 3 is still better for most day-to-day work. However, GPT-5.1-Pro wins in deep thought, planning, and research.
How evals drive the next chapter in AI for businesses (9 minute read)
OpenAI published a framework arguing that evaluation systems ("evals") are bridge between AI's probabilistic nature and business outcomes that breakdown into three phases: 1) experts defining success, 2) stress-test systems with real-world edge cases, and 3) continuous monitoring to build proprietary datasets that compound into competitive moats.
“We're in an LLM bubble,” Hugging Face CEO says—but not an AI one (3 minute read)
Clem Delangue, CEO of Hugging Face, says that the LLM bubble may be bursting next year. However, that doesn't mean that 'AI' will collapse - LLMs are just a subset of AI technology. We are still at the very beginning of AI, and we'll see much more in the next few years. It is more likely that we will end up with a multitude of models that can solve many different problems rather than one type of model that will solve all problems for all companies.
alphaXiv raises $7M in funding to become the GitHub of AI research (3 minute read)
alphaXiv has raised $7 million in seed funding in a round co-led by Menlo Ventures and Haystack. The startup aims to help engineers transform the latest academic discoveries into cutting-edge AI features by streamlining their paths from research to production. Its platform allows researchers to publish their latest papers, connecting them to engineers who use the knowledge to create new AI features. alphaXiv aims to become the de facto global workspace for AI researchers.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email