Gen AI in software engineering has moved well beyond autocomplete. The emerging frontier is agentic coding: AI systems capable of planning changes, executing them across multiple steps and iterating based on feedback. Yet despite the excitement around “AI agents that code,” most enterprise deployments underperform. The limiting factor is no longer the model. It’s context: The structure, history...
The Allen Institute for AI (Ai2) recently released what it calls its most powerful family of models yet, Olmo 3. But the company kept iterating on the models, expanding its reinforcement learning (RL) runs, to create Olmo 3.1.The new Olmo 3.1 models focus on efficiency, transparency, and control for enterprises. Ai2 updated two of the three versions of Olmo 2: Olmo 3.1 Think 32B, the flagship...
In a new paper that studies tool-use in large language model (LLM) agents, researchers at Google and UC Santa Barbara have developed a framework that enables agents to make more efficient use of tool and compute budgets. The researchers introduce two new techniques: a simple "Budget Tracker" and a more comprehensive framework called "Budget Aware Test-time Scaling." These techniques make agents...
OpenAI has officially released GPT-5.2, and the reactions from early testers — among whom OpenAI seeded the model several days prior to public release, in some cases weeks ago — paints a two toned picture: it is a monumental leap forward for deep, autonomous reasoning and coding, yet potentially an underwhelming "incremental" update for casual conversationalists.Following early access periods and...
The rumors were true: OpenAI on Thursday announced the release of its new frontier large language model (LLM) family, GPT-5.2.It comes at a pivotal moment for the AI pioneer, which has faced intensifying pressure since rival Google’s Gemini 3 LLM seized the top spot on major third-party performance leaderboards and many key benchmarks last month, though OpenAI leaders stressed in a press briefing...
Marble, a startup building artificial intelligence agents for tax professionals, has raised $9 million in seed funding as the accounting industry grapples with a deepening labor shortage and mounting regulatory complexity.The round, led by Susa Ventures with participation from MXV Capital and Konrad Capital, positions Marble to compete in a market where AI adoption has lagged significantly behind...
Nous Research, the San Francisco-based artificial intelligence startup, released on Tuesday an open-source mathematical reasoning system called Nomos 1 that achieved near-elite human performance on this year's William Lowell Putnam Mathematical Competition, one of the most prestigious and notoriously difficult undergraduate math contests in the world.The Putnam is known for its difficulty: While...
Presented by Oracle NetSuiteWhen any company tells you it is their biggest product release in almost three decades, it’s worth listening. When the person saying it founded the world’s first cloud computing company, it’s time to take note. At SuiteWorld 2025, Evan Goldberg, founder and EVP of Oracle NetSuite, did just that when he called NetSuite Next the company’s biggest product evolution in...
Almost a year after releasing Rerank 3.5, Cohere launched the latest version of its search model, now with a larger context window to help agents find the information they need to complete their tasks. Cohere said in a blog post that Rerank 4 has a 32K context window, representing a four-fold increase compared to 3.5. “This enables the model to handle longer documents, evaluate multiple passages...
There's no shortage of generative AI benchmarks designed to measure the performance and accuracy of a given model on completing various helpful enterprise tasks — from coding to instruction following to agentic web browsing and tool use. But many of these benchmarks have one major shortcoming: they measure the AI's ability to complete specific problems and requests, not how factual the model is...