Anthropic’s Claude Opus 4.5 is here: Cheaper AI, infinite chats, and coding skills that beat humans

Anthropic released its most capable artificial intelligence model yet on Monday, slashing prices by roughly two-thirds while claiming state-of-the-art performance on software engineering tasks — a strategic move that intensifies the AI startup's competition with deep-pocketed rivals OpenAI and Google.

The new model, Claude Opus 4.5, scored higher on Anthropic's most challenging internal engineering assessment than any human job candidate in the company's history, according to materials reviewed by VentureBeat. The result underscores both the rapidly advancing capabilities of AI systems and growing questions about how the technology will reshape white-collar professions.

The Amazon-backed company is pricing Claude Opus 4.5 at $5 per million input tokens and $25 per million output tokens — a dramatic reduction from the $15 and $75 rates for its predecessor, Claude Opus 4.1, released earlier this year. The move makes frontier AI capabilities accessible to a broader swath of developers and enterprises while putting pressure on competitors to match both performance and pricing.

"We want to make sure this really works for people who want to work with these models," said Alex Albert, Anthropic's head of developer relations, in an exclusive interview with VentureBeat. "That is really our focus: How can we enable Claude to be better at helping you do the things that you don't necessarily want to do in your job?"

The announcement comes as Anthropic races to maintain its position in an increasingly crowded field. OpenAI recently released GPT-5.1 and a specialized coding model called Codex Max that can work autonomously for extended periods. Google unveiled Gemini 3 just last week, prompting concerns even from OpenAI about the search giant's progress, according to a recent report from The Information.

Opus 4.5 demonstrates improved judgment on real-world tasks, developers say

Anthropic's internal testing revealed what the company describes as a qualitative leap in Claude Opus 4.5's reasoning capabilities. The model achieved 80.9% accuracy on SWE-bench Verified, a benchmark measuring real-world software engineering tasks, outperforming OpenAI's GPT-5.1-Codex-Max (77.9%), Anthropic's own Sonnet 4.5 (77.2%), and Google's Gemini 3 Pro (76.2%), according to the company's data. The result marks a notable advance over OpenAI's current state-of-the-art model, which was released just five days earlier.

But the technical benchmarks tell only part of the story. Albert said employee testers consistently reported that the model demonstrates improved judgment and intuition across diverse tasks — a shift he described as the model developing a sense of what matters in real-world contexts.

"The model just kind of gets it," Albert said. "It just has developed this sort of intuition and judgment on a lot of real world things that feels qualitatively like a big jump up from past models."

He pointed to his own workflow as an example. Previously, Albert said, he would ask AI models to gather information but hesitated to trust their synthesis or prioritization. With Opus 4.5, he's delegating more complete tasks, connecting it to Slack and internal documents to produce coherent summaries that match his priorities.

Opus 4.5 outscores all human candidates on company's toughest engineering test

The model's performance on Anthropic's internal engineering assessment marks a notable milestone. The take-home exam, designed for prospective performance engineering candidates, is meant to evaluate technical ability and judgment under time pressure within a prescribed two-hour limit.

Using a technique called parallel test-time compute — which aggregates multiple attempts from the model and selects the best result — Opus 4.5 scored higher than any human candidate who has taken the test, according to company. Without a time limit, the model matched the performance of the best-ever human candidate when used within Claude Code, Anthropic's coding environment.

The company acknowledged that the test doesn't measure other crucial professional skills such as collaboration, communication, or the instincts that develop over years of experience. Still, Anthropic said the result "raises questions about how AI will change engineering as a profession."

Albert emphasized the significance of the finding. "I think this is kind of a sign, maybe, of what's to come around how useful these models can actually be in a work context and for our jobs," he said. "Of course, this was an engineering task, and I would say models are relatively ahead in engineering compared to other fields, but I think it's a really important signal to pay attention to."

Dramatic efficiency improvements cut token usage by up to 76% on key benchmarks

Beyond raw performance, Anthropic is betting that efficiency improvements will differentiate Claude Opus 4.5 in the market. The company says the model uses dramatically fewer tokens — the units of text that AI systems process — to achieve similar or better outcomes compared to predecessors.

At a medium effort level, Opus 4.5 matches the previous Sonnet 4.5 model's best score on SWE-bench Verified while using 76% fewer output tokens, according to Anthropic. At the highest effort level, Opus 4.5 exceeds Sonnet 4.5 performance by 4.3 percentage points while still using 48% fewer tokens.

To give developers more control, Anthropic introduced an "effort parameter" that allows users to adjust how much computational work the model applies to each task — balancing performance against latency and cost.

Enterprise customers provided early validation of the efficiency claims. "Opus 4.5 beats Sonnet 4.5 and competition on our internal benchmarks, using fewer tokens to solve the same problems," said Michele Catasta, president of Replit, a cloud-based coding platform, in a statement to VentureBeat. "At scale, that efficiency compounds."

GitHub's chief product officer, Mario Rodriguez, said early testing shows Opus 4.5 "surpasses internal coding benchmarks while cutting token usage in half, and is especially well-suited for tasks like code migration and code refactoring."

Early customers report AI agents that learn from experience and refine their own skills

One of the most striking capabilities demonstrated by early customers involves what Anthropic calls "self-improving agents" — AI systems that can refine their own performance through iterative learning.

Rakuten, the Japanese e-commerce and internet company, tested Claude Opus 4.5 on automation of office tasks. "Our agents were able to autonomously refine their own capabilities — achieving peak performance in 4 iterations while other models couldn't match that quality after 10," said Yusuke Kaji, Rakuten's general manager of AI for business.

Albert explained that the model isn't updating its own weights — the fundamental parameters that define an AI system's behavior — but rather iteratively improving the tools and approaches it uses to solve problems. "It was iteratively refining a skill for a task and seeing that it's trying to optimize the skill to get better performance so it could accomplish this task," he said.

The capability extends beyond coding. Albert said Anthropic has observed significant improvements in creating professional documents, spreadsheets, and presentations. "They're saying that this has been the biggest jump they've seen between model generations," Albert said. "So going even from Sonnet 4.5 to Opus 4.5, bigger jump than any two models back to back in the past."

Fundamental Research Labs, a financial modeling firm, reported that "accuracy on our internal evals improved 20%, efficiency rose 15%, and complex tasks that once seemed out of reach became achievable," according to co-founder Nico Christie.

New features target Excel users, Chrome workflows and eliminate chat length limits

Alongside the model release, Anthropic rolled out a suite of product updates aimed at enterprise users. Claude for Excel became generally available for Max, Team, and Enterprise users with new support for pivot tables, charts, and file uploads. The Chrome browser extension is now available to all Max users.

Perhaps most significantly, Anthropic introduced "infinite chats" — a feature that eliminates context window limitations by automatically summarizing earlier parts of conversations as they grow longer. "Within Claude AI, within the product itself, you effectively get this kind of infinite context window due to the compaction, plus some memory things that we're doing," Albert explained.

For developers, Anthropic released "programmatic tool calling," which allows Claude to write and execute code that invokes functions directly. Claude Code gained an updated "Plan Mode" and became available on desktop in research preview, enabling developers to run multiple AI agent sessions in parallel.

Market heats up as OpenAI, Google race to match performance and pricing

Anthropic reached $2 billion in annualized revenue during the first quarter of 2025, more than doubling from $1 billion in the prior period. The number of customers spending more than $100,000 annually jumped eightfold year-over-year.

The rapid release of Opus 4.5 — just weeks after Haiku 4.5 in October and Sonnet 4.5 in September — reflects broader industry dynamics. OpenAI released multiple GPT-5 variants throughout 2025, including a specialized Codex Max model in November that can work autonomously for up to 24 hours. Google shipped Gemini 3 in mid-November after months of development.

Albert attributed Anthropic's accelerated pace partly to using Claude to speed its own development. "We're seeing a lot of assistance and speed-up by Claude itself, whether it's on the actual product building side or on the model research side," he said.

The pricing reduction for Opus 4.5 could pressure margins while potentially expanding the addressable market. "I'm expecting to see a lot of startups start to incorporate this into their products much more and feature it prominently," Albert said.

Yet profitability remains elusive for leading AI labs as they invest heavily in computing infrastructure and research talent. The AI market is projected to top $1 trillion in revenue within a decade, but no single provider has established dominant market position—even as models reach a threshold where they can meaningfully automate complex knowledge work.

Michael Truell, CEO of Cursor, an AI-powered code editor, called Opus 4.5 "a notable improvement over the prior Claude models inside Cursor, with improved pricing and intelligence on difficult coding tasks." Scott Wu, CEO of Cognition, an AI coding startup, said the model delivers "stronger results on our hardest evaluations and consistent performance through 30-minute autonomous coding sessions."

For enterprises and developers, the competition translates to rapidly improving capabilities at falling prices. But as AI performance on technical tasks approaches—and sometimes exceeds—human expert levels, the technology's impact on professional work becomes less theoretical.

When asked about the engineering exam results and what they signal about AI's trajectory, Albert was direct: "I think it's a really important signal to pay attention to."

Source link