llms

Higher usage limits for Claude and a compute deal with SpaceX

Anthropic partnering with SpaceX to lease their Colossus 1 data center (over 220,000 NVIDIA GPUs).

While that on it's own is interesting, the astonishing side effect of that is a sharp increase in usage limits in paid Claude plans and their APIs.

The following three changes—all effective today—are aimed at improving the experience of using Claude for our most dedicated customers.

  • First, we’re doubling Claude Code’s five-hour rate limits for Pro, Max, Team, and seat-based Enterprise plans.
  • Second, we’re removing the peak hours limit reduction on Claude Code for Pro and Max accounts.
  • Third, we’re raising our API rate limits considerably for Claude Opus models.

I think people will generally appreciate this too:

Finally, we recently made a commitment to cover any consumer electricity price increases caused by our data centers in the US. As part of our international expansion, we’re exploring ways to extend that commitment to new jurisdictions, as well as partnering with local leaders to invest back into the communities that host our facilities.

A model that produces code which compiles and passes the tests it was given is not the same as a model that produces correct, secure, maintainable, well-architected software

The title here, a paraphrased quote from [Gary Marcus], on TNW, today, evaluating a claim from “OpenAI president [who] says AI is now writing 80% of the company’s code”.

Marcus' specific point about coding is structurally important: a model that produces code which compiles and passes the tests it was given is not the same as a model that produces correct, secure, maintainable, well-architected software. The first is verifiable in seconds; the second requires the kind of judgement that has been the historical bottleneck on engineering productivity. Brockman acknowledges the gap, even as he argues it is closing. "The technology we have right now is very jagged," he said in the Big Technology interview. "It is absolutely superhuman at many tasks. When it comes to writing code, those kinds of things, the AI can just do it. But there's some very basic tasks that a human can do that our AI still struggles with."

Realism re AI coding is knowing that next-word prediction gets us a surprisingly long way in writing code, but less far in making sure that code is robust. Coders (especially vibe coders with little experience) beware

As good as these tools are getting — and they are getting really good and helpful — I don't see a non-technical person, say a product manager or marketing person, being able to steer and coerce these LLMs into producing software that any company should be willing to expose to the internet and their user base if they care about robustness, security, reliability and maintainability of the system. Especially if revenue and reputation are on the line.

Building Has Become Really Fun (again)

It's becoming apparent that people are really... really enjoying using AI to build things.

Not just developers, non-technical people too, from product managers to CEOs to business owners. Pretty much anyone.

People are excite to build. Building at night, on the weekends and during their holiday. I can relate.

It has almost become addictive. "Please, just one more feature...".

This is kind of a big deal. The performance difference between Cursor and Claude Code? Or Claude Code and OpenCode? All that comes from the system around the model. That’s why Google can have one of the best models in the market, and still produce the Shakespearean tragedy that is the Gemini CLI. When people say “Mythos found a zero-day,” the truth is more “Mythos, orchestrated by a purpose-built vulnerability research pipeline, found a zero-day.

Devansh

Anthropic Opus 4.7 Released Today

The much-anticipated Opus 4.7 was released today. It's the only 4.7 in the model family, with Sonnet and Haiku still at 4.6.

According to the claimed benchmarks, it shows a substantial jump in capability across the board, with notable improvements in:

  • Visual reasoning — It can now "see" higher resolution pictures up to 2,576 pixels on the long edge, 3x more than Opus 4.6.
  • Instruction following — It takes instructions more literally than the previous version. The called-out side effect is that users may need to re-tune any prompts and harnesses.
  • Memory — It's better at using file-system-based memory, remembering important notes across long-running, multi-session work.
  • Real-world work — Areas like financial analysis, legal, and professional slide presentations.

Anthropic Opus 4.7 Benchmark

I like that they have also included benchmark numbers for the new unreleased Mythos Preview model. The jump in SWE-bench agentic coding from Opus 4.6 to Opus 4.7 is already substantial, and then there's a further leap to 93.9% for Mythos!

Interesting that the cybersecurity vulnerability reproduction score is actually slightly lower on 4.7 than it was on 4.6 — although it appears this may have been intentional.

Anthropic Opus 4.7 Benchmark Cyber

We stated that we would keep Claude Mythos Preview's release limited and test new cyber safeguards on less capable models first. Opus 4.7 is the first such model: its cyber capabilities are not as advanced as those of Mythos Preview (indeed, during its training we experimented with efforts to differentially reduce these capabilities). We are releasing Opus 4.7 with safeguards that automatically detect and block requests that indicate prohibited or high-risk cybersecurity uses.

And...

Security professionals who wish to use Opus 4.7 for legitimate cybersecurity purposes (such as vulnerability research, penetration testing, and red-teaming) are invited to join our new Cyber Verification Program.

This seems to imply the model was intentionally dialled back to reduce the risk of misuse, with access to the full capabilities gated behind the new Cyber Verification Program.

Other Notable Changes

Along with the model, they are releasing these notable controls:

  • New xhigh reasoning effortan effort level between high and max, giving users finer control over the tradeoff between reasoning and latency on hard problems. In Claude Code, the default effort level has been raised to xhigh for all plans.
  • New /ultrareview slash command in Claude Codeproduces a dedicated review session that reads through changes and flags bugs and design issues that a careful reviewer would catch.

Opus 4.7 does use more tokens, and it's easy to see why they introduced this new effort level. xhigh scores substantially higher than high at the cost of more tokens burned — but max uses double the tokens of xhigh for a smaller score jump than high to xhigh.

Anthropic Opus 4.7 Agentic Reasoning Effort Token Usage

Opus 4.7 Preparing For Release (via Alberto Romero)

While all the talk is about Anthropic’s “terrifying” new Mythos model, and it being too dangerous to release to the public, it looks like we may be able to get our hands on an upgraded Opus 4.7 in the meantime, possibly as soon as this week.

While 4.5 and then 4.6 were total game changers, there is still often frustration working with them, as with all other models. If 4.7 is anything like the upgrade from 4.5 to 4.6, then we should see a notable and much-welcomed increase in capability.

Opus 4.7 X post