anthropic

Designing a Feature with Claude Design — Then Handing It to Claude Code

I've been wanting to give Claude Design a try. In this post I'll walk through my first use of it for designing some new functionality I wanted for my blog CMS. My use here is probably very basic, but it was an interesting exercise, and in particular I wanted to see how I could hand the design off for building.

Just a small disclaimer: I'm not a UX designer, or a creative person in general. But that's exactly why this exercise was interesting — it did a far better job than I ever could.

Getting into it...

I've recently built this blog site where this article is being read (I will write about this separately). It's very new and the feature/capability set is minimal, just enough functionality to manage, publish and serve blog content.

It has a custom-built CMS for managing the content, and supports slug-like tags that I can selectively apply to any of the content. These tags are visible to the right side of this article (or at the bottom on mobile).

When writing or editing content, it has a section where I can add new tags or use existing ones. Below shows editing an article in the CMS I wrote recently about human reviews being a bottleneck.

Blog CMS Edit Post

The behaviour of the tags input field is: On post creation, any never-before-seen tags will be created in the tags table in the database, existing tags will be referenced.

At first sight it looks ok, but:

  • It has no option to select existing tags.
  • It doesn't indicate if a tag being added was one that already existed or would be created.
  • It doesn't help me avoid creating near-duplicate tags with a similar name or even typos.

One wouldn't want different blog articles using tags that mean the same thing but with slightly different names. For example ai-engineering and engineering-with-ai, or even a typo in the tag like ai-eginering. So without any kind of tag selection, and the behaviour of the existing tag selection and creation, this made the UX painful (I had to open the tag management section in a different tab to remember what tags were available) and susceptible to tags becoming a mess across all the content.

Using Claude Design

Claude Design was introduced on 17 April. I had had a bit of a poke around in it before to see what it was about, but this was the first chance I had to try it out on something real.

I started by pasting four screenshots of the CMS and giving it this prompt. (I used the term "upsert" for tags, which wasn't quite correct, but Claude got it).

Attached are four screenshots of my blog CMS. One is where tags are managed, show when two tags existed; ai and harness-engineering. The second one is showing where I can add a post and enter tags. How it currently works is I can enter any tag value and when the post is created/published then the app(or probably the backend for the CMS) will upsert tags.

So you can see in the third image I enter two tags, one existing and the second one (new-tag) is new, so when the blog post got created is used the existing tag and then created a the new-tag, as shown in the fourth image, where I then have a total of three tags. (Two were existing one new one got created on that new post).

I like this functionality. However, right now when I enter tags on the create post page I don't get offered to select from existing tags, so if my new post should use an existing tag I have to carefully remember or go back and look in the tags section to remember the exact name of the tag so I don't end up with near duplicated for what should be the same tag.

Ideally when I start typing for a new tag, it should show existing tags that match (like contain) the text I have entered, with the option of selecting one of those. If none match it should allow me to add a new tag.

Come up with three options for the post create screen for selecting existing or adding tags. Should be simple and seamless and easy to use.

[... 1879 words]

Higher usage limits for Claude and a compute deal with SpaceX

Anthropic partnering with SpaceX to lease their Colossus 1 data center (over 220,000 NVIDIA GPUs).

While that on it's own is interesting, the astonishing side effect of that is a sharp increase in usage limits in paid Claude plans and their APIs.

The following three changes—all effective today—are aimed at improving the experience of using Claude for our most dedicated customers.

  • First, we’re doubling Claude Code’s five-hour rate limits for Pro, Max, Team, and seat-based Enterprise plans.
  • Second, we’re removing the peak hours limit reduction on Claude Code for Pro and Max accounts.
  • Third, we’re raising our API rate limits considerably for Claude Opus models.

I think people will generally appreciate this too:

Finally, we recently made a commitment to cover any consumer electricity price increases caused by our data centers in the US. As part of our international expansion, we’re exploring ways to extend that commitment to new jurisdictions, as well as partnering with local leaders to invest back into the communities that host our facilities.

Mozilla Used Anthropic's Mythos to Fix 271 Bugs In Firefox (via Simon Willison)

Mozilla has been one of the companies to get access to Anthropic's new Mythos Preview model. And have put it to good use.

As part of our continued collaboration with Anthropic, we had the opportunity to apply an early version of Claude Mythos Preview to Firefox. This week's release of Firefox 150 includes fixes for 271 vulnerabilities identified during this initial evaluation.

Our experience is a hopeful one for teams who shake off the vertigo and get to work. You may need to reprioritize everything else to bring relentless and single-minded focus to the task, but there is light at the end of the tunnel. We are extremely proud of how our team rose to meet this challenge, and others will too. Our work isn't finished, but we've turned the corner and can glimpse a future much better than just keeping up. Defenders finally have a chance to win, decisively.

This seems to validate a lot of Anthropic's claims about the new model's capabilities. And it's encouraging to know that this will likely soon be available to the general public, at some point, where they can be used to strengthen the security postures of existing and new systems.

When that eventually happens, people are going to need to act fast to find and resolve security vulnerabilities before bad actors get a chance to exploit them.

There will be casualties.

Anthropic Opus 4.7 Released Today

The much-anticipated Opus 4.7 was released today. It's the only 4.7 in the model family, with Sonnet and Haiku still at 4.6.

According to the claimed benchmarks, it shows a substantial jump in capability across the board, with notable improvements in:

  • Visual reasoning — It can now "see" higher resolution pictures up to 2,576 pixels on the long edge, 3x more than Opus 4.6.
  • Instruction following — It takes instructions more literally than the previous version. The called-out side effect is that users may need to re-tune any prompts and harnesses.
  • Memory — It's better at using file-system-based memory, remembering important notes across long-running, multi-session work.
  • Real-world work — Areas like financial analysis, legal, and professional slide presentations.

Anthropic Opus 4.7 Benchmark

I like that they have also included benchmark numbers for the new unreleased Mythos Preview model. The jump in SWE-bench agentic coding from Opus 4.6 to Opus 4.7 is already substantial, and then there's a further leap to 93.9% for Mythos!

Interesting that the cybersecurity vulnerability reproduction score is actually slightly lower on 4.7 than it was on 4.6 — although it appears this may have been intentional.

Anthropic Opus 4.7 Benchmark Cyber

We stated that we would keep Claude Mythos Preview's release limited and test new cyber safeguards on less capable models first. Opus 4.7 is the first such model: its cyber capabilities are not as advanced as those of Mythos Preview (indeed, during its training we experimented with efforts to differentially reduce these capabilities). We are releasing Opus 4.7 with safeguards that automatically detect and block requests that indicate prohibited or high-risk cybersecurity uses.

And...

Security professionals who wish to use Opus 4.7 for legitimate cybersecurity purposes (such as vulnerability research, penetration testing, and red-teaming) are invited to join our new Cyber Verification Program.

This seems to imply the model was intentionally dialled back to reduce the risk of misuse, with access to the full capabilities gated behind the new Cyber Verification Program.

Other Notable Changes

Along with the model, they are releasing these notable controls:

  • New xhigh reasoning effortan effort level between high and max, giving users finer control over the tradeoff between reasoning and latency on hard problems. In Claude Code, the default effort level has been raised to xhigh for all plans.
  • New /ultrareview slash command in Claude Codeproduces a dedicated review session that reads through changes and flags bugs and design issues that a careful reviewer would catch.

Opus 4.7 does use more tokens, and it's easy to see why they introduced this new effort level. xhigh scores substantially higher than high at the cost of more tokens burned — but max uses double the tokens of xhigh for a smaller score jump than high to xhigh.

Anthropic Opus 4.7 Agentic Reasoning Effort Token Usage

Opus 4.7 Preparing For Release (via Alberto Romero)

While all the talk is about Anthropic’s “terrifying” new Mythos model, and it being too dangerous to release to the public, it looks like we may be able to get our hands on an upgraded Opus 4.7 in the meantime, possibly as soon as this week.

While 4.5 and then 4.6 were total game changers, there is still often frustration working with them, as with all other models. If 4.7 is anything like the upgrade from 4.5 to 4.6, then we should see a notable and much-welcomed increase in capability.

Opus 4.7 X post