Loosely Coupled Thoughts

Higher usage limits for Claude and a compute deal with SpaceX

Anthropic partnering with SpaceX to lease their Colossus 1 data center (over 220,000 NVIDIA GPUs).

While that on it's own is interesting, the astonishing side effect of that is a sharp increase in usage limits in paid Claude plans and their APIs.

The following three changes—all effective today—are aimed at improving the experience of using Claude for our most dedicated customers.

First, we’re doubling Claude Code’s five-hour rate limits for Pro, Max, Team, and seat-based Enterprise plans.

Second, we’re removing the peak hours limit reduction on Claude Code for Pro and Max accounts.

Third, we’re raising our API rate limits considerably for Claude Opus models.

I think people will generally appreciate this too:

Finally, we recently made a commitment to cover any consumer electricity price increases caused by our data centers in the US. As part of our international expansion, we’re exploring ways to extend that commitment to new jurisdictions, as well as partnering with local leaders to invest back into the communities that host our facilities.

A model that produces code which compiles and passes the tests it was given is not the same as a model that produces correct, secure, maintainable, well-architected software

The title here, a paraphrased quote from [Gary Marcus], on TNW, today, evaluating a claim from “OpenAI president [who] says AI is now writing 80% of the company’s code”.

Marcus' specific point about coding is structurally important: a model that produces code which compiles and passes the tests it was given is not the same as a model that produces correct, secure, maintainable, well-architected software. The first is verifiable in seconds; the second requires the kind of judgement that has been the historical bottleneck on engineering productivity. Brockman acknowledges the gap, even as he argues it is closing. "The technology we have right now is very jagged," he said in the Big Technology interview. "It is absolutely superhuman at many tasks. When it comes to writing code, those kinds of things, the AI can just do it. But there's some very basic tasks that a human can do that our AI still struggles with."

Realism re AI coding is knowing that next-word prediction gets us a surprisingly long way in writing code, but less far in making sure that code is robust. Coders (especially vibe coders with little experience) beware

As good as these tools are getting — and they are getting really good and helpful — I don't see a non-technical person, say a product manager or marketing person, being able to steer and coerce these LLMs into producing software that any company should be willing to expose to the internet and their user base if they care about robustness, security, reliability and maintainability of the system. Especially if revenue and reputation are on the line.

Mozilla Used Anthropic's Mythos to Fix 271 Bugs In Firefox (via Simon Willison)

Mozilla has been one of the companies to get access to Anthropic's new Mythos Preview model. And have put it to good use.

As part of our continued collaboration with Anthropic, we had the opportunity to apply an early version of Claude Mythos Preview to Firefox. This week's release of Firefox 150 includes fixes for 271 vulnerabilities identified during this initial evaluation.

Our experience is a hopeful one for teams who shake off the vertigo and get to work. You may need to reprioritize everything else to bring relentless and single-minded focus to the task, but there is light at the end of the tunnel. We are extremely proud of how our team rose to meet this challenge, and others will too. Our work isn't finished, but we've turned the corner and can glimpse a future much better than just keeping up. Defenders finally have a chance to win, decisively.

This seems to validate a lot of Anthropic's claims about the new model's capabilities. And it's encouraging to know that this will likely soon be available to the general public, at some point, where they can be used to strengthen the security postures of existing and new systems.

When that eventually happens, people are going to need to act fast to find and resolve security vulnerabilities before bad actors get a chance to exploit them.

There will be casualties.

Opus 4.7 Preparing For Release (via Alberto Romero)

While all the talk is about Anthropic’s “terrifying” new Mythos model, and it being too dangerous to release to the public, it looks like we may be able to get our hands on an upgraded Opus 4.7 in the meantime, possibly as soon as this week.

While 4.5 and then 4.6 were total game changers, there is still often frustration working with them, as with all other models. If 4.7 is anything like the upgrade from 4.5 to 4.6, then we should see a notable and much-welcomed increase in capability.

Opus 4.7 X post

Feedback Flywheel

Rahul Garg at Thoughtworks talks about the Feedback Flywheel, a practice that encourages paying attention to signals in AI engineering that can be used to continuously improve the AI engineering setup and workflows to get better outcomes. Not at a personal level but at the team level.

Every AI interaction generates signal: prompts that worked, context that was missing, patterns that succeeded, failures worth preventing. Most teams discard this signal. I propose a structured feedback practice that harvests learnings from AI sessions and feeds them back into the team's shared artifacts, turning individual experience into collective improvement.

Anyone who has used AI when building software has felt all kinds of frustration when it's not doing what you expect, and what seemingly should be obvious.

Every interaction like this is a signal that the LLM hasn't been primed sufficiently with the right information or instruction for the task or workflow. At this point one could start instructing and steering the LLM right there and then with prompts to get a better result. But it will likely need to be repeated again later.

The idea with the Feedback Flywheel is to pay attention to these, ask what went wrong and what can be improved so next time it works better. Not just for yourself but for anyone on the team working on the project. And then taking steps to bake these in so the next encounter of that situation works better, and broadly for everyone.

Adopting AI practices can plateau once everyone gets comfortable.

With AI coding assistants, most teams reach a plateau. They adopt the tools, develop some fluency, and then stay there. The same prompting habits, the same frustrations, the same results month after month. Not because the tools stop improving, but because the team's practices around the tools stop improving

Worse than this, when every developer on the team is encountering their own flavours of these issues, and each has their own level of skills, prompting style, local setup and even different AI tools, what you can and likely will quickly end up with is snowflake pull requests of wildly varying style and quality.

All of this is signal that should be used to continuously make adjustments to the context and knowledge the LLM is primed with. These can be done in various ways, like adding specific content to the agent specific files (AGENTS.md, CLAUDE.md etc), well structured and discoverable context in additional markdown files, agent skills and custom commands.

These benefits will quickly start to compound and result in better outcomes across the team and improve the quality of the codebase and the pace at which things can be delivered with a high level of quality.

LLM Knowledge Bases

Andrej Karpathy recently shared his approach to building LLM-managed knowledge bases, and it resonates with me.

I've never been great at organizing notes. I suspect most people are the same — they want things organized, they just don't want to be the one doing it. Which makes this a perfect job for an LLM.

The core idea:

TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM.

You provide raw information, the LLM organizes and makes it discoverable, and then it's accessible by you directly, and also made available to the LLM that you work with for looking up information, Q&A and outputting it in different formats.

In a way it's similar to RAG but less complicated, and probably a bit more like the progressive disclosure approach with agent skills where information is discoverable when needed in the context of the conversation but without blowing out the context with wiki information that's not relevant to the conversation.

The real payoff comes with scale:

Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc.

I can see this extending further — multiple wikis, all accessible to the LLM, with controlled access say by different agents. A personal wiki, a business wiki, and a general one for collected articles and research interests. Different agents, different scopes, same underlying approach. Or a single agent with access to all of them.

This feels like something we are going to see more of as a new product or enhancement to existing products.