LLM Knowledge Bases

Andrej Karpathy recently shared his approach to building LLM-managed knowledge bases, and it resonates with me.

I've never been great at organizing notes. I suspect most people are the same — they want things organized, they just don't want to be the one doing it. Which makes this a perfect job for an LLM.

The core idea:

TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM.

You provide raw information, the LLM organizes and makes it discoverable, and then it's accessible by you directly, and also made available to the LLM that you work with for looking up information, Q&A and outputting it in different formats.

In a way it's similar to RAG but less complicated, and probably a bit more like the progressive disclosure approach with agent skills where information is discoverable when needed in the context of the conversation but without blowing out the context with wiki information that's not relevant to the conversation.

The real payoff comes with scale:

Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc.

I can see this extending further — multiple wikis, all accessible to the LLM, with controlled access say by different agents. A personal wiki, a business wiki, and a general one for collected articles and research interests. Different agents, different scopes, same underlying approach. Or a single agent with access to all of them.

This feels like something we are going to see more of as a new product or enhancement to existing products.