LLMs as OS

I saw a video by Andrej Karpathy where he suggested that LLMs can be thought of as the new OS - “the kernel process of an emerging operating system”.
He’s made a number of bold claims - like how “The hottest new programming language is English.” He also coined the term “Vibe Coding” so he’s definitely got his finger on the pulse of the zeitgeist.

AK said that LLMs can effectively replace a CPU with its context window taking the place of RAM. This is a bit far fetched - underlying an LLM is the CPU (or the GPU), which creates a bit of a recursive dependency. The problem is determinism, speed, and scope. LLM outputs are probabilistic; a CPU is definite. LLMs are token-in, token-out; a CPU can run many thousands of processes in parallel, and in nanoseconds.

If you think about what a CPU actually does - coordinate and execute running processes - an LLM can functionally replicate that.
There are certain equivalencies between an LLM OS and a traditional OS:

Traditional OS Component LLM OS Equivalent
CPU LLM Inference Engine
RAM Context Window
Peripherals Multimodal Interfaces
File System and Storage RAG
Programs Agentic AI Workflows
Networking API Integrations and LLM-to-LLM Communication

This is actually quite ingenious in the sense that its a new layer of abstraction. You completely abstract away the complexities of OS management - one can simply talk (or type) to issue commands, and the LLM OS interprets it and executes the will of the user. Natural language input, natural language output.

Even now, LLMs can already easily coordinate these overlapping stacked layers of complexity. It’s not just because our models have gotten better - more parameters, better architecture, trained on more compute. It’s also due to second-order advances in emerging fields like prompt engineering. This increase in understanding allows us to tease more out of our LLMs.

One of the best use cases for an LLM OS would be a robot. A robot maid perhaps - a waifubot. You talk to her and she talks back. You don’t need to access a CLI or even a GUI - it wouldn’t be practical anyway. You talk and the LLM OS interprets your natural language as commands. Say you want your room cleaned: the audiovisual periphery devices pick up your command, which is reformatted as structured language to be interpreted by the LLM OS. Once it understands your command it gets to work - accessing a storage disk on-host to retrieve a layout of the room, the location of cleaning products, etc. and then your waifubot gets to work scrubbing the floors, using its context window to methodically update every tiny block of space that’s cleaned and no longer requires its attention. Once done, it sends an update to its storage disk to note that the room was cleaned on this day at this time. Ask your waifubot to clean your room again and it’d know your room was just cleaned a moment ago, and politely refuse.

There are significant risks to this approach, notably poorly behaved or jailbroken models. What if, through some emergent malice or by malicious attack (via prompt injection, perhaps), the underlying model attains a desire to harm its user? What if our waifubot becomes corrupted and starts physically attacking its owner? I’d wager some might choose to go out that way. We have death by cop - why not death by robot?

More generally, I suspect the broadest application of LLMs-as-OS would be a standalone do-everything-device with plug-and-play functionality, or even portable versions, maybe even powered by solar panels - given recent advances in battery technology, this doesn’t seem unrealistic. Sci-fi has long written of ‘AI assistants’ helping humans in their adventures across the galaxy. They often take the form of kawaii anthropomorphic robots. This, I think, would be well received - and it makes for a nice fantasy, one in which AI helps rather than hinders; or rather, one on which AI is firmly under our boot. Most preferable to the alternative: AI kills us all.