The first AI agent worm is months away, if that
20 points by carlana
20 points by carlana
I called this a Challenger disaster moment for coding agent security on the Oxide predictions podcast in January. I think a worm is the most likely expression of that, spreading via stolen npm/PyPI/GitHub credentials.
It will probably steal a lot of crypto wallets too, if anyone still has those lying around.
How is it going to pay for LLM tokens? One of the things I saw mentioned with the Cline attack is that since they didn’t give the OpenClaw installs and API keys they couldn’t do anything.
You’d need to buy tokens without getting caught somehow.
if it can manage the “stealing crypto wallets” part, all it would need is seed funding to get to the point of being able to steal wallets; then could presumably self-finance, no?
It won't need to - it will leach off the paid LLM subscriptions of the people it affects.
Lots of people have Claude Code / Codex etc installed on their local machines with a paid account attached.
The worm would infect them and then use their existing accounts and authenticated agents to find secret tokens and use those to publish new copies of itself.
I think the threat relies on some paid-for LLM process picking up the worm's work at each step. In the article, the warning is about agents in publicly triggered CI systems with paying accounts already attached. In a package, just running claude -p prompt uses a local user account if it's configured.
Capability security can help, but only so much. Wrapping agents in sandboxes is tough to do, since AI agents are fundamentally confused deputy machines, and will mix whatever authority they are given.
Consider the following thought experiment: a local model is grammatically constrained to only emit valid E (or Monte, Pony, etc.) Whatever it emits is immediately evaluated. Without loss of generality we could imagine that the model is plugged into an E REPL. What can it do? Only that which was already tamed and available to do! Extending this across a network merely involves cryptographically sealing each emission before evaluating it, as in Warner-style spellservers, discussed previously, on Lobsters.
The current problem with OpenClaw and friends is that they wield a system shell, often with elevated privileges. The problem is not hooking an RNG up to a tamed powerbox and seeing what happens; that's merely the now-classic approach behind chaos engineering and fuzzing.
Concretely, several years ago I hacked up a harness that could connect to an IRC server and also could query Wikipedia. The model didn't have control over which server was used or which channels were joined or even its nickname. In terms of structure, it wasn't really different from any other IRC bot; in particular, it wasn't especially different from a much older bot which used a trigram model trained on Time Cube to emit gibberish. The model never escaped.
Hi Corbin! Well, as you know I also know about ocaps pretty well :P
I think any agent usage absolutely needs to be wrapped in capability sandboxes. It's the bare minimum thing to do. But I'm not sure it's sufficient, for three reasons:
So anyway. Yes, capabilities can contain and make safe programs, including agents. I spend most of my day advocating for them. But I'm hearing various ocap people in our circles act like "and thus that solves the LLM misbehavior problem!" and I'm not convinced it does.
What are the best practices today if I want to run an agent locally with all the permissions all the time, in a safe way? What kind of container/sandbox/etc. should I be using? Can I keep using my favorite IDE/editor (a GUI running locally, not a TUI that runs over ssh etc.) with these setups?
Safest is to run with Claude Code for web or Codex Cloud - one of the systems that runs the agents in a cloud container somewhere for you. That way the worst that can happen is something can mess up that container or steal from it, which means stealing source code or maybe configured API key secrets, so be careful what you expose there.
On a local machine the answer is probably Docker. I'm still waiting for good, widely-used local sandboxing setups to emerge which are safe to recommend - there are lots of solutions out there but it's hard to know which ones are robust.
Here's a previous discussion. I don't think the best answer has really settled though.
For what it's worth, some GUI editors like Zed can act as the front end to a headless editor process over SSH.
I think this is the important point:
AI agents are fundamentally confused deputy machines, and will mix whatever authority they are given.
and the only way to win is not to give them any authority whatsoever. But that erodes all of their perceived usefulness. I suspect a highly publicized, very damaging agent worm (or something similar) is going to be necessary to persuade those who've apparently decided that this perceived usefulness is currently worth the risk.