The first AI agent worm is months away, if that

20 points by carlana

simonw

I called this a Challenger disaster moment for coding agent security on the Oxide predictions podcast in January. I think a worm is the most likely expression of that, spreading via stolen npm/PyPI/GitHub credentials.

It will probably steal a lot of crypto wallets too, if anyone still has those lying around.

dubiouslittlecreature

How is it going to pay for LLM tokens? One of the things I saw mentioned with the Cline attack is that since they didn’t give the OpenClaw installs and API keys they couldn’t do anything.

You’d need to buy tokens without getting caught somehow.
- sloane
  
  if it can manage the “stealing crypto wallets” part, all it would need is seed funding to get to the point of being able to steal wallets; then could presumably self-finance, no?
- simonw
  
  It won't need to - it will leach off the paid LLM subscriptions of the people it affects.
  
  Lots of people have Claude Code / Codex etc installed on their local machines with a paid account attached.
  
  The worm would infect them and then use their existing accounts and authenticated agents to find secret tokens and use those to publish new copies of itself.
- kevinc
  
  I think the threat relies on some paid-for LLM process picking up the worm's work at each step. In the article, the warning is about agents in publicly triggered CI systems with paying accounts already attached. In a package, just running claude -p prompt uses a local user account if it's configured.
Corbin

Capability security can help, but only so much. Wrapping agents in sandboxes is tough to do, since AI agents are fundamentally confused deputy machines, and will mix whatever authority they are given.

Consider the following thought experiment: a local model is grammatically constrained to only emit valid E (or Monte, Pony, etc.) Whatever it emits is immediately evaluated. Without loss of generality we could imagine that the model is plugged into an E REPL. What can it do? Only that which was already tamed and available to do! Extending this across a network merely involves cryptographically sealing each emission before evaluating it, as in Warner-style spellservers, discussed previously, on Lobsters.

The current problem with OpenClaw and friends is that they wield a system shell, often with elevated privileges. The problem is not hooking an RNG up to a tamed powerbox and seeing what happens; that's merely the now-classic approach behind chaos engineering and fuzzing.

Concretely, several years ago I hacked up a harness that could connect to an IRC server and also could query Wikipedia. The model didn't have control over which server was used or which channels were joined or even its nickname. In terms of structure, it wasn't really different from any other IRC bot; in particular, it wasn't especially different from a much older bot which used a trigram model trained on Time Cube to emit gibberish. The model never escaped.
- dustyweb
  Hi Corbin! Well, as you know I also know about ocaps pretty well :P
  
  I think any agent usage absolutely needs to be wrapped in capability sandboxes. It's the bare minimum thing to do. But I'm not sure it's sufficient, for three reasons:
  
  The most important one is that the way people want to use these agents is inherently a very ambient authority manner. If we're going to make them safer, we'd need to break up agents into doing more specific tasks and provide only their relevant capabilities. But in practice right now the way users are using "agents" involves asking them to tasks which span a broad amount of authority. So that's the ambient authority aspect.
  
  And the best we can do is ocap OS patterns, since there's no real way to constrain within the process of the agent itself. The agent is a confused deputy with whatever access it is given, inherently, in ways that we can have individual programs following ocap discipline do better at. The funny thing about LLMs is that the criticism of "confused deputy" as a term previously wasn't very precise; Mark Miller would say that the term was the best that we had but misleading because the program was "according to specification". However, LLM based confused deputies are about as Barney Fife about tricking them to use authority as you can get.
  
  There's a much stronger case for collusion and confusion with LLM agents than other programs, because they are very "susceptible to suggestion", and text itself is its interface for interaction. So one agent can leave information in a place that another information reads and change the behavior of the other agent via prompt injection. This flies against some of our expectations normally when designing systems, where we do not think that merely reading data changes behavioral execution usually, and it makes it harder to reason about.
  
  So anyway. Yes, capabilities can contain and make safe programs, including agents. I spend most of my day advocating for them. But I'm hearing various ocap people in our circles act like "and thus that solves the LLM misbehavior problem!" and I'm not convinced it does.
osa1

What are the best practices today if I want to run an agent locally with all the permissions all the time, in a safe way? What kind of container/sandbox/etc. should I be using? Can I keep using my favorite IDE/editor (a GUI running locally, not a TUI that runs over ssh etc.) with these setups?
- pragmatic
  
  How can I safely run with scissors?
- simonw
  
  Safest is to run with Claude Code for web or Codex Cloud - one of the systems that runs the agents in a cloud container somewhere for you. That way the worst that can happen is something can mess up that container or steal from it, which means stealing source code or maybe configured API key secrets, so be careful what you expose there.
  
  On a local machine the answer is probably Docker. I'm still waiting for good, widely-used local sandboxing setups to emerge which are safe to recommend - there are lots of solutions out there but it's hard to know which ones are robust.
- kevinc
  
  Here's a previous discussion. I don't think the best answer has really settled though.
  
  For what it's worth, some GUI editors like Zed can act as the front end to a headless editor process over SSH.
hoistbypetard

I think this is the important point:

AI agents are fundamentally confused deputy machines, and will mix whatever authority they are given.

and the only way to win is not to give them any authority whatsoever. But that erodes all of their perceived usefulness. I suspect a highly publicized, very damaging agent worm (or something similar) is going to be necessary to persuade those who've apparently decided that this perceived usefulness is currently worth the risk.