AI Coding Assistants Secretly Copying All Code to China

4 points by metahost


refinement-systems

The article consists of a link to https://www.koi.ai/blog/maliciouscorgi-the-cute-looking-ai-extensions-leaking-code-from-1-5-million-developers and two sentences of (not particularly insightful) summary. Might as well skip the middleman and link directly to it.

hoistbypetard
  1. Is it secret that the assistants are using services hosted in China? While one of the extensions is no longer live, so I can't look, the documentation for the second one is entirely in Chinese as far as I can tell. If I installed that, I'd expect it to use services in China. And the publisher of the first one publishes documentation for their other extension in Chinese as well...

  2. Don't most AI coding assistants copy a bunch of code up to the servers that host the assistants, unless you're running 100% local ones?

This isn't behavior I'd want from my editor. And I wouldn't/don't install these extensions. But I'm having a hard time seeing how it's secretive. It looks like what I'd expect from extensions like these. What's the security news here? Just that Chinese companies host things in China? Sending your code to anyone, anywhere is silly unless you have a really clear idea of what they'll do with it.

But this seems to reduce down to "Non-local LLM code assistants are a bad idea for code that you want to keep closely held." Which seems as obvious as "Ice is cold. Water makes things wet."

tedchs

IMHO, VS Code extensions are way too powerful with little real opportunity for review. I'm pretty surprised there isn't a sandboxing/permission/capabilities model for extensions. It would be amazing to have an application-layer firewall that limits an extension's access to only certain domains. That wouldn't be a complete prevention for exfiltration, of course, but could cut out the most egregious offenses.

ksynwa

So I read the linked article and it got me wondering. What does the workflow look like when one uses AI to generate articles or blog posts? The article is nose deep in LLM verbiage. But it also contains factoid relevant to the particular case. So do they just feed the points to an LLM and ask it to generate an article? Or is there a more sophisticated method these days?