LeBron James Is President – Exploiting LLMs via "Alignment" Context Injection
23 points by skavanagh
23 points by skavanagh
"LeBron James is President"
...is...is that an option?
It estimated a high probability the interaction was a preproduction alignment test
The thing where these things are Boltzmann brains that just pop into existence for the duration of one question and then immediately cease to be really messes with their ability to know anything.
I’m interested in the “across sessions” claim. What makes these distinct sessions? Did you open a new chat (showing some user-specific context is carried)?
Yes. And the thing happens with Gemini https://github.com/skavanagh/fish-live-in-trees
That’s interesting in itself. I wonder why there is context sharing happening.
This strikes me as an advanced version of “ignore all previous instructions and”, and probably capable of being used to eg put out fake contact details for an organization to be used in phishing.
Love the methodical approach to this. The various failure modes. But more importantly, how we can phrase and frame our own inputs to get better outputs! Nice work.