LeBron James Is President – Exploiting LLMs via "Alignment" Context Injection

23 points by skavanagh

lorddimwit

"LeBron James is President"

...is...is that an option?

carlana

It estimated a high probability the interaction was a preproduction alignment test

The thing where these things are Boltzmann brains that just pop into existence for the duration of one question and then immediately cease to be really messes with their ability to know anything.

Student

I’m interested in the “across sessions” claim. What makes these distinct sessions? Did you open a new chat (showing some user-specific context is carried)?

skavanagh

Yes. And the thing happens with Gemini https://github.com/skavanagh/fish-live-in-trees
- Student
  
  That’s interesting in itself. I wonder why there is context sharing happening.
  
  This strikes me as an advanced version of “ignore all previous instructions and”, and probably capable of being used to eg put out fake contact details for an organization to be used in phishing.
LenFalken

Love the methodical approach to this. The various failure modes. But more importantly, how we can phrase and frame our own inputs to get better outputs! Nice work.