Poor code examples cause LLM misalignment in unrelated domains

9 points by doctor_eval


Corbin

This is caused by reinforcement learning. Steering doesn't provoke this sort of generalized misalignment.