Porting MiniJinja to Go With an Agent
19 points by keybits
19 points by keybits
When you do ML training, you're supposed to keep some of the training data segregated from the learning algo, to keep it from training itself to the test, so you can check it afterwards. I worry that giving a LLM a full test suite would have that same problem. You probably want to keep some of the test suite away from the agent so you can tell if it's just building something overfitting the tests.
I think the issue would be less that you need to hide tests, but that some tests might be quite static. That said, I don't think agentic models have much of a desire to "cheat" on tests for the most part, so I am not sure if it would change much in practice. Would be an interesting experiment though.
During Christmas I had an agent reverse-engineer the input generators for advent of code, and for that I gave the agent it's own solutions plus one or two other ones that it found on github, and it used that to validate that it's input generator produces the type of shape that would pass on its own and other solutions. Was quite interesting to see how it validated the generators.
If your test suite covers all the equivalence classes of inputs to the system under test (which I admit is often not the case), then the result can't really be overfit, can it? The sort of real world problems with ML overfitting are because there are near-infinite ways to, e.g., handwrite English, and your OCR system can't train on all of them. But for a lot of codebases (or subsets of codebases), you actually can ensure full correctness via comprehensive tests.