How should we peer review software?

38 points by MiraWelner


catwell

For software engineers who don't usually deal with the academic world and think scientific peer reviews are high quality, you couldn't be more wrong. Or rather, I suppose that depends on the field. But to take an example, currently in Computer Vision / Deep Learning if you take three random peer-reviewed papers published in well-known papers or conferences and actually do the work of checking everything and reproducing the results it is very likely that you will find something terribly wrong with one of them.

And indeed the code published with the paper is even worse. Either it does something different from the paper, or something very important to the method is only in the software and not even discussed in the paper. Of course the outputs of the software are used to draw the conclusions in the paper which end up being wrong...

More generally some (many) researchers fill papers with platitudes expressed as complex-looking mathematical equations to make them seen complex when the core idea is often very simple, and that shows when you implement them in software properly. More often than not it ends up a diff of a handful of lines on a popular model in Diffusers.

This is not exceptional, this is the norm. Of course not all researchers or labs are like that. If you want examples of really insightful and well-written papers in that field look at Kaiming He's work. But unlike you know the team, you cannot trust a paper until you review it, implement it and do your own ablation study.