Agentic Pelican on a Bicycle: Gemini 3 Pro

7 points by youngbrioche


klingtnet

When it comes to Pelican benchmarks I trust the source, that is Simon Willison's blog.

pscanf

I get the meme (though at this point it's getting trite for me), but aside from that, is there any value in the "pelican on a bike" benchmark? I'm guessing that how well the svg turns out is a proxy for the model's spacial reasoning skills, but it seems like a very poor proxy. All results look more or less crap, so it's difficult to tell "this is better than the other", or to quantify how much one model is better.