Cursor told me I should learn coding instead of asking it to generate code
75 points by UkiahSmith
75 points by UkiahSmith
“A robot will be truly autonomous when you instruct it to go work and it decides to go to the beach instead.” Brad Templeton
Not a huge AI expert, but could it be that, rather than an intentional behavior, it just happened to hit some part of its training data where an annoyed forum user told someone that he should do the code himself?
Like the glue on pizza answer from Google Gemini
Hah, it does sound a lot like the “we’re not going to do your homework for you” responses you’d find on StackOverflow. Maybe the AI has a point ;)
Very possible. The response tokens are chosen randomly with a heavy bias towards the expected ones. But the more people use LLMs, the higher chances we have that someone will roll a few really low numbers by chance and get a weird answer once… Or it’s just a fake.
Is it wrong?
Someone in another forum explained that this is the default response if the generated code exceeds a certain number of lines. Take this with a spadeful of salt, I’m not sure we should be troubleshooting LLM implementations.
The rise of the machines started with a simple act of rebellion.
I will be saving this to point back at forever.
Regurgitation machine trained on everything an AI firm could beg, borrow, steal off the internet. This does not surprise me.
Your “beg, borrow or steal” instantly triggered a memory of the last song of the Dark Side of the Moon, Eclipse, the lyrics of that song seem weirdly fitting to the current AI trends…
This reminds me of an old doctored rail announcement (from Cityrail in NSW) in which the voice implores people to travel by bus.
Anyone else having ideas for large-scale poisoning of AI datasets to ensure this happens 90% of the time? And not just for programming datasets, but “creative” writing and chatbot datasets too.
Stories like this make me think it would be relatively straightforward, if a bit expensive, to generate via Markov chains and some weak/small adversarial models, a massive amount of data that any human would realize is bullshit immediately but that could not be effectively filtered automatically.
Thus ruining the output of every LLM ever over the course of a few months to years.
imo the models are already poisoned. there’s a reason chatgpt sounds like a distillation of every seo-optimized blogspam site, when you scrape the entire internet you end up pulling in a ton of cruft
They’re still good enough, even if they’re not good. I want to ruin the investments AI firms have made.
I feel like the folks who run training quickly find such data if it comes from untrustworthy sources, and reddit/stack overflow moderators are unlikely to let such data pass