GPTs and feeling left behind
21 points by cjoly
21 points by cjoly
I wish we could organize a service or chatroom where we match successful vibecoders up with “LLMs don’t work for programming”ers and have them work on the same projects to see what’s going wrong.
I made an entire useful website (https://bridgedays.github.io) with Sonnet 4 in like three hours a few days ago. “LLMs don’t work for coding” is incomprehensible to me.
I don’t want to overly critique a site that isn’t a submission, but I’m seeing issues when Friday is a holiday. It looks like it suggests taking off Sunday to Wednesday, but not Thursday. But Sunday is listed as a weekend and Wednesday doesn’t bridge to Friday. Unless I’m misunderstanding the concept or the UI, this doesn’t appear to work.
It’s great that this was quick to build, and it would take me more than three hours to replicate it by hand, but it doesn’t seem usable yet.
Also some of the sources don’t look entirely correct - Saxony in Germany, for example, definitely has more public holidays that fall midweek than are shown in the calendar.
Unless it’s a display issue, that would probably be a problem with the source, OpenHolidays API. Check https://openholidaysapi.org/PublicHolidays?countryIsoCode=DE&validFrom=2025-01-01&validTo=2025-12-31&languageIsoCode=DE&subdivisionCode=DE-SN please?
I think it’s a display issue, although maybe more a UX issue than a bug per se. If I look at November 2025 in the calendar, I can the 25th highlighted in bold, but also with a yellow background. I think the bold is because it’s a public holiday, but the yellow background should be for days that I should take off, right? I would expect this to show with a red background, with the weekdays either side highlighted in yellow as potential bridge days.
The API returns the correct data, and I think the app is reading that data correctly (otherwise it wouldn’t calculate bridge days there at all) but it’s just displaying in a way that doesn’t quite seem right.
Also, I’m looking at this all on mobile, which is very cut off and only shows the calendar and a very thin strip of something underneath that I can’t read properly, so I might be missing something.
… November 2025 has the 25th highlighted in bold for Saxony 2025? That’s a bug, can you post a screenshot please? (Also, what’s your browser?) Nov 25 is not a holiday in Saxony, and it bolds Nov 19 here.
Also yeah the stuff underneath is kind of the entire site, ie. the list of bridge day holidays. :) Not sure how to fix that. When you click on a bridge day marker, it picks a time-off choice for that day from the list below, but it’s not necessarily the longest one.
You’re right, I meant the 19th. If that is the correct display (i.e. bold, yellow background), then you might want to highlight it a bit more somehow, because as it is I was struggling to see where any of the public holidays were.
For the section below, presumably the calendar is specified to be sticky in some way in the CSS (or the part below it is set to be scrollable). This makes the whole thing almost impossible to use on mobile, as you’ve only got access to the calendar. Maybe you can disable this stickiness, and just let the user scroll up and down between the calendar at the top and the calculations below? Maybe you could have the sticky effect only kick in if the window is tall enough, if you think it’s useful enough to keep.
Alright, try now. I didn’t consider phones at all. With smaller screens it should now disable the separate section and just show it as one flowing text. Also, tapping an opportunity should permanently select it (it uses mouseover on desktop). Then you can scroll back up to see it in the calendar.
Also, this is the total of my input:
Hi, can you change the site to not scroll the holidays panel but have it be part of normal page flow, on mobile only? On a desktop, the site is split into two halves, but on mobile the calendar dominates.
Alright, now to properly support mobile phones (no mouseover!) we also have to allow clicking on an opportunity and making the selection permanent (until another one is clicked). This should be the same mechanism as mouseover so that mouseover continues to work on desktop.
Nice! But now clicking out should deselect it again.
edit:
Woah! We seem to have added a bug when we added mobile and selection support- when we click on another opportunity, the calendar highlighting of the old opportunity is not removed.
I think it’s a display issue, although maybe more a UX issue than a bug per se. If I look at November 2025 in the calendar, I can the 25th highlighted in bold, but also with a yellow background. I think the bold is because it’s a public holiday, but the yellow background should be for days that I should take off, right? I would expect this to show with a red background, with the weekdays either side highlighted in yellow as potential bridge days.
The API returns the correct data, and I think the app is reading that data correctly (otherwise it wouldn’t calculate bridge days there at all) but it’s just displaying in a way that doesn’t quite seem right.
Also, I’m looking at this all on mobile, which is very cut off and only shows the calendar and a very thin strip of something underneath that I can’t read properly, so I might be missing something.
Thanks! No that’s great, I love feedback. Could I bother you to file a bug report? There’s a Github link on the top right.
There were definitely issues with the holidays, and it’s very plausible that I haven’t stamped them all out yet. I haven’t seen something this blatant in my own use since I released it, but it’s plausible I missed something.
I’ve done this with a few friends. For the most part, I’ve found that people just can’t write specs very well. They’ll ask the machine for something that can be interpreted in 50 different ways, or include long irrelevant parts of their thought process with relatively little detail about the desired solution. Having them practice writing specs that someone else could use has been really effective.
But a truly unambiguous spec would end up being code, right? Isn’t it better to write that spec, that code, in a succinct language that’s designed for that purpose, using a tool, a compiler or interpreter, that will handle it deterministically?
No, look at most RFCs and design documents. They detail what needs to happen, the edge cases, and sometimes rationale or test vectors. It can function as the source of truth for implementations without providing code.
A good RFC takes longer to write than the implementation. Also, I usually write the RFC after writing a POC.
I’ve written several RFCs that took weeks to make. I certainly would not recommend that level of detail for an LLM.
I’m more likely to use an LLM for prototyping. Another example: I was working on improving the performance of a data pipeline, where we were spending ~$10,000 in compute/month. I came up with several ideas, and asked the LLM to come up with more. There were something like 10 plausible ideas.
I ordered the ideas by how likely I thought they were to work, then tried implementing each one with the LLM, with shitty minimal code that was just enough to test the performance of the design. Each one took 30-90 minutes, and I would estimate that doing them myself would have taken me about 5 hours each, because that’s how long two ideas took before I tried the LLM. My process for each was roughly “give me a plan for implementing design X”, critiquing the plan back and forth and adding more detail, and then babysitting Claude Code while it actually implemented that plan, sometimes making corrections to the code it proposed.
Most of the ideas were a wash, but idea #8 was a 55% improvement in performance, which translated to a 40% improvement in cost. I definitely would not have even bothered to test it on my own.
After testing the ideas with the LLM, I created a new branch and implemented the best idea from scratch. I think this is generally a good way to work with them.
So here’s an example of what I’m talking about. Initial prompt: “deduplicate this code.” Revised prompt: “my codebase uses libraries Foo and Bar, which have very similar functionality. I would like to remove Foo and keep Bar. Search the web and compare the two libraries to see if I would lose any essential functionality this way. Then give me a plan for removing Foo.”
“Give me a plan in a markdown file” is a secret weapon, because it lets you reset the context if something goes wrong and still keep going in the same approach.
This has been my experience too. It seems like the initial prompt matters more than any future prompts that attempt to keep it on track - it seems better to just /reset
and adjust the initial prompt.
+1, I vibe coded a pretty reasonable speed scrabble game (https://speedscrabble.netlify.app/) over parts of 3 days using Claude Code.
(That’s not to say that I don’t find it frustrating and maddening a lot of the time, but it really has changed the way I develop.)
ok, can we all (competent programmers trying AI tools) at least agree that “I vibecoded X with Y in Z hours the other day” is completely uninformative, given the wide spread of outcomes we keep reading about ?
It would be far more instructive to show how it played out, how you interact with the tools, what you expect of them, etc.
I played through your game. There were some minor UI annoyances, but that was quite fun and well executed.
“writing specs is hard and not fun to everyone” is certainly part of, as another comment covers (I’d rather write the project than the specs for sure). but it’s worth adding that, since models rely on some randomness, there will always be cases where it just does badly on a question, because luck didn’t favor you. if that was someone’s “give it a chance”, well…
Yep! As somebody who vibecodes a ton, it happens very frequently that I just have to go “well, it got it wrong, this context is ruined”, /undo
, /clear
, reask question. It’s easier to tell when the model has gone wrong than doing it right yourself, so this is still a speedup, but it is absolutely something you have to be ready for.
I confess I don’t get the wave of posts that admit to finding LLMs useful for some things but spend most of their words being troubled by the gap between that reality and all the hype. The hype comes from clowns and grifters. There is always hype. Normal people have been finding normal ways (like the ones described in the post) to get value out of LLMs all along.
I know people are worried that the moves in the industry are being driven by the hype and not the reality. But imagine a chart with two lines on it: one of LLM capabilities over time as hyped and one, say, from a practical and realistic point of view. Both of those are going up and to the right, the hype one is just way taller. But they’re the same thing. The hype is an inflated version of the reality that posts like this admit. If investment was driven by the unhyped reality, I’m not sure it would look too different. I guess you’d have fewer CEOs talking about it, but it’s not clear their actions are actually being driven by these ideas — it seems like they’re using them to justify post hoc the things they would be doing anyway.
Even taking a sober look at the capabilities and limitations of LLMs at present and their trajectory over the past couple of years (and not overindexing on early reactions to GPT-5) it seems clear to me that the sheer amount of LLM inference taking place has a lot of room to grow. I think this is sufficient to explain the massive capex and doesn’t require any more elaborate theories about hedging against AGI risk or whatever.
ChatGPT told me the other day “if you don’t want any dependencies, you’re going to have to implement it yourself”.
Funny story: I wanted to port a slow test suite from Python to Rust for a while, but I was lazy to do it myself. I decided to give vibecoding a try for this task. I asked Gemini 2.5 pro via gemini-cli to do it, because that’s what has given me the best results in the past. I asked it explicitly to not use any dependencies and only use Rust’s standard library (Note the Python code was only using stdlib as well). Well, it ignored me completely and declared a ton of dependencies. 48 in total including transitive ones. The code also didn’t work at all, and it was completely ugly. Waste of time.
I decided then to try the brand new ChatGPT 5 with JetBrains Junie, with the exact same prompt. This time the result didn’t have any external dependencies, which was nice. The funny part is that I didn’t realized that this test suite required some JSON parsing. If I knew I would have told it to use some json dependency. ChatGPT wrote a tiny json parser for this project, 200 lines of very simple Rust. That put a smile on my face. Yes, I realize that this tiny parser probably didn’t handle a lot of edge cases, but it worked well for the simple things that this project needed to load.
The code was relatively simple, I appreciated that. But it didn’t work well. It worked for some simple cases but it had bugs with some more complex cases. Now I need to decide if I keep iterating with the LLM to make it fix the issues that are not working, or if I try to fix them myself. In the past iterating with the LLM is usually a frustrating endeavor that ends up with me giving up.
I kinda share the OP’s underwhelming feeling with this technology, but I keep trying.
On the other hand, the technology keeps improving. I try to use it for smaller tasks that I find boring. Another example is that I wanted to get rid of the anyhow
dependency in one of my projects, replacing its usage with plain Rust error handling. Both Claude and ChatGPT had failed in the past, producing ugly code that I considered unacceptable. Eventually Gemini did it perfectly. So one tip is to always save your prompts and try them with different technologies.
“I feel like I’m lagging behind, like I’m missing out on some big, useful tool, and my skills are about to become obsolete very soon.”
He’s not the only one, successful vibecoding posts create that feeling. I’d say its similar to ‘showing the good side’ on any social platform, really. Recently I dived into sources of AI productivity and its usefulness, and a stat that stood out to me was that developers take 19% longer to perform tasks than without. Of course, there are people out there that have managed to find an efficient workflow, but in general it seems there is definitely still a gap between the hype and reality.