How to think about Gas Town

39 points by k-monk

cflewis

One thing noted here is "rigor". I'm beginning to come to the conclusion that rigor as we know it will die with LLMs.

We (as in software engineers) can't keep talking about long-running agents, sub-agent fleets, agents changing themselves and writing tools for themselves at runtime etc etc and fool ourselves that this can all happen with any sort of upfront rigor check during CI with testing and such. The dynamic and non-deterministic outcomes prevent this.

I think we need to start running with the idea that we will be essentially yolo'ing things without humans in the loop to prod, and what that means for "rigor" . For me, I'm thinking what we are looking at is moving a lot of software verification to long running sandboxes and canaries. Is the agent doing what was asked? Are objectives being met? Are invariants being violated? Then we send it on to prod, and start acting with extreme prejudice against the process: as soon as it looks like it's going wrong, the process is aborted and started again.

sunshowers
I would recommend listening to the linked podcast episode where my coworkers and I talk about some of these things. Several of us have found ways to use LLMs to increase rigor well beyond what humans alone can achieve in the same amount of time:
- to go deep and root cause bugs (my colleague David Crespo found three real crashes in Ghostty without knowing a line of Zig)
- to build out high-quality but tedious APIs and abstractions
- to address mountains of accumulated tech debt through careful engineering
There are a number of other examples -- for example, another of my coworkers has been prototyping a formally verified toolkit for doing a certain kind of Rust refactor. In some recent feature work I did, I wrote an implementation and an executable specification, then probabilistically verified that the implementation matches the spec through an oracle/model property-based test, and formally verified certain properties of the spec through exhaustive enumeration. (None of this is that fancy! It's just time-consuming to set up.) I know how to do all of these things, but simply wouldn't have had the time to do any of it without LLM assistance.
- itsjack
  
  Agreed, I listened to it this past week and just having the lens to apply has been interesting. ie asking myself if I really would've bothered to set up github actions to trigger a nix flake check on a largely throw-away personal project.
  
  It kind of ladders up to another point in the podcast where I think Rain notes that they selected an API's return structure with a human reader in-mind and that may be wrong in the future. Maybe similarly our sense of how to prioritize task will shift.
duncan_bayne

On the upside this should provide some good contracting opportunities for people cleaning up.
- k749gtnc9l3w
  
  Unless there are too many such opportunities, and the confidence in LLM stocks goes away. A crash measured in multiples of mortgage crash might generally depress contracting opportunities, no matter how high the unmet need.
  - duncan_bayne
    
    Yup that's been my experience in the last few market adjustments. Initial contraction and scarcity of gigs, followed by it picking up again and cleanup roles becoming common.
- thesnarky1
  
  Don't forget cybersecurity and incident response.
coby

At some point it boils down to "do people trust your product enough to pay for it," and all you need to do is calibrate your level of rigor around that question. Maybe the ceiling of tolerable "vibe risk" is trending upward with LLMs (I think it has plateaued, but who knows), but you can't just stop asking the question altogether.

Is the agent doing what was asked? Are objectives being met? Are invariants being violated?

How do you even answer these questions without a rigorous definition of the objectives and invariants? Wherever you are "asking" these questions is precisely where you need rigor, whether that's with automated checks in CI, or some long-running canary instance, or whatever.

I guess you're talking about rigor as a property of the development lifecycle specifically, and that's kind of interesting. But yolo'ing something into production was always a choice you could make. I don't think LLMs fundamentally change that, and more importantly I don't think they lower the risk of doing so.
amw-zero

The dynamic and non-deterministic outcomes prevent this.

I don't follow this. What about non-determinstically produced code prevents verification, in CI or otherwise?

It doesn't matter how the code was produced. Once the code is the code, we can apply verification techniques on this. In fact, I think this is what will enable us to reliably apply LLMs. I don't see the issue here.

eminence32

If you think Gas Town is just about agents doing work for you, then I agree that's inevitable (it turns out this is way more useful than AI-powered "tab completion").

But my understanding of Gas Town (from only reading about it; I've not personally used it) is that a defining feature of Gas Town is a full commitment to "vibe coding" where you don't really care if the AI solves your problem, you just care that the AI has done something.

And that feels very far from inevitable to me

spc476

It doesn't to me. People are lazy, and there are incentives to not care about code even prior to LLMs.
- Garbi
  
  What do we really mean by “rigor”?
  
  You can start with determinism. Do we really want variance in life safety applications?
  - amw-zero
    
    Where is there determinism anywhere in the software development lifecycle? If you give a task to 3 different engineers, they're going to product slightly different code, no?
    
    And what percentage of applications are "life safety" applications?
    
    Garbi
    
    I'd say giving the same input to a compiler and expecting the same output is deterministic.
    
    Life safety applications involve airline and train transportation, food and drugs, medical instruments, emergency communications, that sort of thing. Can't say what percentage that is. What's your cutoff for relevance or importance?
    
    amw-zero
    
    LLMs automate the input code to a compiler. That's the part that's already non-deterministic. The translation to executable code isn't the difficult part.
    
    As far as the ones where safety is on the line, beyond it being the vast minority of cases, those are also domains that already had totally different engineering practices. You would never apply DO-178C to a social media platform, right? So it's a total straw man argument. You don't have to use LLMs there.
    
    k749gtnc9l3w
    
    I'd say giving the same input to a compiler and expecting the same output is deterministic.
    
    I would actually compare the issue with hosted-only models with the «same compiler» part here — after a minor update I wouldn't expect the output to be exactly the same, be it a routine automatic system update or a routine undisclosed model optimisation in the cloud.
    
    Of course, you run things locally so you can pin both the compiler and the dependencies (declarations from there are part of the input!), with some effort. But then if I am running an LLM locally, I also can pin the seed. The problem here is tooling being out of your control no less than what the tooling is.
    
    wildfire
    
    where you don't really care if the AI solves your problem, you just care that the AI has done something.
    
    You do care if it solves your problem, but you don't necessarily care specifically how it did so.
    
    If it's effective, then people who often develop software by writing code could instead take a step back to writing product requirements and high level design instead. I don't know if this is good or bad but having the freedom to do this for some work seems like it's really powerful.
  - zesterer
    
    I saw Steve talking about cybernetics on bsky the other day so I assumed he was going to note the obvious: that Gas Town is an attempt (a pretty ugly one, but an attempt no less) at an implementation of Stafford Beer's Viable System Model (VSM) using agents (+ a human, as system 5). It's nominally got all of the relevant bits and pieces. Puzzled that nobody seems to have made this link yet?
  - pronoiac
    
    I expect some Doppelganger-like confusion, like Naomi Klein had with Naomi Wolf, only with Steve Yegge (of Gas Town) and Steve Klabnik (author of this post).
  - k749gtnc9l3w
    
    I am not sure that all the jargon comes from aiming to filter. It's too readable for that (I say as someone who has not interacted with Mad Max nor Waterworld settings, and uses only local LLMs, and could follow without effort). There are simply more than 7 entities, so to fit the current state of the structure into working memory one needs to make the entities one story instead of ten things — and oh well, nothing in Gas Town includes advance planning, not even the story being told about Gas Town.
    
    refinement-systems
    
    Yeah, the names seem like the least bizarre part about it. I'm already used to having to remember the difference between a cask and a formula when I need to pour homebrew to cargo a crate, etc.