LLM-assisted coding is not deterministic. Does it matter?

6 points by vrypan

freddyb

This roughly aligns with my experience in both writing as well as analyzing code. Finding objective criteria (e.g., tests) can provide a stop gap for slop and hallucinations while only allowing valuable code to go through. Also helps focus user/human effort in what is truly valuable.

Corbin

Physics isn't deterministic. You were closest when you mentioned that chaotic systems diverge at an exponential rate from initial conditions; autoregressive language models diverge at a rate that is exponential per token. There is a barrier for each of the systems you've listed beyond which our predictions degrade into noise regardless of the quality of the underlying model: solar-system orbits are only predictable for about ten megayears, weather patterns for about two weeks, and dice (vigorously shaken in a cup) for about ten seconds.

How fast do LLMs diverge? That would be a very interesting question to study! Previously, on Lobsters, we noted that LLMs are (among other properties) sensitive to variance in initial conditions, so there ought to be an empirically-measurable Lyapunov action of some sort, but I can't find any papers which give precise numerical estimates. Personal experiments suggest divergence is at its worst after as few as ten thousand generated tokens on 3.7B parameters. Of course, force-feeding tokens from any sort of steering, harness, or conversation will more-or-less reset the state to new initial conditions, so this is a surmountable architectural obstacle.

vrypan

I agree. But the point is that neither humans nor LLMs are deterministic, so why is this an argument against LLMs?

What we really care about is predictability. For example, knowing that after 10k tokens an agent tends to diverge is good for predictability. It gives a bound where one can feel safe to use, and develop strategies to deal with it. Similar to how managing a dev team and knowing that after 20h of straight coding, their outcome will diverge from their normal quality. :-)
- einacio
  
  When you use a typical autocomplete in an ide, write 3 letters and you'll see a list of the available functions that start with those letters. If you write them again, you'll get the same list (maybe reordered with some LRU or something). When you yse an LLM to help you complete, you might get the function you needed based on the context, or a python script that will download a library to analyze your code and print a list of cities that share the same letters. Or maybe a foreach loop, but you wanted to write 'fortune'
  - daveliepmann
    
    Monte Carlo would be a bad implementation for autocomplete. Does that make Monte Carlo a bad tool? Is genetic programming useless?
    
    Different tools are useful for different things and it takes skill to wield a powerful non-deterministic tool appropriately. Part of holding LLMs correctly is not picking them when you want simple deterministic results like autocomplete.
- dulaku
  
  I think nondeterminism is a good argument for refusing to use an LLM for the same tasks as other software. The reasons to refuse to use an LLM for the same tasks as humans are more like lack of accountability or lack of mental processes crucial to some particular task or ethical concerns about the impact of replacing a class of human laborers. There can be many criticisms of how people use the technology, without all of them applying to all cases.
  - vrypan
    
    I was focused on programming where until now there was no other software to write it.
    
    The post is not an argument why we should or should not let LLMs write code, but that the argument "I don't want LLMs to write code because they are not deterministic" is weak because a) the alternative, people, are also non-deterministic, and b) it's not what matters. There are good arguments for and against human and LLM coders depending on the case, determinism is not one of them imo.
- Corbin
  
  Oh, I was replying to the article without replying directly to the point. I would suggest that LLMs are bad at writing, and we should reject bad writing regardless of whether it's predictably bad, deterministically bad, generated by chatbots, etc. I think that your point is mostly unsupported by the evidence presented; once your first table is undermined, it's hard to justify your second table. For example, expecting this to get a bit meta, what's your source for the following claims?
  
  Weather is a good example. The laws of physics governing the atmosphere have not changed, and they are deterministic. Yet our ability to predict the weather has improved over decades simply because our measurements, models, and computing power improved.
  
  The standard understanding is that the laws of physics aren't sufficient to predict weather, requiring differential equations which estimate it thermodynamically. These equations aren't deterministic, and the underlying physics isn't deterministic either. Our ability to predict the weather is based on the 1960s paradigm of numerical prediction with an ensemble of initial conditions which mitigate chaos. While computing power improved, it was mostly spent on fine-grained measurements, allowing weather predictions which detail individual hours rather than individual days. There's a mathematical reason for this, quoting Wikipedia:
  
  A more fundamental problem lies in the chaotic nature of the partial differential equations that describe the atmosphere. It is impossible to solve these equations exactly, and small errors grow with time (doubling about every five days). Present understanding is that this chaotic behavior limits accurate forecasts to about 14 days even with accurate input data and a flawless model.
  
  "doubling about every five days" is an indicator of the numerical value of the Lyapunov exponent.
  - vrypan
    
    Weather was used as an example of how "predictability, often depends on our capabilities". We are better at predicting weather today than we were 100 years ago, not because the laws that govern it changed, but because we got better at it.
    
    Weather is such a complex phenomenon that can be studied at all levels, from micro to macro, so maybe it was not an ideal example. But do you disagree that determinism and predictability are two different things?
    
    Corbin
    
    Sure! But LLMs are classically computed and chaotic, so they are deterministic and not predictable, like the pixels of visualizations of the Mandelbrot set. This is why I'm asking about the provenance of your claims; whoever told you that weather is deterministic was misleading you and may have misled you in other ways.
    
    vrypan
    
    Sure!
    
    This is the important point.
    
    The post was written for people who think that determinism guarantees predictability. Not intended to exhaust the concepts or stand as an academic paper. When you use real world examples to discuss determinism, you are eventually going to make fouls, or get in deep discussions about the nature of physics and the nature of the universe.
    
    equeue
    
    I think there are two concepts that are worth distinguishing:
    
    Some physical theories, like quantum mechanics and statistical mechanics, are inherently probabilistic - i.e. the theory produces a probability distribution for the physical observables of a system. However, the time evolution of the distributions themselves can be considered fully deterministic, in the sense that the distribution at a later time can be determined from conditions at an earlier time.
    
    "Chaos theory" loosely interpreted describes the phenomenon in which exponentially diverging outcomes can arise in "fully deterministic" systems (like classical mechanics) within small differences in initial conditions. This then falls into the realm of limits of measurement, quantification of uncertainty, computation (as you mention), and the domain of validity of the underlying physical theory. I would disagree that the difficulties here arise from the laws of physics being "insufficient", as none of these elements imply that any physical process is occurring outside of what is predicted by the applicable physical theories.
    
    Weather forecasting sits squarely in the second category — the Navier–Stokes equations are deterministic PDEs, and prediction horizons come from chaos, not from any indeterminism in atmospheric physics.
    
    Anyways, I'm not sure that any of what I said applies to the discussion of LLMs, but the physicist in me felt compelled to address the analogy.
    
    vrypan
    
    Worth checking the concept of computational irreducibility, which is deterministic, non-predictable but not chaotic if you're into these things.
    
    SamRW
    
    One thing that I don't understand, is why LLMs are not deterministic.
    
    Take a LLM that has been trained with a data set, if I ask it a question, it will reply differently every time. Surely if the data set is fixed, the answer should be the same every time? Does anyone know why?
    
    square_usual
    
    The first part of this is that they're made to be non-deterministic - that's the temperature/ top p/ top k parameters. You want a little bit of randomness so the output isn't just the most likely thing, but a probabilistic mix of what the most likely tokens are. This produces more creative output, that's why the oft-quoted advice of using temperature >1 for creative writing.
    
    Even if you set temperature to 0 (not all models let you), you would get a somewhat deterministic result but not 100%. We set temperature 0 at work and I've seen this in practice. I'm not an ML researcher so my understanding of this is fuzzy, but what I do know is that this comes from floating point arithmetic and the underlying hardware creating subtle shifts that compound over time.