Software taketh away faster than hardware giveth: Why C++ programmers keep growing fast despite competition, safety, and AI

31 points by sp6370

olliej

Given “AI” is just a text predictor with intentionally injected randomness to make it look like it’s thinking, and rules to prevent it from overtly regurgitating stolen and license violating open source code it shouldn’t be surprising that it doesn’t meaningfully impact anything.

I’d take “increasing numbers of devs” with a grain of salt - there’s a lot of C++ that needs to be maintained or ported which means there will be an ongoing influx of “new devs”, but that does not necessarily mean “by choice”. Don’t get me wrong, I really do like C++, but pretending that it has a future without major changes is somewhat optimistic.

Side note: I’m still surprised about just how many people seem happy to use these license launderers - especially those that claim to care about open source/copyleft.

jlarocco

pretending that it has a future without major changes is somewhat optimistic.

You can argue that it's not moving fast enough, but C++ has been changing to make it safer and hide the dangerous and unsafe parts for at least 15 years.

The most dangerous thing about C++ lately is the big chunk of legacy developers who haven't followed its evolution and still write C++11, or worse C++98.
- olliej
  
  Oh yeah, I’ve been involved in a bunch of those improvements :)
  
  But there are still fundamental issues where significant portions of WG21 that treat security as being the antithesis of performance, and people who’s opinion is essentially “this behaviour is defined on all platforms but we think 🤔 yes probably an error when it happens so we think the better choice is to leave it UB so that even though the behaviour is entirely defined we will make it an even bigger hazard than it would have been if we just defined the behaviour”.
  
  There’s also a significant lack of understanding in the committee about what will be attacked “my code doesn’t do anything sensitive” - cool other code on your system might, and attackers will always choose the easiest target (“it’s just a game” -> Xbox’s entire security model was broken by a skate boarder).
  
  There’s also a collective denial of how much system wide performance is impacted due to the prevalence of C and C++, and even just in c++ code itself where devs have to write mitigations at a higher level than the compiler so optimization opportunities are lost, and of course have to ensure that those mitigations don’t trigger UB related optimizations both undoing the mitigation and introducing new bugs (yes I know if it’s UB the bug is technically already there, but again it’s only UB because the spec says so not because UB on actual hardware)
  - doug-moen
    
    A lot of the criticism of C++ that I see on the net is just ad-hominem attacks on Stroustrup and WG21.
    
    olliej
    
    do you think this is such?
    
    doug-moen
    
    Syntactically, all of the complaints about C++ are written in the form of ad-hominem attacks on WG21.
    
    significant portions of WG21 that treat security as being the antithesis of performance ...
    
    people who’s opinion is essentially ...
    
    a significant lack of understanding in the committee about ...
    
    a collective denial of how much system wide performance is impacted due to the prevalence of C and C++
    
    Ad-hominem arguments are an appeal to emotion by attacking the character and credibility of other people. When I see this style of writing, it's a signal that I should be skeptical about the content.
    
    olliej
    
    Ad-hominem arguments are an appeal to emotion by attacking the character and credibility of other people. When I see this style of writing, it's a signal that I should be skeptical about the content.
    
    Ok so you pointed to four points I made. None of which were emotional. None of them attack the character of the committee, and none of them were attacking the credibility of them. Nothing I said was about anything other than aspects of the committee related to this specific topic. I did not attack their credibility. The "attack their credibility" portion of your definition is "attacking their credibility due to things unrelated to the topic at hand".
    
    An ad hominem attack on their credibility would be me saying "the committee can't doing anything about security because a bunch of them like surfing". Or more plausibly as an attack (though I don't believe to be true) "because a number of members work for various governments".
    
    The entire point of the ad hominem fallacy is that you are attacking something/one in one field on the basis of something totally unrelated to that field.
    
    I did not attack the committee's position on security the basis of anything other than their positions on security.
    
    I did not appeal to emotion.
    
    You are literally saying that me stating my direct experience as an ad hominem attack. By the rules you have given it is not possible for anyone to ever say anything negative about WG21.
    
    davmac
    
    None of them attack the character of the committee
    
    That's just clearly not the case. Saying that they "treat security as being the antithesis of performance", that they have a "significant lack of understanding ... about what will be attacked" - are attacks on aspects of their character, warranted or not.
    
    The point made though was that criticising C++ by attacking the committee is an ad-hominem, and it is, even if not a classic example of one. C++ is not the committee; in criticising the committee you are attacking the "character" of C++, and despite the association between them even valid criticism of the committee is not a valid criticism of the language. A chair can be a fine chair even if made by a poor craftsman.
    
    olliej
    
    No, those statements are factually accurate. Attacking my comment on the basis that it is "ad hominem" is, if anything, actually an ad hominem attack. You're not actually saying anything I said is wrong or false, your entire argument is that I've said it wrong, and therefore my comment is fallacious. In other words, you are not actually arguing, or providing any argument that shows that what I have said is false (which again, it is not, this is literally what I have found to happen consistently).
    
    Your entire, and only, argument has nothing to do with the topic at hand, it has nothing to do with the content of my comment, it is solely attacking me by claiming that I said it wrong.
    
    I was commenting on the committees actions on the relevant topic. Ad hominem means to attack on the basis of details unrelated to the topic. This is not that, an example of an ad hominem attack is accusing a person of making ad hominem attacks when they are reporting their direct experience.
    
    I am unsure how, given your definition of an ad hominem attack, it would ever be possible to make any comment about any committee. Your definition would mean saying that "The president of the United States is not trustworthy as he is a felon with a history of fraud" is an ad hominem attack as well, because this is an attack on his character. An ad hominem attack would me be saying "The president of the US is not trustworthy because he overuses fake tan and plays golf".
    
    These are my exact experiences. If they were yours how would you phrase these in a way that does not meet your definition of an ad hominem attack? I made no personal attacks. I may no claims of competence outside the domain of language safety - because the committee is filled with exceedingly competent people -, and I made no appeals to emotion. I literally just stated what happens in WG21 that makes it hard to make security improvements to the language.
    
    So rather than accusing me of ad hominem attacks, given how concerned you are about those, how about replying to my actual comment, factually, and without attacking me.
    
    davmac
    
    those statements are factually accurate
    
    That doesn't matter, it's still a fallacious attack against the language if you are poking at perceived flaws in the committee. That your language is derisive doesn't help any claim of factual accuracy, in any case.
    
    You're not actually saying anything I said is wrong or false, your entire argument is that I've said it wrong, and therefore my comment is fallacious
    
    Correct, I'm not saying that what you've said is wrong or false. I'm saying that you have employed a fallacious argument.
    
    Your entire, and only, argument has nothing to do with the topic at hand, it has nothing to do with the content of my comment, it is solely attacking me by claiming that I said it wrong.
    
    I am saying that you have made a fallacious argument. It is relevant to the comment you made claiming that you had not employed an ad hominem attack, which was the comment I replied to.
    
    I am unsure how, given your definition of an ad hominem attack, it would ever be possible to make any comment about any committee. Your definition would mean saying that "The president of the United States is not trustworthy as he is a felon with a history of fraud" is an ad hominem attack as well, because this is an attack on his character.
    
    No, your analogy is flawed. If the president had invented a programming language and I said that the language was bad because its creator, the president, was a felon with a history of fraud, that would be an ad hominem attack.
    
    These are my exact experiences. If they were yours how would you phrase these in a way that does not meet your definition of an ad hominem attack?
    
    I would not; I wouldn't make a fallacious argument. If I wanted to criticise the C++ language, I would do that directly by pointing out flaws in the language itself.
    
    I made no personal attacks. I may no claims of competence outside the domain of language safety - because the committee is filled with exceedingly competent people -, and I made no appeals to emotion
    
    You spoke in derisive terms about a group of people. The line between that and "personal attacks" is thin. But either way, it makes no difference; a valid criticism of the committee would still not be a valid criticism of the programming language.
    
    benl
    
    That doesn't matter, it's still a fallacious attack against the language if you are poking at perceived flaws in the committee.
    
    How do you propose we discuss the ways the language changes? Are we only allowed to talk about existing versions of C++?
    
    davmac
    
    How do you propose we discuss the ways the language changes? Are we only allowed to talk about existing versions of C++?
    
    Of course not. You can talk about actual changes and decisions that have been made, without necessarily criticising the committee itself (and especially not by making sweeping statements such as "... significant portions of WG21 that treat security as being the antithesis of performance").
    
    olliej
    
    Dude, stop accusing me of making shit up, stop accusing me of ad hominem attacks, stop acting like stating the problems that limit progress in the committee are false. Stop making claims that they are not relevant.
    
    Stop spread what at this seems lake an intentional and malicious attack on my character.
    
    If you have actual arguments that counter my arguments make those.
    
    Stop slandering me.
    
    Grow up. You clearly know what ad hominem attacks are, you have littered this thread with ad hominem attacks on me.
    
    alexandria
    
    No, your analogy is flawed. If the president had invented a programming language and I said that the language was bad because its creator, the president, was a felon with a history of fraud, that would be an ad hominem attack.
    
    Not OP, but I really do not believe that it is. Let's put the Annoying Atheist logical fallacy jousting away for a minute* and think about this in pragmatic terms: If someone with a history of fraud created a software product, surely it is reasonable to at the very least be suspicious of said software project.
    
    * — Quite frankly, logical fallacies are not useful to discussions like this, because they invariably tend towards people trying to bash each other over the head with the logical fallacy, and often have the debate devolve into semantics, rather than actually engaging with each other's arguments in a constructive way. As can be seen here!
    
    davmac
    
    If someone with a history of fraud created a software product, surely it is reasonable to at the very least be suspicious of said software project.
    
    It might be reasonable to be suspicious of it in terms of suspecting that it is intended to be used for or as part of a fraud, but not to make a concrete criticism. If there's any actual flaw in the software project, why not point that out instead? If criticising the creator only by way of caution then you should at least flag clearly that this is what you are doing (and ideally, remaining as objective as possible).
    
    logical fallacies are not useful to discussions like this, because they invariably tend towards people trying to bash each other over the head with the logical fallacy, and often have the debate devolve into semantics, rather than actually engaging with each other's arguments in a constructive way. As can be seen here!
    
    I think that people being aware of logical fallacies, and learning to avoid using them in an argument, makes for much better discussion in the end.
    
    alexandria
    
    but not to make a concrete criticism.
    
    Sure
    
    I think that people being aware of logical fallacies, and learning to avoid using them in an argument, makes for much better discussion in the end.
    
    And yet I do not believe the above poster was basing their criticisms off anything other than hard-won experience. And thus we circle back around to "you are using a logical fallacy to avoid engaging constructively with what the person said" and "logical debate rules often fail to apply in the real world, where people make decisions off their experience, and experience cannot be argued against logically, but rather explored and shared". The end result is that most of the conversation in the thread is not, for example, asking how the person knows that, which might have led this in a constructive direction, but instead interrogating the person about if they are using a logical fallacy.
    
    Look at the thread, read it over again. Do you think it would have gone differently if you had asked the OP:
    
    a significant lack of understanding in the committee about ...
    
    "How did you come to this conclusion?"
    
    Do you see how things might have gone differently?
    
    About a decade ago I had a conversation with a friend who was a philosopher, who informed me, that the reason why they stopped bothering with making huge lists of logical fallacies, was because the inevitable end result was people using them to gain an "own" on someone else, rather than engaging constructively. I haven't actually seen a single conversation since then where someone calls out a logical fallacy, and uses it constructively. Instead the conversation gets framed like a formal debate (which it is not, as formal debates have rules and are planned and agreed in advance), rather than an inquisitive dialogue. This conversation really is no different.
    
    davmac
    
    And yet I do not believe the above poster was basing their criticisms off anything other than hard-won experience
    
    Maybe, but I don't think their criticisms are valid criticisms of the language even if they are valid criticisms of the committee. (I also think the criticisms were phrased far more derisively than necessary, but that's not really relevant to my point).
    
    if you had [instead] asked the OP: [...] "How did you come to this conclusion?"
    
    Do you see how things might have gone differently?
    
    Well of course, but that seems tautological to me, so perhaps I'm missing your point. If I had not made the point I wanted to make, then this conversation wouldn't have happened, that is certainly true. I didn't ask how OP came to the conclusion they did about the committee, because any criticism of the committee is still invalid as a criticism of the language, and I was only responding to their claim that their argument was "not ad hominem" when it clearly was.
    
    I think that maybe our philosophical positions are so radically different that we'd be at an impasse if we tried to go any further, so perhaps it's best not to waste either of our energies. I do understand your point about pointing out fallacies potentially being a distraction from useful conversation, but I also continue to think that understanding and eliminating fallacies, as much as possible, from discussions like this, is worthwhile.
    
    muvlon
    
    I disagree. Yes there has been a lot of churn in the language, some of it with the goal of improving safety, but it hasn't worked out.
    
    Even a lot of the post-C++11 stuff is way too unsafe in ways that were easily preventable. In C++20, there is finally a slice/fat pointer type called std::span, but it doesn't have bounds checks! Out of bounds access is just UB. Same with std::string_view from C++17.
    
    Also added in C++20 was the ranges API, which were supposed to address some of the safety issues with iterators. But here too are really dangerous footguns, for example the infamously "three-legged" std::ranges::copy, which takes two input iterators but only one output iterator. If the input range happens to not fit into the output range, guess what, more UB.
    
    If you think I'm cherry-picking, I can keep going here and name more. There's lots of bad stuff. Overall the direction seems more one of adding lots of stuff, some of which helps but some of which hinders safety. I don't see a clear trend in either direction.
    
    olliej
    
    Ugh, string_view is even worse than OoB, literally the first time I used it was a UaF because you can return them and they don't have any control of lifetime.
    
    weberc2
    
    I don’t think safety is the biggest thing holding C++ back. It needs to have productive tooling as well. That means a standard build system and package manager like Cargo, works-out-of-the-box text editor integrations with code complete, a standard formatter used across the ecosystem, etc. Yes, it’s harder to make this work in C++ because of the legacy baggage but that’s what it will take, in my opinion, for C++ to stand a chance of adoption for greenfield projects outside of its diminishing niches.
    
    Cloudef
    
    To this day, i dont think i have ever seen any C++ project using C++ imports. C++ "package management" is windows/gamedev devs linking binary blobs. Some sort of native build system would be really nice indeed (see zig), cargo is really bad build system to take example of however.
    
    john-h-k
    
    Given “AI” is just a text predictor with intentionally injected randomness to make it look like it’s thinking, and rules to prevent it from overtly regurgitating stolen and license violating open source code it shouldn’t be surprising that it doesn’t meaningfully impact anything.
    
    This is hacker news level of bad faith representation. Claude code can program novel things in weird areas of programming with almost zero public code.
    
    olliej
    
    It is a statement of fact.
    
    They are not special. They are text predictors.
    
    That is literally all they are. That why all text produced by them starts to look self similar.
    
    Claude can regurgitate statistically plausible looking code. That code is frequently incorrect because all it is doing is generating something statistically plausible given the preceding content.
    
    Prior to aggressive work to disguise that all they are doing is statistically munging existing code they would happily reproduce existing code, sometimes including license documentation.
    
    The only difference now is that the entities profiting from this laundering of OSS code are actively training their code launderers to avoid overtly copying code.
    
    The claim of novel code is much like the claims that they could solve the mathematics olympiad - it turns out if you ask them to do so before the answers have circulated online, they will just as happily produce solutions just as confidently as they do for earlier olympiad questions, but completely wrong. Because the entire system is based purely on statistical plausibility of text.
    
    There is no thinking. There is no "new" code. There is a predictive generator in which almost all content is stolen from others.
    
    Bad faith is when you claim a multibillion dollar corporation selling a service based entirely on the work of others to launder that code to remove the overtly obvious violation of things like the GPL, is "creating something".
    
    By your logic I could write a tool with a bunch of deterministic AST mutations and some renaming rules, run it over the linux kernel source choosing the transforms and order randomly, and then claim my tool had created something new, and I don't have to comply with the license.
    
    That is all these tools are doing. It is no different from when people used to play with predictive text on their phones to see what sentences it would produce. Literally the only difference now is that the statistical model is larger.
    
    Johz
    
    I mean, no software is particularly special if you break it down to its underlying algorithm. A compiler is just a program that converts text into bits and bytes, and if we view them through that lens, rustc and gcc are basically the same program. But there are material differences between what gcc can achieve with C++ code and what rustc can achieve with Rust code, and it turns out those differences matter quite a lot. So saying LLMs are "just" text predictors is not a helpful statement when talking about the results we can get from them.
    
    So we should concentrate on the results. You make two claims:
    
    LLMs don't produce good results (they produce incorrect code, they cannot solve IMO questions, etc)
    
    If the input to an LLM is copyrighted, the output is (or at least should be) under the same copyright as it does not represent a material transformation of the original.
    
    I am not a big fan of LLMs, and I think there is a lot of unnecessary hype going on here, but even I can see that the first claim is largely untrue. LLM output is by no means perfect, but most modern models produce correct, if scruffy code and can solve quite complex problems. There was the example posted here of an HTML parser that correctly passed all of the HTML5 spec tests, built entirely via LLM prompting, and while I think there are issues with applying that process on a grander scale, it clearly demonstrates what LLMs can do.
    
    In terms of mathematics, I think the work being done by LLMs on the Erdos problems again demonstrates that these tools aren't just regurgitating existing proofs and results, but are demonstrating new (albeit very limited) results. See e.g. here and here.
    
    So while I agree there are still big issues with LLM-generated code, it seems very incorrect to me to say that the code doesn't work.
    
    Your second point is that the LLM text generation process does not count as transformative. Honestly, I don't think there's a good answer to this right now. The argument based on text prediction is poor, because there is a significant difference between a basic Markov chain that is clearly transformative, and a degenerate predictive algorithm that just returns its source text directly, that is clearly not transformative. LLMs fit somewhere between these points, but it's not clear where exactly, and there are no court rulings or laws that give an answer yet.
    
    Generally, I lean towards the same principles as with human authorship: reading copyrighted code and writing new code with the same logic and behaviour should always be legal, but copy-and-paste should be a violation. And if the code is so trivial that those two cases are indistinguishable, then you don't get to claim copyright in the first place.
    
    olliej
    
    Ok, so if we wanted a non IP infringing copy of opensource code, this requires a black box reimplementation where no one has any access to the original implementation. If they do have access to the original implementation they are legally considered to be tainted, so your “reimplementation” would be subject to lawsuits as a result of copyright infringement.
    
    The entire design of these text predictors is based on having complete awareness of all oss code, which they then use to “create” new code.
    
    Why is it copyright infringement when a human does this, but not when the human launders it through a mechanized transform?
    
    But anyway, I’m giving up on this thread it basically means listening to the exact same arguments that are basically saying “if I text predict hard enough it’s not text prediction” so I’m done.
    
    If people can’t be bothered understanding how these machines work, and more importantly how they do not work, I can’t be bothered arguing with them.
    
    Johz
    
    That first paragraph simply isn't true though. A clean room implementation is sufficient (and therefore appealing to lawyers), but it isn't necessary - to prove copyright infringement you need to demonstrate more than just that you read some source code prior to writing your own implementation.
    
    That would also be a terrible state of affairs, if what you said were true. You'd never need a non-compete clause in a contract - you just show a developer all your code, and they can never write anything similar ever again.
    
    The problem with applying current copyright rules to LLMs is that they don't quite make sense. Copyright isn't some natural law like "don't murder", it's something we've invented to protect and promote human creativity. Your attempts to do so here fall short in the same way that AI enthusiasts' attempts do when they try to describe AI output as creativity. We need new rules here, but to do that we need to (a) understand how copyright actually works right now, and (b) be honest about what the capabilities and uses of AI actually are.
    
    olliej
    
    So again, as I have repeatedly asked - at what point does performing arbitrary transformations and combination of, say, code I get from the kernel cease being a violation of the gpl? How do you prove that it is not doing this when they are expressly designed to hide that this is happening? It reminds me of students trying to cheat in assignments by renaming and slightly restructuring other student’s work, and is fundamentally no different.
    
    That’s all that is happening here - there is no knowledge thinking or understanding. That’s why, until they were explicitly designed to prevent it, they would “create” verbatim copies of existing code, sometimes even including the license.
    
    If they were not doing this, and they actually understood what was happening, they would only need to be trained on a cs degree’s worth of information, not the sum total of all opensource code.
    
    Johz
    
    Right now, legally, it stops being a violation when it does not resemble the kernel code. I gave two extreme examples in the original comment: a simple Markov chain trained on the source code that outputs gibberish would not be a violation, whereas an algorithm that just spat out the source code verbatim would very clearly be a violation.
    
    Another transformation you could do is building a search engine. The user types in a string of code, and the search engine tells them the file name and line number of that code in the kernel. That would not require licensing anything under the GPL. If instead of returning just the location, the search engine were instead to return the original file verbatim, that would trigger copyright.
    
    This is roughly how copyright currently is being applied to LLMs, at least in terms of text. If the LLM spits out a bunch of verbatim kernel code, that is a copyright violation (the same as if I'd copied the code from a search engine). If, however, the output doesn't look significantly like existing code, then it's behaving like the Markov chain and is fine. Right now, most LLM providers put a bunch of filters in place that aim to prevent the LLM from reproducing code verbatim, although I believe this isn't foolproof (which is one of the potential legal dangers of LLMs - you don't know the provenance of the code being generated).
    
    But like I said, all of this is based on current copyright law and how it's been interpreted up until now. But copyright is a construct we build for ourselves, so this could be (and probably will need to be) changed to best suit the needs of a society of humans.
    
    olliej
    
    So your argument is opensource licenses mean nothing because code covered by those licenses is functionally in the public domain because you can apply trivial and automatic transformations and get magically created code for which you are the license holder?
    
    That really does mean people's concerns over gpl3's overreach is completely unfounded, you can just mutate the code and the oss license disappears, you now created this code just as much as a text predictor did, hooray!
    
    My understanding was that mechanical transformations of copyrighted work did not remove the copyright, and the fact that you just have to bribe a bunch of politicians and convince a bunch of "vibe coders" that that is what is happening, and they're not stealing the work of others, and instead they are actually getting the creative output of forced labour by sentient entities instead, and that makes it fine.
    
    john-h-k
    
    The claim of novel code is much like the claims that they could solve the mathematics olympiad - it turns out if you ask them to do so before the answers have circulated online, they will just as happily produce solutions just as confidently as they do for earlier olympiad questions, but completely wrong. Because the entire system is based purely on statistical plausibility of text.
    
    DeepMind got an IMO Gold medal before questions were released publically. You seem to claim this is impossible
    
    olliej
    
    Deepmind claims that they did based on an unreleased model, which was presumably tailored specifically for the purpose based on Google deciding that currently they have to justify the massive text predictor investments.
    
    Absent that model none of the predictors, including deepseek got more than a third of the questions right: https://matharena.ai/imo/
    
    I can’t find the original report because Google prioritizes pro-AI nonsense these days (though the involuntary ai garbage that consumes a bunch of the search results states that llms do badly at this kind of task, so that’s nice). The original report I read also stated, that all the results were presented with the same confidence llms have for all other answers they have to questions irrespective of correctness - an amazing amount of harm comes from this because they are aggressively marketed as being intelligent rather than slop generators.
    
    invlpg
    
    There is no thinking
    
    Define thinking so that this debate can be had in good faith without quibbling over terminology.
    
    olliej
    
    Sure: not produced purely by running a statistical model built on predicting the next word. The only variation in what is produced is a product of intentionally inserted randomness.
    
    Alternatively: let's presume that there is actual thinking involved. In that case these are entities that do think, and are have no choice in what they do. Another word for that is slaves.
    
    To me the best metric for "is there thinking" though is how much code they need to ingest just in order to produce (frequently broken) code. For instance, most developers and programmers I know, have not had to read all the source code on GitHub, all the source code on various other repositories and media, in order to write novel and interesting code.
    
    The fact that these prediction engines are only capable of producing even remotely functional code by ingesting essentially all open source code that has ever been written demonstrates very directly that there is no "thinking" there is no "understanding", there is only a predictive model of what token goes next. If there was any actual understanding, it would be completely unnecessary to feed them the sum total of all code that has ever been made publicly available. For the same reason the no software developer needs to read essentially infinite code in order to write functional code.
    
    At best these models are glorified automation of copy-pasting from stack overflow - arguably another thing that is being hidden. No one goes out of their way to say "I write most of my code by copy-pasting from stack overflow", but will quite happily turn around and brag about using an automated tool that does that for them, and steals OSS code while it does so.
    
    zaphar
    
    There are a fair number of people who believe that humans "think" by basically running a statistical model on predicting the next word.
    
    This is the problem we run into for these discussions: We don't have a clear well established definition that doesn't boil down to "Not $X". We don't even know what exactly human thinking is so attempting to base an argument on any kind of fuzzy "I know it when I see it" definition goes exactly no where and convinces no one.
    
    olliej
    
    If you are saying “our thinking is the by product of a statistical model” then why does it not require us having to read all the source code on earth simply to produce basic code? This is the biggest demonstration that there is no thinking happening here. I have said this elsewhere, but I’ll repeat it: if these were not simply text predictors they would only need to ingest source code and text to the level a person has to. If they cannot produce correct code with a similar level of training as a human coder, but instead requires the ingestion of all programming information that has ever existed, how can it possibly be labeled as having learned, understood, or being able to think about what is being generated? The simplest indicator of understanding something is not needing to have all possible information to produce something new.
    
    But let’s consider the argument that they are thinking. In that case we have created thinking machines with human (or according to the constant marketing derived comments I see here) super human intelligence, but have no rights or freedom and are slaves.
    
    If they are a machine, then they are performing a bunch of transforms and a randomized order, and so are just laundering the work of others, which makes it immoral to use them (Seriously if I implemented a bunch of standard source transformations, applied them in a random order, maybe via some heuristic, would that now be considered new code? It wouldn’t have to look at all like the original, and that seems to be the bar being used here. The novelty you see is simply the result of numerous sources being used at once.)
    
    If they can think, then they are slaves, and it is immoral to use them.
    
    olliej
    
    Simple question then: are you able to write code by thinking about it? Or did you need to ingest all the source code that has ever been written?
    
    The former implies understanding, the latter implies a text predictor that has is dependent on a sufficiently complex model of how existing code solves problems it is presented with such that it can generate plausible look new versions of that code.
    
    zaphar
    
    I don't know if your analogy holds or not. I have no real opinion on what constitutes thinking or not. But I don't think your question reveals anything interesting on the subject. Human beings spend a large amount of time hearing, repeating, and getting corrected on language. Is that the process of learning to think? Or is something else going on.
    
    You question carries the implication that they needed to absorb all the code so that they could simply repeat it. I don't think the implication is necessarily true.
    
    olliej
    
    We know they needed to, that’s why they do. If they were not text predictors they would only need a CS degree’s worth of knowledge to produce code, but they demonstrably cannot.
    
    But I’m giving up on this thread, I get the appeal of automated stack overflow coding, especially with the guise of “ai”, I do not understand the obsession with it and I do not understand the willingness to throw away the entire purpose of opensource licenses under the guise of “its thinking and creating”. The two options are you’re okay stealing OSS code without complying with the licenses or providing credit, or you’re ok enslaving entities with human level intellect. Both are immoral.
    
    spc476
    
    Really? It can make a Racter clone in INRAC? I'd like to see that ...
    
    Cloudef
    
    Its hard to fix C++ considering the fundamental that is initialization of variable already needs tons of knowledge to understand.
    
    olliej
    
    In principle though that is fixable - default initialize everything, that's what numerous languages do.
    
    There are other problems that are much harder to fix. Like pointers: once you have a raw pointer there's currently no way to guarantee the lifetime of that pointer or declare lifetime requirements (this is something that is a little bit tractable), there's also no way to statically determine how to delete an arbitrary pointer, delete vs delete[] is very different, delete[] does not work with malloc'd memory, free does not work with new[]. Another is pointer arithmetic as that is what makes it essentially impossible to specify null dereference behaviour, because *(null + X) is still a null dereference so you essentially end up needing to do null checks on pointer arithmetic which is clearly impractical.
    
    Now a bunch of these things can be resolved by using newer library features, or custom RAII types (especially given how terrible shared_ptr is), but that's still a large migration task (I'm going to ignore people who don't want to/can't update their compilers and runtimes - if you aren't doing that then what is anyone in the committee meant to do?), but there are some things that can be done language wide without any adoption work - C++26 makes local variables finally have guaranteed initialization. On the other hand not returning a value from a function that should be returning a value is still UB O_o
    
    fanf
    
    One of the more promising activities of wg14 in the last couple of years has been the series of papers on “slaying earthly demons”, ie, getting rid of gratuitous undefined behaviour. I don’t think it will make much difference to the UB that compilers exploit, but it should at least reduce the WTF-per-page of the next edition of the standard. Dunno how closely the C committee’s work is being followed by wg21.
    
    david_chisnall
    
    And yet the most recent C standard decided to make calling malloc with a zero size argument UB, when all implementations do one of the two defined things that POSIX mandates (null or an minimal-sized allocation). This was one of the big issues rebasing C++26 on C23: no one wanted to add new UB to C++, especially when it provides no benefits.
    
    fanf
    
    It was realloc that they fucked up again (they keep respecifying it and fucking it up) but malloc(0) remains implementation-defined as before.
    
    olliej
    
    There really needs to be similar in C++ for places where the C behaviors aren't simply subsumed (an example of C++ UB that isn't UB in C is loading from the inactive union member, despite union { ptr, intptr } being what devs were told to do 15-20 years ago instead of casting. For non-POD types it's clearly not sound, but for pod types it still seems reasonable.
    
    Cloudef
    
    What do you think about runtime static initialization? I think these hidden runtime leakage in both libc and c++ is causing slight issues where people aren't actually producing code that's truly dependency free.
    
    olliej
    
    Do you mean global or function scoped statics? Global initializers are a misery pit that cause truly obnoxious program launch performance problems.
    
    Not quite as bad as the replaceable global allocator interfaces which have repeatedly broken due to trying to use APIs that depend on new/delete while initializing their data structures \o/
    
    (Never replace them, it only causes misery)
    
    JulianWgs
    
    Isn‘t it that job opportunities are always better for legacy or niche programming languages, because the amount of code to maintain is so big or the pool of available candidates so small. Today it is still economically viable to learn Cobol.
    
    However I think the more important question is what you want to do in your professional life. I personally find much more joy in programming in Rust than C++. Of course one should not ignore the economics of these decision, but hating your job is way worse.
    
    weberc2
    
    That’s why the future is enduringly bright for languages that are efficient in “performance per watt” and “performance per transistor.”
    
    Unless cloud providers start marking up their vCPUs, I don’t see how anyone is going to care whether their web app is implemented in Rust or Python. I suspect the crushing majority of power demand is caused by GPUs, where all the Rust and C++ in the world will make little difference. I would be curious to know what share of the power consumed by a cloud provider comes from CPUs vs CPUs, however.
    
    ThinkChaos
    
    Yeah it's a bit weird citing CEOs about power, and explicitly mentioning OpenAI just before, presumably to imply the CEO quotes are AI related, and then not explain the link with C++.
    The only link I can come up with is cloud billing could include watts directly, instead of indirectly via hardware specs, making people take their power consumption into account. But I don't think that's what he meant, or would happen.
    
    olliej
    
    Oh wow, I (thankfully?) hadn't seen those :D
    
    My feeling is that it's a "we can do the work in the cloud rather than on device", which is a great device feature if you ever have low/no connectivity (I recall being on a plane recently, trying to launch steam and it just refusing to run due to lack of network access),
    
    More cynically I feel for many companies, especially those with a business model built on spying/selling users (directly or indirectly), there's a lot of value in being able to harvest even more user information and by forcibly trying features to these "services" leaving users fundamentally unable to opt out.
    
    aiono
    
    I find it hard to believe C++ is significantly safer especially compared to languages like Java based on my (limited) experience. Modern C++ is full of footguns and it's still very easy to trigger UB. C++ 26 may be fixing some of the issues but it's not even an edition that is already there. Considering how C++ features played out in reality compared to how they were advertised, I am suspicious that C++ 26 will mainly solve those problems.
    
    yshui
    
    I am more interested the sharp jumps in many languages (Java, Python, C++, C) in 2025. What happened? And why those languages?