LLVM: The bad parts

69 points by asb


olliej

This pretty much matches my model of the problems with the project. The PR vs reviewers mismatch has always been fairly tough, but now there's an increasingly large number of slop generated PRs being filed which have both gargantuan PR descriptions for trivial changes (because again slop generation), are grossly over commented (because slop), and then if you do ask for any changes the "author" cannot actually make them because they did not write, and do not understand, the PR they are presenting as their own.

Now if we see those PRs - generally the oversized PR description at least is such a clear indicator of slop that this isn't a problem - should we even bother reviewing them? Maybe they do fix something, but if we see any issues during the review then we know they're not going to be fixed and the author will simply go off and make more slop PRs. The first one I came across made me actually look up the account, and while being years old (e.g. not a new spam account), they had a fairly average GH user level of commits over time, and then in the last month had like 300 PRs filed against 10 or 20 projects, none of which they had ever contributed to before.

So now we have PRs that may well be attempting to fix a real bug, but we know:

In a similar vein, as with other OSS projects there's a constant trickle of made up security vulnerabilities from these garbage tools.

It's still amazing to me that people think these are good/trustworthy slop generators are given just how well it is documented that they are neither.

At the much more trivial end of the spectrum the style rules have issues.

First off PascalCase for local variables is horrifying (while this is true, it's not actually an actual problem :D). More seriously, there have been a number of naming rule changes over the years, but because existing code was not simply automatically updated (that kind of thing always makes blame and similar more annoying) there's a near random mix of new and old style code, and people tend to write code the matches the surrounding naming conventions rather than the new one so there's no forward progress on unifying it (and even reviewers have difficulty tracking what the capitalization rules are meant to be). This lack of consistency makes it confusing for new contributors as even the style bots don't enforce capitalization rules, so they write new code and the documented style does not match the code they're working in, and if they try to work out what the style should be by looking at prior recent commits even those are not consistent (even if the PRs had review comments about other style issues).

Similarly there's a hard 80 column cap, but the type names are long, and reasonable variable names are not trivially short, and so even trivial expressions end up having to be split across multiple lines, making the code harder to read. To try and deal with this problem, variables are frequently not given remotely useful names, for example the most common naming scheme for local variables is the acronym of the name of the variables type, e.g

TypeSourceInfo *TSI = ...;
RecordDecl *RD = ...;
...
auto *CXXRD = dyn_cast<CXXRecordDecl>(RD);

Sometimes that's fine due to small methods, but it's also used in large/complex functions simply because while the name is useless, the code is more readable due to the reduction if line breaks. The argument people often then use is to break "large" expressions into lots of different locals, but the combination of method names that are good, and the clear reason to use separate locals so those can also be named well means that even trivial expressions end up requiring line breaks, e.g. things like https://github.com/llvm/llvm-project/blob/c726fff61cb3bd5a3a98c52bb63b99fcb3b25524/clang/lib/Sema/SemaExprCXX.cpp#L1736

It is of course important to note that some of these issues are the result of C++ syntax, and many other cases of gratuitous line breaks in the existing code are simply due to the failure to use locals to hold complex subexpressions, and of course sometimes the reason that a local is not used is to scope lifetimes etc - e.g lifetime extension for temporaries only works for, well, temporaries - so storing what was previously a depending on lifetime extension in a local variable might become a UaF.