coreutils: a comprehensive review (2023)

30 points by apropos

z3bra

I’d love to know if there’s a compelling use case for install.

I use it in all my makefile's "install" target. install(1) is basically mkdir, cp, chown, chgrp and chmod wrapped in a single command. It's very verbose to type at a shell prompt, but is very handy in script. The following snippets are equivalent:

mkdir -p /usr/local/sbin
cp pgm.sh /usr/local/sbin/pgm
chmod 755 /usr/local/sbin/pgm
chgrp daemon /usr/local/sbin/pgm

install -D -m0755 -g daemon pgm.sh /usr/local/sbin/pgm

intelfx

Also install /dev/stdin … <<EOF is a thing, which is similarly hugely useful to write a blob of text into a file while simultaneously doing all of the above.
icefox

Actually it's interesting to think about that way because what it's really trying to be is a way of specifying a file's state all at once, which Unix really doesn't have a clean way of doing...
fanf

Some versions of install(1) copy to a temporary file and atomically rename it into place. Recent-ish FreeBSD and OpenBSD do so by default; NetBSD needs the -r option.

pronoiac

Some random notes:

I tend to reach for pv when working with disk images, so I get a progress meter and an ETA.
ddrescue offers some niceties over dd, such as being able to repeat attempts at reading, which is useful if a drive is flaking out, and a less nervous-making command line interface: ddrescue infile outfile logfile
df also accepts a directory name. In the example, it would look like:

dave@europa~$ df .
Filesystem     1K-blocks      Used Available Use% Mounted on
/dev/sda3      894228804 113366324 735368584  14% /

du - the -d, for "depth", option, is helpful
nproc: I've tripped over the (smaller) "number of physical cores" vs (larger) "number of processors (or threads)" before. This reports the latter.
seq - the -w / --fixed-width option can be useful, adding leading zeroes to line items up. e.g.

$ seq 9 10
09
10

sort - the -h option is great for sorting the human-readable output of du or df, like du -d 1 -h | sort -rh
stty - stty sane is nice if you've emitted binary to your terminal and it's acting concussed, not showing things, or using another language entirely
touch - I've used the -r option to set the times to that of a reference file

I... I'm kinda surprised I had so many feelings about these?

BD103

The section on dd is really interesting! TIL you can use cp to burn disk images, rather than dd. The article provided this interesting example:

I’m still writing this article and since then, I’ve written dozens of OS install images to USB using cp. It works great. I won’t claim any speed increases since I haven’t benchmarked the two methods, but the syntax is so much easier:
# cp downloads/os.3.9.iso /dev/sdb

z3bra

There's still one usage of dd I cannot replace though, it's to extract a specific amount of bytes from a file. For some reasons cut -b does not always work for me (because line buffering and similar stuff). For example I now generate passwords like so </dev/urandom tr -cd '!-z'|dd bs=20 count=1, because cut doesn't, .. well, cut it.
- ossguy
  
  You can use tail -c and head -c instead of dd for that.
  - z3bra
    
    Correct me if I'm wrong, but I doubt running tail -c against an infinite stream like /dev/urandom would be a good idea. head -c might work, I'll try. I'm not sure why it would work where cut -b wouldn't though?
    
    ossguy
    
    Sorry, I was replying directly to your initial sentence (about "a specific amount of bytes from a file"). I'm not sure what issues you were having with cut -b, but I've found head -c to be a lot more performant than dd since it picks the right block size automatically.
    
    Anyway, for a few bytes it doesn't matter, was just thinking about the general case.
  - dzwdz
    
    I think one major benefit of dd is oflag=direct (assuming you also use status=progress - which you'd presumably use pv for if you weren't using dd). Without it, the progress indicator is wrong, as it doesn't know how much of the image was actually written to disk as opposed to just being written to the cache. The copy might "finish" quickly, but then you'll have to wait who-knows-who-long for the cache to actually get flushed.
    
    Glaeqen
    
    I usually forget about the flag and run a watch command over the /proc/meminfo file with a Dirty entry grepped and look how much bytes are left 🙈
  - BD103
    
    In the section on the yes command, it links to this post: How is GNU yes so fast? What an interesting read!
  - nortti
    
    People love taking a big old stinky dump on MD5 all the time, but it still does exactly what it was meant to do: fast, low-collision hashes.
    
    I've recently-ish moved to using b3sum for this use case. Not only is it not vulnerable to collisions, it's so much faster than md5sum.
  - jamesnvc
    
    One place where tr has an advantage of sed is replacing ranges, e.g.
    
    Upper-casing
    
    '->$ echo 'hello world' | tr 'a-z' 'A-Z' HELLO WORLD
    
    Rot-13
    
    '->$ echo 'hello world' | tr 'a-z' 'n-za-m' uryyb jbeyq '->$ echo 'uryyb jbeyq' | tr 'a-z' 'n-za-m' hello world
  - jeeger
    
    I didn't know about csplit, which seems generally useful. It's also funny how some of these commands are just "leftovers" that no one really seems to use anymore, but need to be retained for compatibility or standards reasons. I suspect this is the case with the chcon and runcon SELinux commands.
  - apropos
    
    Some notes from myself: it's interesting that just about half of coreutils is junk! I'm not sure whether this is surprising, or even whether I expected more or less of it to be so, but it's interesting.
    
    I use the Fish shell, which overwrites a couple of pieces of coreutils: echo printf pwd realpath true false test. It also adds path and string and math commands that envelop some specific functionality here. The remaining tools which I use the most often are the following: cat stat ls cp mv touch mkdir rm rmdir ln. I use head tail tee tac sort shuf wc less often for how broadly useful they seem to be for plaintext. I use install mktemp chgrp chmod chown and the base n encoders/decoders and the checksum utilities as needed, on occasion. Same goes for env chroot and sleep timeout and sync and all the informational commands (groups users id who whoami date uname tty nproc).
    
    This post taught me a few good defaults. I now have mv and cp set to expand to mv -i and cp -i, ln to expand to ln -s, df to expand to df -h, and du to expand to du -sh.
    
    Recently, I also learned that df and du are ZFS-aware. I don't know why I didn't expect them to be... with transparent compression enabled, du lists the compressed size of files by default, but also has an --apparent-size flag to view the stated (uncompressed) size. Handy!
    
    All the rest not mentioned above I don't think I'll ever use. (cya dd, cp is my new best friend...)
    
    There were a lot of commands that I thought would be in coreutils that weren't, instead coming from util-linux or procps-ng or inetutils or what have you. I'm kind of curious, actually. Aside from the already mentioned packages, what individual utilities (or packages of such) do you think of as providing a complementary function to or filling in the gaps of coreutils?
    
    A couple of commands I have in mind here are progress (complements cp and mv and similar), watchexec (runs a command every time a file is modified), and rsync (incremental file transfer). Maybe also curl (does urls). Also rg and fd, of course...
  - jmc
    
    I don't get the point of some of these tools: arch is redundant with uname -m, printenv is redundant with... just env! Etc...
    
    As for logname, I do agree that getlogin() is terrible, but in my local reimplementation of the tool I decided to write my own getlogin(). I don't think this is necessarily in conflict with the spirit of the specification, as, I quote:
    
    The login name shall be the string that would be returned by the getlogin() function defined in the System Interfaces volume of POSIX.1-2024. Under the conditions where the getlogin() function would fail, the logname utility shall write a diagnostic message to standard error and exit with a non-zero exit status.
    
    I read this as "the behaviour of the tool should be conformant with the behaviour of getlogin()", so I simply tried to write my own version in such a way that it behaves identically to the specification and the pre-installed logname I have on my system.
  - mccd
    
    Learning that you can just run cp linux.iso /dev/sdb instead of using dd makes me feel... wow, just why have I always used dd and every time needed to google its quirky syntax?
    
    Does that mean also that I can just drag and drop an iso onto the /dev/drive in Nautilus/Thunar?
  - dzwdz
    
    I'm surprised base64 doesn't include the url-safe alphabet. pinky, the "lightweight finger", doesn't even implement the finger protocol :(
    
    I'd assume if you're ever using unexpand then you're doing something wrong. I immediately think of the troves of GNU code that's unreadable if you display it in an editor set to a different tabwidth than 8, because they use Emacs' braindead default of replacing spaces with tabs whenever possible - and, better yet, only using 2 spaces to indent while the tabwidth is 8. Tabs and spaces have different semantic meaning, replacing the latter with the former is almost never correct.