The Wizard and His Shell
55 points by abnercoimbre
55 points by abnercoimbre
they won’t put up with RTFM nor should they
I weep for the future in which people don’t read about and understand what they’re doing.
I don’t think that’s what the author is suggesting though, and I am always disappointed when people dismiss criticisms of our current tooling in this way. Yes, being able to look up documentation is an important skill, and there’s a lot of technical tools that you simply need to learn. But in my experience, RTFM-culture is less about encouraging people to learn and educate themselves, and more about gatekeeping tools behind arcane interfaces because that’s easier than figuring out how to make those tools more usable and useful.
Discoverability is one of the single biggest flaws of shell environments historically. It’s often not even a case of “just look it up”, because very often the problem is not even knowing what to look up. I spent years not knowing that I could search my shell history in bash, because as a new developer, I just didn’t realise that it was an option available to me. But within a few days of getting started with fish, I’d learned that if I typed some of my command into the terminal and pressed “up”, I had access to an amazing new feature.
I’m not saying this to big up Fish shell specifically — as Matklad has pointed out elsewhere, it has its own usability flaws (although it is still very good most of the time). But it’s an example of how quite subtle UX decisions make a big impact (in this case having both “previous command” and “search history” be triggered by the same key in different contexts). Yes, shells are complex beasts and therefore some degree of learning is always going to be required. But in my experience, a lot of what we currently accept as the inherent complexity of shells is just poor interface design, and improving that design reduces the amount of unnecessary learning and memorisation that everyone needs to do.
To expand a little on my admittedly brief lamentation: I think mostly what I object to is the bombastic rhetoric that beats the tired drum of “old things are bad!”, and the only way to fix things is ideally to replace them with something that intentionally has none of the properties that people enjoy about the thing they’re already using.
The Fish shell is not my cup of tea, but I’m thrilled that it exists and it works for people and is actively maintained. I feel like the Fish folks are very positive people, writing software that does something new and bold and friendly – and actually slots really well into fifty years of incremental improvements to the terminal model.
In contrast, with articles like this one, I stand behind the comments I made the last time I read one of these; e.g., in particular:
I guess the thing that animates people outside the target audience to respond at all is probably the exuberant nature of the invalidating language; the workflow we’re happy with is “archaic” and your alternative is “superior” and so on.
I essentially always agree that things could be more discoverable. I also think that even in 2025, sometimes books (whether physical or online) are still a good way for people to discover things. Or just articles, or videos, or even posts on short form social media sites, if you must! I think Julia Evans is doing some great work with accessible material on a variety of subjects (perhaps, ironically, even using the wizard moniker). I gather Julia is doing one on the terminal ecosystem specifically at the moment, or may already have put it out.
a lot of what we currently accept as the inherent complexity of shells is just poor interface design, and improving that design reduces the amount of unnecessary learning and memorisation that everyone needs to do
This is where I get off the train. Improvements in some areas are obviously welcome, but in many cases things work the way they do because it’s actually pretty good for users. I’m glad that I can just push up to browse, or ctrl-R to reverse search, my shell history; I would not be happy if some of my screen real estate had to be given over to persistently letting users know that this is possible, a la the hot key panel at the base of the nano text editor. It would be more discoverable, but if you read a one page summary of interacting with the shell I would absolutely expect this to be present in it, and once you’ve learnt it it’s fine.
As a peripheral example, there are a lot of hospital and point of sales systems that came up in the mainframe era, and thus have a lot of interfaces that are driven by keyboard shortcuts. Some of them on the menu, some of them you just have to be trained in. Sometimes those systems get replaced with a clicky mouse interface, or some web thing, and the people that actually have to use them 8+ hours a day can get pretty frustrated that the new interface is slower and more error prone than the carefully curated, but “old” text and keys interface. It’s true that some amount of upfront training or reading is required, but I don’t think that’s actually bad! At times, when you’re talking about tools that you use all the time for serious work rather than casual users who pop in one time and then leave, it’s a good trade-off.
Regarding nano, it’s a clone of pico, which was originally the pine composer, and was extracted from the email client to become a standalone editor. Pine is not as configurable as mutt, but it’s much easier to get started with, and pretty good for churning through high volumes of email. And it’s very discoverable: even if you turn off the key shortcut menu, it’s easy to pop it up (which I sometimes have to do in some modal contexts even after 30 years of using it!). I think more programs should aspire to be as accessible to newbies as pine, but still useful to power users such as Linus Torvalds.
I have a friend who runs a small pub company. He wrote his own point-of-sale software that runs in the Linux text mode console. He designed it so that most purchases require only one keypress. I was in a different pub this evening with a touch screen PoS interface that needed much more interaction to buy a couple of pints.
Sometimes 1990s software is still pretty good.
Regarding nano, it’s a clone of pico … it’s very discoverable
Right, I’m familiar with the history! I used pine/alpine for mail for a while, before moving to mutt, and then eventually abandoning hope on essentially any mail clients other than the regrettable gmail web application. Part of the endearing magic of the UNIX ecosystem is, I feel, that in many cases we don’t have to agree on what makes a good user experience. We don’t generally even need one program to cover the full gamut between new users and power users. Software being interchangeable across various interfaces (e.g., you could use nano with mutt, if you wanted) is a good thing. The often sordid history of the UNIX terminal interface is a pretty good example of incremental improvement and interchangeable components, over now many decades.
I have a friend who runs a small pub company. He wrote his own point-of-sale software that runs in the Linux text mode console. He designed it so that most purchases require only one keypress.
That is indeed delightful! At work I’m responsible for user interfaces for manufacturing line control, and though it’s the illumos text console rather than Linux, I have gone for a similar user experience and it’s been pretty well received by the folks using it.
With more context, I agree more with what you’re saying. That said, I still think there’s a middle ground here between “software for experts” and “software for beginners”.
Take the fish example. Like I said, it provides some great affordances for people who are just getting started with the shell to help them learn what they can actually do. But it’s also deeply configurable. You want Ctrl-R? You want the up key to only navigate through the history? You can configure that - the keybind mechanism is quite extensive. As you grow into the tool, it encourages you to read the manual more fully and make everything your own. A lot of modern TUIs provide hot key panels like nano, because it makes it easier to get started. But they also let you disable that panel when you’re familiar with it, or if you’d prefer to have the real estate and memorise the commands from the start.
The other aspect is that often the mechanisms that we’re used to in certain tools simply are unnecessary. My favourite example of this at the moment is Jujutsu. In some ways it’s just a wrapper around Git - a simpler interface to the same tool. But it’s not simpler in the sense of “let’s hide away the complexity”, like many Git wrappers are. It’s simpler because it allows you to do everything you could before, but having to learn fewer commands and memorise fewer concepts. And for years, people who have come to Git and complained that it’s unnecessarily arcane have been dismissed with “RTFM”-style comments, because a lot of developers seem to think that, just because I needed to learn all these details, these are necessary for the task at hand. But that doesn’t have to be the case, and experiments like this shell are what help us to see different alternatives.
I’m also a bit sceptical of your example of PoS systems. I knew someone who stopped working because the PoS system became too complex, and I’ve rarely heard kind words spoken about these systems by their users. But even when the complex interface really is the best tool for the job, that is usually for very focused and specific tasks where lots of prior training makes sense. But, theoretically at least, we’re talking here about home computing, about giving people more chances to understand how their own tools work. If the shell really is the wizard’s cauldron for anyone who wants to be comfortable with the Unix way, then surely opening it up to as many people as possible is a good thing.
It’s true that some amount of upfront training or reading is required, but I don’t think that’s actually bad! At times, when you’re talking about tools that you use all the time for serious work rather than casual users who pop in one time and then leave, it’s a good trade-off.
This is the pitfall all these articles fall for. The specific topic of shells being old and that we should have something better and more modern, is one that gets covered in posta like this one a couple of times per year since the mid 90s. It’s always some clickable icons embedded in terminal windows, or how “a shell that supports objects rather than text”, as if plain text had not been a chosen by design.
I strongly believe 99% of these posts are just a result of either the author not really mastering terminal usage (wizardery?), or settling for this mindset that people don’t want to read documentation anymore. It won’t work, you need to learn, we can’t dumb down everything. Efficiency requires learning.
I agree with your overall premise… but not your example? I would like to pipe some objects please and thank you. Worrying about the encoding that gets piped through tools that inconsistently support -0
(and when inevitably one doesn’t and I switch to e.g. tabs, now what if my content has hard tabs, etc. etc.), is silly. Yes I am good enough to work around this, but it’s annoying and it’s mental overhead.
But while that one I don’t actually feel super strongly about… I DO feel strongly about having some sort of richer metadata for streams. Under Unix, you can either have correctly interleaved/ordered stdout/stderr (if these file descriptors refer to the same underlying stream) or you can have separately ordered, distinguishable stdout and stderr. You cannot have both, i.e. you cannot distinguish which lines of output are error text vs. standard output while still printing the lines in order.
This is dumb. I understand there’s very solid technical and historical reasons for it! I don’t really care. Computers are fast now, and that impossibility is dumb.
<…> or how “a shell that supports objects rather than text”, as if plain text had not been a chosen by design.
That’s a fallacy, though. It might very well have been chosen by design, but that doesn’t mean it’s not a crappy design.
Unix’s byte stream files and pipes were a revolutionary improvement over the status quo ante.
I’m sure they were. I’m sure it was the best thing possible under the circumstances. And so on.
It still does not mean it’s not a crappy design. I take issue with saying that “it is by design” and expecting that statement to automatically prove that it is a good thing (much less that it will remain a good thing forever).
The future is vibes, or so I am told.
Programming by teledildonics?
The Encyclopedia Galactica defines a robot as “a mechanical apparatus designed to do the work of a man”. The marketing division of the Sirius Cybernetics Corporation defines a robot as “Your Plastic Pal Who’s Fun to Be With”. The Hitchhiker’s Guide to the Galaxy defines the marketing division of the Sirius Cybernetics Corporation as “a bunch of mindless jerks who’ll be the first against the wall when the revolution comes”.
If it required crawling through broken glass, should people be expected to RTFM? Clearly there’s an equilibrium here: the more accessible documentation is, the more it will be read. If you’re the person controlling access to documentation, this is how you should model it. It’s not productive to model it as some kind of moral imperative.
I feel like what you’re saying is about as rhetorically bombastic as the article. I agree that documentation is communication, and communication is always a two way street. I don’t think making things hard is in any way morally positive, but I would also be sad if what you’re saying that, on the other end, taking things seriously and being curious and wanting to read and to learn is not at least a little on the side of good.
If we make documentation more discoverable and contextual, more people will read it.
Nobody is disagreeing, but this is not the point I was making and not what the article is saying! The article says:
The shell is powerful, yes, and it’s also ancient. I’m afraid to say it’s no longer the gateway to progress; this kind of tooling is a gatekeeper now, blocking the way for younger generations to carry the torch (they won’t put up with RTFM nor should they.)
viz., that shells are bad because you have to read about them to understand them, rather than just sit down and start pushing buttons, and the kids don’t like it.
If only because I have an eight year old son, I have to hope (somewhat desperately) that the kids will in fact still be able to read, and will remain curious about how even ostensibly complicated things work even and perhaps especially when it isn’t immediately obvious when you see them for the first time.
Your reaction makes sense from a parental perspective. You’re thinking about how best to teach your son. From his perspective, he should expect to have to RTFM. I agree with that. However from the application developer’s perspective, our roles should be to make reading documentation as enticing as possible. I think these ways of thinking about the problem are not in conflict.
And also reducing the amount of docs needed in the first place, by semantically compressing software.
I think you’re mixing up inherent complexity and incidental complexity in shells. The fact that to know you can search your history in your shell in e.g. bash or zsh requires reading a manual or someone telling you is not a virtue. Almost every other application and operating system makes its features much more discoverable without having to dive into a manual or rely on oral tradition to figure them out.
As someone who learned how to use a shell when your son was probably in preschool, I can tell you this sucks.
Hi, former kid here. They way I got hooked on Linux was following:
Them: Look kid, this PC was returned to my by a friend who used it to run Linux, wanna take a look?
Me: What’s Linux?
Them: Not Windows nor DOS. Come, take a look. I’ll just quickly install XWindow so that we have a graphical interface…
[After 2 hours trying to configure X from terminal.]
Them: I am so sorry, we are out of time. I regret I won’t be able to show that to you after all.
[Kid’s eyes wide open, mouth agap. Colors in terminal? Text editing, software management, the sheer novelty?]
Me: Can you leave the installation CD set, please?
These things are pretty random. I think reading is the gateway drug for sure.
I think you might be interpreting “RTFM” differently to me (and potentially the author). I think you’re interpreting it literally, so that the sentence reads, roughly, “younger generations should not have to read manuals”. I agree with you that every generation should have to read manuals, manuals are good. If this is what the author was trying to say, then I understand your concern.
But I think the author isn’t talking literally about reading manuals, they’re talking about the RTFM culture where all affordances and hints are looked down on, and asking questions is considered poor form. This is what I more usually associate with the acronym RTFM - not a genuine desire for people to educate themselves, but instead a form of gatekeeping. The “I had to learn this the hard way, so you need to learn it the hard way too, otherwise it’s not fair” mentality.
I am sure some people do use phrases like RTFM out of kindness or because they think it’s genuinely the best thing the other person can do, but most of the time I see it, it seems to be used more in the latter context. And I interpreted the author’s quote there as criticising that behaviour, rather than manuals in general. (After all, it would be odd for them to criticise manuals in general given much of the goal of their shell is to give people more access to manuals and documentation in a contextual way.)
Kids at my electronics & programming hobby group (14-18) don’t really have strong opinions on what they dislike, yet. They are willing to try whatever you throw at them. Sometimes they are dead tired from school, especially after battery of tests, but they recover quickly and are pretty fresh on weekends.
Terminal and shell and not stumbling blocks. They just remember a handful of commands as they go and run with them. When they hit a wall, they ask around or search or ask a LLM and continue.
Empty editor window is worst. Starting with a stubbed demo is accelerating them a lot.
But no one is making anything hard. If you need a manage of most classic Unix tools, they are straight forward. There is absolutely nothing hard, let alone intentionally hard about it. On the contrary, it is made to be as easy, quick, and direct as possible. However, somehow, even reading that became too much to ask and people expect somehow this to be a fixable problem under the premise of zero effort by the user.
That is not the right discussion to have. You can’t possibly learn things if you don’t put effort in it.
Most of these “innovations” look to me like established IDE features. I don’t really want an integrated terminal-shell with lots of mouse-driven actions, animations, or contextual pop-ups, but I’m not surprised that somebody would want these things. If you like that stuff, more power to you. I like small composable pieces and minimal visual clutter.
The shell is powerful, yes, and it’s also ancient. I’m afraid to say it’s no longer the gateway to progress; this kind of tooling is a gatekeeper now, blocking the way for younger generations to carry the torch (they won’t put up with RTFM nor should they.)
Is it gatekeeping if people are just too lazy to read a single manual?
Quick question for all non-lazy manual readers!
The standard shortcut to kill the word to the left is opt+delete (backspace). The standard shortcut to kill word to the right is opt+fn delete (delete).
This works in every gui app.
Now, when I use that in terminal.app + fish, opt+delete deletes a single character, opt+fn delete beeps.
In ghostty, opt+delete delete a word (yay), opt+fn delete does nothing.
Just for fun, in iTerm opt+fn delete beeps and prints 3~
How do I fix it, and, most importantly, why do I need to fix it at all?
Ah, but this is just the start. How about selection? How about cut/copy/paste? The standard GUI “textarea” behaviors and the associated IBM CUA (or Apple HIG) keybindings have a history only a little shorter than emacs, vi, or even ed – and certainly more elaborate, given how many disparate implementations are included in this “common” standard.
Smalltalk-80 had a terminal-like “Workspace” widget that used all the same text conventions as the rest of the system: even those nifty newfangled variable-spaced fonts. Rob Pike yoinked that idea, put a unix spin on it, and called it acme. If you want a modern one, there’s anvil. But the evolutionary paths have diverged.
Yeah, “acme with ‘syntax highlighting’” is I think where we should have ended ideally, but the evolution always works like https://en.m.wikipedia.org/wiki/Giraffe#Internal_systems. Thanks for pointer to anvil, that looks interesting. Sadly, it seems that it just hard-codes syntax-highlighting, instead of adding Emacs-like attributed string as a core domain primitive? That’s sort of the money question of this whole shell thing — how to get colors (and hyperlinks! and magic-style TUIs! and dired!) in a way which is more reasonable than ansii escapes.
To that end, I’d direct attention to the excellent work that @crazyloglad has been doing.
And yeah, that dang giraffe. Sometimes we need a clean break, but the costs are always high.
Macintosh Programmers Workshop did this too. There was no “terminal”, just command execution in-place in editor buffers (any buffers). It was fantastic and I miss it.
I went to a ton of effort to get all those shortcuts to work properly for me in kitty, fish, and vim (mk12/vim-meta being the only reusable one). It would indeed be so much nicer if it just worked like in GUI apps.
To be fair, kitty, fish, and ghostty do push the envelope of usability. Kitty in particular innovated a lot of modern extensions, and just today I’ve learned that fish is going to remove terminfo.
But I am skeptical that you can achieve something reasonable using purely evolutionary methods, I don’t see how you can remove kernel involvement in pty pair, and that I think is one of the bigger sources of accidental complexity here.
For fish you can see the default keybindings here: https://fishshell.com/docs/current/interactive.html#shared-bindings (the rest of the documentation has more info as well).
You can set custom keybindings here: https://fishshell.com/docs/current/interactive.html#custom-bindings
why do I need to fix it at all?
That’s a question akin to “why does Cmd+W on macOS quits some apps but on others it just closes the window?”. Different apps assign different meanings to the same keybindings based on their own tradeoffs.
Fish and other shells prioritize working roughly the same regardless of OS or if the system even has a graphical interface so following macOS UI conventions isn’t really a priority for them.
iTerm/other terminals also don’t send some key combinations to the shell by default because it conflicts with their own key bindings. For instance, if I wanted NeoVim to understand Cmd+T I can’t do that because iTerm intercepts it.
Other GUI apps have fewer such problems because they don’t have two or three levels of indirection (NeoVim is a process inside a process inside another process).
This may seem like nitpicking, but it simplifies the mental model (and is actually correct): On MacOS, cmd-W always closes the window. Some apps happen to quit when you close their only window, but that’s not because they bind cmd-W to Quit.
With bind alt-delete kill-word
, Ghostty indeed kills the word, but terminal.app beeps. And alt-backspace deletes only a single character in terminal.app. Why is the behavior different?
Because Mitchell Hashimoto cares a lot more about macOS UI conventions than Apple does. One of the reasons for him creating Ghostty was better native OS integration.
I don’t think that’s a fair characterization of this issue. Terminal.app cares very strongly about macOS UI conventions, and a lot of what Ghostty does for macOS is trying to match or exceed Terminal.app’s behavior.
In this particular case, there’s some things to check. Terminal.app has a default set of keyboard mappings, which can be extended on a per-profile basis. If you go to the Preferences and look at your profile’s Keyboard tab, you can check if there’s a binding for alt-backspace anywhere. You can also check for fn-delete and alt-fn-delete.
You can use fish_key_reader
(a utility that comes with fish) to check what any individual keypress is coming across as. Invoke it with the -c
flag to listen for multiple keypresses. For me, in both Terminal.app and Ghostty, pressing backspace shows this as bind \x7F
and alt-backspace as bind \e\x7F
(the \e
prefix here is for alt). Note that I’m still on Fish 3.7, I don’t know if the Fish 4.0 keybind changes affect anything here.
If you do this you’ll see something interesting, which is that for Ghostty, Fish will also print bind -k backspace
, but it doesn’t do this for Terminal.app, despite both sending the \x7F
keystroke. If you read the man bind
manpage you’ll see this is because the value for “backspace” comes from terminfo. It turns out that the xterm-256color
(and related) terminfo entries on macOS define the kbs
value as ^H
. So in Terminal.app if I invoke fish_key_reader
and press Ctrl-H then it gives me the bind -k backspace
output (along with bind \b
).
And the final thing to check is what Fish’s own keybinds are. If you run bind | rg backward-kill-word
you can see exactly which bindings have been defined for this action. Or you can go the other way and take whatever fish_key_reader
spits out and pass that to bind
to see what that keystroke is bound to. When I look at my own output, I see that Fish has both \e\x7F
and \e\b
bound to backward-kill-word
.
So to wrap this up, to diagnose the alt-backspace issue in Terminal.app, check if you’ve overridden it in the preset, check what fish_key_reader
says for it, check Fish’s bindings. And for the alt-fn-delete issue, Fish just doesn’t even have a builtin keybind for that one, and whatever Terminal.app emits for that is probably different than what Ghostty does. And the key alt-delete
as a bind here probably won’t work, you need to bind the actual sequence that the terminal emits (or adjust the sequence the terminal emits).
But why would it matter? Aren’t I configuring keybindings in my shell, rather than in my terminal emulator?
There are layers at work. The OS has a particular model of the keyboard and what keys might be called to the user (e.g., via the physical silkscreen label on the keyboard) or to the application (e.g., as an enum of some kind). The terminal emulator is an application that then translates those into a mixture of raw text (probably UTF-8) and control sequences. Your shell understands some set of control sequences as being alt-delete, and your terminal emulator must be willing to translate the OS-side key events (which will usually be different on Windows, Mac OS X, and X11 systems, generally, and may even be different with keyboards other than a US 101-key layout) into a data stream that is sent on to the shell. It sounds like Ghostty has made some good choices in terms of those translations (from OS to terminal sequence) for Mac OS X users, and (sadly!) the Apple Terminal.app has perhaps not.
This problem also exists, for what it’s worth, with really any thin client system; e.g., VNC tries to take local OS keyboard events and translate them into something that makes sense on whatever OS and in whatever locale the remote graphical system is apparently using. Results are mixed there, as well. It’s a tough problem!
I would say it’s a tough solution, rather than a tough problem.
Communicating to an application which key is pressed is not that hard: you need to deal with layouts, modifiers, input methods, and such, but it still maps to not that complicated API.
Preserving all that through the pty pair detour is where the toughness lies.
Communicating to an application which key is pressed is not that hard: you need to deal with layouts, modifiers, input methods, and such, but it still maps to not that complicated API.
I think keyboards and text entry on modern systems, now that we have software complex enough to potentially encompass all languages is actually an extremely hard problem, and that it only seems “not that hard” when you’re only considering your immediately relevant portion of the problem space.
The pty pair itself is not actually that relevant here, the signalling issues would be the same even if you were using a real serial device, or no pty-like layer at all. The work pioneered in the kitty keyboard protocol is great incremental work that I’m glad more and more terminal emulators are adopting. It is a relatively comprehensive transport for keyboard events.
If you look in the specification, though, you can see where some of the complexity arises, even once you have a rich transport: your physical keyboard locally may have weird buttons on it, with new scan codes. Many existing scan codes are defined by USB/HID, but from time to time people add new ones and it’s a whole thing. Different client operating systems also have different ways to express keyboard events, different keyboard layouts (for, e.g., languages, or domain-specific devices) also exist and if you’re writing a program that is used via SSH from remote systems, and you decide what you want is a feed of untranslated keyboard events then you kind of have to be aware of all that. Different operating systems also have different conventions for certain things like, e.g., copying or pasting text. Having the terminal emulator be responsible for mapping locally relevant shortcuts from OS to terminal type sequences is potentially advantageous in that case. This is all, again, also a problem with other thin client systems like VNC when the local client and the remote system diverge in some of these particulars.
I know it’s trendy to dump on the terminal, but this truly is a large and complicated problem space.
Why does the shell handle it at all? Great you’ve configured fish, then you open a repl and it works differently again.
This seems like the thing that should be handled at the terminal level.
I don’t think it’s laziness per se, just severely compressed attention spans. People will read most of the manuals eventually anyway, just as tiny fragmented snippets. Will they understand? Will they remember? Or will they come to depend on the hints and nudges, building muscle memory rather than working knowledge?
I think you are both suffering from Stockholm syndrome. The reason I shouldn’t have to read the manual is because the software should be easy enough to use that a manual is unnecessary, because how to perform the actions you need to is obvious and consistent with how the actions are performed in other applications. It has nothing to do with attention spans or laziness and has everything to do with just making things not s*****, nobody cares to memorize the syntax of the 57th b******* configuration language that someone invented for their file management utility / window manager / whatever. It is an endless parade of monotonous busy work where you never learn anything fundamentally new that you are able to leverage usefully in other domains.
I’m going to extract from that the message that “if something is going to be different from the normal, expected workflow, then it should be so only for a good reason rather than by whim or happenstance.”
I think author should be commended for challenging the status quo of terminal usage and deserves congratulations for their progress so far.
As I see it, the tone of this post (e.g., “I’m going to eat your lunch”) and the repeated appeals to “modernity” (e.g., “[a]nother terminal author who understood modernity” or “[i]t’s modernity 101”) are evidence of their enthusiasm and should not be taken too seriously. (I don’t think they’re trying to be offensive or provocative.)
Indeed, it’s not clear to me why these particular features are evidence of modernity other than in the most trivial historical sense—e.g., that early computers were much less likely to support mouse input than modern computers.
I echo other posters in that the features being showcased are either not particularly compelling (e.g., gnome-terminal
also lets me change preferences using a mouse but this is not something I do frequently enough that it needs convenience beyond vim ~/.config/ghostty/config
; in fact, if you think about it, vim ~/.config/wezterm/wezterm.lua
is actually more modern in a more meaningful way, as programmatic configuration enables a shift from a static, long-lifetime, single-machine to dynamic, short-lifetime, multi-machine!)
(The mouse trap feature is pretty cool, though.)
Rather than litigate the showcased features one-by-one, I want to articulate some useful context from the perspective of a heavy terminal user around a central point that comes up frequently when talking about command-line tooling.
I don’t want to use a mouse.
Mouse, joystick, pen, touch, and voice are useful interface paradigms. There are even tasks at which they each excel! But, except for maybe pen, they are all very low fidelity. In general, they are slow and imprecise. They come with a high cost of accuracy.
Touch is the probably worst common interface paradigm. It’s no wonder that social media apps center their UIs around swiping horizontally or vertically, since their goal is to provide an interface that requires no skill, effort, or thought to interact with. I have a number of iOS devices that I use throughout the day for work and other tasks, and Apple definitely deserves recognition for the work they have put into reducing input latency, but without this work those devices would have been wholly unusable. (e.g., I have a bunch of hybrid tablet/laptop-format devices running ArchLinux; it’s not (onboard
)[https://code.launchpad.net/onboard]’s fault, but it’s almost impossible to perform anything but the simplest tasks via this interface paradigm!) I, like many of you, am also getting older, and if there’s even a little bit of latency, I find that any touch target smaller than about 200% the size of my thumb has a risk of being missed, and this risk increases dramatically as the touch target becomes smaller. In other words, I am significantly more effective at accomplishing any medium-complexity task using a mouse rather than using touch.
So mouse is definitely not the worst. First-person shooter games prove that, with extensive training, it’s possible to develop extremely high speed and accuracy with this paradigm for a limited set of tasks. But, in practice, for a normal person like me, using a mouse tends to be a slow and clumsy exercise (just not nearly as bad as using a touchscreen!) It requires fine-motor skills, quick reaction time, and attention. If we contrast a single mouse “action” (navigate to and click a target) versus a single keyboard “action” (press a key,) it’s clear that the mouse has both higher theoretical fidelity (there are 8,294,400 distinct pixels I could click on versus 109 distinct keys I could press) but has lower practical fidelity (in the same unit of time, I could reliably hit maybe 20~40 distinct pixel regions versus the >70 keys I can hit without moving my hands.) The keyboard also has the advantage that is units of action are nicely aligned with units of human language. As any Arma 3 player will tell you, you can get a lot of input fidelity out of ~50 or so single keys! Because the human-side of the human-interface paradigm (i.e., your hands) tend to stay in the same position, and because the muscle action to activate an action tends to be much grosser (e.g., a key has a very large relative threshold for where it must be struck and with how much pressure before it activates and typically activates in a fashion that is largely indifferent to these changes,) a keyboard is actually at a very prime spot in the interface hierarchy.
It has enough fidelity to encode complex input, it’s units of input are aligned with human language (which aligns it nicely with the majority of work-related tasks,) it doesn’t take extensive training to develop mastery, operator skill degrades less aggressively over time, and all of this is balanced against activation actions that are gross enough to be resilient to variations in pressure or striking inaccuracy. Keyboard-orientated interfaces also tend to be lower latency or designed such that an operator can effectively overshoot. (It’s not uncommon to see an older person with no technical background who has been trained to use a keyboard-orientated interface who can zoom through form-filling interfaces at the speed of thought, and, in the case of interface lag, even speculatively fill in fields before they appear on their screen. Think: gate agents at an airport.)
Also, isn’t everyone always asking for more physical buttons in their cars? It’s for the exact same reason.
The mouse is simply not a more or less modern interface paradigm than the keyboard. It is, instead, an interface paradigm that may work well for certain tasks, but is significantly clumsier, slower, and less accurate than the keyboard, especially for tasks that are aligned with human language.
I’ll let someone else comment on how keyboard interfaces, at present, tend to be more compositional than mouse or touch interfaces.
In other words, no matter how trained a user is or how good a mouse or touch interface becomes, the transformational opportunities offered by composition mean that keyboard operation is in just a different universe. In fact, this is exactly what draws most terminal users to the terminal.
mv file folder/
is not why I use the command-line. I can do this with nemo
and a mouse.
mv *.txt folder/
isn’t even why I use the command-line. You could make a better nemo
that gets pretty close to this.
mv "${(@A0)"$(find -type f -size +3k -name '*.log' -print0 | grep -E -z '2020-01-\d{2}')"%$'\0'}" folder/
is why I use the command-line. (I just wish these things were easier to type!)
I think any appeal to modernity must necessarily involve finding new compositional paradigms.
It’s not just “at present”. Language is inherently combinatorial / compositional. It makes semantic tree structures, encoded as strings of discrete characters or phonemes. Vision is inherently spatial, which is to say “flat” 2d or 3d and continuous. We have dedicated neural systems for these different modalities.
Display technology has saturated the limits of human vision, at least in terms of resolution and latency. By contrast, our sense of touch is at a great disadvantage without tactile or kinesthetic feedback. The intelligence inherent in manual dexterity and bodily agility is offered no way to connect with the machine. Maybe “augmented reality” UIs will eventually change that.
Maybe haptics in touchpads would be more useful if they could somehow make the features on the screen that the pointer moves over more tactile. Before skeuomorphism was swept away and replaced by a featureless desert of dazzling white space, we used to have bumpy user interfaces which I think might be fun to mouse over if the mouse or trackpad bumps too.
I’m hedging, because it occurs to me that current keyboards make various practical compromises, driven by our current level of technology. I’m not smart enough or creative enough to imagine what a materially better keyboard looks like.
(I remember reading somewhere that the actors were given total freedom in how they operated the LCARS interfaces in Star Trek: The Next Generation, because the designer’s intent was that the interface had some mechanism that allowed them to automatically adapt to their user. If I remember correctly Star Trek: Discovery expanded on this with some kind of nanotechnology based, non-touchscreen, physically adaptive control mechanism, kind of like a dynamic keyboard that could add and remove keys.)
Setting aside the specific form of the contemporary keyboard and instead considering finger-button interfaces in general, are you suggesting that they are necessarily at some (practical) optimum with regards to compositional capability? Would this be the case only in contrast to other presently technically feasible modalities like mouse/touch tactile and voice, or would this be the case for any speculative future interface paradigms (assuming no fundamental changes to humanity)?
(As I see it, if we’re talking about “interface,” then we’re presuming the human and machine are separated, so if we broaden our consideration to some speculative future mind-control interface, if it’s actually an “interface,” then it would probably be driven by some kind of inner voice/inner monologue mechanism, so it would end up being an enhancement and synthesis of existing paradigms like voice and gaze…?)
mv “${(@A0)”$(find -type f -size +3k -name ‘*.log’ -print0 | grep -E -z ‘2020-01-\d{2}’)”%$’\0’}” folder/ is why I use the command-line. (I just wish these things were easier to type!)
And now, you want to change it from *.log
to *.lag
(or vice versa, you typo’d it the first time). This is where I like being able to grab the mouse, click on the letter, and change it rather than agonizingly slowly slam backward word, then oops, i overshot, forward word, backward char to finally get on the thing i wanna change, ugh, what a pain.
All of these things can work at the same time.
Maybe.
There are definitely tasks where the mouse is the most efficient way to accomplish a task. It’s also definitely the case that in situations where the efficiency of keyboard and mouse are similar, I may prefer to invest more time into learning the former approach over the latter. (In the case you mention, I might use zsh
’s zle
command edit-command-line
to launch vim
and :%s/log/lag
→:wq
.)
My original comment was to refute the idea that keyboard vs mouse vs touch is somehow aligned with modernity. Instead, I asserted that many who prefer keyboard interfaces prefer them for non-ideological reasons. In many cases, the keyboard is, in fact, significantly better. (It’s not élitism or gate-keeping or about image or about ideology.)
However, we may make these choices absent any initial prejudice yet still develop preferences that reïnforce this choice, especially in cases where there isn’t a clear winner among the options. It may have originally been a practical choice, but there’s still a natural momentum and an uncertainty around change that one must face.
That’s why it’s important to challenge these habits and these choices, as the author of Terminal Click has, even if all that comes of it is reminding ourselves why things are the way they are.
It’s funny, despite truly believing that the shell and the terminal should be integrated and the crappy in-band signaling currently in use should be dropped, I’ve never been able to abandon my shell+terminal combination. In fact, I’ve been implementing terrible hacks using all the (not so) newfangled OSC codes to get more integration. Inertia might be the hardest thing to overcome for all these shell/terminal projects.
We don’t use a terminal emulator because we have ** to. We use it because we like it.
** Except when we ssh into a machine and are too lazy to forward the display.
I really like the idea of revamping terminal and shell mercilessly. Let’s see which properties and abilities of the shell we really need or want, and keep only those.
And the features you demoed look nice. What I’m missing though is better discoverability of all this: if I’m new to shells or to Terminal Click, how do I find out what’s possible? That’s a general problem of shells (you can press Tab twice to see all available commands, but that’s not helpful either). Somehow IDEs feel much more inviting, or rather they have an easier intro curve (open a file and start writing, and then look for commands to work on that file).
Also, I wonder how tightly coupled Terminal Click is with the commands (e.g. Git). Do you need to explicitly add support for each of these commands? Or is there a “narrow waist” that allows to use TC (and other new terminals) with all shell commands, with full integration? Or is there at least an idea for creating such a narrow waist?
I’d absolutely love to see some real innovation adjacent to this space. I’d say adjacent because I think trying to displace terminals (or shells) from the starting gun is (to abuse another metaphor) trying to run before you walk.
There was a project about 20 years ago, I think I’m the GNOME orbit, which tried to do something really interesting with data objects, not entirely the same as but vaguely in the room with powershell. I can’t remember its name.
For me, I’d love an ergonomic, natural way to do incremental pipeline building and experimentation without running the whole thing again and again each time.
The longer I look at it, the more I get convinced that the terminal ecosystem is indeed about gatekeeping. The only catch is, who are being kept out of the gate.
I think we have seen multiple takes on GUI get taken over by pixel-perfection, with composability and automate-ability and customisability falling out of the window, and tools for quickly sketching an interface falling into disfavour (thus harder to find / often harder to integrate with something).
Terminals are more or less the place where the main interaction is actively hostile to pixel-perfection-hunt in the UI, and scriptability the most likely to survive. Thus they go on as the defensible safe place with a gate in front.
Note that «I will prefer CLI tools over everything and I will write my tools with GUI» is about gatekeeping development more than use. Whe there is a CLI tool where gating access to it can be plausibly called gatekeeping against the users and where adding a useful GUI is not that much work, that GUI typically gets added as a wrapper, and the CLI tool authors generally consider this a positive development.
I printed that Unix Magic poster on metal a few years ago and have it hanging in my office. Always brings me joy. Highly recommended for anyone who likes simple reminders that what we do is wonderful.
This seems pretty good, and a pretty good idea for making shell more accessible.
TBH this seems like something that should be fairly doable in emacs AND I haven’t used a truly good emacs shell.
I think this might have the most impact if it were integrated in vscode, which is what everyone not truly invested in another classic editor seems to be using.
IMO a better example for the spinning indicator would have been something like copying a very large file that doesn’t already have a progress indicator.
Also, the phrase “legacy terminal” really rubs me the wrong way.
Is the protocol between the terminal emulator and the shell going to be specified? If it is, why not point that out? If not, is it really compatible with existing CLI’s since they assume a model that has that traditional separation?
Very bold, nice screenshots :-) I would like some of these UI affordances, but I think you still need a shell. I don’t think they necessarily conflict.
That is, my reaction is similar to the last time I saw this – you didn’t get rid of the shell, you created your own shell as a part of your terminal.
It just lives in the same process, in the same codebase, no?
Though now that I think about it, that is not a bad idea in some ways. To have an embeddable shell library.
I have sort of had a relevation that shells are currently waitpid(-1)
loops …
But they could be select() / epoll()
loops like node.js or Python asyncio
enhanced xargs -P
or ninja
problem)I may work on that, either in YSH, or a “catbrain” experiment I proposed last year [1]
Though let me also object a bit to the shell you made:
2+2
… pretty soon you will want to add variables, no? I mean there are already environment variablesSo how is this parsed?
p*3
Is that an arithmetic expression or is that a glob? Will it run a program called python3
? Or does it calculate the integer p
times 3` ?
[1] https://www.oilshell.org/blog/2024/09/retrospective.html#help-wanted - I didn’t get too many responses, probably because this is vague. I want to write a bit more about it, but not sure if I’ll have time
The other thing I want to say is that while RTFM is useful and decent advice in many situations …
It doesn’t really apply to the bash manual … It will probably not make you more productive – it is more likely to make you weep at the mistakes that have been made …
e.g. Someone thought there was deep wisdom in the bash manual, but I disagreed: https://news.ycombinator.com/item?id=38414011
Now that I’ve read almost all of the bash manual, I’d say it’s not really a special document. It’s fine, but it’s not complete, and I wouldn’t say it’s particularly well written. It lacks examples.
Quoted here, along with a mistake we had to re-implement - https://www.oilshell.org/blog/2024/06/release-0.22.0.html#driven-by-nix
The point of YSH is to achieve the “semantic compression” that matklad mentioned – you shouldn’t have to read the manual!
Although a problem I thought of with the embedded approach is that it’s hard to have different working dirs in different GUI tabs …
Tabs seems like an essential UI feature, no?
Because a process is the thing with the PWD, and child processes inherit it.
Although maybe you could keep that state in a variable, and then when you do ls
, you could chdir()
after forking, rather than before …
Although then builtins like test -f /tmp
would have to be aware of that logic too …