Modal editing is a weird historical contingency we have through sheer happenstance
32 points by Sietsebb
32 points by Sietsebb
Some very popular editors have modes, they just don’t edit text and they call the modes “tools”. (Indeed, editing text in them usually is a mode.) Pretty much every mainstream editor for graphics, music, and engineering CAD has lots of modes.
Speaking of CAD, another historical UI fork that is interesting to follow is “noun verb” vs “verb noun” in GUI systems. Many CAD programs (e.g., Eagle, LTspice) come from early “verb noun” heritage (you say you want to move something, then click on what you want to move) and stil work that way, which seems counterintuitive if you learned GUI after 1985 or so when “selection then command” became the standard. Kind of like vim vs. Kakoune.
I recently saw a tantacrul video about the development of audacity [0]. It's all about revamping audacity and removing modes from it and here [1] he goes on into where you actually can find modes in other programs. I found this very interesting since some examples I never realized where modes until he said it which made it kind of obvious.
Even in my compiler for my trash language I track the state I'm in (string, comment, block, code, whatever) by a mode variable.
Modes are really everywhere when you really look
Many CAD programs (e.g., Eagle, LTspice) come from early “verb noun” heritage (you say you want to move something, then click on what you want to move) and stil work that way, which seems counterintuitive if you learned GUI after 1985 or so when “selection then command” became the standard.
Which is weird because Bravo was a "selection then command" GUI! That was in 1974. So it was invented first, then and became mainstream, so it's odd that CAD does the opposite.
I was curious what Sketchpad (1963) did, and it seems like it might do a bit of both, depending on the operation. What does seem clear is that selection and command were clearly separated, into physically different input devices: the lightpen is used for selections, but not for commands. However it is a bit more vague what the order is, or whether it varies. For example, p. 62 of Sutherland's thesis sounds a lot like it's doing command-then-selection:
The table of picture parts falling within the field of view of the light pen, assembled during a complete display cycle, contains all the picture parts which might form the object of a statement of the type:
apply function F to ______
e.g. erase this line (circle, etc.).
And then describes the algorithm used to show candidate "demonstrative objects" on which to apply F. But other parts are a lot less clear, e.g. I think sometimes you press a hardware button to act on the currently selected object? Maybe I should look at videos instead of trying to parse the textual description further.
Reminds me of English being written and spoken backwards compared to my native language which introduces all sorts of issues when I start writing “natural” one after using English all day. It just makes the grammar feel as if typed it while drunk. For example we say Union European or States United etc.
Btw. after years of vim, kakoune’s select -> delete (xd
) instead of delete > select (dx
) was such a pleasure to use for the first time. It felt natural and aligned with modern UI/UX standards.
As wrs points out, modal editing is still common today in everything from Photoshop to Ableton Live.
I think the key difference is that modal editing is popular in applications that are primarily mouse-driven but not in apps that are keyboard-centric. That makes sense when you think about it.
The point of a mode is to multiply the number of possible inputs from a device by the number of modes. That increases the number of possible inputs, at the cost of forcing the user to remember the current mode and perform a separate action to switch modes if the current mode doesn't expose the action they want.
With a keyboard and modifier keys, there are already hundreds of unique key presses the user can invoke. If you need more than that, then you've probably got bigger UX problems than modes will solve. But with a mouse, you've only got a couple of buttons and one cursor. If you want clicking and dragging to mean different things, then modes really do help.
It's also the case that every app still has modes in the form of modal dialogs. And mobile apps tend to be heavily modal where you have many different screens you can be in and each processes input in a different way. That makes sense when you consider that a phone not only doesn't have a keyboard, but it doesn't even have mouse hover, multiple buttons, or modifier keys.
Modes also didn't take off anywhere else. There's no modal word processor, spreadsheet editor, or email client. Visidata is an extremely cool modal data exploration tool but it's pretty niche. Firefox used to have vimperator (which was inspired by Vim) but that's defunct now. Modal software means modal editing which means vi.
That's a (little) bit unfair. First of all, Tridactyl is a successor to vimperator with more than zero users!
We should also think about the software users control with small input devices: games!
Some games have a "photo" mode where a joystick moves the camera instead of the character. Lock-on is a mode that changes the cartesian coordinates to polar. I've played roguelikes where hjkl
move your character but you press x
to enter 'eXamine' mode to check items on the map.
Some games you can toggle sprint/crouch/look down sights "mode" rather than holding a button.
I'm no expert but I think Blender is modal. Other multimedia software has "draw" vs "select" modes to change mouse click behavior. Again, not an expert but I think toggling "edit mode" is important in Renoise.
Other modal software: tmux, i3 (also allows user defined modes!)...
I think hwayne nails it: the motivation is getting more input density from your limited device. In games the controller is even smaller than a keyboard, but I couldn't come up with any excellent examples.
I feel like, at least with respect to spreadsheets, it's also not really accurate. You have the navigation mode where you're moving between and selecting cells, and you have the edit mode you switch into when you're editing the contents of a particular cell.
I feel it's important to point out that qutebrowser exist! It's been my main browser for almost a decade now.
Neovim, which aims to fix all of the baggage in vim's legacy, didn't consider creating modes an important feature.
There is https://github.com/neovim/neovim/issues/992 which dates to the year of Neovim's creation. So some form of custom mode is planned.
Back in the day I was taught that it’s better to think of vi’s insert “mode” as just a long argument to the insert command, not really a separate mode. That was more true with original vi which didn’t allow you to change the insertion point while inserting: every key was inserted verbatim except escape. Vim is more modal than vi in this respect, because it has commands while inserting, a different set of commands than normal. (nvi is somewhere between old vi and vim.)
There’s also a progression from sed to ed to vi: they are cryptically terse editing DSLs with a lot in common, but with different interaction models: sed is batch mode, ed is a REPL, and vi immediately displays the effect of each command. The DSL as the user interface is more important than the modality, which is why kakoune and helix are interesting as different takes on an interactive editing DSL.
A cool demo of the Bravo editor mentioned in the post:
https://youtu.be/q_Na1SJXSBg?si=6fxEmasEueACuNeE&t=390
Timestamp is the beginning of the demo but the introduction is interesting too.
Anyone know what Emacs version Bill Joy was talking about that cost "hundreds of dollars" in 1984?
For text editing I'm pretty sure modal editing is just a massive cope / workaround for our terrible keyboard layouts that make modifier keys hard to use.
I had RSI about a decade ago and the first thing I did was learn vim, thinking keeping my fingers on the home row would help. It helped, but not enough, so since then I've been using split keyboards like the Kinesis which put the modifiers in the middle with easy thumb access.
But I started to wonder recently, is modal editing actually efficient if you have something like a Kinesis? On one hand, the "actions" in normal mode are shorter because they don't require any modifier keys. On the other hand, entering and existing normal mode requires at least two extra key presses (maybe three if you use jk or fd or Ctrl-C etc.) for every text insertion. The balance really depends on the ratio between editing and typing, but unless you're just moving code around I sort of doubt modal editing can come out ahead. Even if it were slightly ahead in pure keypress efficiency, I think the greatly increased cognitive overhead almost certainly makes it not worth it.