Linus Torvalds Expresses His Hatred For Case-Insensitive File-Systems
73 points by laktak
73 points by laktak
I think Linus’s greatest strength is his willingness to delegate despite people doing things that don’t live up to his sense of perfectionism.
I don’t read this as him being angry at kernel contributors, but rather at those inconsiderate fools who wrote those no-good filesystem specs in the first place. I imagine the authors of his pointed-to commits agreeing with him, along the lines of “I know, this is terrible, but what are we supposed to do, that’s just what the spec says.”
This is silly, fake drama. It doesn’t matter, at this point, whether the FAT16 spec was “brain dead” or whatever mean thing he wants to call it: it’s not being revised. The horse has left the barn. The only real question is whether or not Linux can have drivers that can implement the spec. Same deal with (more to the point) NTFS and HFS+ and APFS: these allow for case-sensitive configurations, but in practice are rarely configured that way. I wouldn’t choose to format a volume with these filesystems, but I would like to be able to read and write such a volume without breaking it for other users, thanks.
It is, however, a direct reply to the original author of Bcachefs, a Linux-targeting FS with a case insensitivity option, so in the case of this specific email, it is targeted directly at a kernel contributor for having chosen freely to implement case insensitivity.
Given that the FS already had some users, and some case-insensitivity users, by the time merging it into mainline got discussed, accepting it with insensitivity was the best of the options, but writing messages to LKML as if original implementation of case insensitivity was a good idea still counts as provocation enough for a rant in reply.
HFS+ and APFS: these allow for case-sensitive configurations, but in practice are rarely configured that way.
I don’t know if this is still true but last time I looked into using a case-sensitive HFS+ filesystem as my root on OS X, I came across some pretty dire warnings that there were a bunch of applications that would subtly break, Photoshop and Illustrator being examples that I remember off the top of my head. Which… given that I was doing pretty heavy web development at the time, would have not been a Fun Time.
Microsoft OneDrive also refuses to work on systems where /Users is on a case-sensitive APFS volume.
I’m so glad HFS+ got replaced with APFS. The (non standard) unicode normalization in HFS+ was a disaster and made some software just not work, as well even more confusion if you tried to share these files with other systems.
See and I wish they would have adopted ZFS instead of write APFS. We are both in agreement of retiring HFS+ though!
They considered it, there was a lot of excitement for ZFS in the Leopard era, this post covers the history in detail: https://ahl.dtrace.org/2016/06/15/apple_and_zfs/
Agreed, I was around during that time and heard all the rumors and hoped it would come true. Of course Ellison ruined it, what else would one expect from Oracle.
In fact, I’d submit that (at least on older OS X) case sensitivity is really what UFS was for, if you really had to have it. My 10.4 file server has HFS+ and UFS partitions: the HFS+ part is where the Mac clients get stuff, and is insensitive as they expect, and the UFS portion is where the Unix clients get stuff, and is case-sensitive as they expect.
Not sure why we are still linking to Phoronix dumbing down the original source for broad mainstream consumption, but that whole LKML thread is full of gold, e.g.:
I think this is something that NTFS actually got right. Each filesystem carries with it a 128KiB table that maps each codepoint to its case-insensitive equivalent. So there’s no ambiguity about “which version of the unicode standard are we using”, “Does the user care about Turkish language rules?”, “Is Aachen a German or Danish word?”. The sysadmin specified all that when they created the filesystem, and it doesn’t matter what the Unicode standard changes in the future; if you need to change how the filesystem sorts things, you can update the table.
Without context the thought of having that 128KiB translation table around sounds completely cursed to me, but I agree when you are stuck with a case-insensitive fs, it’s probably the only correct way to deal with this problem.
And it wouldn’t be LKML if there wasn’t a whole other sidethread about Linus arguing with another person, I thought their blunt phrasing was actually hilarious in that context:
Kent: The subject is CI lookups, and I’ll eat my shoe if you wrote that.
Linus: Start chomping. [proceeds to show code he wrote back in 1997]
Not sure why we are still linking to Phoronix dumbing down the original source for broad mainstream consumption
Probably because lobste.rs has rules against “linking into projects’ community spaces” and nobody wants to test the exact position of the line.
I think you are misinterpreting the rule here, linking to ML archives always seemed fine. This is about Github issues where (probably) half of the readers here are already logged in and could click a button to react.
I’m very definitely not. There have been moderation actions against exactly this type of links not long ago.
(In hindsight, what was wrong was my remark about “testing the exact position of the line” — per the thread above, the rule is in fact designed to be heavy-handed and have no exceptions, so no line-testing is even needed. Links to LKML are banned per pushcx, full stop.)
as an alternative, someone with LWN subscription could find the LKML mails on their archive, and link there instead.
Given the choice, I’ve always preferred case-sensitive file systems, but thinking about it, are those really better from the user’s perspective? Is it correct to assume that “foo.txt” and “Foo.txt” are different files? If I have two paper files on my desk with the same title but different capitalization, I would assume the content to be the same.
With ASCII it’s easy. But you will quickly learn that case insensitive matching in Unicode is not just a can of worms, but of eldritch horrors.
One famous issue is the Turkish capital I
problem. In english, lowercase I
is i
, however in Turkish it is ı
and uppercase i
is İ
.
Clearly, the solution is that the file system should do locale-aware case insensitive matching. What could possibly go wrong?
I was setting up my Minecraft mods exactly the same way the tutorial does it, and it wasn’t working. Turns out the solutions is changing my JVM locale! I have learned why it worked after changing the locale about 8 years later.
I think the most sensible answer is only ASCII should match different letter-case. Every other character is fringe in file names anyways.
Unless your system explicitly prevents non-ASCII, you’ll have to deal with weird cases. Even an all-ASCII string might up/downcase to non-ASCII codepoints in certain locales. One man’s fringe is another’s everyday.
I would assume the total opposite. They’re not the same name, why would I assume they’re the same thing?
It only gets weirder though when you have Unicode paths. Are А.txt and A.txt the same file? One file name is Cyrillic, the other is Latin; they have different names in some sense but it’s often impossible to visually distinguish them.
It is also hard to visually distinguish letter “l” and “I” (lowercase L vs uppercase i). Do you want to treat them similarly too? Visual distinction is different from actual distinction.
Is it correct to assume that “foo.txt” and “Foo.txt” are different files?
color.txt
and colour.txt
the same file?Steuern.txt
(de:taxes) and steuern.txt
(de:to navigate) supposed to be the same?A bit off-topic but here’s what happened to Spotify with that line of thinking: https://engineering.atspotify.com/2013/06/creative-usernames/
Ouuu, thanks for the link, it is a perfect illustration for the specific problems Linus calls attention to:
How about the files “Interesting.txt” and “interesting.txt”? In languages that have a dotless “i” those are different words, but in languages that don’t they’re the same. Should the filesystem be tracking what language filenames are written in so that it can accurately be case insensitive?
Yeah, I think the main problem with this line of thinking is that capitalization is only one example of many, many different situations where two strings can “mean the same thing” but be composed of different codepoints. With capitalization it “seems obvious” that they are the same, but there are a bunch of other situations in other writing systems that are harder to normalize. So saying that filenames should be case-insensitive is really just saying “normalization should happen, but only for characters that Americans use”.
Yes, if we’re considering that A
and a
are equivalent, then why not A
and à
, a
and а
, leet
and 1337
, etc. You can argue that one is justified but not the other, but there’s no universal consensus. The only sane approach IMHO is to not consider any of them equivalent.
I mean Han unification is basically the east asian equivalent of saying that “a” and “α” are equivalent, and we all know how well that went over.
Isn’t that a problem in any case? If I downcase text in a text file, it’s not going to know the language, either, unless I tell it somehow.
The filesystem has to leave it the hell alone. There’s no way that it can make reliability correct decisions and if it tries it’ll just get in the way of higher level software that might have more context about the user or the data.
My point is, it’s hardly a unique problem to the file system. If somebody is using a language with those characters then they need the correct locale setup or they’re going to have problems all over the place. If they have the locale setup correctly, then switching case in the FS will be handled the same way it is everywhere else automatically.
Of course the best solution is case sensitive file systems.
The problem with the file system is that it’s uniquely poorly suited to this problem. It’s user interface is just a set of syscalls for file manipulation. Higher layers of the stack can use everything from environment variables to GUIs to allow the user to configure their language.
Should the kernel make opinionated decisions to provide a user interface making file system files similar to papers on your desk? Or should that be relegated to applications?
Or should that be relegated to applications?
That’s exactly what being done here. Case-insensitivity as implemented is a toggleable flag with a per-directory granularity.
That’s not the same as relegating the case insensitivity to the application. That’s still the kernel doing case sensitivity, it’s just inconsistent between directories.
That’s a pretty uncharitable way to reinterpret my point.
Case sensitivity has to be physically done in kernel even if it’s “relegated to applications”, because the kernel is the entity doing directory lookups.
In practice, desktop applications work with mostly disjoint sets of files (yes, this is true even for user data). Thus, when an application decides it wants the case-insensitive semantics, it flips the case folding switch for the directories it “owns”. Thus, “relegated to applications”.
Applications don’t own (most) directories.
Well, most directories will not (and are not supposed to) have case insensitivity enabled. I don’t see how this contradicts anything I’ve written.
It means that, when case insensitivity is configured per-directory, it’s handled by the kernel and configured by the user. It’s hard to view that as being “relegated to applications”.
Applications can’t make that decision, since the kernel is the interface to the file system. One application deciding to not have case sensitivity will break as soon as another app decides to make files with the same name modulo case. It has to be handled uniformly across applications, and the only way to do that is in the kernel.
macOS does something in between: it doesn’t allow creating a file when one with the same case-normalized name exists, but it will allow referencing the file given a different case.
Essentially it always normalizes the case at the filesystem level, except it preserves the given case when creating the file.
$ echo wow > afile
$ cat aFile
wow
I’m not sure whether it’s the best or worst of both worlds!
From experience I can say that this is absolutely horrible and causes tons of issues.
Yes, some file managers (including Finder, the last time I tried) won’t let you rename afile to aFile. They check first and aFile already “exists”.
An experiment on macOS 15.3.2, on an APFS volume of the default kind which is case-insensitive:
touch afile
: creates itls
: prints aFile
ls afile
: prints afile
ls AFILE
: prints AFILE
ls a<Tab>
: completes aFile
ls af<Tab>
: does not completeThe behavior is not different if I use mv
instead of Finder to rename.
The primary problem I’ve experienced with this scheme is that although git is case-sensitive, if I re-case a file git is already tracking, git will not notice it as a rename because the file system still finds the file at the tracked name. Now my working copy is out of sync and it only Works on My Machine. At one point I used a case-sensitive APFS volume for repo storage, but it was more complicated than the workaround: Rename the file in a way that’s more than re-casing, commit that, do it again to the correct name and case, then amend.
Why do we even have that lever
Yeah, I encounter this all the time and it’s infuriating. I’ll make a source file representing a class, then realise I accidentally made it lower case instead of camel case, and then to rename the file to camel case, I need to do this stupid dance of first renaming it to something I don’t want and then renaming it a second time.
Some of the stickiest “things programmers believe about <domain>
” come from assuming that a relationship is one-to-one. A letter is not a character; there could be many. A character is not a code point; there could be many. A code point is not a grapheme cluster; it may take many to make one. A grapheme cluster is not a glyph, how many it takes depends on the font. (I sure hope that was accurate.) Which of these things is a string a list of, such that you can derive length or equality from them? Even before we talk about file systems and VCS, programming languages disagree and some take years to make up their minds.
At least in the Unix, Linux and BSD world, case-sensitive filesystems are an accepted norm, and trying to deviate from it is basically digging a hole for yourself and the users.
I understand that the man is a legend, and he will always be among geeks and nerds of the wonderland that we live in, but why do we dramatize, discuss, politicize, argue about every opinion of his?! It’s repetitive, unnecessary and not in the spirit of thoughtful community (Linux should run across the world, and galaxy and then some with full force of open-source nerds offering tech support to them all) :-)
That being said, designing anything let alone important building block like file system without case sensitivity leads to pandora’s box worth of bugs, and that’s me saying it having some experience with authentication and other client side validations
At work, they give engineers Macs, but we have to ssh to Linux boxes to actually do development work because we’ve got an old Linux files with names that only differ by case so the repos can’t be cloned on the Macs. I’d love to actually use my M3 Max cores for building but instead I get an old workstation with 4C/8T that’s a lot slower (though, it’s fine).
Ultimately I’d love to just run a Linux distro of my choice on the Mac but IT isn’t going to allow that any time soon…
You can get around this issue entirely on MacOS by using Disk Utility.app to create a new Volume Image, with case sensitivity, and just use that as your work/repo folder, if possible.
If you want to do it at the command-line, maybe as some sort of repo setup requirement, you can do it like this:
$ hdiutil create -size 100m -fs "Case-sensitive APFS" -volname aCaseOfSensitivity aCaseOfSensitivity.dmg
$ open aCaseOfSensitivity.dmg
$ touch /Volumes/aCaseOfSensitivity/Yo
$ touch /Volumes/aCaseOfSensitivity/yo
$ ls /Volumes/aCaseOfSensitivity/
Yo yo
Disk Utility.app and hdiutil go hand in hand, so you can do it both ways ..
You don’t need to create a disk image. If you create a new case-sensitive volume within the main system APFS container, it will share space with the main volume and you don’t have to worry about deciding on its size or manually mounting it.
For a while, that was a necessary step for building AOSP on a Mac. I’m not sure if that’s still the case or not, since it’s been a long time since I’ve needed to build AOSP on a Mac or elsewhere. I think the ability to create case-sensitive images might be what keeps it from being a larger issue on the Mac. Unfortunately, they’re a little slow when it comes to something like compiling a large project that creates many thousands of small files, though probably not as slow as a VM would be.
A long time ago I was trying to build some software on OS X and the errors I was getting were completely baffling. On Linux it all built fine. On OS X it wasn’t even trying to compile the same files as it was on Linux. Eventually after doing some diffing I discovered that the Git repo had a file called Makefile and a file called makefile in it… Almost flipped my desk over.
The repo for the Linux kernel today has several file paths that differ only by case too. I’m not sure anyone besides me is interested in trying to cross-compile the Linux kernel from macOS (more of an academic exercise) but it mildly annoyed me…
(haven’t tried with a disk image or APFS volume yet so I’d still like to give it a shot sometime)
There was an option to format an hfs+ drive with case sensitivity. So if that is still the option you just have to make another volume (never the system volume).
Sadly it’s not, at least as far as I’m aware. They (IT) format the whole thing when they give it to us. I’ve been trying to use OCI containers but haven’t gotten it to work yet (nor tried too hard lol)
Perfect example of why I don’t bother getting familiar with osx or windows. Such ridiculous problems are just too frequent. And life is too short to waste time battling them.
I’ve always found the name “case-insensitive” to mean “filesystem that considers a to be the same as A” to be strange. Since I instinctively interpret “insensitive” as “this filesystem doesn’t care about cases at all”, so my brain defaults to “oh, so it distinguishes based on their octet sequence alone, right”
I have to think that terminology through every time too even though I’ve programmed for HFS+!
What FS are ppl using in Linux desktop these days? I run a Linux server but it’s in rolling distro (gentoo) running since 2015. Has ext4FS and ZFS for the 4 x 2TB disks (these are due to an update in size).
If you don’t need to extract every last bit of I/O performance, just do btrfs. It works, it has all the features, and it is extremely flexible when it comes to resizing/reshaping the volumes.
It is also the default for Fedora. Raid1, with encryption and compression was a few clicks in the installer.
If you don’t need every last bit of I/O performance, just use ZFS. It’s very convenient and let’s you use snapshots and zfs send to do semi real time backups to the ZFS pool on your server.
With one caveat: you cannot follow the latest kernel if you choose ZFS for your desktop. Or it takes time to port the module to the latest kernel.
ext4, it will not stop working when you get close to filling the filesystem, it has a good old fsck and it will keep your files.
ext4, <…> it has a good old fsck and it will keep your files.
Except when it won’t.
Just a week ago, I was helping a friend rescue his files from an ext4/lvm sandwich that somehow started spitting out nondeterministic garbage. This was caught when he downloaded a huge archive of historical data over many days, which failed to extract, then downloaded it two more times, at which point it turned out all three copies hash to different values and the first one changed its hash in the meantime. (No, there were no memory issues with the box, an LTS kernel was being used, and fsck was perfectly happy all the while.)
So yeah. It will keep your files, as long as you’re an ostrich.
Wondering if there’s a happy medium , e.g. you can have a folder called “Home”, but internally it’s “home”; you can rename it to “hOme”, but you can’t have two folders in the same directory called “Home” and “hOme”, respectively? Or maybe that’s already how things are?
I agree with Linus here. Concept of path in Linux is simple: its a string with /
as path separators (ending with null character in C). Adding case-insensitivity invites edge cases which shouldn’t exist. A glaring one I found recently is different cases of file names in git index and file on disk in Windows.
When programs assume insensitive case file system (like some apps on macOS) I am just horrified. Looks like code from DOS or Windows ported and it contained reading/writing files with different cases. Well bad implementation I say and it should be fixed.
What if you are trying to copy files from a case-sensitive file system to an insensitive file system? I think it would be super annoying to do that.
Both directions pose problems, plenty of examples in this thread.
In some ways it’s a pointless debate: case-insensitive systems exist, so we have to make them work. But they’re a self-inflicted wound, so one could hope that they get deprecated over time, if not phased out.