The Audio Stack Is a Crime Scene
134 points by alterae
134 points by alterae
It started simple. ALSA was the kernel-level driver layer: Advanced Linux Sound Architecture.
That’s not where it started, that’s where it started to go wrong. Originally, there was the Open Sound System (OSS). OSS was developed by a small company. Their business model was simple:
This wasn’t working very well, so they decided that OSS 4 would be proprietary (this worked even less well, but that’s a much later and irrelevant bit of the story).
The FreeBSD reaction to this was to take the last BSD-licensed version of OSS 3 and fork it. They extended it to support the newer OSS 4 APIs (which were nicer, but mostly backwards compatible), to have multiple virtual channels, per channel volume controls,
The Linux reaction was to have a temper tantrum, rip out OSS, and replace it with ASLA, which had a half-finished OSS compatibility layer but required rewriting your software to properly use ALSA.
For some reason, the kernel that has a reputation for not breaking userspace is Linux. Meanwhile, software that plays or records audio written for FreeBSD 4 has worked unmodified with every version of FreeBSD since.
And on the userspace side PulseAudio wasn’t the beginning either. The Enlightened Sound Daemon (esd) was the first widely used sound server especially once Gnome adopted it. When PA came along it had to provide esd compatibility.
Yup, that was the period where I abandoned Linux. KDE and GNOME each had their own sound daemon because it was the only way to have two programs go ‘ping’ on Linux. Meanwhile, on FreeBSD, there was in-kernel sound mixing and they could all just talk OSS directly.
Ironically enough, by the time PA popped up, that was no longer a problem, as ALSA already had softmixing support by then. It was part of the reason why people hated Pulse so much early on, it didn’t really do much that couldn’t already be done, and broke all the time.
ALSA had hardware mixing from the start. However, at the same time, laptops became popular and sound cards with hardware mixer rarer. There were limitations on ALSA software mixing, notably clients should match the bitrate. Moreover, ALSA was not very user-friendly when you had Bluetooth or multiple output devices. These things were becoming common at the same time. ALSA was also quite low-level and it was a more complex to implement in applications than OSS (and than PulseAudio).
PulseAudio was a bit like systemd. People with simple usecases didn’t like the change when they had a working setup. But for people needing Bluetooth or having multiple output devices, PulseAudio was a blessing.
I’m talking about softmixing, not hardware mixing – I don’t think I ever had a system with a soundcard that supported hardware mixing (edit: actually, I did – but I definitely never had one that I ran a Unix on :-) ). And about cca. 2008, when ALSA wasn’t exactly the main obstacle when it came to Bluetooth headphones (or Bluetooth anything, for that matter) on Linux.
Easier Bluetooth interaction was definitely a plus, but the way I experienced that very painful transition was that, while it was generally easy to get Bluetooth headphones to pair and show up as valid sound sinks, getting them to actually put out as much as a beep was orders of magnitude harder than troubleshooting ALSA.
I recall that there was something annoying. I think ALSA had mixing for ALSA clients, but the OSS compatibility layer bypassed the ALSA layer that did the mixing, so things broke if you were using some things that hadn’t been ported to ALSA (and things being ported to ALSA took a long time).
Yeah, I remember that, too, anything that used OSS took over the soundcard. I don’t remember the mechanics of that, though – i.e. I don’t recall if there was an alsa-oss layer in-between and it didn’t mix well, or… that was forever ago.
I think it was kind of exacerbated by the fact that, throughout the time that the OSS compatibility layer was truly relevant, ALSA didn’t support proper softmixing, so basically everyone ran ESD or ARTS (I swear to God I’ll never remember how that’s capitalised, artS? ARTs?). Sound servers also took over the sound card, so you ended up with three things fighting over it (OSS applications, ALSA applications that didn’t go through the sound server, and the sound server itself).
By… I don’t know, 2005, 2006 or so, I think, that was no longer too relevant, as ALSA had proper mixing support by then and ESD/ARTS gradually lost relevance, although some distributions were slow to remove them. Sound basically worked fine for about a year or two, until 2008-ish, when PulseAudio got thrown into the mix. That’s how I ended up buying a Mac, actually :-D.
ALSA didn’t support proper softmixing, so basically everyone ran ESD or ARTS (I swear to God I’ll never remember how that’s capitalised, artS? ARTs?). Sound servers also took over the sound card, so you ended up with three things fighting over it (OSS applications, ALSA applications that didn’t go through the sound server, and the sound server itself).
Yup, that was what drove me to run FreeBSD. I was using Psi (KDE) and Evolution (GNOME) and they each talked to different userspace sound daemons and I wanted new-message notifications from both. I was also using XMMS to play music, which could talk OSS or ALSA but couldn’t (yet?) talk to either of the sound daemons. And I wanted all of these to keep working when I played a game full screen in the foreground.
With FreeBSD, this was possible (with FreeBSD 5 it was automatic, with 4 it required a bit of manual configuration). With Linux, it worked with my SB Live! but not my onboard audio. And the SB Live! used the emu10k driver, which was a buggy pile of awfulness that caused kernel panics at least once or twice a day (Creative’s Windows drivers were much better, they barely crashed the kernel once a week. The MS drivers didn’t enable most of the features, but didn’t crash), so I really wanted to use the onboard audio with Linux.
But those sound daemons were for OSS, because OSS did not support having two programs go ping. In fact, I believe ALSA supported having multiple programs go ping before OSS did.
Weren’t both ALSA and newpcm created several years before OSS became proprietary?
For some reason, the kernel that has a reputation for not breaking userspace is Linux.
Ah, well, you see, external kernel modules are literally evil and thus anyone relying on them had only themselves to blame ;)
Weren’t both ALSA and newpcm created several years before OSS became proprietary?
It seems like the ALSA project had started already, but OSS becoming proprietary is what triggered the kernel community to completely abandon OSS in favor of ALSA:
https://en.wikipedia.org/wiki/Open_Sound_System#History
In 2002, Savolainen was contracted by the company 4Front Technologies and made the upcoming OSS 4, which includes support for newer sound devices and improvements, proprietary. In response, the Linux community abandoned the OSS/free implementation included in the kernel and development effort switched to the replacement Advanced Linux Sound Architecture (ALSA).
That’s not where it started, that’s where it started to go wrong.
Sorry I missed out OSS.
I was aware of it, but that was before I started using Linux, therefor I have no experience with it.
Heh, the first time I compiled my own kernel was to get ALSA (which was experimental and not in my distro’s packages yet). Yet the reason I needed ALSA wasn’t for audio drivers, it was for a (win) modem driver!
I remember that back in the OSS days, one could even play audio by doing cat some-audio.wav > /dev/dsp
That’s not where it started, that’s where it started to go wrong.
I don’t know about that. ALSA is part of the Linux kernel, not something a third party corporation maintains. We can count on it being there. Improvements to it are global to all Linux systems. ALSA works really well in my experience. Every computer I’ve ever had with Intel HD Audio codecs pretty much just worked for me and I never had any issues.
Getting features into the Linux kernel is the best long term solution in my opinion. It’s Linux user space that is and has always been a mess.
And FreeBSD’s audio system is also part of FreeBSD. They already had a fully OSS-compatible sound system that didn’t need a third party corporation, before OSS became proprietary.
Exactly. The version of OSS in the Linux kernel was open (I think it was GPL’s, possibly just GPL compatible. I never looked closely at how they sold proprietary device drivers). The same route that FreeBSD took of forking the last open version and updating it was available to the Linux kernel. But they decided to break userspace and force decades of churn on userspace programs instead.
I feel like I live in a complete alternate universe from a lot of folks, evidently including the author, wherein my audio on Linux just works these days, better than it ever has. PipeWire scared me a bit - oh great, more RedHat-isms, more GNOME-isms, more PulseAudio-isms… right? Well, by the time it became widespread - no, not right. My biggest bone to pick with it is that it’s kinda fussy about exact ordering of how you launch your Wayland compositor session, associated Dbus user socket, itself, and anything that plugs into it (like pipewire-pulse). Whoop-de-do. Otherwise, PipeWire is plug-and-play perfect for me on all hardware I’ve ever thrown at it. Bit-perfect resample-less plug-and-play at low latencies at any source sample rate that happens to be playing.
I don’t say this to discredit the author, I say this to paint an idea of how wide the range is. For some of us, PipeWire is the first and probably only audio tech stack that has ever worked correctly on Linux… or maybe the second, after OSSv4 (my beloved) way back in the day.
If you didn’t see the first article of the author you might not know the context. The author is visually impaired, his use-case requires specialized software, which allows for reading aloud what’s seen on the screen.
As another data point:
I’m a long time Linux user (since 1998), and I didn’t even know Pipewire exists. But I just checked and it’s what I’m currently using. I remember the Pulseaudio transition alright - it sometimes made sound quality awful, so I uninstalled it and everything got better. Later Pulseaudio seemed to just work and I stopped noticing it.
Now it’s switched to a completely different system and I had no idea at all.
Pipewire is kinda interesting because it’s main selling point is the flexible node graph. Under the hood pipewire uses fork of the wayland protocol, so it’s extensible. If you run pw-dump
you can see the whole internal node graph. There’s lots of information about everything audio / video related and processes which play audio / video. This means pw is flexible to all sorts of audio setups and things that haven’t even been thought out yet.
Pipewire improved things a lot, but I frequently still have to do a random sequencing of Pipewire and Chrome restarts to get audio working when I use a Bluetooth headphone. Wired works great though. So it’s most likely caused by another part of the stack (BlueZ?).
The lack of introspection into the sound system is a real problem that affects everyone: How is even a non-visually impaired user supposed to fix a sound system that doesn’t work under Linux right now? Where do you even start?
I suspect a big part of people’s frustration with Linux sound in the past has been due to this opacity - how do you fix something when you have no idea which part is broken, nor any obvious way to explore & debug the system?
The failure to ensure proper accessible access to the tools that do expect is just a further kick in the teeth if you’re actually visually impaired.
I am not blind, but my sound broke on my arch laptop (yes, yes, insert tired old “I use arch” meme here) a few weeks ago and I have literally no idea what’s wrong and no idea how to go about fixing it. I empathize with everything in this article.
I did learn about pipewire though (I’m still on pulseaudio) so maybe I’ll see if I can switch that out and if it solves anything.
arch broke my internal microphone recently and it turned out to be the alsa settings getting messed up, so that’s worth checking https://old.reddit.com/r/archlinux/comments/1k79zcm/loud_internal_microphone_hissing_after_switching/
At work, an Ubuntu distro upgrade broke an audio system. Perfectly able-bodied people were incapable of fixing the breakage over weeks (without downgrading or restoring the system).
Ok but compare this experience to the popular alternatives: on Windows or OSX if audio is broken for you there’s absolutely no chance you can figure it out because the components are proprietary opaque boxes, and your only option is to wait quietly and hope that someone at Microsoft or Apple notices.
Audio on Linux has a lot of components that are hooked up in complex ways but least it’s not impossible to e.g. isolate which component is dropping the ball so you can report it.
There’s definitely room for improvement though. alsa-ucm-conf is an absolute mess, for instance.
At work, I’m using Ubuntu 22.04. Everything the author is talking about with respect to Bluetooth is an issue for me:
It never occurred to me how much worse the situation must be for the blind.
Bluetooth on Linux is 100% not worth it. It simply cannot be coerced to working anywhere near what can be described as well.
What you want is https://us.creative.com/p/speakers/creative-bt-w6 which will appear to Linux as an audio sink. The dongle then handles the Bluetooth part of it. It works reliably and supports loads of codecs to improve output.
Bluetooth on Linux is 100% not worth it.
This is a sweeping generalisation that’s simply untrue. Configuring Bluetooth on Linux for most users is 100% worth it, because they’ll get something that works at least good enough, if not completely well enough for their use case.
Yeah, but it’s also amazing just how broken it is. I don’t think I’ve ever had a hardware/distro/peripheral combination where Linux Bluetooth just works reliably. It’s a bloody mess.
ChromeOS gave up on BlueZ and is replacing it with Android’s Bluetooth stack, which is better designed anyway: it moves lots of complex code from the kernel into userspace. However, distros don’t ship it because it’s at least not friendly to distro packaging, and I suspect it has no integration with PipeWire either.
The one fundamental problem is that some of the better codecs require royalties, and many distros can’t ship anything that requires royalties. That means that these codecs won’t be available.
My experience for the past few years on Debian 12 + KDE is that it seems to work fine IFF you use bluetooth devices that behave sanely. For me, that’s just some cheapo headphones off Amazon and a bluetooth speaker. Whenever I have a problem, it’s almost always the device’s fault. (I also don’t use bluetooth mics, maybe that has something to do with it.)
I use bluetooth audio pretty much exclusively on my Linux systems because it Just Works 🤷♂️
This honestly sounds just like a friend’s experience with Bluetooth and audio under macOS. I can only conclude that the whole ecosystem is irreparably busted, and will hold onto my TRRS-ended wired headphones until there’s no way to use them.
For what it’s worth, I took a look at my notes on 22.04 and this guide I had marked as “the only thing that actually helped” with the Bluetooth audio profile: https://askubuntu.com/a/1389217
I had the same problems, I updated to 24 and it works fine. Ubuntu 22.04 does not use pipewire, rather the old (shitty) pulseaudio.
I’m on Debian 12 and Bluetooth has not been an issue for me (so far). I remember it being awful on EndeavourOS.
You want to play sound? You need:
ALSA, because it’s the actual driver
PipeWire, because it’s the new standard
Pulse emulation, because most apps still use Pulse
ALSA plugins, because some things bypass PipeWire
JACK shims, because a few pro audio tools never moved on
And config files for all of it—if they even exist
This isn’t backwards compatibility.
This is a graveyard, and we’re all just camping in it.
Linux Audio is one of the purest examples of JWZ’s CADT model of Linux Software development:
It hardly seems worth even having a bug system if the frequency of from-scratch rewrites always outstrips the pace of bug fixing. Why not be honest and resign yourself to the fact that version 0.8 is followed by version 0.8, which is then followed by version 0.8?
PipeWire solved many genuine problems, and handles both audio and video. Invoking CADT is nonsensical, since PW works better than the old stuff ever did. Bluetooth just works, the JACK shim just works. PW was created because Linux didn’t have a good solution for video, audio was added basically as an after thought, and yet it ended up just being better.
I remember how it used to be, I had to run the Pulseaudio server on top of JACK if I wanted to do music production on my normal desktop without making all normal apps stop working. It was super janky. Now? I launch REAPER and it just works, it asks for low latency and it gets it.
Of course PipeWire solved genuine problems. Nobody, and certainly not me, said otherwise.
If you think invoking CADT is non-sensical, or implies that PipeWire is useless or solved nothing new, you’ve failed to understand what JWZ was criticizing with CADT. The issue isn’t that PipeWire was a useless rewrite, it’s that, inevitably, instead of being worked on and fixed until all of the corner cases are solved and it displaces all that came before it as the One True Linux Audio Solution that will be maintained in perpetuity and improved steadily, it will get about 80% of the way there before the community gets bored of doing the hard final 20% (which won’t be the same hard 20% as the hard 20% that led to Pulseaudio getting abandoned – every new rewrite that displaces the old does bring new things to the table) and decides that what PipeWire really needs is a from-scratch rewrite “done right this time” – at which point it will become yet another Jenga block in the ever-growing pile of 80% Linux audio solutions that are all trying to maintain compatibility for all the apps using some incoherent mix of all that came before.
The issue isn’t that PipeWire was a useless rewrite, it’s that, inevitably, instead of being worked on and fixed until all of the corner cases are solved and it displaces all that came before it as the One True Linux Audio Solution that will be maintained in perpetuity and improved steadily, it will get about 80% of the way there before the community gets bored of doing the hard final 20%
I see absolutely no reason to believe this will be the case. The argument seems to just be “because PW was created instead of making Pulse better, the same thing will happen with PW”. But there is no “the community” getting bored of one thing and picking up another. There was no decision that what we really need is a rewrite. There was one guy deciding to solve another problem, video, by happenstance also solving audio, and the community of users switching over because in the span of just a few years it far exceeded Pulse.
I think Pulse was simply fundamentally unsuited to being the universal solution to audio on Linux (hence JACK), and so it was never going to solve the problems solved by PW. It has nothing to do with boredom, or solving the hard 20%. By the time PW was created, Pulse was in fact working properly! The old days of Pulse never quite working were over. The developers had solved the problems! The issue is that it only solved the problems that were the problems that Pulse set out to solve, not the problems that it did not set out to solve. Pulse’s architecture is quite bad for low latency applications, for instance. No amount of “doing the final 20%” would fix that.
If we had stayed with Pulse, I personally would’ve had to stay on the actual Jenga tower of running Pulse on JACK forever.
If you want to invoke CADT, you need to point to the same people always reinventing stuff over and over. Not completely different people who have nothing to do with each other solving different problems.
Edit: in the context of this sub-thread, I find it slightly funny that the author of the article actually wants a total rewrite of everything: https://lobste.rs/c/zeds2s
If you want to invoke CADT, you need to point to the same people always reinventing stuff over and over. Not completely different people who have nothing to do with each other solving different problems.
I don’t think you do; I think you need to point to a project culture that tolerates such behaviour. And I think it’s pretty clear that the FLOSS community generally does.
I don’t think you do; I think you need to point to a project culture that tolerates such behaviour. And I think it’s pretty clear that the FLOSS community generally does.
I think it’s very insulting to call the developers of PipeWire “attention deficit teenagers” just because it seems to you like it is a common problem in the “FLOSS community”.
Like, what part of this describes PipeWire at all?
This is, I think, the most common way for my bug reports to open source software projects to ever become closed. I report bugs; they go unread for a year, sometimes two; and then (surprise!) that module is rewritten from scratch – and the new maintainer can’t be bothered to check whether his new version has actually solved any of the known problems that existed in the previous version.
Spoiler alert: bugs are fixed very frequently in PipeWire, and rarely are any rewrites involved.
Like, literally nothing in jwz’s CADT rant actually applies to PW. Nothing. The way it is being invoked in this thread is purely as a loose dot connect and thought terminating cliché. 25 years ago the Linux audio stack had parts constantly replaced and rewritten and everything was in flux, and 20 years later someone creates a replacement, which proves that the Linux world is still totally CADT! Despite the replacement solving almost all of the problems, and continuing to get constant care and attention and fixes without being abandoned.
Sorry about my anger. It upsets me that people are shitting on other people’s hard work with arguments that don’t work, with no evidence other than vibes.
Okay, fair. On the basis of that, PipeWire looks like it might be an exception - if the community as a whole sticks with it. I hope they do.
I’m not sure “solving almost all of the problems” is true, though, based on the author’s report. The Linux audio stack is still pretty cursed, even if one component of it (PipeWire) isn’t.
🙏
I should clarify. I think PipeWire has solved almost all of the problems that can be solved by improving the state of sound servers on Linux. It can’t solve problems that are in the kernel, with ALSA and BlueZ. So I said elsewhere in this thread that Bluetooth has been working super great with PW, and that is true, but I did have one problem since switching, caused I eventually found out, by the kernel. My old “Cambridge Silicon Radio, Ltd Bluetooth Dongle” stopped working, seemingly due to a bug in the kernel that has yet to be fixed. I had to buy a new Bluetooth card. So I should’ve been clearer, I didn’t want to bring up kernel problems when talking about PipeWire, as that seems unfair to me, but I should’ve been clearer about the scope of my statement.
So the other thing I brought up elsewhere in this thread, PW’s JACK being so smooth and just working, that’s more what I’m talking about. I plug in my guitar, route it to Reaper for recording or digital effects, and it all just works with no fuss. No special configuration, no special software. Back when I used JACK, it seemed like there was always stuff that needed poking for everything to work as it should. And now? Not only does it save me effort, I’ve even seen some audio professionals online talk about how PipeWire has finally made Linux viable, how it’s better and more reliable than low latency audio on Windows. Like, what? That’s crazy! And amazing!
As for solving the other stuff, I wonder how hard it would be to bring support for Android’s BT stack into PW, so we could replace BlueZ with something better.
I see absolutely no reason to believe this will be the case
Because that’s the history
But it isn’t. The Linux audio stack was in crazy flux with constant churning and replacement, 20 to 25 years ago. Which means we had 20 years of not seeing things always being rewritten from scratch instead of being fixed. Since PipeWire seems to be fundamentally more capable and flexible than Pulse, with very healthy development and issues constantly being fixed, I see no reason to think anything like what is described by jwz’s CADT will happen.
If you want to invoke CADT, you need to point to the same people always reinventing stuff over and over. Not completely different people who have nothing to do with each other solving different problems.
The hell I do. What, precisely, do you think the Cascade in CADT is referring to if not the constant churn of different developers showing up to reinvent the wheel?
CADT is a criticism of a software engineering culture writ large re: the Linux Desktop community. Not whatever this is you’ve made up in your head.
I get that you’ve decided that you’re a huge PipeWire fan and that you’ve decided my post is some kind of “insult” to PipeWire’s developers so you’re fundamentally arguing from a place of emotion, but you’re really missing the forest for the trees in this rush to defend PipeWire specifically. The Linux desktop development culture is one of constantly rewriting whenever a project is in danger of reaching mature stability and only requiring boring maintenance. Even if the PipeWire devs themselves avoid this, the larger community will move on to some shiny new half-broken hotness eventually. It’s happened over and over and over again.
But if you want to disagree, that’s fine. We can check back in 10 years.
No part of the CADT argument suggests that the replacement doesn’t fix issues with what came before.
But there is no argument. Only a complaint about a solution that solved 99% of the problems in a way that is totally transparent to 99% of the users. The part about version 0.8, followed by 0.8, followed by 0.8, suggests incompleteness, but PipeWire is very mature at this point. “the frequency of from-scratch rewrites always outstrips the pace of bug fixing” just doesn’t apply at all!
If CADT actually applied in this case, PW would be a buggy mess whose introduction would’ve been just as painful as the old transition to Pulse.
Edit: so today I re-read the CADT page for the first time in year, and
and the new maintainer can’t be bothered to check whether his new version has actually solved any of the known problems that existed in the previous version.
that sure seems to suggest the replacement not fixing things. Yeah, it doesn’t say the replacements don’t fix anything, but things never actually getting better seems like a pretty big part of the argument… The CADT Model very obviously does not describe PW.
Okay, that’s a good point. You’re right.
It’s actually remarkable how non-existent the stories of “the transition to PipeWire broke audio” have been.
REAPER via WINE? Which layers of the Linux audio stack does it talk to?
No, native REAPER for Linux x86_64: https://www.reaper.fm/download.php (though they also say there that the Windows version works great in WINE)
No, it’s native. It speaks either JACK or Pulse. I believe what lonjil is saying is that they used to run JACK (to talk to the hardware), Reaper’s JACK support (for low latency), and Pulse on top of JACK (for the benefit of the 99% of other apps). Now they just run PW, which makes everything happy (including Reaper, via PW’s Pulse emulation, which makes it happy enough to eliminate the need for JACK).
That is almost correct, but no, I do not run Reaper via PW’s Pulse emulation. PW has its own JACK shim as well.
This has also been my experience with Renoise (also native Linux). I run PW, tell Renoise to use JACK, and it Just Works.
Not to disclaim the author’s problems, of course, but just to say that it’s firmly put me in team “pipewire is good”.
Related to: https://lobste.rs/s/1t0gpl/curse_knowing_how_fixing_everything
Open source hasn’t figured out how to design things – it’s more about cloning, and then piling stuff on top.
I would make an analogy to government systems. It does seem like we go through cycles of “everyone makes locally optimal decisions”, which leads to a mess …
And then the system breaks down, and there is a new world order.
IMO rewrites that are too frequent can be a symptom of “local hill climbing”, rather than design
But design is expensive and takes a long time, so that’s what we’re stuck with.
Apple used to be good at this – e.g. they had consistent user interface guidelines and so forth – but it seems they lost institutional knowledge in the last 10 years.
Also related to:
If you want to go find English language docs, LKML is probably the place to go. This is Linux kernel analyst. Here’s an exchange from 2009 where someone finds out a very surprising thing to them of health systems. They ask, where is this documented? A developer hopefully responds, this mailing list. A file system developer, a core file system developer, responds, oh, probably some six to eight years ago in email postings that I made. So if you want to understand how to use file systems safely, you’re basically expected to read the last five to 10 years of LKML with respect to file systems.
https://www.deconstructconf.com/2019/dan-luu-files
( which I got from - https://lobste.rs/s/yx57uf/is_linux_collapsing_under_its_own_weight - although the problems discussed there are somewhat orthogonal to these design problems, IMO )
Apple used to be good at this – e.g. they had consistent user interface guidelines and so forth – but it seems they lost institutional knowledge in the last 10 years.
I think this gradual decline coincided with the people who worked at Apple or NeXT prior to the merger retiring.
Both companies had solid design philosophies and sold based on a consistent UI (and, in the case of NeXT developer) model. People who joined either company were indoctrinated into these being their key differentiators and very important parts of system design from the kernel to the UI. OS X inherited these things, but there was some friction in merging the two philosophies (both of which worked well, but which were different). People coming into the company after around 2002ish did not grow up with consistent systems and were not given the same level of indoctrination as the company grew.
The problem for the F/OSS world is that most people now learn about computers with Windows or, occasionally, Linux-based systems that badly copy Windows. And Windows has never had a consistent UI model or consistent design. The UI model was always a mess because Windows and Office were the two big business units and the Office team didn’t want it to be easy to create Office competitors. Where Apple and NeXT produced reusable UI widgets, the Office team ensured that they developed in-house UI widgets that were not part of the standard Windows toolkit, so competitors never looked or felt like Office. And when the flagship apps on a system have UIs that are not consistent with anything else in the system, nothing else is going to care.
As Apple’s employees are increasingly people who grew up in this environment, people moving from macOS to *NIX are not bringing that experience with them either.
I’d love to make everyone who wants to write a *NIX desktop environment use a NeXT machine for six months before they start.
As time goes by, I’m more and more inclined to get defensive when I hear blanket generalizations that “open source cannot this and that”. With design specifically, I can say that a lot of proprietary and internal projects I’ve seen had far worse design and implementation issues than any popular open-source. It’s just hidden by the fact that no one can inspect the code and most projects just don’t live long enough for those problems to become apparent — either it runs our of VC funding or gets replaced by something else.
MS/Apple/IBM have the resources to maintain old stuff alongside rewrites for decades. FOSS projects don’t, so both design failures that lead to rewrites and those rewrites themselves are more visible.
I’m definitely not claiming that proprietary is better! I work on open source software because I like using it better :-)
I’m just saying there are downsides to the model where development is more distributed over space and time, and where the leadership isn’t really about design – it’s more about low-level software engineering issues (which is hugely important too)
But from a 1000 foot view, those downsides are lesser than the downsides of software shaped by typical commercial incentives
Unfortunately, the author of the blog post has fallen prey to the same trap. The article ends with:
And until someone tears this down and rebuilds it for real accessibility, it’s only going to get worse.
Fix the existing system? No! Rewrite a new system from scratch, with new bugs!
I’d love to say that I think the system can be fixed. I just don’t think it can be; we’ve been working on this problem for as long as I can remember (I got involved in 2009)
I remember back like eight years ago when I realized I could just …. remove pulseaudio, and it solved all my problems. There was a brief period where Firefox refused to use anything but pulse, but that was resolved relatively quickly.
I understand this doesn’t work for everyone because some people want more than one program to play audio at a time, but for me it was such a breath of fresh air to have everything suddenly start working again.
I understand this doesn’t work for everyone because some people want more than one program to play audio at a time,
ALSA supports this too, and has since like 2003 or something; the whole time I’ve been a linux user.
Drives me a bit nuts to see this repeated so often, the OP link too:
But back in the early 2000s, people realized ALSA wasn’t enough. It couldn’t mix audio from multiple sources.
Yes it can! It can kinda route sound across apps too but configuring that is a pain. But configuring the dmix thing for output mixing is trivial. As I type this, I have mpv playing a thing and I’m hearing a chat program in a browser booping at me. I tried pulseaudio once, it broke tons of things, so I deleted it and went back to plain alsa and have no trouble with it.
(When I was new, it was OSS that couldn’t mix things from different programs. And I enjoyed that, blocking other programs from playing sound I considered a nice feature for a while. But OSS can also mix things now - why is it so many people say “system X can’t do Y” and then insist on a from scratch rewrite with a whole new userland api instead of just… adding feature Y to system X? People love to claim that’s impossible, but then it inevitably happens successfully, proving them wrong again.)
I started using Linux write when the OSSv4->Alsa migration was going on. My recollection was that dmix was released around 2005. But I just checked and dmix already existed in 2001. I suffered the closing Firefox to let bmpx play music for 2 years w/o need! But yeah, I don’t understand how dmix was not more well known around that time. It was even brought up as a talking point by the pulse audio people. It was around the time the Gentoo Wiki content was lost so that might have been related.
But configuring the dmix thing for output mixing is trivial
This sounds interesting but the only relevant hit in apt search dmix
is “apulse” which I definitely don’t want. Can you point me to some more info? Most of the time I don’t want multiple things playing at once, but it would come in handy occasionally.
If it’s so easy to use, why doesn’t it come preinstalled?
If it’s so easy to use, why doesn’t it come preinstalled?
It does, it is part of the kernel. Preconfigured too on Slackware… I never had to actually configure it myself until I updated Slackware and found they surrendered to PulseAudio (slackware resisted it for many years saying it just breaks audio) and I had to undo that damage.
But it is called a “plugin” but it is part of the normal build. Here’s some links:
https://www.alsa-project.org/wiki/Asoundrc#dmix
Interestingly, that claims “ In practice not many applications are able to take advantage of this functionality yet. “ which the wiki history says goes back to at least 2007; it was part of the first version of the page, imported from some other existing website.
https://www.brain-dump.org/blog/sound-mixing-with-the-alsa-dmix-plugin-instead-of-a-sound-server/
This one, dated 2005, says “I honestly don’t know why they call it a plugin because at least in my ALSA installation it was already integrated and I just had to create a ~/.asoundrc configuration file for my sound chip”, so not preconfigured for that person, but still shipped with the system.
The note at the bottom of that blog doesn’t make too much sense - the default pcm device is set to dmix in the config, so applications should just work… except remember this is 2005, most programs were using OSS by default, so the configuration is not to make them use dmix specifically, it is to make them use ALSA at all.
(and alsa’s api is so different than oss, i resisted it myself for a while too…)
Can you point me to some more info?
https://wiki.archlinux.org/title/Advanced_Linux_Sound_Architecture#Software_mixing
If it’s so easy to use, why doesn’t it come preinstalled?
It is already installed with alsa, one just needs to configure it (I haven’t used dmix in ages so my information might be out of date)
Some distributions package dmix
in an alsa plugins package. It used to come preconfigured eons ago but it doesn’t mix well with Pulseaudio so it’s no longer done these days. I don’t know if that’s going to help with Firefox – at some point I remember they dropped the ALSA backend (that’s why apulse
was necessary), I don’t know how it’s playing anything for you right now and if that needs additional hacks ¯\(ツ)/¯
I remember they dropped the ALSA backend
There was a time a while back when you had to recompile Firefox in order to enable ALSA; as far as I know it was never removed, just disabled by default. But they brought it back quite a while ago; maybe 4 or 5 years? It’s worked great out of the box since then, altho I’ve switched to Librewolf for unrelated reasons.
I don’t know how it’s playing anything for you right now and if that needs additional hacks
I don’t use firefox (well ok i use it for some things, but dropped it as a main browser when version 76 came out, i just got sick of the constant usability regressions. for a while, they’d break something, i’d work around it - apulse wrapper, userChrome.css things, even a few hacks to my window manager to tame some of its bad behavior, and carry on. but then they started breaking the fixes in all kinds of random ways too and i was just like…. forget this. It was less work to just write my own ui around a nested chromium window than it was to keep fixing firefox.). My CEF-based custom browser’s sound works just fine out of the box.
ALSA supports this too, and has since like 2003 or something; the whole time I’ve been a linux user.
Indeed, that was the main user-visible change (to me) of ALSA compared to OSS. The latter (up to v3) provided a /dev/dsp device, which only one process could write to at a time: audio mixing was done in userland, by giving an audio daemon exclusive access to /dev/dsp and having programs send their audio to that daemon. Except there were multiple competing audio daemons (e.g. esd and arts), which each required exclusive access…
In comparison, the main user-visible change (to me) of PulseAudio compared to ALSA was having per-application volume control.
In comparison, the main user-visible change (to me) of PulseAudio compared to ALSA was having per-application volume control.
Yeah, when Windows got this, I thought it was cool but…. never use it. Applications often have their own volume control anyway.
What I actually wanted PulseAudio for once and tried to actually embrace it for was to send sound over a network. At the time, I wanted to play something on my one computer and have it follow me through the house so I could redirect it to wherever I happened to be without interruption. idk if it can even actually do that, but it claims network transparency (I can do that kind of thing with X, though it is finicky and painful lol, xlib really doesn’t like you recovering from its “fatal” connection lost error, but you can) so i tried it.
After a bunch of trouble, I decided to say forget about it and ran an analog extension cord off a splitter instead.
Yeah, when Windows got this, I thought it was cool but…. never use it. Applications often have their own volume control anyway.
See, I use it all the time because various platform can’t agree on common normalization target.
When I was new, it was OSS that couldn’t mix things from different programs.
Do you happen to recall when OSS gained the ability to mix from multiple programs?
Edit: also, I recall running plain ALSA back in like, 2017, and some programs not working with the whole mixing multiple sources thing. They’d just grab exclusive control of the output. That lead me to switch to Sndio, and later Pulse and then finally PipeWire.
Do you happen to recall when OSS gained the ability to mix from multiple programs?
I do not know. I recall a BSD fan telling me his computer could do it around 2007ish, and BSD stayed on OSS, so surely before then, but my memory of the exact time isn’t reliable.
This freebsd history page: https://wiki.freebsd.org/Sound#History implies some time after 2005 when it got a new maintainer over there and became compatible with oss4 (so technically they forked off, but i’ll count it anyway since it still achieves the goal without the api breakage, so mission accomplished).
So I can say probably between 2005 and 2010.
also, I recall running plain ALSA back in like, 2017, and some programs not working with the whole mixing multiple sources thing. They’d just grab exclusive control of the output.
Might be because they hardcoded the device name as like hw:1 or something. Or there’s some incompatibility I don’t know about (but my experience is programming against the dmix thing is easier than the hardware since it is less picky about making you match config parameters, but still, my experience isn’t saying much).
There were two different things. A lot of better sound hardware supported mixing in hardware. I remember the SoundBlaster Live! claiming to be as fast as a Pentium 166 MMX, which was true for specific floating-point operations for DSP, which mattered for a load of audio effects. On a 486 or early Pentium, offloading sound mixing to hardware was a big reduction in CPU load. On a 1 GHz Althlon, it was irrelevant, so these days people often don’t bother even if the hardware supports it. The same happened with MIDI synthesis. I remember some ISA sound cards being expandable to 32 MiB of samples, which was an immense amount of memory (most good computers had 4-8 MiB at the time!) and would do multiple channels of mixing and scaling in custom DSPs. Now, pure-software MIDI synths rarely miss in the data cache for samples and playing MIDI with a software synth will leave your CPU 99% idle.
From the start, I believe, OSS supported using hardware mixing. If your hardware could do mixing, two programs could play sound at the same time. The bit that was often missing was the software fallback for when you either didn’t have any or didn’t have enough hardware channels.
The thing Ariff added in the FreeBSD 4.x was low-latency in-kernel sound mixing. He did some impressive work using fixed-precision integer arithmetic to get good quality (kernel threads back then didn’t have FPU state, this was added some time later because some crypto things wanted vector registers). He wrote most of these improvements up here.
I’m not sure exactly when the virtual channel (vchan) support was introduced. I was an undergrad when I switched to FreeBSD 4 from Linux to get working sound, so that would be 2002 or 2003. The down side of the FreeBSD 4 implementation was that each vchan got its own device node name. I had a /dev/dsp.1 for the GNOME sound daemon, a /dev/dsp.2 for the KDE sound daemon, and a /dev/dsp.3 for XMMS. Each of these allowed you to explicitly configure the sound device. Unfortunately, some things (BZFlag, I’m looking at you) didn’t (I think Quake did), so I left /dev/dsp free for them. With FreeBSD 5, these were dynamically assigned so you could just point everything at /dev/dsp and they’d each get their own vchan.
It couldn’t at the time, and then for awhile it sort of could but required dmix and rather arcane confoguration and only supported certain cards. By the time it properly could, the die was cast.
That’s what I’ve been doing on my desktop. With Gentoo it’s easy to make sure that everything is built with ALSA enabled and PulseAudio disabled so I don’t even have to have the PA client libs installed. Everything works great including mixing from multiple sources and games (Steam etc).
I am familiar with PA and PipeWire from other setups but I figured that I’d only switch this one to PA or PW when I actually needed to. It’s been 15 years and I haven’t missed them yet.
One possible catch is Bluetooth audio since that’s normally done through PA/PW. There is a bluealsa driver but since I don’t need BT audio on this computer I haven’t tried that yet.
It’s unfortunate “A Thank You, Where It’s Due” was merged with this. To quote from the fine article - “Progress Deserves to Be Seen”. Merging the article under the existing headline a day later means that the second article isn’t seen.
Not to make this about NixOS, but for anyone still on pulseaudio – the switch from pulse to pipewire is seamless.
services.pipewire = {
enable = true;
alsa.enable = true;
alsa.support32Bit = true;
pulse.enable = true;
jack.enable = true;
};
This starts pipewire-pulse
, which makes the change invisible to pulse-dependent application. Even if you’re playing audio during the switch to the new generation, most application will simply reconnect to the new pipewire pulse socket behind the scenes without a hitch.
If you’re on NixOS 24.11/unstable, pipewire is already the default audio server. But for anyone with service.pulseaudio.enable
explicitly set, it’s probably time to update.
The flip over to PW from PA was such a no-op on my NixOS machine I couldn’t have been happier.
Switching the packages was easy, but your mileage may still vary, e.g. I have one system with this:
# Avoid stuttering/under-runs. PipeWire assumes that (a) our system is
# reasonably fast (it's not), and (b) that we really care about low
# latency (we don't). Increasing ALSA's "headroom" allows more audio
# data to be buffered, so applications don't need to run as often to
# refill it. The downside is that this increases latency: data written
# into the buffer will only get played once everything before it has
# finished, and a bigger buffer means more milliseconds worth of audio
# to get through. That's fine for media players, etc. although high
# latency might be more noticable in games or phone calls.
environment.etc."wireplumber/wireplumber.conf.d/91-increase-headroom.conf".text = ''
api.alsa.headroom = 2048
'';
And another that has a script with:
# If we're using PipeWire, ensure 44100Hz is allowed (avoids having to resample)
command -v pw-metadata > /dev/null &&
pw-metadata -n settings 0 clock.allowed-rates '[ 48000, 44100 ]'
This seems part of a series.
https://lobste.rs/s/biswto/i_want_love_linux_it_doesn_t_love_me_back
TBH OSX has its issues, too. While audio works well, it works well when it works.
A good chunk of the time either applications and/or the OS picks the wrong audio device. It shouldn’t be that complicated to figure it out.
“Hey, I know there are 15 possible audio devices configured. But there are only two connected. And one of them are my headphones, which just connected two seconds ago. No, I don’t want the mic input to the the laptop and the audio output to be the headset. That’s ridiculous. Just use the headset for everything immediately when it connects”.
As a minor note, the UI for the audio devices can’t be sorted. I can’t grab the “headset” line, and drag it above the “laptop speaker line”, to tell the OS that it needs to prioritize the headset over the laptop speakers.
It shouldn’t be that complicated to figure it out
It’s actually pretty complicated to figure it out, and not surprising that software sometimes behaves weirdly. A lot of people have a different concept of what a reasonable default should be, too (for example, I would find the OS changing my default input device when I connect a new one to be very surprising - that’s why software usually prompts you if you want to use it).
Hence, sorting. You sort headphones after other things, I sort them before.
It’s only hard to figure out because the implementers have made everything opaque. What order are the devices listed in? I dunno, it look random. How are devices chosen or prioritized? I dunno, it’s not documented, and there’s no way to sort the devices.
The reasonable default is let the user choose. Refusing to acknowledge that the user wants to make a choice is just terrible UI.
I configure our TV NUC Fedora to output to all devices available simultaneously. That way we just turn on any combination of the two BT headsets and loudspeakers and have what we want.
I don’t do that on my laptop, but sometimes when we watch something on a train with my wife, I use helvum to send the audio into her headphones as well.
How does sorting help me with these?
I proposed sorting as a simple way to solve one particular use-case. I didn’t claim it was a magically perfect solution which would solve all possible problems seen by everyone.
Just some perspective from a music producer:
Maybe not as a default, but the ability to have separate devices handling input and output was traditionally a very advanced feature that no other OS offered.
Back on Ventrilo in the 2000s we often had a problem called the “mic of death” where someone would try to chat and instead Windows’ audio system would blast ear-splitting static. Something would trigger it to stop polling your audio input jack properly and pick up white noise until you manually pulled it out and plugged it back in (if you were lucky).
I don’t remember if it was Windows XP that sorted it out but it’s a problem I haven’t had for about 20 years…until yesterday when my work MacBook mic-of-deathed a whole Teams call. I wasted an hour investigating but restarting the machine sorted it out.
It was as you say, flawless until it completely broke.
Headset sensing pin… it shouldn’t be that complicated.
You literally need to tell the hardware audio codec whether to use the sensing and sometimes which pin it is. Which differs from board to board.
Not all codecs can detect wired microphone presence. In theory it’s just comparing to ground, since TRS jack shorts it out in the TRRS socket and there is the bias voltage otherwise, but mic input is AC coupled, so we’d need another pin to measure DC level. Not all codecs support that. In some cases, such sensing could be performed by a different part, e.g. via ADC input of the SoC.
OS X used to have the most annoying and obscure bug for the longest time. For whatever reason, sometimes the left/right balance of audio output would just drift to one side. (Honestly, I have no idea why anyone would even want to unbalance it in the first place…)
Last I checked, Apple never identified the cause, and it remained unfixed for long enough that an app called Balance Lock was written specifically to check for audio balance drift and recenter it.
For all I know it’s still there.
Thank you so much for posting this. Accessibility really is a huge barrier and problem in the UX of mainstream linux distros, and this issue tends to be ignored or hidden.
When I plug in my earphones/microphone, I have to go into pavucontrol
and manually set the input device to “Headset Microphone” every time. I have no idea how to do it in PipeWire, I have no idea why the input priorities are wrong, I have no idea how to debug this, and I have no idea where to report the bug. Web searches, the Arch wiki (I don’t use Arch BTW), and even begging LLMs for advice have turned up nothing.
I deeply sympathise with the author, since even with decent vision it sometimes makes me want to frisbee the laptop straight out the window.
You need the “switch-on-connect” module loaded. For example here is my pipewire config on my nixos system https://github.com/Cloudef/nixos-flake/blob/master/modules/defaults-linux.nix#L114-L131
I find these articles interesting because they highlight problems that others may not notice. I dislike how the article is just complaining about the Linux audio “stack”, however. The author has clearly made their point, but they don’t offer any potential solutions to the problem. I wish these posts would offer concrete fixes, or at least a clear path forward, so that the angry energy can be directed into improving things. Getting visibility and awareness is important, but it’s just screaming into the void if nothing good comes from it.
It generally isn’t incumbent upon someone calling out a problem, particularly a social problem like “people are making non-accessible stuff”, to provide a solution. The fact that the problem exists is important information.
This isn’t a technical problem, it’s a social one. Technically we know perfectly well how to make accessible software, software that prioritizes making sure the computer stays interactive to a blind person, etc. Socially we have chosen not to. Socially we have chosen to start making linux less accessible software, like moving to Wayland from X11 before figuring out how to make it as accessible.
Frankly the only playbook that has ever worked here is legislation saying “you can’t make money off the inaccessible thing”. Businesses don’t build wheelchair ramps because they like them, they build them because legislation requires them to. We need to do the same with software. To some degree we have started to, but the fact that RedHat and Ubuntu felt comfortable replacing X11 (accessible) with Wayland (not at the time accessible, I think still not?) is a clear sign that there’s a significant gap here.
I did propose a solution.
Tare the hole thing down and start again.
there’s really nothing else that can be done.
The rot is systemic, people say “that’s good enough” and leave whatever implementation at that.
Where would you start with designing a solution?
Here’s my attempt:
sound
user. It is started at boot and never stopped unless the user manually stops it. If it crashes, it is automatically restarted. It is written in Rust to mitigate the security concerns to the maximum extent possible.Thoughts?
Bluetooth is handled by Android’s stack to maximize hardware compatibility.
Please yes, or something like this. BT on Linux is so janky. I have a handful of BT devices that simply won’t pair with Linux systems, for no reason that I can fathom. They work perfectly with any Android box I try.
Bluetooth is handled by Android’s stack to maximize hardware compatibility.
Is Android’s bluetooth stack actually better? Or does it just have the advantage that on the devices it runs on there’s dedicated engineers making sure it works with that exact hardware?
Someone elsethread said that Chromebooks switched to it because Bluez doesn’t work well. Surely the engineers working on ChromeOS could’ve made Bluez work on all the Chromebook hardware, if that was the only difference? That’s not a super strong argument, but I’ve been hearing good things about Android’s audio stack for a while.
After writing the above, I did a little googling, and found this: https://chromeos.dev/en/posts/androids-bluetooth-stack-fluoride-comes-to-chromeos, which talks a bit about why. Getting BT out of the kernel and into userspace seems to be a pretty big part of it.
Side note: during the googling, I also found out that the Rust rewrite of Android’s BT stack uses Tokio. That library really is everywhere, huh!
It looks like that was recently (~a month ago) extended to ChromeOS flex (i.e. ChromeOS for non-chromebooks) too: https://chromeunboxed.com/chromeos-135-is-here-with-some-surprising-updates/
I might have to see if I can get this working on non-chromeos.
For what it’s worth, just anecdotal, my experience with Android phones and BT audio is that it works well enough that I just don’t think about it at all.
Wrt. Patented codecs, Red Hat and SuSE legal/patent teams would screech upon and block such mechanism outright. Android BT stack has the problem of not being packager-portable, discussed elsewhere in this thread. I’m not sure whether rust should handle soft-rt stuff like audio server. Debian developers will never cede control of what goes onto the default system to a 3rd party.
I’m not sure whether rust should handle soft-rt stuff like audio server.
Rust is fine for soft-rt stuff, it’s equivalent to C (and similar languages) with respect to RT requirements and since you’re running on top of a C kernel…
Just no. It’s going to be a security nightmare to guarantee different users cannot interact with each other. Maybe making it possible to mix audio from different users at the ALSA level would suffice? But not in general, since it wouldn’t be ideal to let locked session to keep playing or students over SSH on a campus lab PC to blast the speakers while someone else is using it…
Not gonna happen. If you want that, build the website and tell people here. I will gladly submit reports, rate the HW and remark on any issues. I am even open to some package asking me about my experience and to resubmit every 1/2 year. Provided all data gathered are anonymous and freely available to anyone.
Sounds pretty crazy. I don’t want to version control volume and which output is hot. RPM already saves previous config or the new one.
Work on BlueZ instead.
Isolated stack cannot integrate with rest of the system. Always ship source and let the distro do the rest. Let me just attach with gdb, download symbols automatically and allow me to debug.
Patented does not mean closed source. Just ship source.
Drivers for what? You mean like exposing the I2C to the user space? Letting it configure DMA? Please elaborate.
Work on BlueZ instead.
I genuinely think it’s probably better to put effort into Android’s bluetooth stack, because BlueZ requires a lot of complex bluetooth code in the kernel that’s better served by living in userspace, like on Android.
/dev
. Also, the common case by far is a single-user system that might well have sandboxed applications. In this case, the loss of security is minimal, while the gain in reliability (and thus accessibility) is huge.I’m generally with you but:
But not in general, since it wouldn’t be ideal to let locked session to keep playing
That’s a super common use case for me. Start playing some music on my laptop and then lock the screen so the cats can’t screw anything up by walking across the keyboard.
Back in PA-times I was using FreeBSD as a daily driver. Something that worked really well is setting use/options flags to unset PA and enable sndio and portaudio. It made problems just disappear. Granted I am not and never have been a Bluetooth user. This made me feel very much in control, and I tried to replicate that on so many Linux systems, but every time something wanted to install PA, something wanted to not work, something wasn’t compiling with portaudio or sndio.
It made me very much appreciate the work that people put into the FreeBSD ports system.
I am trying to get the same using Nix, but it feels a lot more involved and harder and quality is less consistent and there are fewer situations where “someone else fixed that already” or made a know I could use. Still the closest I get on Linux systems. It just feels that Nix is a lot more assuming the world is good and having a view on how the world should be, with as little state as possible (see also home directory management), where FreeBSD Ports is a lot more “wow, people put the work in to make this work easily” (maybe not simple in some situations though).
ALSA was one of the reasons I gave up on desktop Linux back around the turn of the century.
PipeWire just fixes all the things, if you should want to give it another go (at least for me).