WebRTC is the Problem
26 points by polywolf
26 points by polywolf
Despite the page title mentioning OpenAI, it has very little to do with the actual AI parts, only the real-time audio transport parts, which is why I've matched the URL slug instead.
I had no idea WebRTC was so complicated behind the scenes. Now I know why all the video conference companies really, really want you to download their app instead. This post then sent me down a rabbit hole learning about WebTransport and Media over QUIC.
I guarantee you their app also uses WebRTC :)
Which he stated, but a "fork" and a "a tiny fraction of the protocol". I am curious to know more about what Discord specifically is doing.
You can read about Discord stuff here: https://discord.com/blog/how-discord-handles-two-and-half-million-concurrent-voice-users-using-webrtc
Additional (and more recent) stuff about voice:
i wanted to try using it for some peer to peer stuff (just a hobby project), because it was basically the only way to get a udp-like connection in a browser
but it's absolutely miserable. i gave up and watched brazil to cheer myself up
A lot of this needs an asterisk: ".. for the default existing implementations of WebRTC". Jitter buffer sizes and time-stamping etc can all be completely modified, especially if you control both ends of the pipe.
It may however require forking the WebRTC libraries. I've found that webrts-rs has some.. questionable practices within the implementation and doesn't expose all the levers you would need for a good application on top.
But the core of WebRTC (ie the protocols and general ideas) I find pretty solid and honestly quite flexible. For instance we use it to receive video streams for archival and have configured an exceptionally long buffering time on the server and a custom time-stamping system on the client.
But I agree with the article overall. It's so obviously designed for video conferencing that the more you depart from it the more painful it becomes. I don't think there's a good reason to use it for voice in open ai. This is a simple, targeted thing that doesn't need the more advanced features sets.
It would be interesting to read a side by side comparison of WebRTC and AES67 (Dante/Ravenna)
I understand at a high level that WebRTC was built for conferencing and sending media across networks while AES67 standardized what pro/commercial devices were doing over LANs, but I'm curious what stops everyone from using the same standard for AoIP
FWIW, even a WebRTC specific website said WebRTC probably isn’t the best idea for Voice AI :)
https://webrtchacks.com/webrtc-vs-moq-by-use-case/#post-4716-_Toc213101666