lobste.rs migrates from MariaDB to SQLite

143 points by mort

lobste.rs entered read-only mode earlier today to migrate from MariaDB to SQLite. I found that interesting and found this GitHub issue with discussion, first around moving to PostgreSQL, then about moving to SQLite: https://github.com/lobsters/lobsters/issues/539

This is exactly the kind of interesting technical change that I'd like to see a lobste.rs discussion around, and I found the GitHub issue interesting to read, so I'm linking it here :) Obviously don't brigade the issue tracker, but since this isn't about something controversial or drama, I think it should be fine to link?

If this thread steps on the toes of some official announcement, or if it still breaks the anti-brigading rules, feel free to delete this thread.

dzwdz

We reverted the migration, the site was slow as molasses. See https://gist.github.com/pushcx/eb7cdf2dc9707dc3ab9e7173d197ddfc

asenchi

I didn't see a clear reason why PostgreSQL was dropped in favor of Sqlite. Any discussion I'm missing?
- dzwdz
  
  I haven't watched these streams - only skimmed the transcripts - but I believe this was the main topic of the 2025-02-06 stream. For context, there's a gap in discussion on #539 between 2025-01-27 and 2025-02-06 (same day as that stream) where nothing happens and then we suddenly switch from wanting PostgreSQL to SQLite.
  
  The [previous stream from 2025-02-03] mentioned it a few times too, most notably:
  
  I had Joel Drapper as a guest on (I believe) the last stream; he made a pretty hard pitch for SQLite instead of Postgres [...] I emailed Rahul, who's the person who started a bunch of the Postgres migration stuff, and he gave me a brain dump on his opinion, which is generally that he likes SQLite and has a lot of experience with it too.
  
  The transcript of the mentioned stream only mentions SQLite once, so presumably that hard pitch wasn't made there (it does look like a fun watch tho).
  - bitshift
    
    The 2025-01-27 stream has more in-depth discussion about SQLite and appears to be where it started:
    
    Have you looked at SQLite?
    
    No. Do you think we could actually... I don't have any feel for how much scale you can get out of it.
    
    And the pitch continues from there. Haven't read/listened to the whole thing, but they start talking about SQLite not needing a separate DB server, reduced read latency with SQLite, and how those cheaper reads mean you don't need to worry as much about N+1 queries.
  - fratti
    
    So someone who speaks in ChatGPT output convinced the site's admin to pivot to sqlite when postgres would've probably just worked,
  - asenchi
    
    Thank you for the details and your work!
    
    dzwdz
    
    this took like 5 minutes lol
  - singpolyma
    
    It's on Maria right now not postgres
    
    mort
    
    The project to move to another database was initially about moving to postgres, and then that changed to moving to sqlite. Asenchi is asking why sqlite replaced postgres as the database to replace mariadb.
    
    singpolyma
    
    Thank you for that context. I now have the same question
    
    vaguelytagged
    
    Interesting. It’s a bit unclear what the problem was with SQLite from the issue. Is the migration still planned? What went wrong?
    
    trousers
    
    Looks like this was the commit that reverted - https://github.com/lobsters/lobsters/pull/1924
    
    We deployed and it was not performant. We ran out of debugging time, but would like to find an expert and try again later.
    
    landon
    
    I was surprised to see the title sqlite is ... not really for this, I've never gotten it to work decently under any concurrency at all.
    
    technomancy
    
    There are lots of sites using sqlite for write load significantly higher than lobsters. It just takes some tuning and can't be done with out-of-the-box settings.
    
    Psentee
    
    Piece of anecdata: I'm running a homelab server that's under quite a bit of memory pressure (it's a Mac Mini M1 with 8G RAM running NixOS with Asahi kernel). Most of the services I run there have full SQLite support, for some of them it's primary engine. It's not under constant load, I have expected that SQLite is a great fit – when a service is not used, it will expire to swap.
    
    Turns out that a number of processes, each running its own copy of a database engine consumes way more RAM and CPU than an instance of PostgreSQL shared across these services, and under such circumstances SQLite is prone to random freezes after a day or two of uptime. It's running way more smoothly after migrating all that I could to Postgres [looks angrily at Jellyfin].
    
    Maybe one service on a machine/VM, in one process, would run better in SQLite. When there are multiple processes and multiple DBs (I've got databases of 12 services in that machine's Postgres now), it's better if DB engine's caching, concurrency management, and I/O is centrally managed.
    
    Lobste.rs is one service, but I expect it runs multiple services (web server, queue workers, and maybe some kind of cron/maintenance jobs), and at least web server runs many replicas. With the experience above, I side eyed yesterday's banner suspiciously. Reading about a rollback today has confirmed my suspicions.
    
    JulianWgs
    
    I exclusively use sqlite in my homelab running on a Raspberry Pi 5 with 8GB RAM and never had these issues. RAM utilization is at 25%. I don‘t run as many services as you do, but I never had freezes with Jellyfin for example.
    
    erock
    
    I thought SQLite wasn’t great for concurrent writes?
    
    bitshift
    
    This comment from the linked discussion addresses some of the concerns with SQLite, and it mentions that it can be run in WAL mode. You still don't get concurrent writes, but you get concurrent reads alongside sequential writes.
    
    So they were definitely aware of SQLite's limits in this area. Presumably they had some numbers on typical write rates in production, and it was low enough?
    
    lorddimwit
    
    Yeah. We tried switching to SQLite for an internal bug tracker thing from MySQL, because it was like a team of five people and there was no reason to stand up a whole MySQL database or whatever...anyway, yeah, even with five people the concurrency issues were bad enough that we'd get timeouts on trying to open new tickets or whatever.
    
    This was...probably a decade ago, though, and I'm sure it's gotten better but from what I understand it's still an issue.
    
    thombles
    
    Not saying it was or wasn’t the case for you but folks should know SQLite needs some sensible pragmas for good concurrent performance. See for example this article and discussion: https://lobste.rs/s/rsagpv/sqlite_for_servers
    
    fanf
    
    There were also some interesting SQLite gotchas/tips in https://lobste.rs/s/gipvta/cysqlite_new_sqlite_driver tho I have no idea how likely Rails is to stumble into that kind of trap.
    
    mort
    
    Opening a new ticket should be one write to the database, or a couple at worst, from one writer process, no? If that every few minutes was enough to lead to timeouts, it sounds like something was seriously wrong with your system.
    
    kornel
    
    Sqlite is easy until you actually need it to work.
    
    If you use deferred transaction mode, and have reads before the first write, you can get deadlocks. That's 30s spent waiting, and if you don't have WAL then everything else is frozen during that time too. If you have a retry after the "sqlite busy" error (which you would, because it's terribly frustrating to lose writes randomly when some reporting background job runs a heavy query), then you can easily retry the two mutually-deadlocking transactions over and over again.
    
    snej
    
    In my usage of SQLite I don’t allow two threads to open write transactions at the same time; I have a mutex to serialize them. Or alternatively, a connection pool where all connections are read-only but one. You lose some potential concurrency where two writers could be doing reads before writes, but on the other hand it works reliably.
    
    zie
    
    If you turn on WAL mode and set your timeouts properly, SQLite will handle serialization of writes for you. It's def. not the default though and I forget the timeout details, but you have to set them in a kind of weird way to get the right behavior.
    
    Concurrency on writes doesn't exist in SQLite, but it can pretend pretty well(again, if you set all the settings correctly, away from defaults).
    
    motet-a
    
    In my opinion, if you use SQLite on the server you really need WAL mode on. With proper connection pooling, it doesn’t deadlock in my experience. But it is true that even with WAL mode, SQLITE_BUSY does happen sometimes and you actually need to retry transactions.
    
    thombles
    
    I've never found it very intuitive that deferred transactions are the default. Using immediate transactions for all write codepaths (or potential writes) is much easier to reason about. I figured I'd swap back to deferred if I had a workload that would specifically benefit but it hasn't happened yet.
    
    Edit: Just read snej's sibling comment and I guess it would be a strict optimisation in that case!
    
    kornel
    
    Sqlite can be made to work, but it's so annoying that it has so many default settings/behaviors that will bite you.
    
    There's also a lot of bad advice on the web about sqlite performance, which ends up corrupting the database sooner or later.
    
    So you have to hold it just right, knowing what to do and what not to do, and then it's okay.
    
    mort
    
    Oh interesting, I didn't catch that. Hopefully that provided useful real-world performance data!
    
    erezsh
    
    That seems like the obvious outcome. I would have been very surprised had it actually worked.
    
    yawaramin
    
    Am I reading this wrong, or did they not use WAL mode? https://github.com/lobsters/lobsters/pull/1871
    
    dzwdz
    
    They did. https://gist.github.com/pushcx/eb7cdf2dc9707dc3ab9e7173d197ddfc#file-gistfile1-txt-L45-L46
    
    indolering
    
    Is there a link to the definitive reasoning for not switching to PostgreSQL? It's an odd move considering that the reason to switch away from MariaDB was due to lack of features and SQLite isn't great as a server side DB.
    
    motet-a
    
    SQLite isn't great as a server side DB
    
    With proper tuning and when used correctly, for read-heavy applications with a few occasional quick writes it can be great in my experience. But it’s far from a drop-in replacement for MariaDB or PostgreSQL.
    
    bitshift
    
    I'm also curious why the plans changed. This GitHub comment from pushcx is the closest to an explanation I've found so far:
    
    We've started work migrating away from MariaDB. That issue only talks about postgres, but we've had some sqlite advocacy this week and the existence of this benchmark's partial port is encouraging. So either it's a lot of churn for you (postgres) or a lot less work for you (sqlite).
    
    I presume there was a discussion of some sort but maybe the decision happened elsewhere, because the GitHub issue's history goes from full-steam-ahead Postgres to full-steam-ahead SQLite.
    
    EDIT: It sounds like the discussion started during an office hours stream, where they talk about how SQLite makes idiomatic ORM queries in Rails more efficient?
    
    unixdigest
    
    I have just today finished a migration project for a client from MariaDB to SQLite, a project that has several hundreds writes a second in bursts, with many reads as well, no problem what so ever.
    
    SQLite needs to be setup with the correct PRAGMAs on each connection request, few PRAGMAs are "set and forget" - with the exception of the journal_mode and a few others. This is perhaps the main mistake/misunderstanding I see most make with SQLite when used in web related setups.
    
    I might have missed it, but from what I can see you didn't set the busy_timeout?
    
    Also, this blog post might be relevant What to do about SQLITE_BUSY errors despite setting a timeout.
    
    motet-a
    
    I’m also running a website on SQLite since a few months. It doesn’t do hundreds writes a second (it is mostly read-heavy) but it handles bursts of HTTP requests like a breeze, on seriously cheap hardware.
    
    Obviously I’m using WAL mode, but IMO it’s also important to have proper connection pooling and to make sure queries run as fast as possible. Opening a database file can take a few milliseconds and it’s usually not a good idea to share connections between threads. I my case I keep one connection open per thread, in thread-local storage, and I keep an eye on query and transaction durations. Most of them should not take more than a few microseconds. 10 milliseconds may not be a lot for PostgreSQL but it can be too slow for SQLite!
    
    I can confirm SQLITE_BUSY does happen sometimes in WAL mode. I have a mechanism that retries transactions from the start in this case (it’s a kind of optimistic locking basically).
    
    snej
    
    Why is it usually not a good idea to share connections between threads? Assuming each connection is not accessed concurrently; I.e you have some kind of pool that lets a thread grab a connection, use it, then return it. That’s a common pattern for using SQLite.
    
    motet-a
    
    Yes if the pool is correctly sized it works fine. My point is, in WAL mode concurrent reads work just fine as long as you have many connections, so you should use many connections for reads if you want the best performance.
    
    yawaramin
    
    Also not seeing and PRAGMA journal_mode=wal in the migration PR. Very puzzling.
    
    alper
    
    I just read up on this because I’m going to roll it out on a tiny site myself (100 writes/ month or so).
    
    I’m guessing that if you get writes that block your framework needs to have some way of resolving that. Could that be an issue?
    
    BinaryIgor
    
    Interesting; from my benchmarks, SQLite can easily do hundreds of writes per second and thousands to tens of thousands reads - depending on the query, of course.
    
    Does this site gets more traffic than that? I would be surprised. Didn't you miss some basic, but important detail like connection polling or enabling WAL mode?
    
    emschwartz
    
    I recently found that having any amount of contention on the SQLite database itself made the performance much worse. It might be worth trying to have a single thread writing and queue writes at the application level. https://emschwartz.me/psa-your-sqlite-connection-pool-might-be-ruining-your-write-performance/
    
    hoistbypetard
    
    Based on the messages in the IRC channel and the state of the migration checklist it appears that the migration was rolled back and will be attempted again after addressing some issues that popped up quickly after migrating.
    
    whalesalad
    
    I’m unsure why this would be done cowboy style. It’s possible to replicate live traffic to a test environment to scale test before doing the real cutover.
    
    bkhl
    
    I think lobste.rs can probably soak up the loss of revenue from that hour or so where it was out. Or rather, it was available but in maintenance mode.
    
    hmaddocks
    
    Just use Postgres
    
    dprkh
    
    https://kerkour.com/sqlite-for-servers
    
    Good luck.
    
    rebeca
    
    This discussion is up to now very helpful and insightful. Saving to read later in detail. Thank you for fostering it. I was following the lobsters PRs and had the same question about the reasons for this migration! :)