Go ahead, self-host Postgres

94 points by kowale

k749gtnc9l3w

I've had data corruption when using a 3rd party vendor just the same as I've had when self-hosting.

As far as I'm concerned this is roughly comparable to the time you spend debugging RDS connection limits, working around parameter groups you can't modify, or dealing with surprise maintenance windows.

The main operational difference is that you're responsible for incident response. If your database goes down at 3 AM, you need to fix it. But here's the thing: RDS goes down too. And when it does, you're still the one getting paged, you just have fewer tools to fix the problem.

Doesn't look like it will ever become non-true…

mxey

None of this is technically complex

Proper automated failover is quite complex. Luckily you don’t implement it yourself for Postgres.

We run our own Postgres just fine but I would love to not have to run MySQL. Cloud vendors typically handle database replication by replicating block devices, which gives them a generic solution that doesn’t have to deal with the problems of each database.

twotwotwo

For what it's worth, we've been running MySQL clusters on VMs (where we can get local NVMe) for a long time using its replication, and this stuff is now mundane for us.

Initial setup isn't fire-and-forget, and if, for example, you don't go in aware what replication likes and doesn't (it doesn't like huge transactions or large ALTERs without an online-schema-change tool!) you might learn the hard way. But those seem like more learning and start-up costs than recurring ones. I haven't looked deeply into it, but it sounds a little like maybe Percona is trying to sort of build an open-source RDS-like thing on top of k8s with their Everest product, which might help with one class of setup work.

Postgres is probably the route if you're just deciding what to use--the larger ecosystem seems to have gone that direction--but given my past experience I'd probably personally start a thing with MySQL again if I were starting over.

I mostly agree with the OP: the work that is specific to running our own instances happens, but is occasional and very manageable. (And the local-NVMe perf is sure nice.) Most of our database-related work is the stuff that you'd have to do no matter how you hosted: looking at what's inefficient and improving it, once every N years making sure your app remains happy across a major-vesion upgrade, things like that.

Of course, saying something works well is always calling the wrath of the ops gods, and I'm knocking wood just writing this. But empirically it has been working pretty well for us!
- peter
  
  I'm curious, what online schema change tools do you use? And what do you use for automated failover?
  - twotwotwo
    
    We use gh-ost for many schema changes. Native ALTER TABLE is still much faster for adding indices, and it is generally safe for boxes in a cluster to have different indices, so we add indices to big tables with a script that connects to each box in the cluster and runs the ALTER with replication logging off. For anyone bewildered, this is because a large ALTER TABLE can stall replication even though it doesn't lock out writes, simply because it's a single large statement.
    
    Failover is based on two scripts of our own, one that coordinates with the writer so it's clean, the other for if the writer's down. They're not doing anything super deep: stop writes, wait for new writer to catch up, move traffic to new writer, update followers' replication. It would be cool to have more operational stuff (incl. backups etc.) open source. We only have two clusters and failover steps are scripted but it's triggered by a human who gets paged, so you might or might not even call it fully automated.
  - mxey
    
    We use Orchestrator and it’s miserable. Patroni is so much better. Unfortunately there is no straight “Patroni for MySQL” that’s not totally niche (Yandex has a repo somewhere, stuff like that)
    
    Group Replication and Galera might work better but they have a limit on transaction size. Unfortunately we don’t make all the software so that was a problem for us.
- ptman
  
  Does something like https://pigsty.io/ help with that?
  - muvlon
    
    There's also https://github.com/patroni/patroni which only does the HA part, without being a full on postgres distro.
    
    mxey
    
    yeah you just need to assemble Patroni and etcd into your setup. Might take a few days but then you’re set and it will just keep chugging.
    
    Those few days are mostly about thinking through what kind of durability and stuff you want to configure. I’ve found Patroni well documented but you wanna read through it.
  - kowale
    
    Perhaps this ZFS backup strategy from 2022 can help as well as long as there is equivalent to pg_start_backup/pg_stop_backup in MySQL?
  - mordae
    
    So do it as well. Use shared storage with dual controllers and failover the whole DB VM. Use replica as manual failover in case things go wrong with the whole primary cluster.
    
    mxey
    
    Knowing what I know today, I might go that route if I started over. I generally prefer database replication over storage replication. It has some advantages like letting you upgrade incrementally.
    
    Honestly my preferred option would be: no more MySQL clusters. Move as much as possible to PostgreSQL, and put the rest on standalone MySQL. Switchover times for maintenance are not that much faster than a quick restart of a standalone instance, and MySQL replication is a big penalty on write performance.
    
    mordae
    
    I would prefer DB replication too, but I simply don't trust people to be able to write it correctly. I do trust embedded HDD developers to be able to serialize requests over two ports, because that's just mutexes on a single device.
    
    Non-critical logical replication that can be verified independently is fine, when backed up by, well, backups.
    
    I think we'll need a different DB architecture for that to work. Something less synchronous, more eventually consistent.
    
    Halkcyon
    
    It may be only a couple hours a month or so of maintenance, but it's the task switching and minutiae knowledge that kills you in these routines. You look at it so infrequently that actually changing anything becomes a behemoth task in my experience as you have to refresh yourself on the implementation details.
    
    k749gtnc9l3w
    
    I find it believable what author says about «similar amount of maintenance to what interfacing with RDS requires», and then the same exact issue applies equally on both sides.
    
    weberc2
    
    Yeah, it also feels like the author is significantly more knowledgeable about (and therefore confident in) his Postgres management skills. He knows the failure modes and how to recover from them. He knows how failover works and how to set up Patroni or whatever. He knows how to configure backrest and has practiced recovery.
    
    I don’t know that much and don’t have that confidence. And I’m sure I could learn it—it’s probably not all that much and I’m sure I could find some good content on learning all of it. On the other hand, my company can pay Google perhaps tens or hundreds of dollars per month (on top of the cost of the underlying instances) to manage it for me which is a rounding error on our cloud bill, and instead I can either help increase revenue or decrease costs by some figure several orders of magnitude larger than what I would save running Postgres myself. We have people who write a bad BigQuery query that waste more money than I would save running Postgres myself. It just isn’t worth my time right now, and that’s probably true for many—if we get to an aggressive cost optimization phase and that’s the next biggest bang/buck, then we’ll tackle it then, but for now it would be the wrong move.
    
    Also worth noting that it’s never “running it myself”—I also have to teach my team how to do that work.
    
    I also don’t buy the argument that if RDS goes down you still have to deal with it. No I don’t—Amazon deals with it. I maybe have to do a bit of communication with stakeholders that RDS went down, but it’s much less work and much less stress than fixing it myself. More importantly, if I run the database myself and things break, it’s my fault, but if RDS goes down, my stakeholders and their stakeholders are understanding (maybe it shouldn’t be that way, but that’s the world we live in).
    
    fm
    
    The article makes several strong points, but I disagree with it on a few important parts.
    
    For context: I work on cloud orchestration for datastores. For compliance reasons I can't name the company, but I'm responsible for operating and scaling more than a thousand database clusters across AWS, Azure, Aiven, MongoDB, and other experiments.
    
    "Cloud providers pushed a fear narrative: self-hosting is dangerous."
    
    Every option has risk. Cloud providers are not a magic bullet either. But they do solve 90% of the problem space reliably, and the remaining 10% can still cost you a dedicated team, on-call maturity, and real operational depth.
    
    "Managed Postgres is mostly vanilla Postgres with wrappers."
    
    It depends on the provider. Postgres is becoming a standard and is starting to suffer the same fragmentation Linux distros did: different "flavors" with meaningful behavioral differences. Underneath, it's still Postgres primitives, but each vendor chooses to solve reliability, storage, replication, and failover in its own way, and those choices matter.
    
    "DB engineers don't fix bad queries; abstraction hurts benchmarking/tuning."
    
    You can have deep observability and tuning with cloud providers too. Monitoring is genuinely good across most serious vendors, and if you know what to look for you can squeeze a lot of performance out of managed offerings. Abstraction can hide details, but it can also standardize metrics, automate baselines, and reduce the time-to-diagnosis.
    
    "Vendors still have corruption/outages, and you pay a markup."
    
    Some do, sure, but the better vendors are doing real engineering there. Many are decoupling compute and storage and investing heavily in durability and recovery paths. If you've read the Aurora design papers, or looked at Azure's newer database architecture work, you'll see how deep this gets and how it can reduce corruption risk. Personally, I haven't had a high-availability Aurora outage caused by underlying infrastructure in the last five years.
    
    "Self-hosting can be much cheaper."
    
    It can be, on paper. But you're one incident away from blowing up that "cheap" argument. The hidden cost isn't the server, it's the operational blast radius when something goes wrong: paging, recovery time, data risk, customer impact, and the engineering time you burn.
    
    "Self-hosting performs better because you can tune what managed services lock down."
    
    Sometimes. In practice, at scale, a lot of my work with vendors is exactly this: unlocking their platform's potential, pushing past conservative defaults, and getting provider-side guardrails adjusted for our workload. The problem is you're usually dealing with support layers, not the people steering the product, and you spend extra energy proving what you already know.
    
    "Operational overhead is smaller than people assume if you keep it simple."
    
    No, it isn't. I joke internally that every extra "9" you sell a customer is an exponential increase in operational burden. Backups, restores, upgrades, replication, failover testing, incident response, capacity planning, security patching, observability, and compliance don't disappear. You're just choosing who owns them.
    
    "With managed services you still get paged, but you have fewer tools and less control."
    
    In my experience, tooling is usually sufficient. I can count on one hand the number of incidents where we couldn't solve a database performance issue with the information and controls available, and only one of those as Postgres.
    
    "Self-hosting should be the default for most teams."
    
    I don't buy that. Managed should be the default for most teams because it removes a huge amount of operational risk and undifferentiated work. Self-hosting can be right, but it's a deliberate trade, not a baseline recommendation.
    
    "The future is hybrid."
    
    Agreed. We already do this: we self-host for ephemeral dev environments and testbeds (version upgrades, experiments), and thanks to orchestration we mostly ignore them until needed. For production workloads, managed services are often worth it.
    
    If you're running basically one database (prod + staging + dev), it makes no sense to absorb the operational cost of running those databases yourself. The "savings" are a rounding error compared to the risk, the context switching, and the tax of being your own DBA and SRE.
    
    At scale, yes: you can squeeze a lot more performance per euro and reduce cost, because you can standardize patterns, automate everything, and run the same playbooks across fleets. But at that point you're also competing with the providers for the same scarce talent: people who really understand database orchestration, failure modes, kernel and storage behavior, replication internals, and performance tuning. That's a hard labor market to win, and it's expensive even when you do.
    
    So for most teams, most of the time, you're better served with a cloud offering: you buy mature operations, proven HA patterns, predictable tooling, and a smaller blast radius for mistakes. Then you selectively self-host where it's clearly justified (ephemeral dev/testbeds, special performance needs, cost extremes, or compliance constraints that the managed product can't meet).
    
    And for what it's worth: one of my skips is one of the people who originally wrote Postgres for Stonebraker, and he's a strong advocate of cloud offerings. Go figure. 😄
    
    k749gtnc9l3w
    
    It can be, on paper. But you're one incident away from blowing up that "cheap" argument.
    
    But if your database ties you to using hyperscalers' cloud infrastructure (which as we all remember does not have working cost capping), you are one incident away from blowing up «within couple of zeros from reasonable costs» argument.
    
    Managed should be the default for most teams because it removes a huge amount of operational risk and undifferentiated work.
    
    Interfacing with cloud services is also undifferntiated work. On the same scale. Which the author discusses.
    
    Personally, I haven't had a high-availability Aurora outage caused by underlying infrastructure in the last five years.
    
    Except not being hit by any of the AWS outages in those five years takes a lot of operational competence, much of it vendor-specific.
    
    reezer
    
    The operational complexity is very very independent of self hosting.
    
    If you have a managed service you:
    
    Still wanna check which queries are slow
    
    Still wanna back up SOMEWHERE ELSE or else it isn't a backup (people, even large multi million companies do lose their cloud accounts), you are one stupid mistake away to have your project including your backups removed, etc.
    
    You still wanna check what uses space, because cloud or not it can lead to performance problems
    
    You still of course want to have a way for disaster recovery
    
    Managed Postgres gives you zero of those things. The only thing you get is potentially a few minutes of setup reduction initially. That's it. Clouds aren't magic and the people working at cloud companies aren't wizard. People should leave the dark ages of believing in such things. Superstition in IT is a really bizarre thing to have.
    
    The main operational difference is that you're responsible for incident response. If your database goes down at 3 AM, you need to fix it
    
    The same is true when you use a managed database. Pretending that somehow because it's in some cloud there is magically nothing that can affect your database availability is just silly.
    
    Sure they have things like failover, but if you want to have automatic failover you can do that. And unlike the cloud provider turning these things into products, meaning they need to do a lot more, which introduces more potential for things to go wrong. You don't have many of these issues if you self-host. And for cloud providers you still end up working around limitations of someone else managing it. Eg, good luck just doing pg_upgrade at most big providers.
    
    Cloud providers need to add a lot more complexity, because they are not just self hosting.
    
    While all comparisons have shortcoming, it's a bit like you can cook your scrambled eggs just the way you want with very little requirements and effort, but a restaurant chain needs a lot more stuff around it. Sure usually their egg dish will be a bit more consistent and you can just have it on a whim and stuff. They also take a lot more care about contaminating, but there are typically a lot more opportunities for contamination there. While you are happy to just give the plate from the dishwasher a quick look and maybe throw the egg into a glass first to do the float check the company needs to take care about large deliveries, needs to make sure your order is taken, goes to the right cook, always have a supply, with none getting back, check there is no rogue employee messing stuff up. Your company needs to do that too, but at a much smaller scale where a single person can have a complete picture a lot more easily than a large company.
    
    Otherwise we'd just all copy paste stuff and be happy. But the reality is if you do that (eg. with Kubernetes) it might be fun as a learning process, but you'll spend more time with the management part than actually just having your postgres run.
    
    But as always whether a managed DB makes sense is very very dependent on the use case. If you run your company in many cases I'd argue it's better to self host, if you have your local sports club website, maybe just get a managed DB. But also maybe don't do that on a big expensive cloud provider. Those just charge a lot for brand name and "having everything". It's really not worth it. You overall won't have less downtime if you pick one of the many thousand alternatives.
    
    It's the same as running any kind of personal side project. One can sometimes compete with the biggest company when your goal is having something specific work really well, and cut out all of the "running a huge company": It's just a very different goal.
    
    One more aside: pg_basebackup is really nice. You can use it to have your server back up in no time. You can use it to easily create a replica. It takes care of configuration. It comes built in. I'd highly recommend everyone self-hosting to give the command line tools coming with PG a try. They are really great and useful and it's actually fun to use them. Just using that and something like restic gives you "ready to go" backups for very cheap.
    
    intarga
    
    if you have your local sport's club website, maybe just get a managed DB.
    
    I run a website for a local (non sports) club and disagree. Our constraints are:
    
    we have no money
    
    low load
    
    very small dataset
    
    perfect uptime isn’t important
    
    In this circumstance SQLite backed up with an rsync cron seemed the obvious option. I’m not sure what a managed db would get me other than a bigger bill
    
    reezer
    
    Fair enough.. So seems like no circumstances where managed DBs make sense.
    
    eduard
    
    I totally agree with you. Nonetheless, SQLite backups with rsync can be dangerous. From the SQLite docs:
    
    Ordinary rsync does not understand SQLite transactions. Rsync will make a copy of ORIGIN into REPLICA, however the copy might not be consistent. Parts of the copy might be from one transaction, while other parts might be from a different transaction. The database copy might be corrupt.
    
    One alternative is using sqlite3_rsync which always generates a consistent copy of the DB.
    
    intarga
    
    Good point, I should have clarified that I dump the db, then rsync that, not the live db
    
    e12e
    
    Great article. One thing seems to be (partially) missing; patching the VM/server os (along with dist-upgrades when current version falls out of support).
    
    As one alternative to self-manage we're currently slicing up managed postgres instances (from UpCloud) - giving our test environments a seperate database, user and schema on a "shared" managed postgres instance - which makes the "managed sevices tax" more reasonable.
    
    For prod environments it can make sense to pay for one instance per service - because of the "now it's somebody else's problem"-feature, and making it trivial to do point in time recovery via just bringing a new service instance online for recovery.
    
    But the general trend of steering users toward one postgres instance per service does become a little silly when you have important, but low volume services.
    
    option
    
    If I ever run Postgres myself, I'll build from source with vendored libs. Building postgres is easy, and it's really nice to pick a path for the databases, files, etc.
    
    fanf
    
    Debian’s PostgreSQL packages allow you to set up multiple database clusters (in the PostgreSQL jargon sense) with potentially different versions of PostgreSQL and lots of control over things like data directories. It’s pretty neat, especially for making major version upgrades easier. https://manpages.debian.org/stretch/postgresql-common/pg_createcluster.1.en.html
    
    option
    
    I have to admit, that is pretty neat!
    
    calvin
    
    Is it just me, or does Pg seem very high maintenance compared to other databases? For operations and tuning wise relative to SQL Server, Oracle, or Db2, anyways. Most of what I hear is constantly having to fight it (it's either time or money in the form of licensing costs, it seems).
    
    k749gtnc9l3w
    
    I have heard stories about having to fight Oracle or MS SQL Server. Not much though, because I am not in the social circles of the remaining people still running those. And about PostgreSQL, there is really not much to tell until you hit the need to start tuning the parameters.
    
    Like, my emails (and some web feeds like Lobste.rs, and like Freefall webcomics) are indexed in PostgreSQL on my laptop. Once in a few years I need to do the version migration ~five versions ahead, and last time the defaults about checksums changed (towards having checksums on by default), which the migration command explained to me, and I had to enable them on the old version then migrate. That's all there is to tell, for the last ten years.
    
    e12e
    
    No. I think maybe Ms SQL server and Oracle is more likely to be running on over-provisioned hardware perhaps - reducing the need for tuning somewhat.
    
    Which makes sense - if you start with paying 10 000 USD in license fees, you're likely not trying to fit that db into an anemic vm. Postgres is free, so you might be more inclined to experiment how far down you can reasonably scale the deployment.
    
    kev009
    
    Relative to these commercial DBs, sure. I don't think pgsql is that much work though: turning up configuration options to match the hardware, understanding autovacuum, and having some backup/recovery strategy.. only one of these is unique.
    
    I haven't encountered Oracle in a long time but that used to be the realm of expensive DBAs and lots of odd OS tuning and choices though.
    
    zie
    
    I think it's just that PG has A LOT of knobs, but you don't really need to push them. The defaults are "good enough" most of the time. It's when you want to optimize, something developers are famous for trying to do, that you can go down giant rabbit holes and get lost in the tuning.
    
    oger
    
    Good article! What I'd also be interested in: what are the usual reasons for outages (i.e. for 3am pager calls) with a self-hosted Postgres? Are there any common patterns? And is it possible for a non-expert to debug/fix such problems on short notice?
    
    doctor_eval
    
    I was a senior manager in a small company that ran Postgres in its core and quite honestly the number of Postgres related after hours calls we had in the 20+ years we were running could be counted on one hand. One example that stands out due to its severity was an instance that just kept crashing - but it turned out to be a raid controller flipping bits.
    
    While replication etc is great (and we did it with drbd for a long time before other techniques came along), pg is rock solid, I’d have no qualms running it solo and just relying on backups for a small project.
    
    All of that said, I currently run a smallish instance at Vultr for $WORK. Backups, upgrades and failover just happen, and I’d need to pay for the CPU and disk anyway. So while I don’t think it’s hard to run your own PG, it is (was?) non trivial to set up and maintain that sweet sweet automation, and there are plenty of cheapish hosting options around that do it all for you.
    
    zie
    
    If random people can send SQL to your PG, chances are it's one of those random SQL queries abusing the poor server. You have a bunch of people sending terrible SQL and your going to have a bad time.
    
    Sometimes it can be locking, if you allow really long running queries against tables that are being inserted against.
    
    As for figuring out if these are the issues are not, you can query for both of those things. see the PG wiki for queries that do test for these. There is also the server admin section of the PG manual: https://www.postgresql.org/docs/current/admin.html
    
    Also it should be noted, a hosted DB instance won't fix any of these issues for you either, they will always be your problem.
    
    If it's not that, it's probably not PG's issue. It's probably hardware failure, OS crashing, no disk space, stuff like that.
    
    0x2ba22e11
    
    Sometimes it can be locking, if you allow really long running queries against tables that are being inserted against.
    
    I once saw someone call this "Don't OLAP where you OLTP" because analytic queries can often be long-running enough to cause outages if you run them on the production server that is actively modifying the data. Their suggestion was to run your OLAP queries on a read-only replica of production instead.
    
    (I guess in the GDPR world that we live in now, having some analyst run unspecified queries against production is questionable because how sure am I that they stick within the subset of queries that the user's consent actually covers?)
    
    zie
    
    Agreed a read only copy for long running queries is a great idea, if you have the budget for a read only copy.
    
    As for access issues(not just GDPR related): PG has very fine grained access controls that so many people just never bother to use. docs: https://www.postgresql.org/docs/current/ddl-rowsecurity.html
    
    madhadron
    
    It's the same as for most databases: hardware issues (disk problems, IO glitches), bad plans getting generated and cached by the query planner, and query loads that cause lock contention or exhaust resources. The first aren't that common and having hot failover changes them from disasters at 3AM to a failover and then address it during business hours. Lock contention is actually an application error, and not the database's problem. Exhausting resources is from your application as well, so you have to shed load at the application level.
    
    kev009
    
    The benefit of hosted is getting accessed to closed source distributed DBs like Aurora and AlloyDB. I never understood the appeal to RDS/CloudSQL.
    
    alper
    
    Fine, we'll self-host RDS Aurora.