Start with a clean slate: Integration testing with PostgreSQL
19 points by dogacel
19 points by dogacel
Worth contrasting with this approach: https://calpaterson.com/against-database-teardown.html
This is the way. My prod DBs are full of crap (erm, data), added at different migration versions than $latest. Tests that run against a clean DB are testing the software in an unrealistic state and therefore provide no value to me, might as well skip writing them.
As the other reply suggests, this is where you want some sort of test-data “fixture”. Sometimes that might be hard-coded data that’s the same on every run, sometimes it might be partially or completely randomly generated data (and there are libraries to do this – a popular one I know well from the Python world is factory_boy).
Another plus of being able to generate realistic-looking test data is you can bootstrap local dev or non-production environments easily; if you have a factory or whatever that can generate one instance for a test, it can generate five thousand instances to seed a database.
Realistic looking data is not realistic data though. Prod is full of surprises! The only realistic data is what I have in production. If I really need to fill a DB with realistic data, I prefer to invest in anonymizing a prod data dump and importing that instead of wasting LoC to build up fake data that gives me fake assurance.
You can always put in some specific fixed items if you need to regression-test their cases. Pulling all or most of your production data set for testing just feels like using nuclear weapons to kill a mosquito. Like, suppose I’m writing a sales-tax/VAT calculator: I’m not going to include every transaction in the history of my company in the test data set just to make sure the test encounters a few key edge cases. I’m going to include those specific cases.
We have abstracted all data requirements into “Smart Fixtures”, that are similar to scenarios. They bring all data dependencies for an object with one method call. Therefore, before each test, you can choose and setup a scenario that fits your need.
This is only possible by ensuring your DB state is clean in-between tests. On contrary, I think what you said made me appreciate we have flexibility to provide many different scenarios to developer, rather having our DBs contain unrelated and uncontrolled data. It really beats the “isolation” aspect of integration tests.
I don’t see how that’s related? We can have scenarios and “smart fixtures” without cleaning the DB. The call can simply return the ids of the created objects, so the test can be written without hardcoding any identities.
If a few edge-casey tests really do need a clean DB, they can run against an ephemeral version. But the value they provide is less, and setting them up is more effort, and they might not be so easy to parallelize, so adding them to a codebase is a liability and needs to be considered carefully.
If you try to use scenarios without cleaning the db, you might see unique constraint errors. Also existing fixtures can be modified after a test run and you wouldn’t know about it. It might also effect the results of some queries, especially if you query a secondary index.
If a few edge-casey tests really do need a clean DB
Almost all tests really do need them, because of the reason said. Most of the time out scenarios are slight derivations of each others, for example you can have an active user, or a disabled user, the second one is a derivation of first, it uses the same ids, so you will definitely have clashing definitions if you don’t clean your db.
I think your assumption makes sense when you define everything with unique ids, but you will still face unique constraint errors or unintended items being returned in broader queries in general. How do you plan to solve them?
From the article you linked:
def test_adding_a_user_2(session):
user = make_user()
db_session.add(user)
# this is safe, doesn't assume no other users exist
assert db_session.query(User).get(user.user_id) is not None
Is it safe if the user already exists? In production, trying to add a user that already exists should probably be a logic error, so you’d need to either delete the user first, which your business logic might not even have a need for.
Of course, you can always generate a fresh user each time, at which point your tests are going to get slower and slower over time, which goes against the benefit of “cleaning the database is slow” that the article mentions.
I’m using https://github.com/peterldowns/pgtestdb , works really well. Uses pg template database to create a new one for each test.
Will check it out. I will also try to evaluate it. Thank you!
It looks like pgtestdb
is golang-specific so you probably can’t use it with a Ruby codebase (without shelling out).
At work, we have a hand-rolled implementation in Clojure which also sets up the test database using a template at the beginning of each test. It isn’t too difficult to set this up though: Wrap your tests in code which runs DROP DATABASE test IF EXISTS;
followed by CREATE DATABASE test TEMPLATE test_template;
. To speed things up further, we keep a SQL schema dump of the most recent schema, so we don’t need to run the migrations from scratch when setting up the template.
Note that you can also populate the template with common fixtures if you have them - it doesn’t have to be just an empty schema.
Yes, it’s for Go only. But it isn’t that much code. You can read it and understand it so that you can port it to another language.
pgtestdb does run migrations on the template so that you don’t need to run migrations on the test db. It also supports multiple test dbs so that you can run tests in parallel. Sped up my test suite by quite a bit.
I have strong opinions about never mocking the database in tests, so tricks like this are very important to me.
I’ve gone to quite extensive lengths in the past to implement similar patterns for other data stores - I’ve written code to help clear an Elasticsearch index and recreate it at the start of every test run, for example.
https://testcontainers.com/ is a neat tool in this space - it provides Docker containers for all sorts of databases / search indexes / message queues / etc that are designed to be used in a test environment that needs to recreate a clean slate for every test runner.
Some people will complain that this “isn’t unit testing” and that the performance isn’t good enough. I disagree with those people - I think you can almost always make it fast enough if you put some thought in, and unit tests that exercise a bunch of mocks are far less useful than more integration-style tests that actually show if your code works or not.
Yeah, I strongly agree. If you’re mocking the database, your tests are nearly pointless as almost all of the complexity (and thus risk) is in the SQL itself. You are just writing tests for the sake of writing tests—you aren’t actually derisking much of anything.
Moreover, I’ve never seen the database make tests slow—I’ve seen slow tests because the application was slow (e.g., Python services) but I typically run hundreds of tests against a Postgres database in a couple hundred milliseconds without doing anything special to tune the Postgres database for testing (it’s just a vanilla postgres
container image running on a laptop).
I don’t really buy this. Let’s say I have a structure DbWrapper
, which has constrained inputs and interacts with the database. I write tests against DbWrapper that hit the database directly. I then have a UserManager, which wraps DbWrapper with its own logic, and then TeamManager, which is exclusively implemented using UserManager and other pure constructs.
Maybe all tests against DbWrapper should not mock the database. Let’s imagine I have near total coverage of DbWrapper. Should UserManager not mock the database? We’ve got tested behaviors for DbWrapper already from its own tests. I can write a “MockDb” that DbWrapper can use instead of a connection to ar eal db that’s in memory and passes the exact same test suite as DbWrapper, even acting as an oracle. Probably there’s value in at least some of UserManager not using a database, but all tests are pointless if they use a mocked database?
What about TeamManager? We’re so abstracted from the database, we have full coverage of the DB layer and the layer above it. How much value are we getting from using a real database now? I’m really skeptical, and there’s a lot of value to having really fast, deterministic, reliable tests. I could write generated property tests or fuzz tests that iterate hundreds of thousands of times if I use a mock, there’s no way I can do that to the same degree against a database. Is that not worthwhile?
Moreover, I’ve never seen the database make tests slow—I’ve seen slow tests because the application was slow (e.g., Python services) but I typically run hundreds of tests against a Postgres database in a couple hundred milliseconds without doing anything special to tune the Postgres database for testing (it’s just a vanilla postgres container image running on a laptop).
Hundreds of tests is nothing. If you use tools like quickcheck you’ll have 100s of thousands of tests, even millions - I’ve written tests that exhaustively bruteforce 0 to 2^20, you just aren’t going to do that with a database. I’ve had extremely optimized code with optimal DB usage, it doesn’t change the fact that talking to a DB is radically slower compared to talking to memory.
Recently I wrote a protobuf client. I have integration tests against real services, but I also have mocks so that I can specifically test things like invalid responses (ie: I need the service to do something that no real protobuf server would actually do). There’s just no practical way to do that without mocking.
I’m confused because you say you don’t buy my reasoning, but then you say that your tests for DbWrapper hit the database directly, which is precisely what I’m advocating. To be clear, I have no problem with mocking DbWrapper in unit tests for UserManager or other higher-level components.
Hundreds of tests is nothing. If you use tools like quickcheck you’ll have 100s of thousands of tests, even millions - I’ve written tests that exhaustively bruteforce 0 to 2^20, you just aren’t going to do that with a database. I’ve had extremely optimized code with optimal DB usage, it doesn’t change the fact that talking to a DB is radically slower compared to talking to memory.
Sure, if you’re executing that many tests for your SQL it probably makes a lot of sense to optimize.
To be clear, I have no problem with mocking DbWrapper in unit tests for UserManager or other higher-level components.
Then I misunderstood.
I do have a problem with mocking DbWrapper - that’s the exact kind of thing I argued against in an earlier comment.
I have written code like that in the past and regretted it. I have never regretted code that I have written that exercises a real database.
My ideal test suite is one where almost every test demonstrates that a feature of my software works. I’m not a fan of tests that exercise individual “units” of code that are detached from end-user functionality via several layers of abstraction.
I’ll write unit-style tests when they make sense - if I was writing a YAML parsing library, for example.
I still use mocks when there is a system I’m talking to that’s outside of my control - like a REST API hosted by an external company. I don’t want my tests to ever fail because some other system I depend on had an intermittent error.
I do want them to fail if eg a new MySQL version comes out that subtly changes the semantics of an obscure SQL feature that my code happens to rely on. Robust integration style tests make upgrading dependencies (including library dependencies) so much less painful!
I have never regretted code that I have written that exercises a real database.
I definitely have. Having slow tests is really annoying. I much prefer to focus on exhaustive testing for the areas I want to test.
I do want them to fail if eg a new MySQL version comes out that subtly changes the semantics of an obscure SQL feature that my code happens to rely on.
But those would (likely) get caught in my scenario because you wouldn’t be mocking the database for DbWrapper. And I even would say that some tests that don’t mock the Db in UserManager would make sense, maybe even some in TeamManager. But the further away you get from the database, the less useful it is, and the more you might benefit from having faster and faster tests.
It really depends what you are testing. If I’m getting a method with a bunch of logic and it happens to fetch a value from the db, but what I’m testing is what decision it makes with the value etc, then mocking the db hardly makes the test useless. Usually I use a repository object and send in a fake repo with such a test
For that kind of thing I like to refactor my logic to a function or class that doesn’t load data from the database itself at all. It’s much easier to write tests against a function that eg takes a value and returns a decision - the fact the value was loaded from the database is irrelevant to testing if that logic works or not.
It really depends what you are testing. If I’m getting a method with a bunch of logic and it happens to fetch a value from the db, but what I’m testing is what decision it makes with the value etc, then mocking the db hardly makes the test useless. Usually I use a repository object and send in a fake repo with such a test
Maybe I should clarify that I’m not opposed to mocking the database, I’m opposed to not testing non-trivial SQL. I often make a repo object which I will test against a real database and then mock in higher-level tests, but mocking the database in the unit tests for the repo object seems pointless to me.
We use testcointers in our microservices, it works pretty well!
unit tests that exercise a bunch of mocks are far less useful than more integration-style tests that actually show if your code works or not.
Totally agree, we write those tests to ensure our code works fine and doesn’t introduce bugs. I think sometimes we forget why we write those tests in the first place. When you mock a bunch of stuff, it basically becomes an echo chamber. Every mock you make is an optimistic assumption of how external system is going to behave like.
The other trade off maybe not mentioned in the siblings is that it’s a lot of stuff to set up and to set up in CI. Template dbs, spawning the app pointed at the right db, process management…
Sure, but that investment pays off every time the test suite runs. I’ve never regretted putting the work into CI to get that all working.
The harder part in my experience is usually the local development environment - I like developers to be able to run the tests locally! For that Docker helps a ton.
Specifically for Python + pytest I use and slightly prefer pytest-mock-resources, which provides very similar functionality but exposed through pytest fixtures.
My experience with using this approach for all tests is that test times skyrocket. In my experience full test suite that runs longer than 1 hour is not good for productivity reasons. Because you want your test suite to run for every PR before it’s merged and every time a commit is done to main
. Is it the case for you that you never have a complete test suite taking longer than 1 hour?
I’ve never had a test suite on one of my own projects that takes more than a few minutes, and honestly at a few minute I start getting stressed out about it.
A lot of my current projects use plugins, which make this a whole lot easier - Datasette is 100+ plugins but each of those have their own independent test suites, so I don’t have to run everything at the same time. I have a few projects that run tests exercising multiple plugins at the same time, I’d like to increase that.
There are a bunch of tricks you can use to optimize slow test suites. The most powerful I know of is the pattern implemented by https://pypi.org/project/pytest-parallel/ or https://pypi.org/project/pytest-xdist/ - you effectively split the test suite into 4 or 8 chunks and run them in parallel across multiple cores. It’s possible to get multiple machines to help here though I’ve never worked that that setup myself.
You need to have discipline in making sure your tests can run independently of each other, but that’s good practice anyway.
I’d rather run a CI cluster of 16 machines and split a huge test suite across them than mock my database!
I’d rather run a CI cluster of 16 machines and split a huge test suite across them than mock my database!
That makes sense if you can afford it. But it’s certainly not free and too expensive for some companies to spawn 16 machines for every commit.
I’ve certainly seen companies reject the cost of running more machines for CI, which felt short-sighted to me given they were paying the salaries of several hundred full-time engineers who were all blocked for 20+ minutes at a time every time they had to wait for a CI job to complete.
My approach is to have my test setup run ephemeral PostgreSQL instances: https://michael.stapelberg.ch/posts/2024-11-19-testing-with-go-and-postgresql-ephemeral-dbs/ — this way, you don’t need to manage any databases, the app gets a fresh database on every test run, and everything is fast (no containers needed) :)
I have found transactions to offer sufficient isolation for testing. The only issues I encountered were for cases where I expected code to rollback the transaction, and I have a suitable workaround for those handful of cases.
We initially truncated tables, but ran into the foreign key issues the author mentions. Also, the number of tables has grown so significantly such that manually maintaining the manual order list of tables to truncate is too toilsome.
[With monoliths] You don’t need to open 3 pull requests just to do some CRUD operations on a basic database table, and jump back a couple PRs later, because you forgot to add a field to your proto definitions and you gotta open 3 more PRs to add that.
This kind of feels like blaming microservices for an architecture problem, by which I mean a good microservice architecture (as I conceive of it, anyway) shouldn’t require touching several components just to make a CRUD endpoint? If it does, the number of PRs seems like the least of your concerns. I agree that microservices make it more painful to “cheat” the architecture, which maybe feels like a feature if a poor architecture is forced upon you, but if you have a more reasonable architecture, you want to disincentivize this sort of “cheating”. Microservices aren’t the only way to do that, nor is that their primary purpose (and there are reasonable criticisms of microservices—e.g., you need to bring in something like Kubernetes or Docker compose sooner than with a monolith), but IMHO it’s an under appreciated benefit.
I’m curious to hear what others think?
The problem they’re describing isn’t architectural, it’s an issue of code organization. What they’re describing goes away in a monorepo - you would never need multiple PRs for that, nor would you forget a field in protobuf, etc, because your entire deployment is checked in lockstep in CI. The microservice vs monorepo distinction isn’t really relevant at all, it’s just that you wouldn’t run into this as often in a monorepo because you already have the lockstep check - well, except for dynamically linked dependencies, which goes to show my point.
What they’re describing goes away in a monorepo … The microservice vs monorepo distinction isn’t really relevant at all
Monorepo-vs-manyrepo is orthogonal to microservices-vs-monolith i.e. you can have microservices in a monorepo. Neither the article nor my comment mentions monorepo, so I suspect you may be confusing monolith and monorepo? Please correct me if I’m the one misunderstanding something.
That’s my point. They seem to be talking about an issue that isn’t related to microservice vs monolith. I miswrote “microservice vs monorepo”, I meant to write “The microservice vs monolith distinction isn’t really relevant at all”.
Not really, you still have to deploy each service slowly considering your changes are not backwards compatible. I see that decreases number of hops between repositories, but still
So it won’t be a single PR anyway, you need to deploy each service in order.
You could solve that problem with multiple PRs, I guess. But you would know about a client not being up to date at build time, that’s my point. This wouldn’t be something you could forget.
Depends, if you think about linked libraries, yes it won’t be stale. However it doesn’t guarantee your release won’t be causing issues, as two services will be deployed independently, and they can both speak to the older version of eachother.
But if you make certian assumptions that is not at compiler level, such as assuming the field foo
will be a non-empty string from now on, you still won’t catch it. And actually, if you follow gRPC standards to write wire-compatible code, compiler won’t be really catching much.
If you can’t make compiler level assumptions then how would a monolith help? idk it isn’t that central to your post but I’m not convinced that this is much of a microservices problem.
A monolith can help because you don’t have RPC methods and don’t need to mock them. You can directly see how thing works end to end. In microservices, your tests assume
You don’t have that in monoliths, you can just write code that is well-covered. That’s a distirbuted system problem overall, it doesn’t have to be microservices. You can say the same thing for any kind of external dependency, message queues, external APIs, databases. That’s why I underline the importance of integration tests in this blog post.
Ah, I understand and I agree, although whether OP puts their code in a monorepo or a repo per component (I’m using “component” here to refer generically to some piece of code that could be either a component in a monolith or a microservice depending on the architecture) or whether their components are deployed as independent services or a single monolith, it’s suspicious that ordinary CRUD endpoint changes are traversing component boundaries. That seems like an architecture issue IMHO.
I guess I’m (over?) reacting to a common objection (that perhaps the OP isn’t actually making) that microservices make it hard to traverse component boundaries, and I think maybe that’s just an argument against architectural accountability. Whether you organize your code as microservices or components inside a monolith, you shouldn’t “cheat your architecture” and if the architecture is causing you pain, then the problem is probably with the architecture and not with the guard rails that enforce it.
I did not fully understand your point,
microservices make it hard to traverse component boundaries
Also an ex-members who were a senior at the time told me, “Bad standards are better than no standards”. So it is fair to assume we had bunch of “architectural issues” because it was hard to change the architecture of everything all at once. Maybe from the business’s perspective, those issues are justifiable, since standardization also brings bunch of advantages.
From the book “Microservices Patterns”, I remember author mentioning how “microservices” are usualy an organizational decision rather than architectural. Based on my recent research experience, I should say “modularization” is a sweet-spot for many things that microservices try to solve, while having the advantages of monoliths.
If you want to learn more on why I think, I can provide some sources.
Based on my recent research experience, I should say “modularization” is a sweet-spot for many things that microservices try to solve, while having the advantages of monoliths.
Yeah, I think I mostly agree. I’m certainly a big fan of modularization. Hopefully all software, whether monolith or microservice, is not only modular but the modules are organized sensibly (I think of “architecture” as the way we organize these modules). If you have a modular monolith, I think you should be able to break it up into microservices pretty easily if you want to.
You might want to break one or more modules into a microservice if:
(Microservices aren’t the only solutions to these problems, but they are an established pattern for solving them)
However, if your monolith lacks a good architecture and you try to break it up into microservices which also lack a good architecture, then I think you’re going to feel a lot of pain, and I think a lot of people who criticize microservices are largely criticizing poorly-architected microservices. If you have a good architecture, then monolith vs microservice becomes a sort of implementation detail.
At our company we’re searching for a solution to this, so this blogpost is extremely appreciated!
Our stack is Prisma (Typescript) + PostgreSQL, and our integration tests use Prismock as a poor man’s substitute for in-memory ephemeral PSQL database. It only looks like a database and doesn’t have the full functionality (eg. foreign keys), so we’d love to switch to a real PSQL database, a clean one for each test.
Thanks for the trick for temporary disabling of constraints so that you can clear the tables more easily! Very cool.
I’m glad that you liked it! I have included the installation strategies in the repository as well. I tried to make it as easy as possible.
We used H2 as our in-memory database in the past, but we also wanted to have the full functionality of PSQL after facing couple incidents which could have been caught easily with existing tests.
Let me know if you can successfully use it in your project and any kind of feedbacks! I would be very happy to hear sharing our solution helped others painpoints as well :)
Interesting. For PostgreSQL testing we use https://pgtap.org and we separate concerns and only test app against PG directly with full e2e tests. We trust that the DB API’s can reliably talk to the DB.
We trust that the DB API’s can reliably talk to the DB
It is not necessarily DB API’s reliability, we would like to simulate the whole end to end flow. This makes sure we are catching any kind of bugs early on.
It would be an interesting read for me too see how you separated concerns and how you actually tested some behavior fully. For example long transactional blocks or complex queries are really hard to test. I wonder how you dealt with them and how successful it was.
tldr; The database is smarter than we are 90+% of the time, so we rely on it to manage and control the data. We recognize that data is the most important part. Everything else is mostly just window dressing.
That’s all done in the DB, we rely heavily on DB features such as views, triggers, functions, etc. This is why we care so much about PG testing.
i.e. most of the logic around the data is done in the database. It handles auditing and access control too. I get this is weird compared to how most apps do it, since they treat databases as dumb data stores, but we believe strongly in making the DB guarantee the data integrity: FK’s, check constraints, etc.
I agree if you are doing long transaction blocks and stuff from the app and not have that run in the DB process, then for sure you have messes if you can’t do a lot of E2E testing. In our case we would just have a function or view you call, and it would handle the transaction or complex query. Which we then test with pgtap.
When we need a real language past SQL in the DB, we reach for PL/Pythonu, but there are plenty of options to choose from. This means most of our app logic lives in the database, but that’s good with us. Most of the external code outside the DB is just GUI/UI logic or CLI tools that interact with the DB. We haven’t quite gotten as far as something like SQLPage where even the web UI lives in the DB. That seems a touch too far, even for us!
This is all fine and dandy for applications that just use the database to manage state. I’ve really never found a good solution for the class of app that exists to process and serve up data produced outside of the world of the app, eg analytic data produced by other teams.
I’ve really never found a good solution for the class of app that exists to process and serve up data produced outside of the world of the app, eg analytic data produced by other teams.
What’s wrong with mocking in that case? I assume there is some agreed upon schema for the data you are processing/serving. Generate mock data that is as realistic as can be and test against it. The alternative is to call those external systems with will be slow and error-prone, and assumes the external systems have a useful test environment.
Generate mock data that is as realistic as can be and test against it.
This assumes you know the ways the data will change. Unless you control the real data, this is just writing down your assumptions twice.
With data coming from third and fourth parties, and other teams processing the data, we’re exposed to:
Obviously we have database schemas and can reject data that doesn’t conform to the schema. The problem is that the data is more complicated than that.
At first, it sounds like a good idea. In reality, it is a terrible idea. PostgreSQL supports some sort of nested transactions, also called SAVEPOINTS. However a failure inside a transaction aborts the rest.
Surely this is fine if most of your tests don’t rely on nesting transactions? (Which I’m assuming is most). In my experience wrapping each test in a transaction and rolling back each time is much faster.
I’m actually looking into this, I guess what I said was wrong,
a failure inside a transaction aborts the rest
I tested it by running
begin;
savepoint here;
select * from categories;
insert into categories (category, created_at, updated_at) values ('Foo', now(), now());
savepoint there;
select * from categories;
insert into categories (category) values ('Foo');
rollback to there;
select * from categories;
commit;
It only appeared as so because our database library (Scala Slick) doesn’t support savepoints.
That’s still a limitation, however what I said seems wrong.