Avoiding PostgreSQL Pitfalls: The Hidden Cost of Failing Inserts

11 points by spookylukey

xfbs

TL;DR (correct me if I got this wrong): if you insert data into a table that has a (compound) UNIQUE constraint, and you expect the insertion to fail often (because of a UNIQUE constraint violation), then using ON CONFLICT DO NOTHING and not doing it within a transaction may help Postgres perform better. If you don’t do this, then even if there is a constraint violation, Postgres will still write the tuples to disk (they will be dead tuples) and perform a transaction rollback, which adds time and increases the amount of work that the autovacuum has to do.

mxey

Yeah, apparently ON CONFLICT does the uniqueness check before writing into the heap, so it it fails, there will be no dead tuples

I wonder why a standard INSERT without that clause doesn’t also do the index checks first.

squadette

Django wraps the call in a transaction, which adds slight overhead.

but wouldn’t the lone INSERT also auto-start and auto-commit a transaction? I wonder what was the performance improvement between Django call and raw INSERT execution. I’d be surprised if it was visible — what is it doing then?

riking

My guess from afar is that the client library is waiting for unnecessary protocol roundtrips before letting the application code move on to the next step, turning a 1-RTT BEGIN; INSERT; COMMIT, or a 1-RTT INSERT with the database adding the autotransaction, into 3RTT.