Summary of the Amazon DynamoDB Service Disruption in Northern Virginia (US-EAST-1) Region

31 points by dryya


fanf

One of the under-appreciated features of the standard DNS UPDATE protocol is that it is transactional. (AWS Route53 has proprietary APIs for managing DNS data.)

There are big caveats with DNS UPDATE, tho: the update is atomic on each authoritative server individually, but the secondaries will apply the atomic update later than the primary; and if the update modifies multiple RRsets then resolvers might cache a mixture of old and new states.

The main way transactional updates are useful is to eliminate any window where records are missing. (You can avoid spurious NXDOMAIN without transactions in many cases by adding new records before deleting old ones, but that doesn’t work if you are changing from A to CNAME or vice versa.) This is nice because transactional updates make it really easy to ensure an update is idempotent: the UPDATE message can simply say: delete all the old records regardless of what data they contain, and add these new records; and the change happens as an atomic unit.

The risk with that kind of idempotent update is that last-write-wins is often not what you want. But that can be fixed with a more obscure feature: an UPDATE can include prerequisites. An UPDATE can check that records are present or absent or contain particular data before applying the change, and the prerequisite checks can examine different names or record types from the ones that are to be changed. This can be useful to prevent TOCTOU races.

So, for example, if you have a versioned RRset that might be updated concurrently by multiple clients, you can stuff the version number in (say) an HINFO record next to the A records, and your UPDATE message can have a prerequisite that the HINFO matches the preceding version before replacing the HINFO and A records as an atomic unit.

Sirikon

The race condition in the service updating DNS records sounds obvious. Surprising that no one caught that one before (or during the design phase, honestly). Also surprising that this didn't happen before.