Considering Strictly Monotonic Time

32 points by bakaq

tobin_baker

I already use this rule with hybrid logical clocks: every read advances the time.

tobin_baker

This is semi-OT but thought I would drop it here since it burned me: don't use CLOCK_MONOTONIC_RAW on Linux to try to get better perf than CLOCK_MONOTONIC. The latter has a vDSO fastpath and the former does not.

A kernel bug for this on x86_64 has been open since 2018:

https://bugzilla.kernel.org/show_bug.cgi?id=198961.

However, it appears to be fixed on ARM (but I haven't verified):

https://lore.kernel.org/linux-arm-kernel/20190621095252.32307-1-vincenzo.frascino@arm.com/.

Nezteb

I like the Elixir/Erlang method of strictly monotonic time tagging.

tldr: You store monotonic time alongside a monotonic unique integer (and optionally an offset). Example Livebook code: https://gist.github.com/Nezteb/aff615f423fbd3e2fb219414e2172829

cpurdy

We ran into this before Linux added monotonic clocks. Our distributed system software would see occasional wild swings in time, including > 60s jumps forwards and backwards. This required a bunch of clock hacks that were accreted over time:

Disallowing backwards movement, as per the article.
Forcing some forwards movement under some situations, as per the article.
Avoiding forwards jumps, by smoothing them out over time.
Using a daemon to regularly poll the clock service to make sure that a detected forward jump isn't a forward jump.

(etc.)

In the absence of these, an NTP event could cause a jump in time that would cause all sorts of timeouts to be instantaneously hit, which as you could imagine can be a bad thing in a distributed system. And time going backwards was even worse, but that one was pretty easy to deal with.

edwintorok

Reading the linked issues: it seems like the problem is that monotonic time on Windows is broken, QueryPerformanceCounter should return monotonic values, but it doesn't (when running on a hypervisor?)

lorddimwit

The Time to Live in the IP header is specified in seconds, but also that every system that processes the packet must decrement the field, it is effectively a hop count (even if the system can “process the packet in less than a second,” it was a different time).

bal-e

For work, we're considering adding some fake-time infrastructure for testing; in case we go for a "time only moves when the test fn adjusts it" model, we wouldn't be able to rely on strict monotonic time. Aside from that, strictness would be cool. I wonder if there are more use cases for it.

algesten

Can Windows handle 1ns time resolution these days?

maxm

nice, i’ve used this trick to make sure observably information is correctly ordered

YogurtGuy

I think this only works if you assume that getting the current time can only be done once per nanosecond. I think this is probably true in most cases, but might not be true if rdtsc is used as the clock source.

bal-e
I did a bit of digging, and it seems it's fine for two reasons:
1. rdtsc is quite slow; its throughput seems to be around 30 cycles, across various CPUs (see https://uops.info/table.html). Even at a frequency of 5GHz, you'd get 5-6ns between reads.
2. rdtsc counts individual clock cycles, not nanoseconds. If you're using the raw rdtsc count, you'll be fine. It's only a little bit more dicey if you want to convert to nanoseconds (and I'm not sure there's a good way to do that, because of frequency scaling).

robey

Yeah, this was a well-known trick in the distributed systems world. Usually you can just use microseconds and add a counter to the end, which covers for the extremely unlikely case that you're tracking multiple events each nanosecond/microsecond, without having to move your clock forward. :)