The Limits of NTP Accuracy on Linux
38 points by antonmosich
38 points by antonmosich
Does anyone know, please, how common it is for datacenters to have things like a second low-latency network that is used only for things like PTP so that they won’t be disrupted by normal traffic?
As I understand it, in PTP the switches should actively assist in the time distribution network so that queueing delay is accounted for, so there should be no need for a quiet time network. (Tho I suppose ultra-high precision time might still need one.)
AIUI the process is that the switch records the time when each incoming PTP packet when is received, before it is enqueued. When the packet is sent the timestamp is updated to based on the amount of time the packet spent waiting. So even if there is variable latency, the packet is modified to appear as if it was switched in constant time.
The main sources of error are the accuracy of the PTP switch’s internal clock reference, and any variable delay components that may occur before the incoming time is recorded and after the outgoing timestamp is set. This is why e.g. 10GBase-T is bad, since the complex encoding needed to work over copper wire introduces variable delays in the PHY (which can’t be compensated for in the MAC). And don’t try to run PTP over a half duplex connection ;)
I don’t know for real, but some searching suggests it’s not common but you can find it if you need it. More common appears to be PTP-capable hardware running on the normal network? Most datacenters I’ve seen have a second network already for server management systems, and most servers I’ve seen have two ethernet ports, so adding a third would start needing more specialized hardware. I’ll defer to u/fanf on whether PTP would benefit from a dedicated network or not.
Oh yeah if you already have a separate network for server management and that’s 99% underutilised by design then you might as well stick timing on it too.
Great article; overlaying the three GPS module signals on an oscilloscope is genius for demonstrating the noise levels. People often assume too much of the PPS signal while not paying enough attention to the latency on that signal.
short of an atomic clock
These are easier to get than you might think, e.g., this component for $2k USD, that measures only 2×2×1”.
These are easier to get than you might think, e.g., this component for $2k USD, that measures only 2×2×1”.
I raised an eyebrow at this bullet in the feature list: “Excellent short-term stability”.
“Short-term stability” here refers to the how much the frequency drifts over a short time. All oscillators drift with age (even these ones), but that’s pretty easy to compensate for. What you want to look for is the “Allan Deviation” in the datasheet for the actual measurements.
This is actually a very hard problem I’ve spent a lot of time grappling with at work. :-) A millisecond of precision, ez, no problem. A microsecond, that’s fairly difficult, but doable if you pay for the right hardware and know how to use it well. Less than that and you start getting into Actual Science Territory where stuff like physics can matter.
For example one confounding factor he didn’t mention is RTK, a method for using small-scale differences in GPS satellite orbits to improve accuracy of positioning (and therefore also time, though not many people use it for that). It accounts for minor local differences in GPS satellite position from tides, solar weather, and other random stuff, so basically requires a local side-channel of some kind giving you regular updates. Some GPS receivers have this; for example your cell phone gets them automatically over the air, which is one reason why Google Maps knows exactly which street you are on, rather than having a 50 m uncertainty bubble around you at all times.
It’s good to keep in mind Grace Hopper’s lesson on nanoseconds.
The LeoNTP looks nice. I might consider that if my Time Machines TM1000A wears out, but it’s half the price, and it’s “good enough” GPS-based NTP.