Consistent Hashing

22 points by joshleeb

YogurtGuy

In a real application, we'd probably want to find a different hash function. md5 is relatively slow, and we don't need any of its cryptographic guarantees in this use case.

As an aside, it's not a good idea to use md5 in any new applications. There are practical collision attacks against it (so it's not secure), and it's also slower than actually secure cryptographic hash functions (like blake3).

neil_f

for a consistent hashing application what hash function would you recommend? (ideally one with a Rust implementation) Or can you direct to resources on it? Thanks!
- cpurdy
  
  Take a look at the smhasher project, which is a testing framework for hash functions.
  
  I’m planning to use RapidHash for a new project based on its results.
izabera

security in that sense is not at all a concern in this case. let's say you have 1000 machines: you use blake3, it returns 256 bits, you map them onto your cluster with ~1000 machines, so you're discarding 246 bits at least, or 96% of all the security you got from blake3
- YogurtGuy
  
  I agree that security is likely not a concern in this use-case; I just wanted to make an aside about hashing :)
  
  That being said, I'm not sure that "discarding bits" is necessarily the best way to evaluate a hash in this case. Some (non-cryptographically-secure) hash functions are biased in particular bits, which would lead to poor load balancing, which is bad even for a non-cryptographic setting.
fanf

Damien Gryski has a nice survey of algorithmic tradeoffs in consistent hashing
cpurdy

Consistent hashing using a hash ring is one of the most complicated ways to achieve consistent hashing. Amazon did a great disservice with their Dynamo paper in this regard.

It's far simpler, in my experience, to simply use a hash function with bucket assignment, i.e. you hash to a value, modulo the bucket count you get a bucket id, which is an index into an array of bucket owners (node ids in a a distributed system).

There are some clever mathematical ways to avoid this, but the amount of work goes up steeply, both in terms of implementation and in terms of runtime cost.
- alanwang599
  
  Isn’t the main reason for consistent hashing for nodes entering and leaving the system. For example, if we use just the modulo to determine which node our data is on when nodes enter and leave the system our data will move around considerably where as with consistent hashing we limit the amount our data needs to move when we add/remove nodes.
  - cpurdy
    
    Isn’t the main reason for consistent hashing for nodes entering and leaving the system.
    
    Yes (!!!), that is why I said: use a hash function with bucket assignment, i.e. you hash to a value, modulo the bucket count you get a bucket id, which is an index into an array of bucket owners (node ids in a a distributed system).
    
    So assume you have some key k, that key hashes hash = f(k), that hash modulos bucketId = hash % bucketCount, and that bucket id looks up the node id: nodeId = nodeAssignment[bucketId].
    
    For an HA system, the node assignments are ordered by owner, first backup, second backup, etc.: primaryNodeId = nodeAssignment[bucketId][0]
    
    fanf
    
    I think you are suggesting Maglev hashing (summary) which has the caveat that it’s a challenge to construct and distribute the lookup table.
    
    cpurdy
    
    Thanks for the links. I’ll take a look at Maglev.
    
    I was talking about the consistent hashing that we rolled out in early 2002 at Tangosol, in the Coherence product. It was the first clustered in memory database, and did HA and load balancing (horizontal scale out) from the beginning.
    
    it’s a challenge to construct and distribute the lookup table.
    
    One does no such thing; such a thing is a very non-distributed / single threaded / shared-everything way of thinking.
    
    Nodes simply request additional responsibilities if they have capacity and judge the data to be at risk or the balance to be unfair. Over time, the cluster grows and shrinks, and it self heals, self balances, etc. Almost all work is completely point-to-point within the cluster, which scales well to the limit of the fabric.
    
    The table itself is only a few kilobytes, but it’s managed independently (locally) by every node.
    
    benl
    
    It seems like you’re describing the naive solution from the introduction to this post.
    
    cpurdy
    
    I’m not sure how to explain it any more clearly and simply than I already have. Please re-read what I posted until the difference between “bucket id” and “node id” are clear.
    
    A hash bucket is a hash bucket.
    
    A node is a node in a distributed system.
    
    Those two concepts are different.
    
    The “naive” approach is possible, and it’s how memcached (2003) was implemented, but I agree that it’s a very non-ideal approach.
    
    martinkirch
    
    That's the best introduction to the concept i've seen ; nice !