Experimenting with Robin Hood hashing

8 points by dryya


cpurdy

I don't mean to imitate an engineer, but there are a number of problems with this example:

    struct Entry {
        union Storage {
            SingleEntry single;
            EntryTree* tree;
            EntryList* list;
        }
        uint8_t tag; // single entry vs tree vs list
    }
    struct SingleEntry {
        uint64_t hash;
        ValueT value;
    }
    struct EntryTree {
        uint16_t count;
        Entry entry; // array; each is single or list (begins with uint64_t hash)
    }
    struct EntryList {
        uint64_t hash;
        uint16_t count;
        ValueT value; // array
    }

Those minor nits aside, I did like the article, and it does a better job explaining the topic than most I've seen on it.

What I was really hoping to see in the article was a discussion of the impact of table fullness on the efficiency, i.e. efficiency with zero collisions, and a curve showing efficiency all the way to a fully saturated table. Mainly because I have no clue what those curves would look like, and I've always wondered.

justinhj

Nice explanation. Here is a db project in the wild that uses robin hood hashing https://github.com/spotify/sparkey