Building a Fast Hash Map with Hopscotch Hashing in C++

Introduction to Hopscotch Hashing

Hopscotch hashing is a technique used to implement fast and efficient hash maps and hash sets. It was designed to overcome the limitations of traditional hashing techniques, which can suffer from poor performance due to collisions and cache misses. Hopscotch hashing uses a combination of hashing and probing to find empty slots in the hash table, making it an attractive choice for applications that require high-performance and low-latency data structures.

How Hopscotch Hashing Works

Hopscotch hashing works by dividing the hash table into a series of "hopscotch blocks" or "neighborhoods". Each block contains a fixed number of slots, and the hash function maps each key to a specific block. When a collision occurs, the algorithm probes the neighboring slots in the block to find an empty slot. If all slots in the block are occupied, the algorithm will "hop" to the next block and repeat the probing process. This technique allows for fast and efficient lookup, insertion, and deletion of elements in the hash table.

Implementing Hopscotch Hashing in C++

Here is an example implementation of a hopscotch hash map in C++:

#include <iostream>
#include <vector>

class HopscotchHashMap {
private:
    int blockSize;
    int numBlocks;
    std::vector<std::vector<std::pair<int, int>>> table;

public:
    HopscotchHashMap(int blockSize, int numBlocks) : blockSize(blockSize), numBlocks(numBlocks) {
        table.resize(numBlocks, std::vector<std::pair<int, int>>(blockSize));
    }

    void insert(int key, int value) {
        int blockIndex = key % numBlocks;
        int slotIndex = key % blockSize;

        // Probe neighboring slots in the block
        for (int i = 0; i < blockSize; i++) {
            int probeIndex = (slotIndex + i) % blockSize;
            if (table[blockIndex][probeIndex].first == 0) {
                table[blockIndex][probeIndex] = std::make_pair(key, value);
                return;
            }
        }

        // Hop to the next block if all slots are occupied
        blockIndex = (blockIndex + 1) % numBlocks;
        insert(key, value);
    }

    int lookup(int key) {
        int blockIndex = key % numBlocks;
        int slotIndex = key % blockSize;

        // Probe neighboring slots in the block
        for (int i = 0; i < blockSize; i++) {
            int probeIndex = (slotIndex + i) % blockSize;
            if (table[blockIndex][probeIndex].first == key) {
                return table[blockIndex][probeIndex].second;
            }
        }

        // Return -1 if key is not found
        return -1;
    }
};

int main() {
    HopscotchHashMap hashMap(16, 1024);
    hashMap.insert(1, 10);
    hashMap.insert(2, 20);
    std::cout << hashMap.lookup(1) << std::endl;  // Output: 10
    std::cout << hashMap.lookup(2) << std::endl;  // Output: 20
    return 0;
}

This implementation demonstrates the basic principles of hopscotch hashing and provides a simple example of how to use it in a C++ application. In practice, you may want to add additional features such as handling collisions, implementing deletion, and optimizing performance.

Practical Considerations

When implementing hopscotch hashing in a real-world application, there are several practical considerations to keep in mind. First, the choice of block size and number of blocks will significantly impact performance. A larger block size can reduce the number of cache misses, but may also increase the number of collisions. Second, the hash function used to map keys to blocks should be designed to minimize collisions and ensure a uniform distribution of keys across the blocks. Finally, the implementation should be optimized for the specific use case and hardware platform to achieve the best possible performance.