Distributed Hash Table (DHT) traditionally found enough applicabilities in decentralized distributed systems. In DHT data are distributed among several nodes in the system via hashing techniques. Currently, DHT shows increasing popularity in the modern storage systems aka NoSql Systems. This post reveals some of the uses and applications of DHT in today's modern storage systems.
Hash-Table versus Distributed Hash Table:
Hash Table is one of the efficient and well-known data structure in programming languages. Hash Table let you write a data with a key, which makes it easy to lookup and retrieve the written data using the key. The key could be anything, more like a meaningful name representing the value as shown in the following figure.
The core of today’s modern storage systems is based on this Key-Value pair, Hash-Table like structure. The modern storage systems are of many forms such as Key-Value store, Column-Family store, Document database. In all this type the fundamental concept is the same. No need to worry anymore about Schemas and Relations, just write a data with a key and use the key to read the data when needed. Some of the databases are Distributed Hash Table but not all of them.
So which one are Distributed Hash Table?
If you dig into the working of Hash-Table a bit, you will get the answer. Distributed Hash Table, follows the similar concept of associative array as Hash-Table does in storing and locating a value with the help of a key. The concept that Hash-Table follows to store and retrieve a value in a software program is extended to store and retrieve the value across multiple machines. This is the reason, why it is called Distributed Hash Table.
As you know, when the amount of data needs to be stored and processed increases, people prefer vertical scaling and go for Distributed Storage Systems. When data are distributed over several machines, and when a client gives you a key and asks you to retrieve the value of that key, in which machine you will go and search? So it is very important to know what data are stored on each machine. In the vocabulary of distributed storage systems, people call it as ‘Global Conceptual Schema’. Global Conceptual Schema (GCS) is nothing but a chart about data stored on each machines. Maintaining this chart involves lot of challenges such as where do you store it? Whether you distributed it or store it in a centralized machine? Storing it in a centralized machine would be a single point of failure. If distributed, maintaining the consistency among the distributed information would be a nightmare. In either case, how many lookups to the chart are possible per second? and so on. Distributed Hash Table could easily tackle all these type of challenges.
Working of Distributed Hash Table:
Distributed Hash Table is nothing but an imaginary ring, where each machine is placed with a token number. The token numbers are assigned by hashing a unique id of the machine, which is its IP-address. While Reading or Writing a data, find the token number of the data by hashing the unique id of the data, which is the data-key. Remember in modern storage systems, data-key is the fundamental for reading and writing a data. Now just match the token number of the data against the token number of the machines over the ring to find the node. And there you go..!
For your better understanding, I am reproducing an example given by datastax from their website. For example, if your database cluster have four nodes, the nodes will be placed in the imaginary ring with a token number as shown in the figure. So before Read/ Write operations, hashing the data-key of the data will yield a token number within this range. So you just have to associate both the token numbers to find the appropriate node as shown in the figure.
Figure 2: DHT: Mapping data to corresponding nodes
At this point, if you would have read my earlier post, an obvious question that would come to your mind. Using DHT we can find the node that is responsible for a data item. But, in a distributed data store, a data has to be stored on more than one node to avoid single point of failure. So how could we find the remaining nodes where the copies have to stored. In distributed data store vocabulary, it was called ‘Data Replicas’. Actually, finding the remaining node (the data replicas) are not so tricky. If you could find the first node, you could play around different techniques to choose which nodes could become the replicas for the particular data. The specific vocabulary for this technique is called ‘Replication Strategy’. Some of the strategies used by Cassandra are as follows.
Assuming you chose the number of data replicas as three.
- Simple Strategy: In simple strategy (also known as Rack-Unaware Strategy), if you find the first node (replica), the next two replicas will be the two adjacent nodes of the first one irrespective of its rack or datacenter placement.
- OldNetworkTopology Strategy: OldNetworkTopology strategy (also known as Rack-Aware strategy) places the replica in different racks of same datacenter. However, if the cluster is spanned between more than one datacenter, the strategy also makes sure, one of the replicas are chosen from different datacenter. This strategy increases availability of the data in the event of failure of entire rack and/or datacenter.
- NetworkTopology Strategy: NetworkTopology strategy (also known as DataCentric Strategy) takes the input from the user directly about how the replica should be chosen.
At this point, you might end up with a question that ‘How Cassandra identifies a node belongs to particular a Rack in a particular Datacenter?’ If you google about ‘Snitches’ in Cassandra you will get all your answers. For your quick reference, I am explaining on of the snitches called ‘RackInferringSnitch’ with the help of an image taken from this link.
As you can see from the above image, the IP address octets are used to identify it. Snitch helps Cassandra to map nodes IP addresses to its corresponding Rack in a Datacenter. So that you can sort the replicas based on which one is closest to you (proximity).
The main intension of this sorting is two:
- Reduce the request latency: Of course, the more close the replica is, the faster would be the query response.
- Increase the data availability: In case, a complete rack or a whole datacenter is down, you still have the data available on the other rack/ datacenter.
There were more interesting things involved in these techniques, but since the post goes longer than expected, hoping you would have got the main idea, i prefer to stop this post here.