Delve into HashMap: ConcurrentHashMap walkthrough

ConcurrentHashMap uses the same concept of HashMap to store key value pairs. Before starting this section, please go through https://delveintojava.blogspot.com/2019/11/hashmap-walkthrough.html for better understanding of ConcurrentHashMap.

As you might have noticed, the put and remove methods in HashMap are not synchronized. That means, if multiple threads try to update the same HashMap object, there is a chance of data inconsistency. So, HashMap is not preferred for multi-threaded program. But you might have heard of a legacy class called Hashtable which is built just for that purpose. So, why do we need one more class to serve the purpose of concurrency ? Let's see few methods inside Hashtable class, and try to find out the flaw.

Did you see the pattern ? Take a look at one more.

Yes, you got that right. Every method is synchronized in Hashtable. So, if a thread tries to update Hashtable, all other operations are blocked for the rest of the threads since Hashtable object's lock is used for synchronization. This ensures that all the threads work with the latest data in Hashtable, however, this makes Hashtable slow and inefficient. Can we improve it's efficiency ? You now know that would be the exact reason why ConcurrentHashMap was introduced in java 1.5.

Let's find out the improvement ConcurrentHashMap has to offer.
We know that HashMap contains buckets to store the entries. Each bucket is like a different store house for the entries whose key evaluates to the same hash value and in turn the same bucket index. So, instead of locking the complete object, we can lock just the bucket that we are working on. This will allow just as many threads as the number of buckets to simultaneously operate on the Map object.

We know the default capacity of HashMap is 16, so is the initial capacity and concurrency level of ConcurrentHashMap. That means 16 threads can simultaneously update ConcurrentHashMap, provided each key used by different threads maps to different buckets.
Using significantly higher value of concurrency level can increase space and time complexity, and significantly lower value can lead to thread contention.

You can increase the concurrency level of ConcurrentHashMap with the three parameter constructor, last parameter of which serves the purpose.

This is all good. Now let's get to the part where the magic happens. We will take example of put method.

You see "synchronized" somewhere in there. You probably figured things out. But let's get there one step at a time.

1. The first line is important, it tells that ConcurrentHashMap does not allow null keys and null values. It throws a null pointer exception when you try to insert null key or value. Why does ConcurrentHashMap does not allow null keys and values.
One thing to note is that get operation in ConcurrentHashMap does not block, so it may overlap with update operations (put, remove). So, retrieval operation returns the value of the most recent update operation.

This causes ambiguity if we allow null keys and values. If get(key) returns null, you can't tell whether the key maps to null or the key isn't mapped at all. In case of HashMap, you can check this via contains() method, but in case of ConcurrentHashMap, the value might get changed between calls due to some other threads operating on that same node.

For example, after checking contains, the value might not be there because of removal from other threads. You might say the same thing could happen in case of HashMap as well. Well, in case of normal HashMap, we are assuming that the program is not the multi-threaded one. If multi-threading is your goal, then HashMap shouldn't be used in the first place.

2. The next line is calculation of hash value for the key.

3. Next few lines contain 3 conditional checks:
a. If the ConcurrentHashMap table is null or it's length is 0, then initialize the table, and the for loop continues.
b. If the Node in the index is null, then insert the new node with given key, value and hash as the first node in that index.
c. If re-size is in progress, the method helpTransfer helps to return the correct index for the new key value pairs and the for loop continues.

4. If the conditions in step 3 does not satisfy, that means there are already some entries present in the bucket. And, we need to check whether the bucket already contains the key, if it does then replace the value for that entry, and if it does not then insert the new entry with given key value pairs in the end.

The implementation is pretty much the same as a HashMap. However the update operation is blocked for other threads using the synchronized keyword. Here the lock of the first node is used for the synchronization. Since, we acquired the lock on the first entry of the bucket, all other threads trying to update on the same bucket will get blocked.

Get implementation is not synchronized.

However, get will always take the latest value for the key. This is ensured by the use of volatile keyword for the entry's value as you can see in it's Node's member variables.

References: http://androidxref.com/9.0.0_r3/xref/libcore/ojluni/src/main/java/java/util/concurrent/ConcurrentHashMap.java
http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/concurrent/ConcurrentHashMap.java

Delve into HashMap

Friday, December 13, 2019

ConcurrentHashMap walkthrough

No comments:

Post a Comment