Creating our own hashmap in java

March 11, 2013 by Niranjan Tallapalli 33 Comments

This is an attempt to come up with my own hashmap in java. It serves all basic needs of original java.util.HashMap with O(1) complexity in read operations. Full source code can be downloaded from here

To deep dive into the fundamentals of java.util.HashMap refer to this article

STEP1: Create a simple data structure with key, value and which can also extend as linked list

line #2,#3: references to key and value
line #4: link to itself (used when there is collision in the hashmap)

    class Entry {
    	Employee key;
        String value;
        Entry next;
 
        Entry(Employee k, String v) {
            key = k;
            value = v;
        }
 
        public String getValue() {
            return value;
        }
 
        public void setValue(String value) {
            this.value = value;
        }
 
        public Employee getKey() {
            return key;
        }
    }

STEP2: Couple of important utility methods

getSupplementalHash(): Supplemental hash function used to defend against the poor quality hash given by the user
getBucketNumber(): It makes sure the bucket number falls within the size of the hashmap based on the value of hash

    private int getSupplementalHash(int h) {
		// This function ensures that hashCodes that differ only by
		// constant multiples at each bit position have a bounded
		// number of collisions (approximately 8 at default load factor).
		h ^= (h >>> 20) ^ (h >>> 12);
		return h ^ (h >>> 7) ^ (h >>> 4);
	}
    
    private int getBucketNumber(int hash) {
		return hash & (SIZE - 1);
	}

STEP3: PUT Method

line #3: get the user defined hashcode
line #4: defend against the poor quality hash functions defined by the user (if Employee.hashcode() is poor, this call will do a better job)
line #8: get the bucket index
line #12: If the control reaches into for loop, it means that it should either be a duplicate or a collision
line #14-15: Its a duplicate, it will be replaced with old one
line #20: Its a collision, new pair will be appended to the list
line #28: Its either a new_pair/collision, add it to the map

    public void put(Employee key, String value) {
		// get the hashcode and regenerate it to be optimum
		int userHash = key.hashCode();
		int hash = getSupplementalHash(userHash);

		// compute the bucket number (0-15 based on our default size)
		// this always results in a number between 0-15
		int bucket = getBucketNumber(hash);
		Entry existingElement = table[bucket];

		for (; existingElement != null; existingElement = existingElement.next) {

			if (existingElement.key.equals(key)) {
				System.out
						.println("duplicate key value pair, replacing existing key "
								+ key + ", with value " + value);
				existingElement.value = value;
				return;
			} else {
				System.out.println("Collision detected for key " + key
						+ ", adding element to the existing bucket");

			}
		}

		//
		System.out.println("PUT adding key:" + key + ", value:" + value
				+ " to the list");
		Entry entryInOldBucket = new Entry(key, value);
		entryInOldBucket.next = table[bucket];
		table[bucket] = entryInOldBucket;
	}

STEP4: GET Method

line #3: defend against the poor quality hash functions defined by the user (if Employee.hashcode() is poor, this call will do a better job)
line #7: get the bucket index
line #14: This loop is iterated as many times as the number of collisions for every key
line #23: If nothing is found

    public Entry get(Employee key) {
		// get the hashcode and regenerate it to be optimum
		int hash = getSupplementalHash(key.hashCode());

		// compute the bucket number (0-15 based on our default size)
		// this always results in a number between 0-15
		int bucket = getBucketNumber(hash);

		// get the element at the above bucket if it exists
		Entry existingElement = table[bucket];

		// if bucket is found then traverse through the linked list and
		// see if element is present
		while (existingElement != null) {
			System.out
					.println("Traversing the list inside the bucket for the key "
							+ existingElement.getKey());
			if (existingElement.key.equals(key)) {
				return existingElement;
			}
			existingElement = existingElement.next;
		}

		// if nothing is found then return null
		return null;
	}

STEP5: Employee object as the key to our custom map (TESTING)
hashCode(): Make sure the hash code falls with 0-10 so that we can reproduce collision easily.
equals(): based on id and name of the person

	static class Employee {
		private Integer id;
		private String name;

		Employee(Integer id, String name) {
			this.id = id;
			this.name = name;
		}

		@Override
		public int hashCode() {
			// this ensures all hashcodes are between 0 and 15
			return id % 10;
		}

		@Override
		public boolean equals(Object obj) {
			Employee otherEmp = (Employee) obj;
			return this.name.equals(otherEmp.name);
		}

		@Override
		public String toString() {
			return this.id + "-" + name;
		}
	}

STEP6: Test Code

line #13-14: Demonstrate duplicate key replacement in hashmap
line #30-42 and #31-35: Demonstrate collision in hashmap
line #44-49: Demonstrate collision along with duplication in hashmap

		TMHashMap tmhm = new TMHashMap();

		System.out.println("============== Adding Element ===================");
		Employee e1 = new Employee(100, "Niranjan");
		tmhm.put(e1, "dept1");

		// duplicate
		System.out.println("============== Adding Duplicate =================");
		Employee e1_dup = new Employee(100, "Niranjan");
		tmhm.put(e1_dup, "dept12");
		// the above value "dept12" should replace the old value "dept1"
		Entry e = tmhm.get(e1_dup);
		System.out.println("GET element - " + e.getKey() + "::" + e.getValue());

		System.out.println("============== Adding Elements ==================");
		Employee e2 = new Employee(102, "Sravan");
		tmhm.put(e2, "dept3");

		Employee e3 = new Employee(104, "Ananth");
		tmhm.put(e3, "dept2");

		Employee e4 = new Employee(106, "Rakesh");
		tmhm.put(e4, "dept5");

		Employee e5 = new Employee(108, "Shashi");
		tmhm.put(e5, "dept2");

		// collision with e2
		System.out.println("============== Adding Collisions =================");
		Employee e2_collision = new Employee(112, "Chandu");
		tmhm.put(e2_collision, "dept16");
		e = tmhm.get(e2_collision);
		System.out.println("GET element - " + e.getKey() + "::" + e.getValue());

		// collision with e3
		Employee e3_collision = new Employee(114, "Santhosh");
		tmhm.put(e3_collision, "dept9");
		e = tmhm.get(e3_collision);
		System.out.println("GET element - " + e.getKey() + "::" + e.getValue());

		System.out
				.println("============== Adding Duplicate in Collision ===================");
		Employee e3_collision_dupe = new Employee(124, "Santhosh");
		tmhm.put(e3_collision_dupe, "dept91");
		e = tmhm.get(e3_collision_dupe);
		System.out.println("GET element - " + e.getKey() + "::" + e.getValue());

OUTPUT: of this program

============== Adding Element ===================
PUT adding key:100-Niranjan, value:dept1 to the list
============== Adding Duplicate =================
duplicate key value pair, replacing existing key 100-Niranjan, with value dept12
Traversing the list inside the bucket for the key 100-Niranjan
GET element - 100-Niranjan::dept12
============== Adding Elements ==================
PUT adding key:102-Sravan, value:dept3 to the list
PUT adding key:104-Ananth, value:dept2 to the list
PUT adding key:106-Rakesh, value:dept5 to the list
PUT adding key:108-Shashi, value:dept2 to the list
============== Adding Collisions =================
Collision detected for key 112-Chandu, adding element to the existing bucket
PUT adding key:112-Chandu, value:dept16 to the list
Traversing the list inside the bucket for the key 112-Chandu
GET element - 112-Chandu::dept16
Collision detected for key 114-Santhosh, adding element to the existing bucket
PUT adding key:114-Santhosh, value:dept9 to the list
Traversing the list inside the bucket for the key 114-Santhosh
GET element - 114-Santhosh::dept9
============== Adding Duplicate in Collision ===================
duplicate key value pair, replacing existing key 124-Santhosh, with value dept91
Traversing the list inside the bucket for the key 114-Santhosh
GET element - 114-Santhosh::dept91

Full source code can be downloaded from here

After receiving a lot of requests/questions related to hashmap behavior on collisions and element duplication, I updated the code. I hope one can take it further from here 🙂

Filed under algorithms and data structures

33 Responses to Creating our own hashmap in java

Pingback: Hashmap Internal Implementation Analysis in Java | java tech stack
Koushik says:

July 22, 2013 at 4:28 PM

Hi, How to calculate the time o-notation complexity for your algm / program

Reply
- Niranjan Tallapalli says:
  
  July 22, 2013 at 4:58 PM
  
  It is O(1) for insertion and retrieval. I am making use of implicit hashcode method of string object. While inserting and retrieving I will get the hashcode from String implicit hashcode method and look for it in the entry map which is O(1).
  
  Reply
tarun says:

August 2, 2013 at 8:22 PM

Nicely code and very to understand.Thanks for sharing …..

Reply
Abhik says:

September 23, 2013 at 6:31 AM

Niranjan,

Great job on the blogs. One question though: this custom hashmap can only hold upto 16 (SIZE) entries- is it an implementation detail that you want your followers to be able to pick up?

Reply
- Niranjan Tallapalli says:
  
  September 23, 2013 at 6:54 AM
  
  thank you Abhik, I took the size as 16 to make it simple (2^4), and why powers of 2 is explained in another article ‘https://tekmarathon.wordpress.com/2012/12/04/hashmap-internal-implementation-analysis-in-java/’.
  
  Reply
Nitish says:

October 26, 2013 at 9:13 AM

Hi Niranjan,
I really like the way you implemented it. I just have one quick question.
In put method you are just checking key for the head element and not for all elements. Is there any specific reason or I am not able to get the correct idea ?

Thanks

Reply
Udhav says:

November 4, 2013 at 12:08 PM

I think the put element should check for HashCode collision instead of checking the first Entry from the table [].

Reply
Eugene says:

November 30, 2013 at 3:50 AM

I Don’t understand: in this line – “Entry e = table[hash];”
I get this exeption table[hash] = java.lang.IndexOutOfBoundsException : Invalid array index: -5 ?

Reply
Vladislaaaav says:

January 30, 2014 at 7:04 PM

Hi,

thank you for your posts, especially the one explaining HashMap implementation. That was what i was looking for for a long time. Regarding your hashmap

1) In your put method you check equality of keys only for the 1st entry in the list. So, put doesn’t replace same keys afterwards, am I right?
2) you don’t use that stuff with hash function, so size == power of 2 is irrelevant

I’m just trying to understand if i get things correctly 🙂 Thank you once more

Reply
Raja says:

February 10, 2014 at 3:20 PM

really handy

Reply
- Pranav says:
  
  February 26, 2014 at 2:55 AM
  
  hi am not sure about the get function. When there is already something there, it means it is a collision right? so when e!=null we should do the link list part of the code, and when e is null, we put it there as it is. Please correct me if my understanding is wrong..
  
  Reply
HM says:

February 17, 2014 at 12:46 AM

In line 77, you’re only checking for the first bucket, isn’t it? What if the buckets with keys were as follows. 1–>11–>21 (Chaining). And then you wish to insert a duplicate key 21 with another value. Your code will not reach that Entry node because you’ve written a while loop to iterate over the linked list in line 88. Please correct me if I am wrong. Thanks. And, great explanation. Thank you, again. 🙂

Reply
dhruvaa says:

February 20, 2014 at 1:07 PM

Always nicely explained Niranjan! 🙂

umm.. would you guide me with any pointers for developing understanding for algorithm development, plz? 🙂

Reply
- Niranjan Tallapalli says:
  
  March 4, 2014 at 1:59 AM
  
  go through ‘Bookmarks’ tab, you will get some good tutorials there
  
  Reply
Pranav says:

February 26, 2014 at 2:55 AM

hi am not sure about the get function. When there is already something there, it means it is a collision right? so when e!=null we should do the link list part of the code, and when e is null, we put it there as it is. Please correct me if my understanding is wrong..

Reply
Abc says:

April 6, 2014 at 2:49 AM

Thanks for the great article.

I have been looking for such detailed explanations everywhere and i finally found this and your other one explaining the detailed implementation of the orginal hashmap.

I have one question though:

You said it does not use tail traversing when inserting new node in the linked list as it inserts at the head always. But in this implementation, why are you inserting at the tail?
Was this deliberate or Am I missing something.

Can you please answer this?

Thanks

Reply
- Niranjan Tallapalli says:
  
  November 17, 2015 at 1:08 PM
  
  yes, this is intentional just to keep the logic simple but in actual java source code it is the other way.
  
  Reply
Sumit says:

April 9, 2014 at 2:37 PM

I don’t think this implementation will work with every string. Taking a big string as key sometimes gives negative index.

Reply
- Niranjan Tallapalli says:
  
  November 17, 2015 at 3:09 PM
  
  changed the indexing from mod to AND, check out the new code
  
  Reply
Sumit says:

April 9, 2014 at 2:41 PM

Can you bit explain on hashing part…can you share some good example on hashing

Reply
kk says:

April 11, 2014 at 9:41 AM

Hi,
In the Put method, if a duplicate value is inserted, you are overwriting the old value with the new value. The duplicate value could be anywhere in the linked list. But you are check the condition only on the first entry in the linked list. Correct me if I am wrong here.

Reply
Himansu Nayak says:

April 22, 2014 at 2:21 PM

Hi,
The below line i think can be better written
Entry e = table[hash];

// if bucket is found then traverse through the linked list and
// see if element is present
while(e != null) {
if(e.key.equals(k)) {
return e;
}
e = e.next;

—————————–

for (Entry e = table[hash]; e != null; e = e.next) {
if (e.key.equals(k)) {
return e;
}
}

Reply
Sayooj Valsan says:

June 4, 2014 at 7:39 PM

There is a bug in the implementation. At line 82, the implementation just traverse to the end, it doesn’t check if the key exists. This could lead to chaining duplicate values.

The correct implementation is to check if the key exists inside the while loop. Please correct it.

Reply
Dheeraj says:

June 23, 2014 at 4:29 PM

Hello Sir,

many times i came across this statement about hashmap “HashMap doesn’t append the new element at tail instead it append new element at head to avoid tail traversing”.
What if i add new entry to last of linked list. because in any case i need to traverse through linked list for checking key’s hash and key by equals.

basically where this tail traversing applies, what can be the problem if i write the code in following way,

public V put(K key, V value) {

if (key == null)
return putForNullKey(value);
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
Entry temp; //this temp will be updated and will store last element in linked list
for (Entry e = table[i]; e != null; e = e.next) {
temp = e;
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
//will add new entry to this temp (temp.e = new Entry(hash, key, value, null))
modCount++;
addEntry(hash, key, value, i);
return null;

}

Reply
bhuvnesh says:

June 24, 2014 at 4:42 AM

Hi TM,

please update the above code @line 83 with this :

if(e.next == null){
break;
}else{
e = e.next;
}

because it give null pointer if there will be more than one value in any linked list.

OR is there any alternative for this or there is some lack in my understanding of above code

please let me know thanks

Reply
Randheer says:

June 26, 2014 at 1:20 AM

Hey,

Nice implementation!!

Two things i would like to mention as modifications here are :

1. In the put() method you should first check for null key, if key is null store it at table[0] index as null key has hascode 0. This satisfies the HashMap accepting one null key criteria.

2. When you find an Entry present at a location in table, you start iterating the linked list and insert the new Entry at the end of the linked list in that bucket. Instead, you can add this new Entry to the head of the linked list. This would avoid unnecessary traversal of list. The same logic is used in the actual HashMap implementation of Java.

Please correct me if I am wrong. Thanks!!

Reply
Neelabh Singh says:

July 15, 2014 at 9:49 AM

In line 77. we are comparing the given k with the existing Key in the linked list.
This only done for the head of linked list. There is also possibility, Inside the linked list given K
and Key will be same so We have to override old value of Key (Key, oldValue) with new value of Key
(K, newValue );
Suppose for hash is same for the following (key, pairs) So they are in same linked list
(Key, Value)
(12, 4), (4, 5) (6, 8)
Here 12, 12, 4, 6 , 6 are key. Suppose they have same hashcode.
(12, 4)–> (4, 5)–>(6,8)
Now some new pairs comes
(12,5) 12 has same hashcode so updated value of head of the linkedlist with new value. So linked
list will be
(12, 5)–> (4, 5)–>(6,8)

(9, 3)

So insert new pair at the end of list
(12, 5)–> (4, 5)–>(6,8)–> (9,6)

Now (4, 9) comes linkedlist already contains it so update it with new value
(12, 5)–> (4, 9)–>(6,8)–> (9,6)// I thing this is not implemented in your code, may be I am wrong. Please
conform it.

Reply
Neelabh Singh says:

July 15, 2014 at 10:03 AM

I implemented the above my comments please check it,
public void put(String k, String v)
{
int hash=k.hashCode%SIZE;
Entry e=table[hash];
if(e!=null)
{
while(e.next!=null)
{
if(e.key.equal(k){
e.value=v;
return ;
}
e=e.next;
}
Entry entryInOldBucket=new Entry(k,v);
e.next=entryInOldBucket;
}
else
{
Entry entryInNewBucket=new Entry(k,v);
table[hash]=entryNewBucket;
}
}

Reply
Ming Fang says:

February 16, 2015 at 7:31 PM

Thanks for providing such brief code to implement HashMap.
I met a issue with the code.
key.hashCode(), if I put “auburn” as key, it will generate a value of -1406852093. This value can be used to get index.
Do you have any idea? Thank you.

Reply
Abhishek says:

May 30, 2015 at 8:46 AM

Hi Neelab Your solution is correct the if(e.key.equal(k) check needs to be present inside the while loop

Reply
Pingback: HashMap | dev4future
Zeroe says:

October 21, 2015 at 11:50 AM

Very clear, thanks. One question I have is how would you go about creating a HashMap that needs to store key value pairs of generic types? How would you use the hashcode() in the generic case?

Reply