Hashmap Internal Implementation Analysis in Java

Full detailed analysis of java.util.HashMap’s implementation, its internals and working concepts.

java.util.HashMap.java

    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

It says the maximum size to which hashmap can expand, i.e, till 2^(30) = 1,073,741,824

java.util.HashMap.java

    /**
     * The default initial capacity - MUST be a power of two.
     */
    static final int DEFAULT_INITIAL_CAPACITY = 16;

    /**
     * The load factor used when none specified in constructor.
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

It says default size of an array is 16 (always power of 2, we will understand soon why it is always power of 2 going further) and load factor means whenever the size of the hashmap reaches to 75% of its current size, i.e, 12, it will double its size by recomputing the hashcodes of existing data structure elements.

Hence to avoid rehashing of the data structure as elements grow it is the best practice to explicitly give the size of the hashmap while creating it.

Do you foresee any problem with this resizing of hashmap in java? Since java is multi threaded it is very possible that more than one thread might be using same hashmap and then they both realize the need for re-sizing the hashmap at the same time which leads to race condition.

What is race condition with respect to hashmaps? When two or more threads see the need for resizing the same hashmap, they might end up adding the elements of old bucket to the new bucket simultaneously and hence might lead to infinite loops. FYI, in case of collision, i.e, when there are different keys with same hashcode, internally we use single linked list to store the elements. And we store every new element at the head of the linked list to avoid tail traversing and hence at the time of resizing the entire sequence of objects in linked list gets reversed, during which there are chances of infinite loops.

For example, lets assume there are 3 keys with same hashcode and hence stored in linked list inside a bucket [below format is in object_value(current_address, next_address) ]
Initial structure: 1(100, 200) –> 2(200, 300) –> 3(300, null)
After resizing by thread#1: 3(300, 200) –> 2(200, 100) –> 1(100, null)
When thread#2 starts resizing, its again starts with 1st element by placing it at the head:
1(100, 300) –> 3(300, 200) –> 2(200, 100) ==> which becomes a infinite loop for next insertion and thread hangs here.

java.util.HashMap.java

	/**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    public V put(K key, V value) {
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key.hashCode());
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }

Here it
1. re-generates the hashcode using hash(int h) method by passing user defined hashcode as an argument
2. generates index based on the re-generated hashcode and length of the data structure.
3. if key exists, it over-rides the element, else it will create a new entry in the hashmap at the index generated in Step-2

Step-3 is straight forward but Step-1&2 needs to have deeper understanding. Let us dive into the internals of these methods…

Note: These two methods are very very important to understand the internal working functionality of hashmap in openjdk

java.util.HashMap.java

	/**
     * Applies a supplemental hash function to a given hashCode, which
     * defends against poor quality hash functions.  This is critical
     * because HashMap uses power-of-two length hash tables, that
     * otherwise encounter collisions for hashCodes that do not differ
     * in lower bits. Note: Null keys always map to hash 0, thus index 0.
     */
    static int hash(int h) {
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

	/**
     * Returns index for hash code h.
     */
    static int indexFor(int h, int length) {
        return h & (length-1);
    }

here:
‘h’ is hashcode(because of its int data type, it is 32 bit)
‘length’ is DEFAULT_INITIAL_CAPACITY(because of its int data type, it is 32 bit)

Comment from above source code says…
Applies a supplemental hash function to a given hashCode, which defends against poor quality hash functions. This is critical because HashMap uses power-of-two length hash tables, that otherwise encounter collisions for hashCodes that do not differ in lower bits. What do this means???

It means that if in case the algorithm we wrote for hashcode generation does not distribute/mix lower bits evenly, it will lead to more collisions. For example, we have hashcode logic of “empId*deptId” and if deptId is even, it would always generate even hashcodes because any number multiplied by EVEN is always EVEN. And if we directly depend on these hashcodes to compute the index and store our objects into hashmap then
1. odd places in the hashmap are always empty
2. because of #1, it would leave us to use only even places and hence double the number of collisions

For example,
I am considering some hash codes which our code might generate, which are very valid as they are different, but we will prove all these turns out as as same bucket very soon

1111110111011010101101010111110
1100110111011010101011010111110
1100000111011010101110010111110

I am considering these sequences directly (without using hash function) and pass it for indexFor method, where we do AND operation between ‘hashcode’ and ‘length-1(which will always give sequence of 1’s as length is always power of 2)’. Why is hashmap length always power of 2? it is because when we do (user_hash_code & size-1), it will consider only the first few bits (in this case 4 bits) which are used to decide the bucket location.
As we are considering the length as default length, i.e, 16, binary representation of (16-1) is 1111
this is what happens inside indexFor method

1111110111011010101101010111110 & 0000000000000000000000000001111 = 1110
1100110111011010101011010111110 & 0000000000000000000000000001111 = 1110
1100000111011010101110010111110 & 0000000000000000000000000001111 = 1110

From this we understand that all the objects with these different hascodes would have same index which means they would all go into the same bucket, which is a BIG-FAIL as it leads to arraylist complexity O(n) instead of O(1)

Comment from above source code says…
that otherwise encounter collisions for hashCodes that do not differ in lower bits.

Notice this sequence of 0-15 (2-power-4), its the default size of Hashtable

0000 - 0	1000 - 8
0001 - 1	1001 - 9
0010 - 2	1010 - 10
0011 - 3	1011 - 11
0100 - 4	1100 - 12
0101 - 5	1101 - 13
0110 - 6	1110 - 14
0111 - 7	1111 - 15

If we notice here, hashmap with power-of-two length 16(2^4), only last four digits matter in the allocation of buckets, and these are the 4 binary lower bit digit variations that play prominent role in identifying the right bucket.

Keeping the above sequence in mind, we re-generated the hashcode from hash(int h) by passing the existing hascode which makes sure there is enough variation in the lower bits of the hashcode and then pass it to indexFor() method , this will ensure the lower bits of hashcode are used to identify the bucket and the rest higher bits are ignored.
For example, taking the same hascode sequences from above example

Our hash code 1111110111011010101101010111110 when regenerated with hash(int h) method, it generates new hashcode ==> 1111001111110011100110111011010

this is what happens inside indexFor method
1111110111011010101101010111110 ==> 1111001111110011100110111011010
1100110111011010101011010111110 ==> 1100000010010000101101110011001
1100000111011010101110010111110 ==> 1100110001001000011011110001011

passing these set of new hashcodes to indexFor() method
1111001111110011100110111011010 & 0000000000000000000000000001111 = 1010
1100000010010000101101110011001 & 0000000000000000000000000001111 = 1001
1100110001001000011011110001011 & 0000000000000000000000000001111 = 1011

so here it is clear that because of the regenerated hashcode, the lower bits are well distributed/mixed leading to unique index which leads to different buckets avoiding collisions.

Why only these magic numbers 20, 12, 7 and 4. It is explained in the book: “The Art of Computer Programming” by Donald Knuth.
Here we are XORing the most significant bits of the number into the least significant bits (20, 12, 7, 4). The main purpose of this operation is to make the hashcode differences visible in the least significant bits so that the hashmap elements can be distributed evenly across the buckets.

java.util.HashMap.java

	/**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    public V put(K key, V value) {
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key.hashCode());
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }

Going back to previous steps:
1. re-generates the hashcode using hash(int h) method by passing user defined hashcode as an argument
2. generates index based on the re-generated hashcode and length of the data structure.
3. if key exists, it over-rides the element, else it will create a new entry in the hashmap at the index generated in STEP-2

Steps1&2 must be clear by now.

Step3:
What happens when two different keys have same hashcode?
1. if the keys are equal, i.e, to-be-inserted key and already-inserted key’s hashcodes are same and keys are same (via reference or via equals() method) then over-ride the previous key-value pair with the current key-value pair.
2. if keys are not equal, then store the key-value pair in the same bucket as that of the existing keys.

When collision happens in hashmap? it happens in case-2 of above question.

How do you retrieve value object when two keys with same hashcode are stored in hashmap? Using hashcode wo go to the right bucket and using equals we find the right element in the bucket and then return it.

How do different keys with same hashcode are stored in hashmap? Usual answer is in bucket but technically they are all stored in a single linked list. Little difference is that insertion of new element to the linked list is made at the head instead of tail to avoid tail traversal.

What is bucket and what can be maximum number of buckets in hashmap? A bucket is an instance of the linked list (Entry Inner Class in my previous post) and we can have as many number of buckets as length of the hashmap at maximum, for example, in a hashmap of length 8, there can be maximum of 8 buckets, each is an instance of linked list.

java.util.HashMap.java

	/**
     * Returns the value to which the specified key is mapped,
     * or {@code null} if this map contains no mapping for the key.
     *
     * <p>More formally, if this map contains a mapping from a key
     * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
     * key.equals(k))}, then this method returns {@code v}; otherwise
     * it returns {@code null}.  (There can be at most one such mapping.)
     *
     * <p>A return value of {@code null} does not <i>necessarily</i>
     * indicate that the map contains no mapping for the key; it's also
     * possible that the map explicitly maps the key to {@code null}.
     * The {@link #containsKey containsKey} operation may be used to
     * distinguish these two cases.
     *
     * @see #put(Object, Object)
     */
    public V get(Object key) {
        if (key == null)
            return getForNullKey();
        int hash = hash(key.hashCode());
		int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
                return e.value;
        }
        return null;
    }

1. re-generates the hashcode using hash(int h) method by passing user defined hashcode as an argument
2. generates index based on the re-generated hashcode and length of the data structure.
3. point to the right bucket, i.e, table[i], and traverse through the linked list, which is constructed based on Entry inner class
4. when keys are equal and their hashcodes are equal then return the value mapped to that key

Also look at
creating our own hashmap in java

password policy with regular expression

Password policy to match minimum 8 characters password with alphabets in combination with numbers or special characters

package regexp;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * Helper for password policy
 * 
 * @author ntallapa
 */
public class PwdPolicyVerifier {

	private Pattern pattern;
	private Matcher matcher;

	/**
	 * password policy to match 8 characters with alphabets in combination with numbers or special characters
	 *
	 * (?=.*[a-zA-Z]) means one or more alphabets, it can be small or CAPS
	 * (?=.*[0-9@#$%]) means one or more numerals or special characters
	 * {8,} means password length should be minimum 8
	 */
	private String pwd_policy = "((?=.*[a-zA-Z])(?=.*[0-9@#$%]).{8,})";

	public PwdPolicyVerifier() {
		pattern = Pattern.compile(pwd_policy);
	}

	/**
	 * Verify the validity of the given client password
	 * @param password
	 * @return boolean if test string passed the password policy or not
	 */
	public boolean verify(String password) {
		matcher = pattern.matcher(password);
		if (matcher.matches()) {
			System.out.println(password + " matched the regexp");
		} else {
			System.out.println(password + " didnt match the regexp");
		}
		return matcher.matches();

	}

	// some test cases
	public static void main(String[] args) {
		PwdPolicyVerifier pv = new PwdPolicyVerifier();
		pv.verify("welcome1");
		pv.verify("welcomeA");
		pv.verify("Welcome@");
		pv.verify("12341234");
		pv.verify("####$$$$");
		pv.verify("Welcome@12");
		pv.verify("WELCOME12");
	}
}

Output:

welcome1 matched the regexp
welcomeA didnt match the regexp
Welcome@ matched the regexp
12341234 didnt match the regexp
####$$$$ didnt match the regexp
Welcome@12 matched the regexp
WELCOME12 matched the regexp

how to print leaf nodes in a binary search tree

The logic here is simple. Recursively go through left and then right elements, when the base condition is met (i.e, when r==null), add a check to see if the node’s left and right wings are null and if they are null print the value of the node.

Input binary search tree:
BST_LeafNodes

    // print leaf nodes
    public void leafNodes(BinaryNode r) {
    	if(r!= null) {
    		leafNodes(r.left);
    		leafNodes(r.right);

    		if(r.left == null && r.right == null) {
    			System.out.println(r.element);
    		}
    	}
    }

Note: This logic is same is recursive post order traversal, but instead of just printing the node value after left&right traversals we check to see if left&right children are null and then print the node value.

find intersection of elements in two arrays

Assume two arrays, ‘A’ of size ‘m’ and ‘B’ of size ‘n’

case 1: Two arrays are unsorted

Here we pick one of the array and load it into hash implemented data structure, i.e, HashSet and then proceeds further to find intersection of elements.

Since hashed data structure’s complexity is O(1), the total complexity to find intersection of elements the complexity would become

Algorithm Time Complexity: O(m) + O(n)*O(1)
Algorithm Space Complexity: O(m)

	public void intersect1(int[] a, int[] b) {
		HashSet<Integer> hs = new HashSet<Integer>(); 
		for (int i = 0; i < b.length; i++) {
			hs.add(b[i]);
		}
		
		for (int i = 0; i < a.length; i++) {
			if(hs.contains(a[i])) {
				System.out.println(a[i]+" is present in both arrays");
			}
		}	
	}

pros:
best algorithm when compared to all others provided one implements appropriate hashcode method
cons:
when the size of the data structure grows too high, it might lead to hash collisions

Case 2: Two arrays are sorted

For every element in array A, do a binary search in array B (instead of linear search as shown in case-1), so here for every value in ‘A’ we go through log(n) interations in ‘B’ to find out if element in ‘A’ exists in ‘B’

Algorithm Time Complexity: O(m)*O(logn)


		public void intersect(int[] a, int[] b) {
			for(int i=0; i<a.length; i++) {
 				int val = binarySearch(b, a[i], 0, a.length);
 				if(val == b[j]) {
 					System.out.println(a[i]+" is present in both arrays");
 					break;
 				}
 			}
 		}

case 3: Two arrays are unsorted

Brute force algorithm: For every element in array A, traverse through all the elements of array B and find out if the element exists or not.
Algorithm Time Complexity: O(m)*O(n)

In Java

		public void intersect(int[] a, int[] b) {
			for(int i=0; i<a.length; i++) {
				for(int j=0; j<b.length; j++) {
					if(a[i] == b[j]) {
						System.out.println(a[i]+" is present in both arrays");
						break;
					}
				}
			}
		}

pros:
works well for smaller arrays
works for unsorted arrays
cons:
order/complexity is directly proportional to the product of size of arrays, this can take huge time for arrays of larger lengths

Acknowledgement modes in Java Message Server (JMS)

 

There are two types – auto-acknowledge – dups-ok-acknowledge

auto-acknowledge: the acknowledge message from MDB to the JMS is sent as soon as the message is received

dups-ok-acknowledge: the acknowledge message from MDB to the JMS can be sent any time later and hence there is chance of JMS sending the message again and MDB should be able to handle the duplicate messages

 

where do the cookies get stored in the request object

 

Cookies are stored in the headers. In the response header it is stored at Set-Cookie and in the request object it is stored under Cookie header

Is the exception implicit object available in all the JSP pages?

 

No, it is avaliable only in the error pages. that is in the pages where this directive is present


<%@ page isErrorPage="true" import="java.io.*" %>

how do you navigate to error pages in JSP

 

In the page where exception might occur, add a page directive as below


<%@ page errorPage="ExceptionHandler.jsp" %>

 

Here ExceptionHandler.jsp is the actual error JSP to which exception implicit object is made available when exception occurs. Inside this page we need to add a page directive as shown below


<%@ page isErrorPage="true" import="java.io.*" %>

Introduction to J2EE (2.x), its Architecture and Component Diagram

Java 2 Platform enterprise edition is an integrated set of individual APIs required for developing server side MULTI-TYRED, DISTRIBUTED, TRANSACTIONAL and SECURE applications.

J2EE Component Diagram

J2EE is released in 4 parts:
• J2EE Specification: which declares all the APIs included in the release
• J2EE Reference Implementation: Open source implementation for the APIs given under J2EE spec and is provided by SUN i.e, j2sdkee
This is given to prove that APIs given under the spec are properly implemented
To help 3rd party vendors (as a reference) to implement the spec
• J2EE Test Suit: It provides compatibility test suit which can be used to confirm the J2EE product.
• J2EE Blue Print: This includes some best practices of the J2EE APIs provided

J2EE Architecture

Components:
Component Provider: It is responsible for implementing components i.e, responsible to implement core business logic. Here the provider is required to be strong in programming, component architectures, component life cycles and with good business domain knowledge. The outcome from this provider will be JAR files containing components
Application Assembler: It will take the components developed by the component provider and assemble them into an application. Here he may be required to write some simple client components. It uses J2EE standards to integrate components like some parts of deployment descriptors and some configuration files.
Container provider: is responsible to implement low level services which are required to run the component like security, transactions, life cycle management etc..
Server provider: responsible to provide system level services. Server and container providers must be same
Deployer: is responsible to take the application assembled by the assembler and configure it so that it can be placed on top of a server/container. He is required to write vendor specific deployment descriptors and configuration files.
Administrators: is responsible to configure the resources required for the application in the server, like creating users, configuring connection pools, etc..

Inversion of Control and Dependency Injection

Inversion of control is too broad term for the frameworks to claim that they are based on IoC. It depends on what control of the application are you inverting? Especially with new set of containers or frameworks IoC is about how they lookup and inject a particular 3rd party provided plugins into the application. These container/framework providers ensure that the user follows some conventions that allows separate assembler module (from the container providers) to inject dependencies into their applications.
As a result we now see IoC as too generic name and we need some other specific name to name this and for this reason we came up with Dependency injection (DI).

Let us start with some basic example

    public Class StudentLister {
    private StudentFinder finder;

    public StudentLister() {
        finder = new ElectronicBranchStudents();
    }
}

Here the StudentLister class is dependent on the interface StudentFinder and its implementation. We’d prefer if the class is just dependent on the interface alone but not the implementation class. But how can me make StudentLister class aware of the the implementation details of finder interface. We describe this scenario as the PLUGIN.
The aim is that the StudentLister class should not be linked to the finder implementation class at compile time, so that the implementation class can be changed at any time later based on the requirement.
Expanding this problem to the real world scenario, there will be many services and components. In each case we abstract these components through interfaces and access via interfaces. But if we wanna deploy these components in different ways then we should have plugin mechanism in place.
This is the place where set of all new light weight containers came in by implementing IoC design pattern.

There are three different ways of injecting the dependencies:

  • Constructor Injection
  • Injection through Setter Methods
  • Interface Injection

Setter Injection via Spring framework

public class StudentLister {
    private StudentFinder finder;
    ApplicationContext context = new FileSystemXmlApplicationContext("src\\applicationconfig.xml");

    public StudentLister() {
        //finder = new ElectronicsStudents();

        // Get the instance using spring framework
        finder = (ElectronicsStudents)context.getBean("students");
    }

    public static void main(String[] args) {
        new StudentLister();
    }
}

Here there is no compile time dependency on the finder interface implementation and hence can be plugged in into any implementation class at later point of time.

Here is the complete source code
https://github.com/ntallapa/ForumWorkspace/tree/master/Spring082201DI

 

Mawazo

Mostly technology with occasional sprinkling of other random thoughts

amintabar

Amir Amintabar's personal page

101 Books

Reading my way through Time Magazine's 100 Greatest Novels since 1923 (plus Ulysses)

Seek, Plunnge and more...

My words, my world...

ARRM Foundation

Do not wait for leaders; do it alone, person to person - Mother Teresa

Executive Management

An unexamined life is not worth living – Socrates

javaproffesionals

A topnotch WordPress.com site

thehandwritinganalyst

Just another WordPress.com site

coding algorithms

"An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem." -- John Tukey