best possible way to find the duplicates in an array
November 17, 2015 Leave a comment
This is the most frequently asked question in many interviews. This can be done in many ways and simplest way is the brute-force approach by taking 2 for loops with complexity of O(n*n). But the interviewer looks for the solution with the best time complexity and also without using any existing data structures that are available in Java or other languages.
Now I am going to show you the approach with O(n) complexity at the cost of O(n) space. Full source code can be downloaded from here.
Step1: Construct a simple linked list which holds the integer value.
class MyInt { int value; MyInt next; public MyInt(int value) { this.value = value; } }
Step2: Initialize the above array with the size of the input array.
hashArray = new MyInt[size];
Step3: Loop through the input array and insert elements into the newly constructed hash array
- In this method we construct the hashcode of each element and then find out the appropriate bucket (array index) in which the element has to be stored.
public void findDuplicates(Integer currentElement) { int idx = getSupplementalHash(currentElement.hashCode()) % size; //int idx = (element.hashCode()) % size; MyInt existingElement = hashArray[idx]; for(;existingElement != null; existingElement = existingElement.next) { if(existingElement.value == currentElement) { // duplicate System.out.println(currentElement+" this is a duplicate"); return; } else { System.out.println("identified collision, adding "+ existingElement.value+" to the list"); } } System.out.println("adding "+currentElement+" to the list"); MyInt mi = new MyInt(currentElement); // insert element at the head to avoid tail traversing mi.next = hashArray[idx]; hashArray[idx] = mi; System.out.println("------------------------------------"); }
Example #1
Input Array: {1, 2, 3, 3, 4, 5, 6, 8, 8, 1, 2,8}
Output:
3 is a duplicate 8 is a duplicate 1 is a duplicate 2 is a duplicate 8 is a duplicate
But there is a catch here, if the hashcode is not properly implemented by the user the complexity might go upto O(n*n), see the below example
Example #2: In findDuplicates() method above, comment line #2 and uncomment line #3
Input array: {10,0,100,20,20}
Output
Input array size is 5 adding 10 to the list ------------------------------------ identified collision, adding 10 to the list adding 0 to the list ------------------------------------ identified collision, adding 0 to the list identified collision, adding 10 to the list adding 100 to the list ------------------------------------ identified collision, adding 100 to the list identified collision, adding 0 to the list identified collision, adding 10 to the list adding 20 to the list ------------------------------------ 20 is a duplicate
If you notice above output, to find if 20 is a duplicate, it traversed 4 times which is nothing but O(n) for each element and hence it goes back to O(n*n) overall.
Disadvantage: The downside with this approach when hashcodes are not proper (like all ending with 0, all are even, all are odd, etc), they all go into one bucket and thus becoming another array (with complexity O(n)) instead or hasharray (with complexity O(1)). Because of this our target complexity O(n) will be lost.
To address this we will re-compute the hash code again at our end to make sure it is well distributes the index, here is the code
private int getSupplementalHash(int h) { // This function ensures that hashCodes that differ only by // constant multiples at each bit position have a bounded // number of collisions (approximately 8 at default load factor). h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); }
Example #2 (run with getSupplementalHash() method): : In findDuplicates() method above, comment line #3 and uncomment line #2
Output
Input array size is 5 adding 10 to the list ------------------------------------ identified collision, adding 10 to the list adding 0 to the list ------------------------------------ adding 100 to the list ------------------------------------ adding 20 to the list ------------------------------------ 20 is a duplicate
If you notice, the collision is reduced to a greater extent.
Full source code can be downloaded from here.
Recent Comments