## best possible way to find the duplicates in an array

This is the most frequently askedÂ question in many interviews. This can be done in many ways and simplest way is the brute-force approach by taking 2 for loops with complexity of O(n*n). But the interviewer looks for the solution with the best time complexity and also without using any existing data structures that are available in Java or other languages.

Now I am going to show you the approach with O(n) complexity at the cost of O(n) space. Full source code can be downloaded from here.

Step1: Construct a simple linked list which holds the integer value.

```    class MyInt {
int value;
MyInt next;

public MyInt(int value) {
this.value = value;
}
}
```

Step2: Initialize the above array with the size of the input array.

```hashArray = new MyInt[size];
```

Step3: Loop through the input array and insert elements into the newly constructed hash array

• In this method we construct the hashcode of each element and then find out the appropriate bucket (array index) in which the element has to be stored.
```    public void findDuplicates(Integer currentElement) {
int idx = getSupplementalHash(currentElement.hashCode()) % size;
//int idx = (element.hashCode()) % size;
MyInt existingElement = hashArray[idx];

for(;existingElement != null; existingElement = existingElement.next) {
if(existingElement.value == currentElement) {
// duplicate
System.out.println(currentElement+" this is a duplicate");
return;
} else {
existingElement.value+" to the list");
}
}

MyInt mi = new MyInt(currentElement);
// insert element at the head to avoid tail traversing
mi.next = hashArray[idx];
hashArray[idx] = mi;

System.out.println("------------------------------------");
}
```

Example #1
Input Array: {1, 2, 3, 3, 4, 5, 6, 8, 8, 1, 2,8}
Output:

```3 is a duplicate
8 is a duplicate
1 is a duplicate
2 is a duplicate
8 is a duplicate
```

But there is a catch here, if the hashcode is not properly implemented by the user the complexity might go upto O(n*n), see the below example

Example #2: In findDuplicates() method above, comment line #2 and uncomment line #3
Input array: {10,0,100,20,20}
Output

```Input array size is 5
------------------------------------
identified collision, adding 10 to the list
------------------------------------
identified collision, adding 0 to the list
identified collision, adding 10 to the list
------------------------------------
identified collision, adding 100 to the list
identified collision, adding 0 to the list
identified collision, adding 10 to the list
------------------------------------
20 is a duplicate
```

If you notice above output, to find if 20 is a duplicate, it traversed 4 times which is nothing but O(n) for each element and hence it goes back to O(n*n) overall.

Disadvantage: The downside with this approach when hashcodes are not proper (like all ending with 0, all are even, all are odd, etc), they all go into one bucket and thus becoming another array (with complexity O(n)) instead or hasharray (with complexity O(1)). Because of this our target complexity O(n) will be lost.

To address this we will re-compute the hash code again at our end to make sure it is well distributes the index, here is the code

```    private int getSupplementalHash(int h) {
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
```

Example #2 (run with getSupplementalHash() method): : In findDuplicates() method above, comment line #3 and uncomment line #2
Output

```Input array size is 5
------------------------------------
identified collision, adding 10 to the list
------------------------------------
------------------------------------
------------------------------------
20 is a duplicate
```

If you notice, the collision is reduced to a greater extent.

Mawazo

Mostly technology with occasional sprinkling of other random thoughts

amintabar

Amir Amintabar's personal page

101 Books

Reading my way through Time Magazine's 100 Greatest Novels since 1923 (plus Ulysses)

Seek, Plunnge and more...

My words, my world...

ARRM Foundation

Do not wait for leaders; do it alone, person to person - Mother Teresa

Executive Management

An unexamined life is not worth living â€“ Socrates

javaproffesionals

A topnotch WordPress.com site

thehandwritinganalyst

Just another WordPress.com site

coding algorithms

"An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem." -- John Tukey