best possible way to find the duplicates in an array

This is the most frequently asked question in many interviews. This can be done in many ways and simplest way is the brute-force approach by taking 2 for loops with complexity of O(n*n). But the interviewer looks for the solution with the best time complexity and also without using any existing data structures that are available in Java or other languages.

Now I am going to show you the approach with O(n) complexity at the cost of O(n) space. Full source code can be downloaded from here.

Step1: Construct a simple linked list which holds the integer value.

class MyInt {
int value;
MyInt next;

public MyInt(int value) {
this.value = value;
}
}

Step2: Initialize the above array with the size of the input array.

hashArray = new MyInt[size];

Step3: Loop through the input array and insert elements into the newly constructed hash array

• In this method we construct the hashcode of each element and then find out the appropriate bucket (array index) in which the element has to be stored.
public void findDuplicates(Integer currentElement) {
int idx = getSupplementalHash(currentElement.hashCode()) % size;
//int idx = (element.hashCode()) % size;
MyInt existingElement = hashArray[idx];

for(;existingElement != null; existingElement = existingElement.next) {
if(existingElement.value == currentElement) {
// duplicate
System.out.println(currentElement+" this is a duplicate");
return;
} else {
existingElement.value+" to the list");
}
}

MyInt mi = new MyInt(currentElement);
// insert element at the head to avoid tail traversing
mi.next = hashArray[idx];
hashArray[idx] = mi;

System.out.println("------------------------------------");
}

Example #1
Input Array: {1, 2, 3, 3, 4, 5, 6, 8, 8, 1, 2,8}
Output:

3 is a duplicate
8 is a duplicate
1 is a duplicate
2 is a duplicate
8 is a duplicate

But there is a catch here, if the hashcode is not properly implemented by the user the complexity might go upto O(n*n), see the below example

Example #2: In findDuplicates() method above, comment line #2 and uncomment line #3
Input array: {10,0,100,20,20}
Output

Input array size is 5
------------------------------------
identified collision, adding 10 to the list
------------------------------------
identified collision, adding 0 to the list
identified collision, adding 10 to the list
------------------------------------
identified collision, adding 100 to the list
identified collision, adding 0 to the list
identified collision, adding 10 to the list
------------------------------------
20 is a duplicate

If you notice above output, to find if 20 is a duplicate, it traversed 4 times which is nothing but O(n) for each element and hence it goes back to O(n*n) overall.

Disadvantage: The downside with this approach when hashcodes are not proper (like all ending with 0, all are even, all are odd, etc), they all go into one bucket and thus becoming another array (with complexity O(n)) instead or hasharray (with complexity O(1)). Because of this our target complexity O(n) will be lost.

To address this we will re-compute the hash code again at our end to make sure it is well distributes the index, here is the code

private int getSupplementalHash(int h) {
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}

Example #2 (run with getSupplementalHash() method): : In findDuplicates() method above, comment line #3 and uncomment line #2
Output

Input array size is 5
------------------------------------
identified collision, adding 10 to the list
------------------------------------
------------------------------------
------------------------------------
20 is a duplicate

If you notice, the collision is reduced to a greater extent.

Analogy between Binary Search Tree and Quicksort Algorithm

Binary Search Tree(BST) and Quicksort Algorithm(QA) are similar in nature in all the factors. They have same time complexities and same number of comparisons at every insertion. If one finds the quicksort logic difficult to understand and remember then he can always remember BST and memorize QA logic. We will now see how QA algorithm can be seen as simple BST by looking at both the concepts individually.

Binary Search Tree(BST)
Binary Search Tree(BST) property is that every inserted node in the BST would be inserted to the left of its parent node if its less than the parent node value and inserted to the right of its parent node if its more.

Binary Search Tree (constructed from random input array) Binary Search Tree (constructed from sorted input array) Complexity of BST: Randomized input would lead to complete BST in which case the BST insertion would take O(logn) time complexity. And if the input is sorted in some order(ascending or descending) then its going to be the length of the array, i.e, O(n)
So for n inserts it would be O(nlogn) or O(n^2)

Quicksort Algorithm
Now we will look at quick sort logic
Plain Quicksort Algorithm Quicksort algorithm with BST embedded Complexity of Quicksort Algorithm: Randomized input would lead to O(logn) time complexity as seen in above diagram. And if the input is sorted in some order then its going to be length of the array, i.e, O(n)
So for n inserts it would be O(nlogn) or O(n^2)

Understanding quicksort algorithm

I see many people finding it very difficult to remember this algorithm, so is this article. Complete source code with few unit tests can be downloaded here

To put it in a simple way, Quicksort algorithm is as simple as this diagram depicts. Explanation of the above diagram: Every time we are choosing first element as the pivot and making the partition, i.e, elements less than pivot comes to left partition and vice versa. And in what order these elements should be brought into left and right partitions is explained in partition logic below. And partitioning itself is a recursive mechanism we do that as part of actual quicksort logic.
Note: Just because this algorithm is in_place algorithm you might find it bit confusing(but efficient), but otherwise if you choose simplicity you can always implement it in other ways.

Complexity of Quicksort Algorithm: On an average Quicksort Algorithm has the complexity of O(nlogn) and in the worst case it has O(n^2) when the elements of the input array are sorted (ascending or descending order). It is an in place algorithm in the sense it does not takes any additional space.
It is a divide and conquer algorithm where we partition the given array with respect to a particular element(called as ‘PIVOT’) such that the lower partition elements of the array are less than the pivot and upper partition elements of the array are higher than the pivot.

Concept of Partition and Pivot:
Rule of Pivot: There is not any rule of pivot as such but it can be any element of the given array, here I am considering the first element as the pivot in every partition. How choosing of an pivot effects the distribution of the elements in partitioning and its role in the complexity of the quicksort logic will be discussed in future posts.
Rule of Partition: The lower partition should be less than the pivot and upper partition should be higher than the pivot. Running time of partition logic is linear.

Let us take an example array {14,12,16,13,11}
First Iteration: Lets say pivot is always the first element of the array(i.e, 14 in this case);

---------------------------
14  | 12  | 16  | 13  | 11 --> (a is not less than pivot, hence i is stopped
---------------------------   at index 0; a is not greater than pivot,hence
j is stopped at index 5. Now swap a and a)
---------------------------
11  | 12  | 16  | 13  | 14 --> (a is not less than pivot, hence i is stopped
---------------------------   at index 2; a is not greater than pivot,hence
j is stopped at index 3. Now swap a and a.)
---------------------------
11  | 12  | 13  | 16  | 14
---------------------------
0   | 1   | 2   | 3   | 4   -->  Index
---------------------------
At this point index i points to 3 and j points to 2 and since i > j,
partition logic will exit resulting into two partitions
----------
Left Partition:  14  | 12
---------------
Right Partition: 13  | 16  | 11
---------------

Partition logic implemented in Java

/**
* Partition logic
*
* @param a array to be modified based on pivot
* @param left lower bound of the array
* @param right upper bound of the array
* @return the partition index where elements to its left are less than it and
* elements to its right are more than it
*/
public int partition(int[] a, int left, int right) {
// Get the pivot element
int pivot = a[left];

// Break when left is  >  right
while(left  < = right) {
//increment the lower bound till you find the element more than the pivot
while(a[left] < pivot)
left++;
//decrement the upper bound till you find the element less than the pivot
while(a[right] > pivot)
right--;

// swap the values which are left by lower and upper bounds
if(left  < = right) {
int tmp = a[left];
a[left] = a[right];
a[right] = tmp;

//increment left index and decrement right index
left++;
right--;
}
}
return left;
}

Running time of quick sort logic depends on the distribution of elements across the partitions.

Quicksort Logic:
Quicksort Logic is straight forward with couple of recursive calls. Quicksort implementation in java is shown below

/**
* Recursive quicksort logic
* @param a input array
* @param i start index of the array
* @param j end index of the array
*/
public void recursiveQuickSort(int[] a, int i, int j) {
// Handle logic to divide the array
int idx = partition(a, i, j);

// Recursively call quick sort with left part of the divided array
if(i < idx-1) {
recursiveQuickSort(a, i, idx-1);
}

// Recursively call quick sort with right part of the divided array
if(j > idx) {
recursiveQuickSort(a, idx, j);
}
}

First recursive call ‘recursiveQuickSort(a, i, idx-1);’ picks the left partition of the array and repeats the process. And second recursive call ‘recursiveQuickSort(a, idx, j);’ picks the right partition of the array and repeats the process.

Since we are returning index ‘left’, i.e, i=2 from partition() method first recursive call ‘recursiveQuickSort(a, i, idx-1);’ would consider left partition array from index 0 to 1

----------
Left Partition:  14  | 12
----------

Second recursive call ‘recursiveQuickSort(a, idx, j);’ would consider right partition array from index 2 to 4

---------------
Right Partition: 13  | 16  | 11
---------------

Swap every pair of nodes in a single linked list

This algorithm is just the extension of ‘swapping any two given nodes’ in a single linked list.

It can be done in two ways – #1 by swapping the addresses of each node #2 by swapping data of each node. #2 is straight forward and doesnt need any explanation. We will discuss #1 here.

Following diagram depicts what are we trying to do: Let’s say Single Linked List is with 5 nodes n1,n2,n3,n4,n5
Analysis of the algorithm:
#1 – We swap two nodes as usual (n1 and n2 are swapped which leads to n2,n1,n3,n4,n5)
#2 – This is key step. before swapping next two nodes we should remember that n1’s next address should also be changed because when n3&n4 are swapped n1->n3 link will be broken and hence we should take care of this step. (n3 and n4 are swapped and n1’s next is linked to n4 – which leads to n2,n1,n4,n3,n5)

Source Code

public static Node swap(Node n) {
Node buffer = null;
Node newRoot = null;
while (n != null && n.next != null) {
if (buffer != null) {
buffer.next.next = n.next;
}

buffer = n.next;
n.next = n.next.next;
buffer.next = n;
if (newRoot == null)
newRoot = buffer;
n = n.next;
}
return newRoot;
}

algorithm to reverse words of a string

This is a bit tricky algorithm and most frequently asked question in interviews. I will now show how to do it in the best time complexity of [O(2n)]. Complete source code can be downloaded from here.

Here we will make use of algorithm to reverse the string which is explained in our previous posts at O(n/2) complexity.

Steps to solve this algorithm

• take two pointers, one pointing to the start of the array and another to the end of the array and swap the values at respective pointers.
• While traversing the array increment starting pointer and decrement ending pointer and continue swap the values at respective pointers
• continue this till we reach the middle of the array which will result in reversed string
• from above generated reversed string take a pointer which runs through this string and stops at spaces which indicates the occurance of a word
• go back to first three steps to reverse a word within this reversed string and go back to fourth step till the sentence is completed
• by end of this algorithm, last word will not be reversed because there is no space after it and hence reverse the last word at the end

Source Code

/**
* Algorithm to reverse word of a string
* @return reversed words array
*/
public char[] reverseWords(char[] arr) {
// reverse the string
reverse(arr, 0, arr.length / 2, arr.length);

// reverse words of a string
int wordIdx = 0;
int wordMidIdx = 0;
int prevWordLastIdx = 0;

// outer loop to track spaces
for (int sentenceIdx = 0; sentenceIdx < arr.length; sentenceIdx++) {
if (arr[sentenceIdx] != ' ')
continue;

wordIdx = prevWordLastIdx;
int wordLastIdx = sentenceIdx;
wordMidIdx = (sentenceIdx + wordIdx) / 2;
// reverse each word in a string
reverse(arr, wordIdx, wordMidIdx, wordLastIdx);
prevWordLastIdx = sentenceIdx + 1;
}

// reverse last word
wordIdx = prevWordLastIdx;
wordMidIdx = (arr.length + wordIdx) / 2;
reverse(arr, wordIdx, wordMidIdx, arr.length);

return arr;
}

How to arrive at the time complexity?

Step #1: reverse the complete string which is explained in our previous algorithm at O(n/2) complexity. https://tekmarathon.wordpress.com/2012/10/05/algorithm-to-reverse-a-string/

Step #2: keep a pointer which traverses through the string looking for spaces which takes O(n)

Step #3: using the above spaces reverse words within the string at O(n/2). Let us see why this is O(n/2)

'welcome to coding algorithms' would become 'smhtirogla gnidoc ot emoclew'

Now take the string 'smhtirogla gnidoc ot emoclew' and reverse each word within
it with same logic explained in reversing the string. This will take O(n)
iterations, let us see how is it O(n)

At first iteration
'smhtirogla gnidoc ot emoclew' would become 'algorithms gnidoc ot emoclew'
at 5 iterations i.e (length of 'smhtirogla')/2)

At second iteration
'algorithms gnidoc ot emoclew' would become 'algorithms coding ot emoclew'
at 3 iterations i.e (length of 'gnidoc')/2)

At third iteration
'algorithms coding ot emoclew' would become 'algorithms coding to emoclew'
at 1 iterations i.e (length of 'ot')/2)

At fourth iteration
'algorithms coding to emoclew' would become 'algorithms coding to welcome'
at 3 iterations i.e (length of 'emoclew')/2)

total length of the string = 29

total iterations = 12 + spaces = 15 (approximately half the length of the
total string)

Total complexity from Step #1, Step #2 and Step #3 = O(n/2) + O(n) + O(n/2) = O(2n)

algorithm to find if a linked list is cyclic

The best way of doing it is to take two pointers. For every step taken by one pointer, move another pointer two steps. In this manner if there is no cycle in the linked list(i.e, if the list is linear), then by the time first pointer reaches mid if the list, second pointer reaches end of the list.

The following figure depicts the cycle in Single Linked List. ---------------------------------------------------------------
Node        : 1->200 | 2->300 | 3->400 | 4->500 | 5->600 | 6->700 | 7->NULL |
-------------------------------------------------------------
Node Address: 100    | 200    | 300    | 400    | 500    | 600    | 700     |
---------------------------------------------------------------
Iteration-1 : p1     |        | p2     |        |        |        |         |
---------------------------------------------------------------
Iteration-2 :        | p1     |        |        | p2     |        |         |
---------------------------------------------------------------
Iteration-3 :        |        | p1     |        |        |        | p2      |
---------------------------------------------------------------

If you notice by the time the first pointer reaches mid of the list, second pointer has reached end of the list.

---------------------------------------------------------------
Node        : 1->200 | 2->300 | 3->400 | 4->500 | 5->600 | 6->700 | 7->300  |
-------------------------------------------------------------
Node Address: 100    | 200    | 300    | 400    | 500    | 600    | 700     |
---------------------------------------------------------------
Iteration-1 : p1     |        | p2     |        |        |        |         |
---------------------------------------------------------------
Iteration-2 :        | p1     |        |        | p2     |        |         |
---------------------------------------------------------------
Iteration-3 :        |        | p1     |        |        |        | p2      |
---------------------------------------------------------------
Iteration-3 :        |        |        | p2  p1 |        |        |         |
---------------------------------------------------------------

If you notice both the pointers are referring to the same node.

Steps to solve this Algorithm:
– For every step taken by one pointer, move another pointer two steps.
– Continue this till either second_pointer/second_pointer’s next pointer is same as that of the first pointer which means second pointer has come behind the first pointer resulting in a cycle.

Source Code

public static boolean findCylicSLL(Node root) {
if(root == null) {
return false;
}
Node ptr1 = root;

Node ptr2 = root.next.next;
while(ptr1 != null && ptr2 != null) {
if(ptr2 == ptr1 || ptr2.next == ptr1) {
return true;
}
ptr1 = ptr1.next;
ptr2 = ptr2.next.next;
}

return false;
}

A small change in this same logic will lead us to another important trick:
How do you find exact middle element of the list in O(n/2) complexity?

Steps to solve this Algorithm:
– For every step taken by one pointer, move another pointer two steps.
– Continue this till either second_pointer is null which means it reached end of the list and therefore first_pointer will be exactly in the middle of the list

public static boolean findMiddleElementOfList(Node root) {
if(root == null) {
return false;
}
Node ptr1 = root;
Node ptr2 = root.next.next;

while(ptr2 != null) {
ptr1 = ptr1.next;
ptr2 = ptr2.next.next;
}

// When ptr2 is null, it means ptr1 reached mid of the list
System.out.println("ptr1: "+ptr1);
}

spiral traversal of binary tree

This can be done using two stacks in iterative manner.

The below diagram depicts the spiral traversal of BST Steps to solve
push level_1 nodes (i.e, only root node) into stack_1;
next by popping elements of level_1, push children of level_l in to stack_2 (i.e, level_2);
next by popping elements of level_2, push children of level_2 in to stack_2 (i.e, level_3);
… so on

Code:

/**
* Algorithm to print binary tree in spiral manner
*
* @param node root node of the binary tree
*/
public void printTreeInSpiralForm(BinaryNode node) {
Stack<BinaryNode> stack1 = new Stack<BinaryNode>();
Stack<BinaryNode> stack2 = new Stack<BinaryNode>();
stack1.push(node);

while (!stack1.empty() || !stack2.empty()) {
//
while (!stack1.empty()) {
node = stack1.pop();
if(node != null) {
System.out.print(node.element+", ");
if(node.left != null)
stack2.push(node.left);
if(node.right != null)
stack2.push(node.right);
}
}

while (!stack2.empty() && node != null) {
node = stack2.pop();
if(node != null) {
System.out.print(node.element+", ");
if(node.left != null)
stack1.push(node.right);
if(node.right != null)
stack1.push(node.left);
}
}
}
}

Output:
20, 25, 15, 12, 18, 22, 28, 32, 26, 24, 21, 19, 17, 14, 6

algorithm to find if two linked lists are intersected

Two different lists will be intersected when one node of a list points to the other list’s node.

The below diagram depicts the intersection of lists How do you solve it:

• Get the difference of the two lists (In above diagram it is 5-4=1)
• Take a pointer to the bigger list and move it as many number of times as difference between the lists calculated above (bigger list with 5 nodes is moved 1 position forward with pointer1 pointing to 2)
• Take another pointer to the second list and start comparing node of first list with the node of the second list. (Take pointer2 pointing to 18)
• Whenever nodes are same we can say lists are intersected. (they get intersected at node 4)

Code:

/**
* Algorithm to detect the intersecting point of two lists
*
* @param s1 first list which is larger in size
* @param s2 second list
*/
public static void getIntersectPoint(SLLIntersection s1,SLLIntersection s2){
int s1Len = s1.getLength(s1.root);
int s2Len = s2.getLength(s2.root);

int diff = s1Len-s2Len;
System.out.println("Difference between two lists is "+diff);

// Move the pointer in the larger list to diff places
SLLNode s1List = s1.root;
if((s1Len-s2Len)>0) {
while((s1List != null) && diff>0){
s1List = s1List.next;
diff--;
}
}

// Now check the point where the lists are intersecting
// by checking each node in two lists simultaneously
SLLNode s2List = s2.root;
while(s1List != null && s1List.next != null) {
if(s1List == s2List) {
System.out.println("Intersection point is at "+s1List.value);
}
// increment the pointers
s1List = s1List.next;
s2List = s2List.next;
}
}

Output:

List1: 1 2 3 4 5
List2: 18 19 4 5
Difference between two lists is 1
Intersection point is at 4

subsequence of array with maximum sum (Kadane’s Algorithm)

This is given by Jay Kadane.

Problem Statement: In a given array, find out subarray such that the sum of its elements is maximum.

Lets take an array:
a[] = {-1,1,2,-3,3,-1,2,-2}
subarray with maximum sum is 4 i.e, {3,-1,2}

Note: If the array with all its elements are positives, then the result is the sum of all the elements. Say if a[]={1,2,3,4}, then result=10
Note: If the array with all its elements are negative, then the result is the minimum element of all the elements. Say if a[]={-1,-2,-3,-4}, then result=-1

The base rule here is that: include negative numbers if the Sum is greater than zero, if not then exclude negative numbers.

This algorithm uses dynamic programming concept to find the sum. what is dynamic programming? evaluate smaller problems first and use that smaller problems result to solve bigger problem.

/**
* @author ntallapa
*
*/

/**
* Algorithm to find max subarray sum
* @param a array for which max subarray sum needs to be found
* @return max subarray sum
*/
public int maxSubArraySum(int a[]) {
int totMaxSum = a;
int currMaxSum = a;

for (int i = 1; i < a.length; i++) {
currMaxSum = a[i]>(currMaxSum+a[i])?a[i]:(currMaxSum + a[i]);
totMaxSum = totMaxSum>currMaxSum?totMaxSum:currMaxSum;
}
}

/**
* Test
* @param args
*/
public static void main(String[] args) {
int a[] = {-1,1,2,-3,3,-1,2,-2};
int max_sum = la.maxSubArraySum(a);
System.out.println("Maximum contiguous sum is \n" + max_sum);
}
}

algorithm to remove element from single linked list

In this algorithm we were not given with the head node but instead given with the node that is to be deleted.

Lets look at an example:

--------------------------------------
Node        : 1->20  | 2->30  | 3->40  | 4->50  | 5
--------------------------------------
Node Address: 10     | 20     | 30     | 40     | 50

And assume we want to remove 3rd element where you are given only the THIRD node that is to be deleted.
Output should be: 1,2,4,5

--------------------------------------
Node            : 1->20  | 2->30  | 4->50  | 4->50  | 5
--------------------------------------
Node Address    : 10     | 20     | 30     | 40     | 50

Instead of deleting the 3rd node we are moving 4th node into 3rd node and ignoring 4th node.

/**
* Remove element from SLL
* @param node element to be removed
*/
public void removeElement(Node node) {
node.value = node.next.value;
node.next = node.next.next;
}
Mawazo

Mostly technology with occasional sprinkling of other random thoughts

amintabar

Amir Amintabar's personal page

101 Books

Reading my way through Time Magazine's 100 Greatest Novels since 1923 (plus Ulysses)

Seek, Plunnge and more...

My words, my world...

ARRM Foundation

Do not wait for leaders; do it alone, person to person - Mother Teresa

Executive Management

An unexamined life is not worth living – Socrates

javaproffesionals

A topnotch WordPress.com site

thehandwritinganalyst

Just another WordPress.com site

coding algorithms

"An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem." -- John Tukey