External Sorting - How do you sort a million 32-bit integers in 2MB RAM?

A million 32-bit integers is 4MB. We cannot fit them all in the given RAM size and we have to do external sorting. It is an open ended question which is interactive. Let’s assume that - The list of numbers is on disk (as we cannot fit them all in RAM). There is a sort function which sorts the given list effectively i.e. we do not have to worry about a particular sorting algorithm. [Read More]

maximum index of an array

Space for 200,000,000 integers : 800MB 256,000,000 integers : 1024MB Integer.MAX_VALUE (2,147,483,647) : ~8600MB Integer.MAX_VALUE/2 (1,073,741,823) : ~4300MB //arrayindextest.java class arrayindextest { public static void main(String[] args) { int[] a = new int[256000000]; //or 200000000 System.out.println(a.length); } } -Xmx set max heap size flag -Xms set min heap size flag 200,000,000 integers : 800MB With new int[200000000]: java -Xms32m -Xmx1024m arrayindextest 200000000 256,000,000 integers : 1024MB With new int[256000000]: [Read More]

Missing integers

If you have a cheap mainstream commodity hardware consisting of a decent processor and 1GB RAM, how would you find a few (more than one) missing integers in a list saved on disk containing all of the unique 2^32 integers? One approach is to use an array of boolean flags and marking them as true for present integers (index of the array denotes flag for the integer). Traversing the flags array again would give the missing integers. [Read More]

print combinations

All possible combinations of a 3 digit binary number are 2c1 * 2c1 * 2c1 = 8 (000, 001, 010, 011, 100, 101, 110, 111). Each digit can be 0 or 1 so 2c1. Below is the code for it. Starting with 000, we get the next combination by incrementing the LSbs and carrying out the ripple effect with the for loop. ` void printCombinations(int digits): char[] out = new char[digits] [Read More]

Trie examples

I mentioned Tries before here and here. I thought of two more popular examples. Google Suggestions (now in production) and Gmail autocomplete logic would be implemented as a Trie (prefix tree). I could think of two ways this could be done using a Trie. The first way is to DFS from the current prefix or node to get to all the possible words below. This approach is time-intensive due to its recursive nature and the user probably would not benefit when the Trie is large. [Read More]

Scrabble algorithm

If you are playing scrabble and you have n letters, what are all the possible words that you can make from these n letters? It looks as if it would involve permutations and combinations both of the letters or a union of permutations and combinations for n letters. Surprisingly, it is much simpler. Below are the possible words for letters ‘abc’. I have highlighted the patterns in them. a ab [Read More]

Permutations and Combinations of string

Permutations of string: For a string of length n, there are n! permutations. For string abcd, there are 24 permutations. We would have a wrapper permutation method which takes the string and calls the recursive permute method. The basic idea is that the permutations of string abc are a + permutations of string bc, b + permutations of string ac and so on. The permute method uses a bool[] used to indicate which characters of the string are already used in the buffer out (to avoid repetitions). [Read More]

Tries

Trie (pronounced as try even though it stands for tree and re_trie_val) is a prefix tree data structure. You can read more on it on topcoder and wiki. Trie mostly has a root node as an empty string ("") and each node has 26 child pointers letters (A to Z) at least. As we add words to it, the trie grows. Here is an example of how a trie looks (from www. [Read More]

Find the distinct words from a text file

The data structure of the trie and its addWord() method remains the same. ` class trieNode: trieNode[26] children char c int countPrefix // words with prefix from root to current node int countWord // complete words for this current node void addWord(trieNode node, string word): //end of the word if (word = “"): node.countWord = node.countWord +1 return  //do stuff with current node node.countPrefix = node.countPrefix + 1 //go to next node [Read More]

superstring

Some code for superstring given two strings. It assumes that first string is longer than second string. If the lengths of the strings first and second are m and n, it takes O(mn) time (as m >= n, the other two loops run for O(n^2) time). There are three loops, first loop looks for second string within the first string, second loop looks for the second string before the first string and third loop looks for after. [Read More]