Search a long string for small strings in an array

Problem Given a string s and an array of smaller strings T, design a method to search s for each small string in T. Solution Method 1 - Using suffix tree We can first get all the suffices of s and check for each element t in T to see if t is the beginning part of any suffices. To make this more efficient, we can build a suffix tree and do the operations. [Read More]

Common naming convention : Coding Style

**Variable names must be in mixed case starting with lower case. ** Common practice in the Java development community and also the naming convention for variables used by Sun for the Java core packages. Makes variables easy to distinguish from types, and effectively resolves potential naming collision as in the declaration eg. int state; Names representing constants (final variables) must be all uppercase using underscore to separate words. MAX_ITERATIONS, COLOR_RED [Read More]

BWT (Burrows Wheeler Transform) Encoding Algorithm

Encoding Procedure for implementing the algorithm: 1. Select a block size to be used. Block size is directly related to effectiveness of encoding and inversely related to the time required. Hence a compromise has to be reached. 2. Convert the data byte stream to blocks of n bytes where n is the block size chosen. 3. The following example illustrates the procedure to be done next. Let n=6 and the first string be “kerala”. [Read More]

Hirschberg's algorithm

Hirschberg’s algorithm, named after its inventor, Dan Hirschberg, is a dynamic programming algorithm that finds the least cost sequence alignment between two strings, where cost is measured as Levenshtein edit distance, defined to be the sum of the costs of insertions, replacements, deletions, and null actions needed to change one string to the other. Hirschberg’s algorithm is simply described as a divide and conquer version of the Needleman-Wunsch algorithm. The Needleman-Wunsch algorithm finds an optimal alignment in O(nm) time, using O(nm) space. [Read More]

Trie examples

I mentioned Tries before here and here. I thought of two more popular examples. Google Suggestions (now in production) and Gmail autocomplete logic would be implemented as a Trie (prefix tree). I could think of two ways this could be done using a Trie. The first way is to DFS from the current prefix or node to get to all the possible words below. This approach is time-intensive due to its recursive nature and the user probably would not benefit when the Trie is large. [Read More]