Find first occurrence of a string in another

Problem: Typical question for indexOf operation. Given two strings s1 and s2, find the first index of s2 in s1 in an efficient manner. Solution: The problem has very typical solution, starting with each character in the source string start looking if the string being searched for, exists or not. However, the implementations may differ on the way they optimize. One good approach is to first look for the starting character of the string being searched for, in the string being searched. [Read More]

Reducing the space for LCS lengths

When you’ve run out of main memory, any estimate of runtime based on big-O analysis becomes useless. The system either crashes or thrashes, paging in virtual memory. By contrast, if we can reduce the memory required by a program, the time analysis should still hold — we never have to page in more time. Once you’ve watched the dynamic programming solution a few times, you’ll realise that the LCS lengths (the numbers in the grid) are computed row by row, with each row only depending on the row above. [Read More]

Hirschberg’s linear space algorithm in C++

It should work on any container type which supports forward and reverse iteration. It’s similar to the Python implementation, but note: rather than copy slices of the original sequences, it passes around iterators and reverse iterators into these sequences rather than recursively accumulating sums of sub-LCS results, it ticks off items in a members array, xs_in_lcs /\* C++ implementation of "A linear space algorithm for computing maximal common subsequences" D. [Read More]

Hirschberg's algorithm

Hirschberg’s algorithm, named after its inventor, Dan Hirschberg, is a dynamic programming algorithm that finds the least cost sequence alignment between two strings, where cost is measured as Levenshtein edit distance, defined to be the sum of the costs of insertions, replacements, deletions, and null actions needed to change one string to the other. Hirschberg’s algorithm is simply described as a divide and conquer version of the Needleman-Wunsch algorithm. The Needleman-Wunsch algorithm finds an optimal alignment in O(nm) time, using O(nm) space. [Read More]

Dynamic Programming Algorithm (DPA) for Edit-Distance

The words `computer’ and `commuter’ are very similar, and a change of just one letter, p->m will change the first word into the second. The word `sport’ can be changed into `sort’ by the deletion of the `p’, or equivalently, `sort’ can be changed into `sport’ by the insertion of `p’. The edit distance of two strings, s1 and s2, is defined as the minimum number of point mutations required to change s1 into s2, where a point mutation is one of: [Read More]

Longest common subsequence : Dynamic programming

m: A B C B D A B yn: B D C A B A The LCS is B C B A For arbitrary inputs, it is NP-hard. For constant inputs, DP gives the solution in polynomial time ex: O(n^k). Also, there can be more than one LCSs and the solution for that is exponential time ex: O(k^n). Here we maintain an LCS matrix of m*n size. Initially, for i= 0 or j = 0, LCSij = 0. [Read More]