Last Thursday night at Oredev, after the sessions, was “Birds of a Feather” - a sort of mini-unconference. Anyone could write up a topic on the whiteboard; interested individuals added their names, and each group got allocated a room to chat about the topic. I joined the “Spatial Indexing” group, and we spent a fascinating hour and a half talking about spatial indexing methods, reminding me of several interesting algorithms and techniques.
[Read More]
Anagram Trees
When it comes to finding anagrams of words, a frequent approach is to use an anagram dictionary - simply put, sort the letters in your word to provide a unique key that all anagrams of a word have in common. Another approach is to generate a letter-frequency histogram for each letter in your word. (Both these approaches are more or less equivalent, in fact.) These approaches make the problem of finding exact single-word anagrams for strings very efficient - O(1) if you use a hashtable.
[Read More]
Secure permutations with block ciphers
It’s been too long since I blogged about anything much, and way too long since I posted the first Damn Cool Algorithms post, which I promised would be a series. So here’s part 2.
To start, I’m assuming you know what a permutation is - basically a shuffling of a sequence of items in a particular order. A permutation of the range 1-10, for example, is {5,2,1,6,8,4,3,9,7,10}. A secure permutation is one in which an attacker, given any subset of the permutation, cannot determine the order of any other elements.
[Read More]
BK-Treesal
I am making my own vocab software. I provided words, meanings, synonyms, search. I feel the project is partially over. So now I thought, why not provide google-like “Did you mean” option in my search.
So it all starts with this, I came closer to algorithms again. :)
BK-Trees
BK-Trees, or Burkhard-Keller Trees are a tree-based data structure engineered for quickly finding near-matches to a string, for example, as used by a spelling checker, or when doing a ‘fuzzy’ search for a term.
[Read More]
Reducing the space for LCS lengths
When you’ve run out of main memory, any estimate of runtime based on big-O analysis becomes useless. The system either crashes or thrashes, paging in virtual memory. By contrast, if we can reduce the memory required by a program, the time analysis should still hold — we never have to page in more time.
Once you’ve watched the dynamic programming solution a few times, you’ll realise that the LCS lengths (the numbers in the grid) are computed row by row, with each row only depending on the row above.
[Read More]
Hirschberg’s linear space algorithm in C++
It should work on any container type which supports forward and reverse iteration. It’s similar to the Python implementation, but note:
rather than copy slices of the original sequences, it passes around iterators and reverse iterators into these sequences rather than recursively accumulating sums of sub-LCS results, it ticks off items in a members array, xs_in_lcs /\* C++ implementation of "A linear space algorithm for computing maximal common subsequences" D.
[Read More]
Hirschberg's algorithm
Hirschberg’s algorithm, named after its inventor, Dan Hirschberg, is a dynamic programming algorithm that finds the least cost sequence alignment between two strings, where cost is measured as Levenshtein edit distance, defined to be the sum of the costs of insertions, replacements, deletions, and null actions needed to change one string to the other. Hirschberg’s algorithm is simply described as a divide and conquer version of the Needleman-Wunsch algorithm.
The Needleman-Wunsch algorithm finds an optimal alignment in O(nm) time, using O(nm) space.
[Read More]
Dynamic Programming Algorithm (DPA) for Edit-Distance
The words `computer’ and `commuter’ are very similar, and a change of just one letter, p->m will change the first word into the second. The word `sport’ can be changed into `sort’ by the deletion of the `p’, or equivalently, `sort’ can be changed into `sport’ by the insertion of `p’.
The edit distance of two strings, s1 and s2, is defined as the minimum number of point mutations required to change s1 into s2, where a point mutation is one of:
[Read More]
Dynamic Programming Practise Problem Set 3
Calculate M = M1 × M2 × M3 × M4, where the dimensions of the matrices are M1: 10,20 M2: 20,50 M3: 50,1 M4: 1,100
a) Calculating M = M1 × ( M2 × ( M3 × M4 ) )
b) Calculating M = ( M1 × ( M2 × M3 ) ) × M4
2 Calculate the length of longest sub string in Hello and Aloha
3. Calculate longest sub sequence in houseboat and computer.
[Read More]
Longest common substring problem
Given two sequences of letters, such as A = HELLO and B = ALOHA,
find the longest contiguous sequence appearing in both.
**One solution: ** (assume strings have lengths m and n)
For each of the m starting points of A, check for the longest common string starting at each of the n starting points of B.
The checks could average Θ(m) time → a total of Θ(m^2*n) time.
Dynamic programming solution:
[Read More]