Search a long string for small strings in an array

Problem

Given a string s and an array of smaller strings T, design a method to search s for each small string in T.

Solution

Method 1 - Using suffix tree
We can first get all the suffices of s and check for each element  t in T to see if t is the beginning part of any suffices. To make this more efficient, we can build a suffix tree and do the operations.

For example, s = “mississippi”; T = { “is”, “sip”, “hi”, “sis” }.
The suffices of s

[0]

“mississippi”

[1]

“ississippi”

[2]

“ssissippi”

[3]

“sissippi”

[4]

“issippi”

[5]

“ssippi”

[6]

“sippi”

[7]

“ippi”

[8]

“ppi”

[9]

“pi”

[10]

“i”

Then for “is” in T, we see it sits in the beginning of suffices [1] and [4]. for “sip”, we see it in [6]. We do not see any “hi” in the suffices but we do see “sis” in [3].

Java code

public static class SuffixTree {  
    SuffixTreeNode root = new SuffixTreeNode();  
   
    public SuffixTree(String s) {  
        for (int i = 0; i < s.length(); ++i) {  
            root.insert(s.substring(i), i);  
        }  
    }  
   
    public List<Integer> getIndexes(String s) {  
        return root.getIndices(s);  
    }  
}  
   
public static class SuffixTreeNode {  
    private char c;  
    private List<Integer> indices = new ArrayList<Integer>();;  
    private Map<Character, SuffixTreeNode> children =   
        new HashMap<Character, SuffixTreeNode>();  
   
    public void insert(String s, int index) {  
        indices.add(index);  
        if (s != null && s.length() > 0) {  
            char character = s.charAt(0);  
            if (children.keySet().contains(character)) {  
                children.get(character).insert(  
                        s.substring(1), index);  
            } else {  
                SuffixTreeNode child = new SuffixTreeNode();  
                children.put(character, child);  
                child.insert(s.substring(1), index);  
            }  
        }  
    }  
   
    public List<Integer> getIndices(String s) {  
        if (s == null || s.length() == 0)  
            return indices;  
        else {  
            char character = s.charAt(0);  
            if (children.containsKey(character))  
                return children.get(character).getIndices(  
                        s.substring(1));  
            else  
                return null;  
        }  
    }  
}  

Thanks.

References


See also