Suffix trees, suffix arrays, etc.
From MIT 6.851:
Given text upfront, preprocess text T and query pattern P. Goal - query in O(P). O(T) space.
Trie - rooted tree with child branches labeled with letters in Σ.
Task: find predecessor. Solved with tries, but the problem is how to represent the node in the trie. Array: too much memory. BST - too slow. Hash - doesn't preserve order, so doesn't solve the predecessor. There are tricks how to do it in O(P + logΣ).
Can use to sort strings: O(T + klgΣ).
Compressed trie - contract non-branching path into single edge. O(k) nodes.
Suffix trees (or suffix tries) - compressed trie of all |T| suffixes. T[i:] of T with $ appended.
Applications of suffix trees:
- Search for P gives subtree whose leaves correspond to all occurrences of P in T. - Which node representation to use? - Hashing => O(P) time. But not in order. 
- Trays => O(P + lgΣ) time. 
 
 
- LCP (T[i:], T[j:]) = LCA 
- All occurrences of T[i:j] = weighted level ancestor j-i of T[i:] leaf. O(lglgT). 
- Document retrieval: O(p + #docs) - all documents matching the pattern. With RMQ. 
Suffix array - sort suffixes of T (just store their indexes) in O(T) space.
LCP - largest common prefix or neighbour prefixes.
You can build suffix tree using suffix array and LCPs.
O(T + sort(Σ)) construction:
- Sort Σ. 
- Replace each letter by its rank in Σ. 
- Form T0=<(T[3i], T[3i+1], T[3i+2])> for i =0,1,... T1=<(T[3i+1], T[3i+2], T[3i+3])>. T2=... 
- Recurse on <T0, T1> 
- Radix sort of T2's suffixes. T2[i:] = T[3i+2] = <T[3i+2],T[3i+3:]> 
- Merge suffixes of T0 and T1 with suffixes of T2. 
Exercises:
- Build suffix tree. 
- Find substring. 
- Find longest repeated substring. Try wikipedia! 
Last updated
Was this helpful?