# Suffix trees, suffix arrays, etc.

From MIT 6.851:

* Video: <https://www.youtube.com/watch?v=NinWEPPrkDQ>
* <https://courses.csail.mit.edu/6.851/spring12/scribe/lec16.pdf>
* <https://courses.csail.mit.edu/6.851/spring12/lectures/L16.pdf>

Given text upfront, preprocess text T and query pattern P. Goal - query in O(P). O(T) space.

**Trie** - rooted tree with child branches labeled with letters in Σ.

**Task: find predecessor.** Solved with tries, but the problem is how to represent the node in the trie. Array: too much memory. BST - too slow. Hash - doesn't preserve order, so doesn't solve the predecessor. There are tricks how to do it in O(P + logΣ).

Can use to sort strings: O(T + klgΣ).

**Compressed trie** - contract non-branching path into single edge. O(k) nodes.

**Suffix trees** (or suffix tries) - compressed trie of all |T| suffixes. T\[i:] of T with $ appended.

Applications of suffix trees:

* Search for P gives subtree whose leaves correspond to all occurrences of P in T.
  * Which node representation to use?
    * Hashing => O(P) time. But not in order.
    * Trays => O(P + lgΣ) time.
* LCP (T\[i:], T\[j:]) = LCA
* All occurrences of T\[i:j] = weighted level ancestor j-i of T\[i:] leaf. O(lglgT).
* Document retrieval: O(p + #docs) - all documents matching the pattern. With RMQ.

**Suffix array** - sort suffixes of T (just store their indexes) in O(T) space.

LCP - largest common prefix or neighbour prefixes.

You can build suffix tree using suffix array and LCPs.

O(T + sort(Σ)) construction:

1. Sort Σ.
2. Replace each letter by its rank in Σ.
3. Form T0=<(T\[3i], T\[3i+1], T\[3i+2])> for i =0,1,... T1=<(T\[3i+1], T\[3i+2], T\[3i+3])>. T2=...
4. Recurse on \<T0, T1>
5. Radix sort of T2's suffixes. T2\[i:] = T\[3i+2] = \<T\[3i+2],T\[3i+3:]>
6. Merge suffixes of T0 and T1 with suffixes of T2.

Exercises:

* Build suffix tree.
* Find substring.
* Find longest repeated substring. Try wikipedia!


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://wiki.ainzzorl.lol/tech/algorithms/suffix-trees-suffix-arrays-etc..md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
