Figure 1.

Overview of the PFP workflow. (1) PFP takes as input a text T and two integer parameters, the window size w and the hash modulus p. In the example, w = 2 and pZ+, where Z+ represents the set of positive integers. (2) It internally prepends the $ symbol and appends w consecutive copies of # to T. (3) The modified text is parsed by sliding a window of length w across the text; when a trigger string is encountered (highlighted in green), the current phrase terminates and the next phrase begins. During parsing, the dictionary DT and parse PT are constructed, with DT initially unsorted and PT containing temporary phrase records (*). (4) Finally, DT is sorted and the records in PT are replaced by the ranks of the corresponding dictionary phrases; the dotted box shows the multiset S of suffixes longer than w derived from phrases in DT.

1081f01