...scenarios in which d can be 10,000 or more. For example, the SILVA NR99 SSU reference database (510,508 sequences, 1.2 GB) spans 9118 distinct genera, which would require a document array profile structure >2 TB.Here,
we develop new methods that drastically reduce the space usage. In so doing, we sacrifice...