Accurate analysis of genuine CRISPR editing events with ampliCan

We present ampliCan, an analysis tool for genome editing that unites highly precise quantification and visualization of genuine genome editing events. ampliCan features nuclease-optimized alignments, filtering of experimental artifacts, event-specific normalization, and off-target read detection and quantifies insertions, deletions, HDR repair, as well as targeted base editing. It is scalable to thousands of amplicon sequencing–based experiments from any genome editing experiment, including CRISPR. It enables automated integration of controls and accounts for biases at every step of the analysis. We benchmarked ampliCan on both real and simulated data sets against other leading tools, demonstrating that it outperformed all in the face of common confounding factors.

created using base loci sequences, same as used for CRISPR editing simulation. This data is further detailed in Supplemental Note S4 . Complete overview of all datasets is presented in Supplemental Tab S6 .
Supplemental Note S1: Tools differ in estimating editing efficiency on real data.
We assessed how tools estimate editing efficiency on 263 real CRISPR experiments, of which 151 were previously published by us (Gagnon et al. 2014) , datasets from run 1 and run 5 available at BioProject under accession number PRJNA245510), and 112 novel experiments from 5 sets for this study (datasets from run 6-10 available at ArrayExpress: E-MTAB-6310, E-MTAB-6355, E-MTAB-6356, E-MTAB-6357, E-MTAB-6358). All experiments were conducted by injection into 1 cell zebrafish embryos and sequenced 2 days post-fertilization (Gagnon et al. 2014) . Due to the rapid cell division and development these experiments are likely to result in highly heterogeneous mutational efficiencies from mosaicism. For these experiments the true mutation efficiency is not known and we can therefore not assess how precise the tools are in their estimates. Instead, we quantified how much the tools differ in their estimates ( Supplemental Fig S1A) and, to qualitatively assess the underlying reason for their discrepancy, we plotted the estimated mutation efficiency values of the tools relative to the non-normalized ampliCan result ( Supplemental Fig S1B ). This showed that discrepancies are likely to originate from different causes. Some, those above the normalized ampliCan estimate, likely stem from a failure to consider control experiments. In our data the experiments impacted by controls is about 5%, but this will depend heavily on the reference genome, heterogeneity of the region and organism under study. Specific examples of the importance of normalization are shown in Supplemental Fig S5 . The discrepancies of the other experiments are due to the steps in the processing pipeline, e.g. off-target detection, primer dimer filtering, alignment strategy and read merging. To investigate the specific sources of these discrepancies and quantitatively assess the performance of the tools we created several synthetic benchmark datasets.
Supplemental Note S2: Automatic normalization using control reads By default ampliCan normalizes through the strict removal of all editing events (insertions, deletions, mutations) that are also found above a threshold in the control sample. The default threshold value is a frequency of 0.01 and was chosen based on the typical frequency of low-abundance editing events ('background noise') present in control experiments ( Supplemental Fig S4 ). These events are assumed to be technical or experimental artifacts present in most experiments. The threshold is also selected to be well above the expected Illumina error rate (Ross et al. 2013) .
The threshold can be adjusted to increase the precision of indel detection when sequencing depth is high or account for a higher error rate in low-depth/precision experiments. The threshold can be set for instance based on the background error in the user's own control experiments or in the absence of controls by inspecting the error rate outside of the target site (see for instance in Supplemental Fig S14A ). Alternatively, if the user has information about the level of variance or noise expected in the case and/or the controls (e.g. genetic variance restricted to 100% or 50%) the threshold can be raised for increased stringency. This may in particular be useful if the user expects high frequency background events in the control experiments. One such use case is if index hopping is likely to be an issue. Index hopping may cause reads from the edited sample to erroneously be assigned to the control sample. We therefore offer a second pipeline 'amplicanPipelineConservative' for experiments where this is expected to be a problem featuring a more stringent threshold of 0.15. For more information about index hopping and how to mitigate the problem see Illumina's webpage ( https://www.illumina.com/science/education/minimizing-index-hopping.html , Accessed January 22, 2019).
Given that the sequencing depth is sufficient the default settings should allow detection of indels as low as 0.01% ( Supplemental Tab S5 ). If higher accuracy is necessary this threshold can be lowered and the sequencing depth increased. In the extreme case of setting the normalization threshold to 0% any event found in control would be removed from the case sample. This may result in removal of real edits and consequently the underestimation of real editing events.
Due to the stochasticity in sequencing data this approach is better suited to handle more heterogeneous cases than the subtraction method where indel frequency is simply normalized by subtraction of the control indel frequency. In the latter case variation in the levels of indel  Fig S6 ). In the worst case fragmented alignments could shift the indel events resulting in a distortion of the mutation efficiency for those tools that only allow events within a certain distance from the expected site. A more likely outcome however, is the misinterpretation of the nature of the mutation.
Under certain assumptions the theoretically optimal alignment can be obtained by the Needleman-Wunsch algorithm. ampliCan uses this algorithm with optimized parameters to reflect the expectation that a CRISPR experiment should result in one deletion and/or insertion event, of unknown length (match = 5, mismatch = -4, gap opening = 25, gap extension = 0, no end gap penalties). With these parameters the Needleman-Wunsch algorithm performs well over a broad range of test cases (data not shown). ampliCan uses these optimized parameters by default, but also allows for supervision of the alignments through human readable output of individual alignment results ( Supplemental Table S4 ).

Supplemental Note S4: Synthetic data set evaluation
The latest available versions was used for all tools and packages. The assessment set from Lindsay et al. (Lindsay et al. 2016) paper (Synthetic Dataset 2, Supplemental 4) was replicated with the same settings and seed values as described ( Supplemental Fig S7 ) Third, paired-end sequencing of 200bp or longer is somewhat expensive and error-prone and most labs would seek to restrict this to shorter reads. To account for this we created an additional set, Synthetic Dataset 3, in a similar fashion to Synthetic Dataset 2, but with with the following minor modifications. First, the length of amplicons and reads (150 bp) were adjusted.
Second, gRNA target sites were designed to be covered by both reads. Third, PCR off-target reads were created without mutating the primer sequences. Finally, mutation efficiency was tested across a range of mismatch rates, 10%, 20% and 30% ( Fig 2 , Supplemental Fig S8 ), to reflect different levels of similarity to the contaminant reads.
For Synthetic Dataset 2 ampliCan matches the perfect score of CrispRVariants and AmpliconDivider. However, on Synthetic Dataset Dataset 3 ampliCan is more consistent at estimating the known mutation efficiencies within the dataset ( Fig 2 ). AmpliconDIVider has no filtering step and is confused by the contaminating reads. CrispRVariants has a filtering step, but is unable to discern divergent off-target sequences (e.g. homologous regions) that are still able to align to the correct target site. As in the benchmark from Lindsay et al. 2016 CRISPResso performs poorly on all benchmarks. When increasing mismatch rate (from 10% of all bases to 20% and 30%), AmpliconDIVider and CrispRVariants get closer to the correct estimated indel rate, but in all cases ampliCan obtains the highest precision and shows the most robust performance ( Fig 2 , Supplemental Fig S8 ).

Supplemental Note S5: ampliCan is able to correctly call longer indels
We have found that even without targeted insertion CRISPR mutagenesis can frequently result in some proportion of longer indels ( Supplemental Fig S9 ). In particular, we have observed unintended insertions from lentiviral vectors used to introduce the guides and Cas9 ( Supplemental Fig S10 , (Chari et al. 2015) .
Current tools primarily rely on either global mapping (CrispRVariants, AmpliconDIVider) ( Supplemental Tab S1 ) that can have problems identifying the correct loci in the presence of a larger insertion or have certain processing steps that are incompatible with longer events (see below for CrispRVariants). This mitigates primer dimer contamination problems (which can be identified by too large deletion gaps after alignments), but ignores bona fide large indels. These tools are therefore often unable to handle longer indels whether unintended or targeted. Long deletions are also a problem for some tools. For instance, CrispRVariants filters out any deletion that does not start or end within the gRNA complementary sequence plus a buffer of 5 bp. Any deletion spanning this region is ignored. This can be used as a strategy to filter primer-dimers, but also has the side-effect of ignoring any bona fide longer deletions.
ampliCan uses a local alignment strategy that can detect these longer indels and a more realistic model of primer-dimer artifacts ( Supplemental Note S8 ).
We Supplemental Note S6: ampliCan consistently recovers the true HDR efficiency when faced with diverse donor templates ampliCan takes into account the donor template and the original genomic sequence to define the set of events that corresponds to a correct HDR editing experiment, but allowing for some background sequencing noise (currently 3 mismatches by default). This is unlike CRISPResso, the other CRISPR tool that can handle HDR events, which do not model events but simply align reads against donor and original sequence picking the best-scoring instance. The advantage of ampliCan's approach is that it accounts for alignment imperfections in a more robust fashion, allowing for complex donor-amplicon relations and sequencing errors.
We designed a dataset for benchmarking the HDR calling capabilities of the most popular tools. Using the same loci as in Supplemental Note S1 we tested 20 different donor templates for each of three kinds of donor types: with point mutations, insertions or deletions of variable length from 5bp to 70bp introduced into the amplicon sequences. We simulated 2000 reads with different levels of HDR efficiency rate (0,33,66,90). In this benchmark set only ampliCan makes no errors ( Supplemental Fig S12 ).

Supplemental Note S7: Visualization and aggregation of the complete activity of gRNAs
While the default alignment plot shows the most abundant reads across the expected cut site it doesn't provide an overview over all editing events. ampliCan therefore also produces multiple plots that aggregate and visualize editing events ( Supplemental Fig S14 ). ampliCan supports multiple types of meta plots to facilitate comparison of not only the gRNAs, but also any group that a user wants e.g. barcode, amplicon, type of treatment ( Supplemental Figs S3 , S17 , S18 ).

Supplemental Note S8: Filtering of noise
Multiple sources of noise can confound the estimation of cut rates, low quality reads, primer-dimers, PCR off-target amplification and sequencing artifacts. ampliCan has three filters to remove noise from different sources: 1) low quality reads, 2) primer-dimers, and 3) erroneously assigned reads and sequencing artifacts.
Low quality reads ampliCan offers basic read quality overview with the use of ShortRead (Morgan et al. 2009) package and filters for minimum base quality (default: 0) in a read, average base minimum quality (default avg min: 30) and the presence of ambiguous (N) letters.

Primer-dimers
Filtering primer-dimers is a balance between getting rid of erroneous reads and allowing for longer deletions. For instance, CrispRVariants ignores all alignments with deletions larger than 33 bp (the guide plus a buffer of 5 bp) and is frequently unable to map long insertions ( Fig 2B ,   Supplemental Fig S11 ). This effectively removes all primer dimers, but also ignores any bona fide longer indels. ampliCan instead tries to estimate the likely length of a primer dimer deletion by taking the length of the amplicon, subtracting the primer lengths with a small buffer (30 bp) to arrive at maximally allowed deletion length. This results in a more realistic estimate of the length of artificial deletions that would result from primer-dimers. As an example, for an amplicon of size 150 with primers of length 20, the maximum deletion length would be 80 bp (150-(2*20+30)).

Erroneously assigned reads and sequencing artifacts
An assumption of ampliCan is that a CRISPR editing event will result in a low number of indel events resulting in a good alignment with few discrepancies to the reference sequence or (if available) the control experiment. To accomplish this ampliCan uses a two dimensional clustering method based on sequence alignment score and sequence alignment indel events to filter out erroneous reads and sequencing artifacts. This takes all alignments and performs k-means clustering with different 1-3 clusters. It then uses the silhouette criterion to determine the optimal number of clusters. In the case of 1 cluster, all reads are either edited or perfectly matching the reference/control. In the case of 2 you have both edited and unedited reads. In the case of 3 clusters, cluster with center that has the biggest number of events and lowest alignment score (in normalized relation on 0-1 scale) means that you in addition have a group of sequencing artifacts or reads that poorly align to the loci (example in Supplemental Fig   S19 ).
ampliCan provides plots that shows the impact of each of the filtering steps, across the whole library for read quality ( Supplemental Fig S16 ) and for each experiments for primer-dimer and assignment/artifacts issues ( Supplemental Fig S15 ).
Supplemental Note S9: Read assignment ampliCan assigns reads to the respective experiment by matching primers used in the amplification of the loci. These region should be immutable and match the reads since an indel spanning a primer would either result in failure to amplify the locus or be "corrected" by the primer when it amplifies the target site. However, since small sequencing and primer synthesis errors could potentially occur in the primer part ampliCan allows for up to 2 mismatches (user customizable) between the primers and reads. During this process it is possible that some reads will be unassigned and not match any of the experiments. While these reads are typically noise from the high-throughput nature of the sequencing experiment, they could in some cases be helpful in troubleshooting failed experiments. ampliCan therefore provides human readable alignments of the top 5 most abundant forward and reverse read pairs aligned to each other ( Supplemental Fig S24 ). In some cases these correspond to off-target PCR amplicons and close homologous regions as well as errors in the specification of the experiment.

Supplemental Figures
Supplemental Fig S1. Comparison of leading tools on real CRISPR experiments. A. Summary of differences between ampliCan and other tools. CrispRVariants reports similar editing efficiencies to ampliCan in~80% out of 263 experiments. The remaining experiments are due to controls (~5%) and processing (~15%) B. Experiments (x axis, sorted) where estimated mutation efficiency differs by at least 5% from non-normalized data. y-axis shows differences in relation to non-normalized ampliCan estimates. Differences between tools predictions staying above line created by ampliCan prediction are likely to be due to lack of normalization, while the predictions below the ampliCan are likely due to the alignments, processing and filtering of data.
Supplemental Fig S2. Two common methods for normalization using controls. In the first, the frequencies from events in the control sample are subtracted from the frequencies in the treated sample (subtraction method). In the second method, all events occurring in the control above a frequency threshold (ampliCan default: 1%) are removed from the treated sample. The latter is more robust to stochastic differences. percentage of all reads) with different mismatch rate of the off-target reads (10%, 20%, 30%).

Supplementary
Each point corresponds to one experiment. True mutation efficiency is indicated with dotted lines, and labelled to the right of the charts. Contamination is simulated by introducing random mismatches (Contaminant read mismatch rate) in reads mapping to the loci, similar to the benchmark in (Lindsay et al. 2016) . The mismatch rate is indicated at the top. Only ampliCan shows robustness to the whole range of different mismatch rates and mutational efficiencies.
Supplemental Fig S9. Fraction of reads having indels greater than 10 bp across 176 experiments.
Each dot represents one experiment and are grouped in rows by having the same gRNA (replicates). All experiments are normalized using wild type controls ensuring that these are real indel events. The higher mean for some of the replicated experiments suggests that some gRNAs have a higher chance of resulting in long indels. Here, an aggregation of editing events from many experiments (many targets) using the same gRNA is presented giving an overall gRNA cut profile. Position 0 is relative to the first 5' base of gRNA.

Supplemental
Supplemental Fig S19. Example of k-means clustering of reads during filtering of contaminant reads. A low alignment score (x axis) combined with a high number of events (y axis) indicate erroneous reads. Silhouette criterion is used to determine whether data should be clustered into two (read with no edits and reads with editing events) or three clusters (a noise cluster). In the case of three clusters, the cluster with its center (purple) closest to the upper left corner is filtered.
Supplemental Fig S20. The paired-end read consensus rules for ampliCan. Events marked in green are considered real cut sites while those in gray are not. When reads from forward (purple) and reverse (blue) reads are in agreement there is a consensus (top row). When two reads overlap, but disagrees the event from the strand with a higher alignment score is used (row 2). In situations where an event is only covered by one read, that read is preferred (row 3). In rare cases where there are events on one strand and the other has continuous alignment ampliCan allows users to define the behaviour (default is promiscuous rule enabled).
Supplemental Fig S21. Percentage of reads with ambiguous indels caused by disagreement of forward and reverse reads. ampliCan consensus rules help to mitigate mis-estimation that could arise from these events.

Supplemental Tables
One edited read with variable number of total reads.
Estimated editing efficiency.  CRISPRessoPooled uses local-global alignment strategy.

Number of reads
Supplemental Table S3. Supplemental Table S4.
Example of human readable output. Aligned reads are assigned to the experiment (ID, read_id) and sorted based on count (Count). For each pair alignment is presented with top part representing forward read aligned to amplicon and bottom presenting reverse read aligned to amplicon.
Example GenomicRanges  table  output with additional meta-columns. This representation of alignments allows for efficient manipulation and processing of the data.
Example ampliCan config file. ampliCan requires this file as minimal input, together with relevant fastq files. More precise, up to date description of the file can be found in the ampliCan vignettes.