Here is the qc steps used for any samples. ```sh # 1. remove rRNA, a hisat2 wrapper python rRNA_QC.py --fq1 xx_L2_1.clean.fq.gz --fq2 xx_L2_2.clean.fq.gz --name xx --r1 1 --r2 15 --readsNum 10000 --pNum 3 --tool hisat2 --outDir Qc/rRNA/ # 2. cutadapt wrapper python adcut.py --fq1 Qc/rRNA/xx/xx_1.fq.gz --fq2 Qc/rRNA/xx/xx_2.fq.gz --name xx --ad7 AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC --ad5 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT --overlap 6 --err 0.1 --outDir Qc/adapter/xx # 3. filter low quality reads, use reads created by step 2. python tmkQC.py --fq1 Qc/adapter/xx/xx_adap_rm_1.fq.gz --fq2 Qc/adapter/xx/xx_adap_rm_2.fq.gz --name xx --lowQ 20 --lowP 0.1 --NP 0.15 --ncpu 2 --outDir Qc/tmkQC/xx # 4. apply fastqc fastqc -t 2 -o Qc/FastQC/xx Qc/tmkQC/xx/xx_1.clean.fq.gz Qc/tmkQC/xx/xx_2.clean.fq.gz ``` The main step in `rRNA_QC.py` is calling the `hisat2` tool to detect rRNA ratio and output `--un-conc-gz` files. ``` hisat2 -p -x -1 -2 --un-conc-gz ``` Next, the `adcut.py` cutadapt parameters: ``` --discard-trimmed -e 0.1 -O 6 ``` Third, in the `tmkQC.py`, we use a `tmkQC` qc command to trim low quality reads. the command line usage is: ``` tmkQC Usage: -f the path of fq files or fq gzip files, seperated by comma. -a the path of adaptor files or adaptor gzip files, seperated by comma. -p threads number. -o output directory. -s sample name. -N N rate. default: 0.1 -l low qual. default: 5 -r low qual bases rate ( low qual bases number/read length ). the reads will be discarded if the rate is greater that this value. default: 0.5 -L the minimal length of read. The reads(perhaps have been truncated) will be discarded if its length is less than this value. -g the output sequence data is compressed by gzip. ``` The parameters we used is: ``` -p 2 -N 0.15 -l 20 -r 0.1 ``` which will filter reads match any patterns of: 1. N rate greater than 0.15 2. low quality (lower than Q20) rate (bases/read length) greater than 0.1 with no read length filter expression applied.