Metadata-Version: 1.1
Name: transpo
Version: 0.7.1
Summary: Find chimeric transcripts in RNAseq data.
Home-page: https://github.com/jduc/transpo
Author: Julien Duc
Author-email: julien.duc.0@gmail.com
License: GPL
Description: # TranspoFinder v0.6: Find transpochimeras
        
        ## Intro
        This python program finds transcripts that spans over different set of regions
        for a set of samples. With a two steps approach, it discover and counts the
        occurrences of each chimeric transcripts in each sample. 
        
        #### Chimeric transcripts discovery 
        The first step of the program is to go through the given BAM files in order to
        **discover** the chimeric transcripts. It starts with running *Stringtie* on all the
        input bams. After that, the GTF is overlapped with both the two bedfiles
        provided (using *bedtools*). The resulting GTF contains the chimeric transcripts and is saved for
        further analysis. So at this point, you'll have 2 GTF for each samples, one
        containing all the transcripts and the other containing only the chimeric
        trasnscripts as overlapped with your 2 input beds.
        
        > This step can be run on vital-it, see below
        
        #### Chimeric transcripts analysis 
        In a second step, theses chimeric transcripts are **analysed**. For each
        transcript, its occurrences in the samples are counted. A transcript is
        considered the same if it has the same number of exon and if each exon starts
        at the same location within a 10bp window.
        
        The transcripts are then merged into a set of chimeric genes using the merge
        function of *Stringtie*. For each genes, the occurrences per sample and
        groups are computed and statistics are derived for later analysis. See
        the results section to get detailed informations about the results.
        
        ## Installation
        Before installing, you need to make an account on 
        [c4science](https://c4science.ch). On the
        webisite, click "Login" then click the **blue** login button and select EPFL.
        Use your GASPAR password and you are set. Now you need to be added to the
        group, ask for access to the
        [batcave](mailto:lvg.batcave@epfl.ch?Subject=TranspoFinder%20access&Body=Please%20give%20me%20access%20to%20TranspoFinder%20Thank%20you.) (or by talking to us directly...) 
        
        - - -
        
        ### MAC OSX
        *Before anything*, you must install **XCode** and the developpers tools to be able to
        run anything bioinformatic related on your mac. Go in the AppStore and install
        XCode.
        
        First, you must install Stringtie,  bedtools and gawk (cause the mac awk
        implementation fails). To do so, either compile from source or
        install via brew ([how to install brew?](https://brew.sh/)):
        
        ```
        brew install stringtie
        brew install bedtools
        brew install gawk
        ```
        
        Then, you should install python3. If you know what you're doing, you can also
        use the system python of your mac. Otherwise, just install python3 from source
        or via brew. 
        
        ```
        brew install python3
        ```
        
        Now you should have python3 up and running. Try to type `pip3` in your
        terminal, it should display the usage of pip. If this worked, you can install
        transpo but first you need cython. Simply run: 
        
        ```
        pip3 install --user -U cython
        ```
        then once this is successful
        
        ```
        pip3 install --pre --user -U git+https://c4science.ch/source/transpo.git
        ```
        
        You should be prompted for a password (or not), use the c4science logins and 
        transpo should be installed soon. If you already set up ssh keys for c4science,
        then you can install via:
        
        ```
        pip3 install --pre --user -U git+ssh://git@c4science.ch/source/transpo.git
        ```
        
        After the installation successfully finished, you may need to add the following
        path to your `PATH` variable by running this line 
        
        ``` 
        echo 'PATH="~/Library/Python/3.5/bin/":$PATH' >> ~/.bash_profile
        ```
        
        - - -
        
        ### Linux
        Install stringtie, bedtools and python. Then just run 
        
        ```
        pip3 install --user -U cython
        pip3 install --user -U git+https://c4science.ch/source/transpo.git
        ```
        
        if you did make an account on c4science and asked access, transpo should be
        installing, go for a break and come back in 5. Use the address
        `git+ssh://git@c4science.ch/source/transpo.git` if you have ssh keys.
        
        - - -
        
        ### Windows
        Install linux or buy a mac and read above. Also, slap yourself in the face.
        
        ## Usage
        Too see the help of transpo, use `transpo --help` in your terminal. Read all the
        options. 
        
        > If you wanna run transpo on the cluster, please check the section later
        
        To run transpo, you need to setup and gather: 
        
        * a **metadata** file
        * gather some **bams** of interest (mapped with HISAT2)
        * two **bed** files to define the chimeric transcripts. 
        
        ### Metadata 
        * [download template](metadata.xls)
        The metadata file is now required to run transpo. It is simply a **tab** separated
        file structured like this: 
        
        > warning: make sure to add the header at the top of the file
        > warning: Make sure the samples are unique names
        
        | bams | sample | groups | 
        |------|--------|-------|
        | /path/to/bam1 | sampleName1 | C |
        | /path/to/bam2 | sampleName2 | KO|
        
        The fist column contains the path to each bam of interest. The sample column is
        a name you choose to give to the sample, typically KO1, KO2 etc... The group
        column is where you specify to which group the sample belongs. You must have a
        control group. For each group except "control", transpo will run comparisons 
        and make pvalues. If you want to use another letter for the control group,
        simply use the `--control` option to specify what named you used.
        
        ### Advanced: Running transpo on a cluster
        This is work on going it will come soon
        
        #### On fidis
        First, run allBam2gtf.sh to get the transcriptome of each bam you have.
        Create the metadata as usual but input `nobam` in the bams column.
        Install transpo and simply run it with the `--fidis` option. Use the --gtfdir to 
        point to the folder containing the GTFs. 
        
        
        ## Results
        The results are structured like so: 
        
        ```
        transpo_res
        ├── all_chim_trans_cat.gtf
        ├── all_chim_exons.bed
        ├── all_chim_tss.bed
        ├── chimeric_genes_table.xls
        ├── transcripts_table.xls
        └── samples
             ├── BAM1_chimeric.gtf
             ├── BAM1.gtf
             └── BAM1.log
             ├── BAM2_chimeric.gtf
             ├── BAM2.gtf
             └── BAM2.log
           
            
        
        ```
        
        
        In the general results folder, the important files are: 
        
        * **chimeric_gene_table.xls**: Your results. Table with occurrence of each chimeric genes per 
          sample and statistics. Openable in Excel
        * all_chim_trans_cat.gtf: concatenation of all the transcripts of all samples. Too big to use in
            browser
        * transcripts_table.xls: all the transcripts of the samples with occurrences in
            samples. Too big to open with excel
        * all_chim_tss.bed: bedfile with ALL the chimeric transcripts TSS
        * all_chim_exons.bed: bedfile with ALL the chimeric transcripts exons (except
            the first exon of the transcripts), usefule for finding TE's that are
            exonized. 
        
        Each sample gets files in the samples directory with: 
        
        > In principle, you should never use these files, transpo is making summary statistics
        for you
        
        * Stringtie GTF with all detected transcripts
        * GTF containing only the chimeric transcripts
        * the logs
        
        
        ###  How to work with the results 
        Don't trust what you see in the table. This program is design to help discover
        new chimeric transcritps, but it will get a lot of not interesting candidate
        so you should check manually loading the BAMS in IGV. 
        
        A good way to inspect the results is to load the all_chim_merged.gtf file in
        IGV with the bam, then go inspect your top candidate and carefully make sure
        everything looks good. There is no free cake.
        
        ## Changelog / todo
        View this [changelog](changelog.md) file
        
Keywords: rnaseq bioinformatic genomic transcript genes chimeric
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: License :: OSI Approved :: GNU General Public License (GPL)
Classifier: Programming Language :: Python :: 3.5
