Metadata-Version: 1.1
Name: varcode
Version: 0.3.12
Summary: Variant annotation in Python
Home-page: https://github.com/hammerlab/varcode
Author: Alex Rubinsteyn
Author-email: UNKNOWN
License: http://www.apache.org/licenses/LICENSE-2.0.html
Description: Varcode
        =======
        
        Varcode is a library for working with genomic variant data in Python and
        predicting the impact of those variants on protein sequences.
        
        Example
        -------
        
        ::
        
            import varcode
        
            # Load TCGA MAF containing variants from their
            variants = varcode.load_maf("tcga-ovarian-cancer-variants.maf")
        
            print(variants)
            ### <VariantCollection from 'tcga-ovarian-cancer-variants.maf' with 6428 elements>
            ###  -- Variant(contig=1, start=69538, ref=G, alt=A, genome=GRCh37)
            ###  -- Variant(contig=1, start=881892, ref=T, alt=G, genome=GRCh37)
            ###  -- Variant(contig=1, start=3389714, ref=G, alt=A, genome=GRCh37)
            ###  -- Variant(contig=1, start=3624325, ref=G, alt=T, genome=GRCh37)
            ###  ...
        
            # you can index into a VariantCollection and get back a Variant object
            variant = variants[0]
        
            # groupby_gene_name returns a dictionary whose keys are gene names
            # and whose values are themselves VariantCollections
            gene_groups = variants.groupby_gene_name()
        
            # get variants which affect the TP53 gene
            TP53_variants = gene_groups["TP53"]
        
            # predict protein coding effect of every TP53 variant on
            # each transcript of the TP53 gene
            TP53_effects = TP53_variants.effects()
        
            print(TP53_effects)
            ### <EffectCollection with 789 elements>
            ### -- PrematureStop(variant=chr17 g.7574003G>A, transcript_name=TP53-001, transcript_id=ENST00000269305, effect_description=p.R342*)
            ### -- ThreePrimeUTR(variant=chr17 g.7574003G>A, transcript_name=TP53-005, transcript_id=ENST00000420246)
            ### -- PrematureStop(variant=chr17 g.7574003G>A, transcript_name=TP53-002, transcript_id=ENST00000445888, effect_description=p.R342*)
            ### -- FrameShift(variant=chr17 g.7574030_7574030delG, transcript_name=TP53-001, transcript_id=ENST00000269305, effect_description=p.R333fs)
            ### ...
        
            premature_stop_effect = TP53_effects[0]
        
            print(str(premature_stop_effect.mutant_protein_sequence))
            ### 'MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKGEPHHELPPGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMF'
        
            print(premature_stop_effect.aa_mutation_start_offset)
            ### 341
        
            print(premature_stop_effect.transcript)
            ### Transcript(id=ENST00000269305, name=TP53-001, gene_name=TP53, biotype=protein_coding, location=17:7571720-7590856)
        
            print(premature_stop_effect.gene.name)
            ### 'TP53'
        
        Effect Types
        ------------
        
        ``Intergenic``: Occurs outside of any annotated gene.
        
        ``Intragenic``: Within the annotated boundaries of a gene but not in a
        region that's transcribed into pre-mRNA.
        
        ``IncompleteTranscript``: Can't determine effect since transcript
        annotation is incomplete (often missing either the start or stop codon).
        
        ``NoncodingTranscript``: Transcript doesn't code for a protein.
        
        ``SpliceDonor``: Mutation in the first two nucleotides of an intron,
        likely to affect splicing.
        
        ``SpliceAcceptor``: Mutation in the last two nucleotides of an intron,
        likely to affect splicing.
        
        ``IntronicSpliceSite``: Mutation near the beginning or end of an intron
        but less likely to affect splicing than donor/acceptor mutations.
        
        ``ExonicSpliceSite``: Mutation at the beginning or end of an exon, may
        affect splicing.
        
        ``Intronic``: Variant occurs between exons and is unlikely to affect
        splicing.
        
        ``ThreePrimeUTR``: Variant affects 3' untranslated region after stop
        codon of mRNA.
        
        ``FivePrimeUTR``: Variant affects 5' untranslated region before start
        codon.
        
        ``Silent``: Mutation in coding sequence which does not change the amino
        acid sequence of the translated protein.
        
        ``Substitution``: Coding mutation which causes simple substitution of
        one amino acid for another.
        
        ``Insertion``: Coding mutation which causes insertion of amino acid(s).
        
        ``Deletion``: Coding mutation which causes deletion of amino acid(s).
        
        ``ComplexSubstitution``: Insertion and deletion of multiple amino acids.
        
        ``StartLoss``: Mutation causes loss of start codon, likely result is
        that an alternate start codon will be used down-stream (possibly in a
        different frame).
        
        ``AlternateStartCodon``: Replace annotated start codon with alternative
        start codon (e.g. ATG>CAG).
        
        ``StopLoss``: Loss of stop codon, causes extension of protein by
        translation of nucleotides from 3' UTR.
        
        ``PrematureStop``: Insertion of stop codon, truncates protein.
        
        ``FrameShift``: Out-of-frame insertion or deletion of nucleotides,
        causes novel protein sequence and often premature stop codon.
        
        ``FrameShiftTruncation``: A frameshift which leads immediately to a stop
        codon (no novel amino acids created).
        
        ``ExonLoss``: Deletion of entire exon, significantly disrupts protein.
        
        Coordinate System
        -----------------
        
        Varcode currently uses a "base counted, one start" genomic coordinate
        system, to match the Ensembl annotation database. We are planning to
        switch over to "space counted, zero start" (interbase) coordinates,
        since that system allows for more uniform logic (no special cases for
        insertions). To learn more about genomic coordinate systems, read this
        `blog
        post <http://alternateallele.blogspot.com/2012/03/genome-coordinate-conventions.html>`__.
        
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
