Metadata-Version: 2.1
Name: fuc
Version: 0.15.0
Summary: Frequently used commands in bioinformatics
Home-page: https://github.com/sbslee/fuc
Author: Seung-been "Steven" Lee
Author-email: sbstevenlee@gmail.com
License: MIT
Description: ..
           This file was automatically generated by docs/create.py.
        
        README
        ******
        
        .. image:: https://badge.fury.io/py/fuc.svg
            :target: https://badge.fury.io/py/fuc
        
        .. image:: https://readthedocs.org/projects/sbslee-fuc/badge/?version=latest
           :target: https://sbslee-fuc.readthedocs.io/en/latest/?badge=latest
           :alt: Documentation Status
        
        .. image:: https://anaconda.org/bioconda/fuc/badges/version.svg
           :target: https://anaconda.org/bioconda/fuc
        
        .. image:: https://anaconda.org/bioconda/fuc/badges/license.svg
           :target: https://github.com/sbslee/fuc/blob/main/LICENSE
        
        .. image:: https://anaconda.org/bioconda/fuc/badges/downloads.svg
           :target: https://anaconda.org/bioconda/fuc/files
        
        .. image:: https://anaconda.org/bioconda/fuc/badges/installer/conda.svg
           :target: https://conda.anaconda.org/bioconda
        
        Introduction
        ============
        
        The main goal of the fuc package (pronounced "eff-you-see") is to wrap some of the most **f**\ requently **u**\ sed **c**\ ommands in the field of bioinformatics into one place.
        
        The package supports both command line interface (CLI) and application programming interface (API) whose documentations are available at the `Read the Docs <https://sbslee-fuc.readthedocs.io/en/latest/>`_.
        
        Currently, fuc can be used to analyze, summarize, visualize, and manipulate the following file formats:
        
        - Sequence Alignment/Map (SAM)
        - Binary Alignment/Map (BAM)
        - CRAM
        - Variant Call Format (VCF)
        - Mutation Annotation Format (MAF)
        - Browser Extensible Data (BED)
        - FASTQ
        - FASTA
        - delimiter-separated values format (e.g. comma-separated values or CSV format)
        
        Additionally, fuc can be used to parse output data from the following programs:
        
        - Ensembl Variant Effect Predictor (VEP)
        - SnpEff
        - bcl2fastq and bcl2fastq2
        
        Your contributions (e.g. feature ideas, pull requests) are most welcome.
        
        | Author: Seung-been "Steven" Lee
        | Email: sbstevenlee@gmail.com
        | License: MIT License
        
        Installation
        ============
        
        The following packages are required to run fuc:
        
        .. parsed-literal::
        
           biopython
           lxml
           matplotlib
           matplotlib-venn
           numpy
           pandas
           pyranges
           pysam
           scipy
           seaborn
        
        There are various ways you can install fuc. The recommended way is via conda:
        
        .. code-block:: text
        
           $ conda install -c bioconda fuc
        
        Above will automatically download and install all the dependencies as well. Alternatively, you can use pip to install fuc and all of its dependencies:
        
        .. code-block:: text
        
           $ pip install fuc
        
        Finally, you can clone the GitHub repository and then install fuc locally:
        
        .. code-block:: text
        
           $ git clone https://github.com/sbslee/fuc
           $ cd fuc
           $ pip install .
        
        The nice thing about this approach is that you will have access to development versions that are not available in Anaconda or PyPI. For example, you can access a development branch with the ``git checkout`` command. When you do this, please make sure your environment already has all the dependencies installed.
        
        Getting Help
        ============
        
        For detailed documentations on the fuc package's CLI and API, please refer to the `Read the Docs <https://sbslee-fuc.readthedocs.io/en/latest/>`_.
        
        For getting help on the fuc CLI:
        
        .. code-block:: text
        
           $ fuc -h
           usage: fuc [-h] [-v] COMMAND ...
           
           positional arguments:
             COMMAND
               bam_head     [BAM] Print the header of a SAM/BAM/CRAM file.
               bam_index    [BAM] Index a SAM/BAM/CRAM file.
               bam_rename   [BAM] Rename the samples in a SAM/BAM/CRAM file.
               bam_slice    [BAM] Slice a SAM/BAM/CRAM file.
               bed_intxn    [BED] Find the intersection of two or more BED files.
               bed_sum      [BED] Summarize a BED file.
               fq_count     [FASTQ] Count sequence reads in FASTQ files.
               fq_sum       [FASTQ] Summarize a FASTQ file.
               fuc_compf    [FUC] Compare the contents of two files.
               fuc_demux    [FUC] Parse the Reports directory from bcl2fastq.
               fuc_exist    [FUC] Check whether certain files exist.
               fuc_find     [FUC] Find all files with a certain extension recursively.
               maf_maf2vcf  [MAF] Convert a MAF file to a VCF file.
               maf_oncoplt  [MAF] Create an oncoplot with a MAF file.
               maf_sumplt   [MAF] Create a summary plot with a MAF file.
               maf_vcf2maf  [MAF] Convert a VCF file to a MAF file.
               tbl_merge    [TABLE] Merge two table files.
               tbl_sum      [TABLE] Summarize a table file.
               vcf_filter   [VCF] Filter a VCF file.
               vcf_merge    [VCF] Merge two or more VCF files.
               vcf_rename   [VCF] Rename the samples in a VCF file.
               vcf_slice    [VCF] Slice a VCF file.
               vcf_vcf2bed  [VCF] Convert a VCF file to a BED file.
               vcf_vep      [VCF] Filter a VCF file annotated by Ensemble VEP.
           
           optional arguments:
             -h, --help     Show this help message and exit.
             -v, --version  Show the version number and exit.
        
        For getting help on a specific command (e.g. vcf_merge):
        
        .. code-block:: text
        
           $ fuc vcf_merge -h
        
        Below is the list of submodules available in the fuc API:
        
        - **common** : The common submodule is used by other fuc submodules such as pyvcf and pybed. It also provides many day-to-day actions used in the field of bioinformatics.
        - **pybam** : The pybam submodule is designed for working with sequence alignment files (SAM/BAM/CRAM). It essentially wraps the `pysam <https://pysam.readthedocs.io/en/latest/api.html>`_ package to allow fast computation and easy manipulation.
        - **pybed** : The pybed submodule is designed for working with BED files. It implements ``pybed.BedFrame`` which stores BED data as ``pandas.DataFrame`` via the `pyranges <https://github.com/biocore-ntnu/pyranges>`_ package to allow fast computation and easy manipulation. The submodule strictly adheres to the standard `BED specification <https://genome.ucsc.edu/FAQ/FAQformat.html>`_.
        - **pycov** : The pycov submodule is designed for working with depth of coverage data from sequence alingment files (SAM/BAM/CRAM). It implements ``pycov.CovFrame`` which stores read depth data as ``pandas.DataFrame`` via the `pysam <https://pysam.readthedocs.io/en/latest/api.html>`_ package to allow fast computation and easy manipulation.
        - **pyfq** : The pyfq submodule is designed for working with FASTQ files. It implements ``pyfq.FqFrame`` which stores FASTQ data as ``pandas.DataFrame`` to allow fast computation and easy manipulation.
        - **pymaf** : The pymaf submodule is designed for working with MAF files. It implements ``pymaf.MafFrame`` which stores MAF data as ``pandas.DataFrame`` to allow fast computation and easy manipulation. The ``pymaf.MafFrame`` class also contains many useful plotting methods such as ``MafFrame.plot_oncoplot`` and ``MafFrame.plot_summary``. The submodule strictly adheres to the standard `MAF specification <https://docs.gdc.cancer.gov/Data/File_Formats/MAF_Format/>`_.
        - **pysnpeff** : The pysnpeff submodule is designed for parsing VCF annotation data from the `SnpEff <https://pcingola.github.io/SnpEff/>`_ program. It should be used with ``pyvcf.VcfFrame``.
        - **pyvcf** : The pyvcf submodule is designed for working with VCF files. It implements ``pyvcf.VcfFrame`` which stores VCF data as ``pandas.DataFrame`` to allow fast computation and easy manipulation. The ``pyvcf.VcfFrame`` class also contains many useful plotting methods such as ``VcfFrame.plot_comparison`` and ``VcfFrame.plot_tmb``. The submodule strictly adheres to the standard `VCF specification <https://samtools.github.io/hts-specs/VCFv4.3.pdf>`_.
        - **pyvep** : The pyvep submodule is designed for parsing VCF annotation data from the `Ensembl VEP <https://asia.ensembl.org/info/docs/tools/vep/index.html>`_ program. It should be used with ``pyvcf.VcfFrame``.
        
        For getting help on a specific submodule (e.g. pyvcf):
        
        .. code:: python3
        
           >>> from fuc import pyvcf
           >>> help(pyvcf)
        
        CLI Examples
        ============
        
        **SAM/BAM/CRAM**
        
        To print the header of a SAM file:
        
        .. code-block:: text
        
           $ fuc bam_head in.sam
        
        To index a CRAM file:
        
        .. code-block:: text
        
           $ fuc bam_index in.cram
        
        To rename the samples in a SAM file:
        
        .. code-block:: text
        
           $ fuc bam_rename in.sam NA12878 > out.sam
        
        To slice a BAM file:
        
        .. code-block:: text
        
           $ fuc bam_slice in.bam chr1:100-200 > out.bam
        
        **BED**
        
        To find intersection between BED files:
        
        .. code-block:: text
        
           $ fuc bed_intxn 1.bed 2.bed 3.bed > intersect.bed
        
        **FASTQ**
        
        To count sequence reads in a FASTQ file:
        
        .. code-block:: text
        
           $ fuc fq_count example.fastq
        
        **FUC**
        
        To check whether a file exists in the operating system:
        
        .. code-block:: text
        
           $ fuc fuc_exist example.txt
        
        To find all VCF files within the current directory recursively:
        
        .. code-block:: text
        
           $ fuc fuc_find .vcf.gz
        
        **TABLE**
        
        To merge two tab-delimited files:
        
        .. code-block:: text
        
           $ fuc tbl_merge left.tsv right.tsv > merged.tsv
        
        **VCF**
        
        To merge VCF files:
        
        .. code-block:: text
        
           $ fuc vcf_merge 1.vcf 2.vcf 3.vcf > merged.vcf
        
        To filter a VCF file annotated by Ensemble VEP:
        
        .. code-block:: text
        
           $ fuc vcf_vep in.vcf 'SYMBOL == "TP53"' > out.vcf
        
        API Examples
        ============
        
        **BAM**
        
        To create read depth profile of a region from a CRAM file:
        
        .. code:: python3
        
            >>> from fuc import pycov
            >>> cf = pycov.CovFrame.from_file('HG00525.final.cram', zero=True,
            ...    region='chr12:21161194-21239796', names=['HG00525'])
            >>> cf.plot_region('chr12:21161194-21239796')
        
        .. image:: https://raw.githubusercontent.com/sbslee/fuc-data/main/images/coverage.png
        
        **VCF**
        
        To filter a VCF file based on a BED file:
        
        .. code:: python3
        
           >>> from fuc import pyvcf
           >>> vf = pyvcf.VcfFrame.from_file('original.vcf')
           >>> filtered_vf = vf.filter_bed('targets.bed')
           >>> filtered_vf.to_file('filtered.vcf')
        
        To remove indels from a VCF file:
        
        .. code:: python3
        
           >>> from fuc import pyvcf
           >>> vf = pyvcf.VcfFrame.from_file('with_indels.vcf')
           >>> filtered_vf = vf.filter_indel()
           >>> filtered_vf.to_file('no_indels.vcf')
        
        To create a Venn diagram showing genotype concordance between groups:
        
        .. code:: python3
        
            >>> from fuc import pyvcf, common
            >>> common.load_dataset('pyvcf')
            >>> f = '~/fuc-data/pyvcf/plot_comparison.vcf'
            >>> vf = pyvcf.VcfFrame.from_file(f)
            >>> a = ['Steven_A', 'John_A', 'Sara_A']
            >>> b = ['Steven_B', 'John_B', 'Sara_B']
            >>> c = ['Steven_C', 'John_C', 'Sara_C']
            >>> vf.plot_comparison(a, b, c)
        
        .. image:: https://raw.githubusercontent.com/sbslee/fuc-data/main/images/plot_comparison.png
        
        To create various figures for normal-tumor analysis:
        
        .. code:: python3
        
            >>> import matplotlib.pyplot as plt
            >>> from fuc import common, pyvcf
            >>> common.load_dataset('pyvcf')
            >>> vf = pyvcf.VcfFrame.from_file('~/fuc-data/pyvcf/normal-tumor.vcf')
            >>> af = pyvcf.AnnFrame.from_file('~/fuc-data/pyvcf/normal-tumor-annot.tsv', 'Sample')
            >>> normal = af.df[af.df.Tissue == 'Normal'].index
            >>> tumor = af.df[af.df.Tissue == 'Tumor'].index
            >>> fig, [[ax1, ax2], [ax3, ax4]] = plt.subplots(2, 2, figsize=(10, 10))
            >>> vf.plot_tmb(ax=ax1)
            >>> vf.plot_tmb(ax=ax2, af=af, hue='Tissue')
            >>> vf.plot_hist('DP', ax=ax3, af=af, hue='Tissue')
            >>> vf.plot_regplot(normal, tumor, ax=ax4)
            >>> plt.tight_layout()
        
        .. image:: https://raw.githubusercontent.com/sbslee/fuc-data/main/images/normal-tumor.png
        
        **MAF**
        
        To create an oncoplot with a MAF file:
        
        .. code:: python3
        
            >>> from fuc import common, pymaf
            >>> common.load_dataset('tcga-laml')
            >>> f = '~/fuc-data/tcga-laml/tcga_laml.maf.gz'
            >>> mf = pymaf.MafFrame.from_file(f)
            >>> mf.plot_oncoplot()
        
        .. image:: https://raw.githubusercontent.com/sbslee/fuc-data/main/images/oncoplot.png
        
        To create a customized oncoplot with a MAF file, see the 'Create customized oncoplot' tutorial:
        
        .. image:: https://raw.githubusercontent.com/sbslee/fuc-data/main/images/customized_oncoplot.png
        
        To create a summary figure for a MAF file:
        
        .. code:: python3
        
            >>> from fuc import common, pymaf
            >>> common.load_dataset('tcga-laml')
            >>> f = '~/fuc-data/tcga-laml/tcga_laml.maf.gz'
            >>> mf = pymaf.MafFrame.from_file(f)
            >>> mf.plot_summary()
        
        .. image:: https://raw.githubusercontent.com/sbslee/fuc-data/main/images/maf_summary.png
        
        
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Description-Content-Type: text/x-rst
