Metadata-Version: 2.1
Name: fuc
Version: 0.10.0
Summary: Frequently used commands in bioinformatics
Home-page: https://github.com/sbslee/fuc
Author: Seung-been "Steven" Lee
Author-email: sbstevenlee@gmail.com
License: MIT
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Description-Content-Type: text/x-rst
Requires-Dist: biopython
Requires-Dist: lxml
Requires-Dist: matplotlib
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: pyranges
Requires-Dist: pysam
Requires-Dist: seaborn

..
   This file was automatically generated by docs/create.py.

README
******

.. image:: https://badge.fury.io/py/fuc.svg
    :target: https://badge.fury.io/py/fuc

.. image:: https://readthedocs.org/projects/sbslee-fuc/badge/?version=latest
   :target: https://sbslee-fuc.readthedocs.io/en/latest/?badge=latest
   :alt: Documentation Status

.. image:: https://anaconda.org/bioconda/fuc/badges/version.svg
   :target: https://anaconda.org/bioconda/fuc

.. image:: https://anaconda.org/bioconda/fuc/badges/license.svg
   :target: https://github.com/sbslee/fuc/blob/main/LICENSE

.. image:: https://anaconda.org/bioconda/fuc/badges/downloads.svg
   :target: https://anaconda.org/bioconda/fuc/files

.. image:: https://anaconda.org/bioconda/fuc/badges/installer/conda.svg
   :target: https://conda.anaconda.org/bioconda

Introduction
============

The main goal of the fuc package is to wrap some of the most frequently used commands in the field of bioinformatics into one place.

You can use fuc for both command line interface (CLI) and application programming interface (API) whose documentations are available at `Read the Docs <https://sbslee-fuc.readthedocs.io/en/latest/>`_.

Currently, the following file formats are supported by fuc:

- Sequence Alignment/Map (SAM)
- Binary Alignment/Map (BAM)
- CRAM
- Variant Call Format (VCF)
- Mutation Annotation Format (MAF)
- Browser Extensible Data (BED)
- FASTQ
- delimiter-separated values format (e.g. comma-separated values or CSV format)

Additionally, you can use fuc to parse output data from the following programs:

- Ensembl Variant Effect Predictor (VEP)
- SnpEff
- bcl2fastq and bcl2fastq2

Your contributions (e.g. feature ideas, pull requests) are most welcome.

| Author: Seung-been "Steven" Lee
| Email: sbstevenlee@gmail.com
| License: MIT License

CLI Examples
============

To print the header of a BAM file:

.. code-block:: console

   $ fuc bam_head example.bam

To find intersection between BED files:

.. code-block:: console

   $ fuc bed_intxn 1.bed 2.bed 3.bed > intersect.bed

To count sequence reads in a FASTQ file:

.. code-block:: console

   $ fuc fq_count example.fastq

To check whether a file exists in the operating system:

.. code-block:: console

   $ fuc fuc_exist example.txt

To find all VCF files within the current directory recursively:

.. code-block:: console

   $ fuc fuc_find . vcf

To merge two tab-delimited files:

.. code-block:: console

   $ fuc tbl_merge left.txt right.txt > merged.txt

To merge VCF files:

.. code-block:: console

   $ fuc vcf_merge 1.vcf 2.vcf 3.vcf > merged.vcf

API Examples
============

To filter a VCF file based on a BED file:

.. code:: python3

   >>> from fuc import pyvcf
   >>> vf = pyvcf.VcfFrame.from_file('original.vcf')
   >>> filtered_vf = vf.filter_bed('targets.bed')
   >>> filtered_vf.to_file('filtered.vcf')

To remove indels from a VCF file:

.. code:: python3

   >>> from fuc import pyvcf
   >>> vf = pyvcf.VcfFrame.from_file('with_indels.vcf')
   >>> filtered_vf = vf.filter_indel()
   >>> filtered_vf.to_file('no_indels.vcf')

To create an oncoplot with a MAF file:

.. code:: python3

    >>> import matplotlib.pyplot as plt
    >>> from fuc import common, pymaf
    >>> common.load_dataset('tcga-laml')
    >>> f = '~/fuc-data/tcga-laml/tcga_laml.maf.gz'
    >>> mf = pymaf.MafFrame.from_file(f)
    >>> mf.plot_oncoplot()

.. image:: https://raw.githubusercontent.com/sbslee/fuc-data/main/images/oncoplot.png

To create a customized oncoplot with a MAF file, see the 'Create customized oncoplot' tutorial:

.. image:: https://raw.githubusercontent.com/sbslee/fuc-data/main/images/customized_oncoplot.png

To create a summary figure for a MAF file:

.. code:: python3

    >>> import matplotlib.pyplot as plt
    >>> from fuc import common, pymaf
    >>> common.load_dataset('tcga-laml')
    >>> f = '~/fuc-data/tcga-laml/tcga_laml.maf.gz'
    >>> mf = pymaf.MafFrame.from_file(f)
    >>> mf.plot_summary()

.. image:: https://raw.githubusercontent.com/sbslee/fuc-data/main/images/maf_summary.png

To create read depth profile of a region from a CRAM file:

.. code:: python3

    >>> import matplotlib.pyplot as plt
    >>> from fuc import pycov
    >>> cf = pycov.CovFrame.from_file('HG00525.final.cram', zero=True,
    ...    region='chr12:21161194-21239796', names=['HG00525'])
    >>> cf.plot_region('chr12', start=21161194, end=21239796)

.. image:: https://raw.githubusercontent.com/sbslee/fuc-data/main/images/coverage.png

Installation
============

The following packages are required to run fuc:

.. parsed-literal::

   biopython
   lxml
   matplotlib
   numpy
   pandas
   pyranges
   pysam
   seaborn

There are various ways you can install fuc. The recommended way is via conda:

.. code-block:: console

   $ conda install -c bioconda fuc

Above will automatically download and install all the dependencies as well. Alternatively, you can use pip to install fuc and all of its dependencies:

.. code-block:: console

   $ pip install fuc

Finally, you can clone the GitHub repository and then install fuc this way:

.. code-block:: console

   $ git clone https://github.com/sbslee/fuc
   $ cd fuc
   $ pip install .

The nice thing about this approach is that you will have access to development versions that are not available in Anaconda or PyPI. For example, you can access a development branch with the ``git checkout`` command.

Getting Help
============

For detailed documentations on fuc's CLI and API, please refer to the `Read the Docs <https://sbslee-fuc.readthedocs.io/en/latest/>`_.

For getting help on CLI:

.. code-block:: console

   $ fuc -h
   usage: fuc [-h] [-v] COMMAND ...

   positional arguments:
     COMMAND        name of the command
       bam_head     [BAM] print the header of a BAM file
       bam_index    [BAM] index a BAM file
       bam_rename   [BAM] add a new sample name to a BAM file
       bam_slice    [BAM] slice a BAM file
       bed_intxn    [BED] find intersection of two or more BED files
       bed_sum      [BED] summarize a BED file
       fq_count     [FASTQ] count sequence reads in FASTQ files
       fq_sum       [FASTQ] summarize a FASTQ file
       fuc_compf    [FUC] compare contents of two files
       fuc_demux    [FUC] parse Reports directory from bcl2fastq or bcl2fastq2
       fuc_exist    [FUC] check whether files/dirs exist
       fuc_find     [FUC] find files with certain extension recursively
       maf_oncoplt  [MAF] create an oncoplot from a MAF file
       maf_sumplt   [MAF] create a summary plot for a MAF file
       maf_vcf2maf  [MAF] convert an annotated VCF file to a MAF file
       tbl_merge    [TABLE] merge two table files
       tbl_sum      [TABLE] summarize a table file
       vcf_merge    [VCF] merge two or more VCF files
       vcf_slice    [VCF] slice a VCF file
       vcf_vcf2bed  [VCF] convert a VCF file to a BED file

   optional arguments:
     -h, --help     show this help message and exit
     -v, --version  show the version number and exit

For getting help on a specific command (e.g. vcf_merge):

.. code-block:: console

   $ fuc vcf_merge -h

Below is the list of submodules available in API:

- **common** : The common submodule is used by other fuc submodules such as pyvcf and pybed. It also provides many day-to-day actions used in the field of bioinformatics.
- **pybam** : The pybam submodule is designed for working with sequence alignment files (SAM/BAM/CRAM). It essentially wraps the `pysam <https://pysam.readthedocs.io/en/latest/api.html>`_ package to allow fast computation and easy manipulation.
- **pybed** : The pybed submodule is designed for working with BED files. It implements ``pybed.BedFrame`` which stores BED data as ``pandas.DataFrame`` via the `pyranges <https://github.com/biocore-ntnu/pyranges>`_ package to allow fast computation and easy manipulation. The submodule strictly adheres to the standard `BED specification <https://genome.ucsc.edu/FAQ/FAQformat.html>`_.
- **pycov** : The pycov submodule is designed for working with depth of coverage data from sequence alingment files (SAM/BAM/CRAM). It implements ``pycov.CovFrame`` which stores read depth data as ``pandas.DataFrame`` via the `pysam <https://pysam.readthedocs.io/en/latest/api.html>`_ package to allow fast computation and easy manipulation.
- **pyfq** : The pyfq submodule is designed for working with FASTQ files. It implements ``pyfq.FqFrame`` which stores FASTQ data as ``pandas.DataFrame`` to allow fast computation and easy manipulation.
- **pymaf** : The pymaf submodule is designed for working with MAF files. It implements ``pymaf.MafFrame`` which stores MAF data as ``pandas.DataFrame`` to allow fast computation and easy manipulation. The class also contains many useful plotting methods such as ``MafFrame.plot_varcls`` and ``MafFrame.plot_waterfall``. The submodule strictly adheres to the standard `MAF specification <https://docs.gdc.cancer.gov/Data/File_Formats/MAF_Format/>`_.
- **pysnpeff** : The pysnpeff submodule is designed for parsing VCF annotation data from the `SnpEff <https://pcingola.github.io/SnpEff/>`_ program. It is designed to be used with ``pyvcf.VcfFrame``.
- **pyvcf** : The pyvcf submodule is designed for working with VCF files. It implements ``pyvcf.VcfFrame`` class which stores VCF data as ``pandas.DataFrame`` to allow fast computation and easy manipulation. The submodule strictly adheres to the standard `VCF specification <https://samtools.github.io/hts-specs/VCFv4.3.pdf>`_.
- **pyvep** : The pyvep submodule is designed for parsing VCF annotation data from the `Ensembl VEP <https://asia.ensembl.org/info/docs/tools/vep/index.html>`_. It is designed to be used with ``pyvcf.VcfFrame``.

For getting help on a specific module (e.g. pyvcf):

.. code:: python3

   from fuc import pyvcf
   help(pyvcf)



