Metadata-Version: 2.1
Name: usum
Version: 0.1.1
Summary: USUM: Plotting sequence similarity using USEARCH & UMAP
Home-page: https://github.com/prihoda/usum
Author: David Příhoda
Author-email: david.prihoda@gmail.com
License: MIT
Description: # USUM: Plotting sequence similarity using USEARCH & UMAP
        
        USUM uses [USEARCH](https://drive5.com/usearch/) and [UMAP](https://github.com/lmcinnes/umap) to plot DNA 🧬and protein 🧶 sequence similarity embeddings.
        
        [![PyPI - Downloads](https://img.shields.io/pypi/dm/usum.svg?color=green&label=PyPI%20downloads)](https://pypi.python.org/pypi/usum/)
        [![PyPI license](https://img.shields.io/pypi/l/usum.svg)](https://pypi.python.org/pypi/usum/)
        [![PyPI version](https://badge.fury.io/py/usum.svg)](https://pypi.python.org/pypi/usum/)
        
        ## Installation
        
        Install `UCLUST` manually: https://drive5.com/usearch/download.html (consider supporting the author by buying the 64bit license)
        
        Install `usum` using PIP:
        
        ```bash
        pip install usum
        ```
        
        ## Usage
        
        ### Minimal example
        
        ```bash
        usum sequences.fa --maxdist 0.2 --termdist 0.3 --output umap
        ```
        
        ### Multiple input files with labels
        
        ```bash
        usum first.fa second.fa --labels First Second --maxdist 0.2 --termdist 0.3 --output umap
        ```
        
        This will produce a PNG plot:
        
        ![UMAP static example](docs/example1.png?raw=true "UMAP static example")
        
        An interactive [Bokeh](https://bokeh.org) HTML plot is also created:
        
        ![UMAP Bokeh example](docs/example2.png?raw=true "UMAP Bokeh example")
        
        ### Programmatic use
        
        ```python
        from usum import usum
        
        # Show help
        help(usum)
        
        # Run USUM
        usum(inputs=['input.fa'], output='usum', maxdist=0.2, termdist=0.3)
        ```
        
        ## How it works
        
        - A sparse distance matrix is calculated using USEARCH [calc_distmx](https://drive5.com/usearch/manual/cmd_calc_distmx.html) command. 
        - The distance matrix is embedded as `precomputed` metric using [UMAP](https://github.com/lmcinnes/umap) 
        - The embedding is plotted using [umap.plot](https://umap-learn.readthedocs.io/en/latest/plotting.html).
        
Keywords: dna,protein,sequence,similarity,umap,usearch,uclust,plot
Platform: UNKNOWN
Requires-Python: >=3.6
Description-Content-Type: text/markdown
