Metadata-Version: 2.1
Name: Vicinator
Version: 0.0.26
Summary: A small python package to trace orthology neighborhood across feature files
Home-page: https://github.com/ba1/vicinator
Author: Ba1
Author-email: djahanschiri@bio.uni-frankfurt.de
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: ansi2html (>=1.5.2)
Requires-Dist: colorama (>=0.4.4)
Requires-Dist: ete3 (>=3.1.2)
Requires-Dist: pandas (>=1.1.3)
Requires-Dist: importlib-metadata (>=3.1.1)

[![Build Status](https://www.travis-ci.org/ba1/Vicinator.svg?branch=master)](https://www.travis-ci.org/ba1/Vicinator) 
[![codecov](https://codecov.io/gh/ba1/Vicinator/branch/master/graph/badge.svg)](https://codecov.io/gh/ba1/Vicinator) 
[![PyPI version](https://badge.fury.io/py/Vicinator.svg)](https://badge.fury.io/py/Vicinator) 
[![Requirements Status](https://requires.io/github/ba1/Vicinator/requirements.svg?branch=master)](https://requires.io/github/ba1/Vicinator/requirements/?branch=master) 
[![Documentation Status](https://readthedocs.org/projects/vicinator/badge/?version=latest)](https://vicinator.readthedocs.io/en/latest/?badge=latest) 
[![Code style:black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

# Vicinator

### What is Vicinator for?

Vicinator traces and visualizes the microsynteny of a window of orthologs across genomes. It takes as input a
mapping of proteins across different genomes to protein groups (typically orthologous groups) and
a collection of the genome feature files, i.e. *.gff* or *_feature_table.txt*. With a user 
specified center-protein on a reference genome and a neighborhood size the program starts tracing
this window across the genomes.

### What is Vicinator not for?

Vicinator relies on a pre-computed grouping of proteins across genomes. It can not find these 
groups of genes for you.

### Installation

Vicinator is written for Python 3.6+

Its recommended to install vicinator in a virtual environment, e.g. with venv:

`python3 -m venv myenv`

This activates the new environment myenv. Then, while activated, install the latest version via pip.
This downloads and installs all unmet requirements automatically.

`pip install --upgrades vicinator`

Requirements:
  -    ansi2html>=1.5.2
  -    colorama>=0.4.4
  -    ete3>=3.1.2
  -    pandas>=1.1.3
  -    importlib-metadata>=3.1.1

### Options

```
python3 vicinator/vicinator.py --help

usage: Vicinator [-h] --tabular-ortholog-groups <orthology_table>
                 --feat-tables-dir <dir_path> --reference <file_path>
                 --centerprotein-accession <str> --extension-size <int>
                 [--tree <newick_tree_file_path>] [--outdir <dir_path>]
                 [--prefix <str>] [--outputlabel-map <file_path>]
                 [--nprocs <int>] [--force] [--version]

Track Microsynteny of target proteins and its orthologs across genomes.

required arguments:
  --tabular-ortholog-groups <orthology_table>
                        path to mapping file with format
                        ortholog_group_id<tab>genome_id<tab>protein_seq_id
  --feat-tables-dir <dir_path>
                        path to directory of *.feature_tables.txt or *.gff3
                        files that shall be screen

required arguments (neighborhood):
  --reference <file_path>
                        path to a ncbi style feature table file that acts as a
                        reference
  --centerprotein-accession <str>
                        unique identifier of the central gene of the window
  --extension-size <int>
                        defines the #features that are co-checked to the left
                        and right of the centerprotein

optional arguments (output):
  --tree <newick_tree_file_path>
                        path to newick tree that includes all taxa to be
                        screened
  --outdir <dir_path>   path to desired output directory
  --prefix <str>        if option is set, shows intergenic distances of genes
                        surrounding the center gene
  --outputlabel-map <file_path>
                        Attempts to replace genome accessions in the outputs
                        with a replacement string. Requires a two-column map
                        file formatted like so: 'genome file accession' <tab>
                        'replacement string'

optional arguments (run):
  --nprocs <int>        Number of CPUs for parallel processing of genomes.
                        Default: Number of CPUs-1
  --force               if option is set, existing ortholog databases in the
                        output dir are ignored and will be overwritten
```

### Input: Required Arguments

`--tabular-ortholog-groups <orthology_table>`

>Vicinator requires a tab-separated three-column mapping of orthologs that is formatted like so:
>
> **group_id** tab **genome_id** tab **protein_id**

`  --feat-tables-dir <dir_path>`

>Vicinator expects the path to a directory containing *.gff* format or *_feature_table.txt* 
> files of all the genomes you want to trace the microsynteny in.
>
> A recommended source for these files is NCBI RefSeq. For the mapping to work, the filenames 
> should correspond to the **genome_ids** specified in the mapping file:
> 
> e.g. the entry: **ortho_group1    genome_1   protein_1**
> corresponds to a feature file named **genome_1.gff** or **genome_1_feature_table.txt** 
> in the specified directory.

`--reference <file_path>`
> the path to a reference genome feature file where the center-protein accession must be found

`--centerprotein-accession` & `--extension-size <int>`

>Identifies the window of vicinity around a center-protein which is traced based on the findings in the reference 
> genome.  
> E.g.  
> Reference Genome: ... GeneT [ GeneU GeneV **GeneW** GeneX GeneY ] GeneZ ...
> with center protein GeneW and an extension size of 2, brackets indicate window boundaries

## Example Basic Usage

`vicinator --tabular-ortholog-groups orthogenome_map.tsv --feat-tables-dir ./gff_dir --outdir ./results --reference gff_dir/MUSMU@10090@1.gff --centerprotein XP_006539605.1 --extension-size 3`


