Metadata-Version: 2.1
Name: seq-to-first-iso
Version: 0.4.2
Summary: Compute first two isotopologues intensity from peptide sequence.
Home-page: https://github.com/pierrepo/seq-to-first-iso
Author: Lilian Yang-crosson, Pierre Poulain
Author-email: pierre.poulain@cupnet.net
License: BSD 3-Clause License
Keywords: proteomics,spectrometry,isotopologues
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Chemistry
Classifier: Natural Language :: English
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: pandas
Requires-Dist: pyteomics[unimod]

# Seq-to-first-iso

> Compute first two isotopologues intensity from peptide sequence

The program differentiate labelled and unlabelled amino acids
for a 99.99 % 12C enrichment.

## Installation

### pip-based

```
$ pip install seq-to-first-iso
```

### Developer mode


Install [conda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html)

Clone repo:
```
$ git clone https://github.com/pierrepo/seq-to-first-iso
$ cd seq-to-first-iso
```

Create conda environment:
```
$ conda env create -f environment.yml
```

Remark: for a fully reproducible environment, you could also use:
```
$ conda env create -f environment.lock.yml
```


Activate conda environment:
```
$ conda activate seq-to-first-iso
```

Install local package:
```
$ pip install -e .
```

## Usage

The script takes a file with one sequence of amino acids per line and returns a tsv of the file with columns:

|sequence|mass|formula|formula_X| M0_NC | M1_NC | M0_12C | M1_12C |
|--------|----|-------|---------|-------|-------|--------|--------|

Once installed, the script can be called with:

```shell
$ seq-to-first-iso filename [-o output_name] [-n amino_acids...]
```
Optional arguments are in square brackets  
This will create _filename_stfi.tsv_ if filename is a correct file

**0.3.0 :** The input file can have annotations separated by a tabulation before the sequences  
**0.4.0 :** Support for [X!Tandem](https://www.thegpm.org/tandem/) Post-Translational Modifications added

### Options

- `-h, --help`:  
Provide a help page

- `-o, --output`:  
Change the name of the output file

- `-n, --non-labelled-aa`:  
Take 1 or more amino acid separated by a comma


### Examples

- You can provide a list of amino acids which will keep default isotopic abundance:

Supposing [peptides.txt](https://github.com/pierrepo/seq-to-first-iso/blob/master/demo/peptides.txt) :

```
YAQEISR
VGFPVLSVKEHK
LAMVIIKEFVDDLK
```

The command
```shell
$ seq-to-first-iso peptides.txt -n V,W
```
will create [peptides_stfi.tsv](https://github.com/pierrepo/seq-to-first-iso/blob/master/demo/peptides_stfi.tsv) :

|sequence| mass| formula|formula_X| M0_NC| M1_NC| M0_12C| M1_12C|
|--------|-----|--------|---------|------|------|-------|-------|
YAQEISR| 865.42938099921| C37H59O13N11| C37H59O13N11| 0.6206414140575179|	0.280870823368276| 0.9206561231798033| 0.05161907174495234|
VGFPVLSVKEHK| 1338.7659712609| C63H102O16N16| C48H102O16N16X15|  0.4550358985377136| 0.34506032928190855| 0.7589558393662944| 0.18515489894512063|
LAMVIIKEFVDDLK| 1632.91606619252| C76H128O21N16S1| C66H128O21N16S1X10| 0.36994021481230627| 0.3373188347614264| 0.7475090558698947| 0.15292723586285323|

Where, in 12C enrichment conditions, the isotopologue intensity M0_12C and M1_12C are computed with unlabelled Valine and Tryptophan (V and W have default isotopic abundance)


- You can change the name of the output file:

```shell
$ seq-to-first-iso peptides.txt -o sequence
```
will create a file named *sequence.tsv*


## Credits


- **Bioconda**:
  - Grüning, Björn, Ryan Dale, Andreas Sjödin, Brad A. Chapman, Jillian Rowe, Christopher H. Tomkins-Tinch, Renan Valieris, the Bioconda Team, and Johannes Köster. 2018. “Bioconda: Sustainable and Comprehensive Software Distribution for the Life Sciences”. Nature Methods, 2018 doi:10.1038/s41592-018-0046-7.

- **Pyteomics**:
  - Goloborodko, A.A.; Levitsky, L.I.; Ivanov, M.V.; and Gorshkov, M.V. (2013) “Pyteomics - a Python Framework for Exploratory Data Analysis and Rapid Software Prototyping in Proteomics”, Journal of The American Society for Mass Spectrometry, 24(2), 301–304. DOI: 10.1007/s13361-012-0516-6

  - Levitsky, L.I.; Klein, J.; Ivanov, M.V.; and Gorshkov, M.V. (2018) “Pyteomics 4.0: five years of development of a Python proteomics framework”, Journal of Proteome Research. DOI: 10.1021/acs.jproteome.8b00717

- **MIDAs**:
  - Alves G, Ogurtsov AY, Yu YK (2014) Molecular Isotopic Distribution Analysis (MIDAs) with adjustable mass accuracy. J Am Soc Mass Spectrom, 25: 57-70. DOI: 10.1007/s13361-013-0733-7

# Changelog

**Dev**

**0.4.2** (2019-06-25)

- Fix *setup.cfg*'s installation requirements

**0.4.1** (2019-06-24)

- Extends numpydoc style to all functions in *seq_to_first_iso.py*

**0.4.0** (2019-06-21)

*Changed*
- Add support for Xtandem Parsing
  - **Breaking change:** sequence_parser() now returns a dict with "annotations", "raw_sequences", "sequences", "modifications" and "ignored_lines"
  - Add get_mods_composition() that returns a composition from a list of Unimod PTMs
  - Remove the appended "\_stfi" if -o flag is provided

**0.3.0** (2019-04-18)

*Changed*
- Add support for files with annotations before the sequences
  - **Breaking change:** sequence_parser() now returns (annotations, sequences, ignored_lines)
  - seq_to_tsv() now accepts (sequences, unlabelled_aa, annotations=None)

*Fixed*
- Output files now have "\_stfi" appended to differentiate from .tsv input files with the same name

**0.2.1** (2019-04-17)
- Format CHANGELOG

**0.2.0** (2019-04-17)
- Add bumpversion

*Changed*
- seq_to_tsv() no longer writes a file, instead it returns a dataframe

**0.1.0** (2019-04-08)
- First release

BSD 3-Clause License

Copyright (c) 2019, Lilian Yang-crosson All rights reserved.
Copyright (c) 2019, Pierre Poulain All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

* Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


