Metadata-Version: 2.1
Name: proteoformquant
Version: 1.16
Summary: A python command line tool for the quantification of peptidoform/proteoforms
Home-page: 
Author: Arthur Grimaud
Author-email: arthur582@hotmail.fr
Classifier: Programming Language :: Python :: 3.8
Description-Content-Type: text/markdown

# Proteoformquant

Proteoformquant is a Python tool for quantitative analysis of proteoforms from mass spectrometry data.

# Setup/Installation

## Via PyPi repository (Recommended)

### 1. Install Proteoformquant package

The  simplest way to use Proteformquant is to downloading it as a package from the PyPi repository using pip.

```bash
pip install proteoformquant
```
You should then be able to run Proteoformquant by running the following line in a terminal

### 2. Run Proteoformquant

```bash
proteoformquant
```
Access help by running

```bash
proteoformquant -h
```

More information on how to use proteoformquant is avaible in the 'usage' section of this document.

## Via github (Alternative)

You also have the possibility to clone the repository from github and manually install the dependencies. You will need to create a Conda environment.

### 1. Install Conda/Mamba

If not already done, install Conda (https://docs.conda.io/projects/conda/en/stable/user-guide/install/index.html#regular-installation)

If you wish to use Mamba to create the Conda environment (faster) you can install mamba by running the following command in a terminal:

```bash
conda install mamba -n base -c conda-forge
```
### 2. Clone proteoformquant repository

```bash
git clone https://github.com/arthur-grimaud/Proteoformquant.git
```
### 3. Create and activate the environment

Next, create the environment using either Conda or Mamba by running the following command in the folder where 'environment.yml' is located

```bash
# With Conda
conda env create --file environment.yml
#With Manba
mamba env create --file environment.yml
```
you should now be able to activate the environment with:

```bash
mamba activate pfq-env
```
### 4. Run Proteoformquant

Run Proteoformquant by running the 'proteoformquant.py' script in 'src/proteoformquant'

```bash
python3 src/proteoformquant/proteoformquant.py
```

## Usage

(n.b the command line listed here are given for the installation of Proteoformquant as a package. you will need to adapt the commands if you use the second installation method)

Proteoform requires 3 input files:

- spectra file (.mgf or .mzml)
- indentification file (.mzid) (Recommended: MSAmanda output)
- a parameter file (.json)

A parameter file can be generated by running.

```bash
proteoformquant -cp
```

If you do not change the name or location of the parameter file you can run proteoformquant as follow

```bash
proteoformquant -i path/to/identification/file.mzid -s path/to/spectra/file.mgf
```
by default this will create an output file 'output/' in the local directory. If you wish to change that use the -d parameter

```bash
proteoformquant -i path/to/identification/file.mzid -s path/to/spectra/file.mgf -d path/to/my_output_folder
```

similarly you can change the outfile name with the -o parameter

```bash
proteoformquant -i path/to/identification/file.mzid -s path/to/spectra/file.mgf -d path/to/my_output_folder -o output_file_1
```

## 5. Output Format

### Quantification File 

Below is the description of each column present in the quantification table ("quant_XXX.csv")

* proforma: Peptidoform in Proforma Nomenclature
* sequence: Peptidoform amino acid sequence.
* brno: Modification notation indicating the type and location of post-translational modifications on the amino acid sequence.
* protein: Accession numbers of the proteins the peptidoform is associated with, delimited by a semicolon if multiple.
* intensity: Absolute intensity value of peptidoforms after quantification in chimeric spectra.
* intensity_r1: Absolute intensity value of peptidoforms using only Rank 1 PSMs.
* linked_psm: The total number of PSMs corresponding to a peptidoform.
* linked_psm_validated: The number of PSMs validated after quantification in chimeric spectra.
* rt_peak: The retention time value in seconds at the apex of the elution profile.
* auc: The area under the curve, which can be used for quantification but is not recommended.
* ambiguity: The number of spectra the peptidoform is identified in where site-determining ions were missing to confidently validate all peptidoforms.


### Additional output files 

#### PSM file ("psm_XXX.csv"):

* spec: Index or identifier of each spectrum.
* rank: Rank of the PSM, with a lower number indicating higher confidence.
* sequence: Amino acid sequence of the peptide/protein.
* brno: Modifications in brno nomenclature.
* proforma: Peptidoform in Proforma Nomenclature.
* score: Match score of the peptide spectrum match (from the identification file provided).
* validated: Boolean value indicating whether the PSM has been validated.
* frag_cov: Proportion of the theoretical fragments observed.

#### Log file ("log_XXX.csv"):

General information about the number of PSMs and peptidoforms validated/unvalidated at each step of the processing. 

#### Obj file ("obj_XXX.pkl"):

Python's pickled ms_run class for visualization (WIP) 


## Contributing

To update

## License

To update
