Metadata-Version: 2.1
Name: SMPrecursorPrediction
Version: 0.0.2
Summary: SMPrecursorPrediction
Author: Joao Capela, João Cheixo
License: MIT
Keywords: plants,secondary metabolism
Platform: unix
Platform: linux
Platform: osx
Platform: cygwin
Platform: win32
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: biopython==1.81
Requires-Dist: lxml==4.9.2
Requires-Dist: BeautifulSoup4==4.12.2
Requires-Dist: html5lib==1.1
Requires-Dist: luigi==3.2.1
Requires-Dist: pandas==2.0.1
Requires-Dist: networkx==3.1
Requires-Dist: deepmol[machine-learning]==1.1.2
Requires-Dist: retry
Requires-Dist: IPython
Provides-Extra: testing
Requires-Dist: pytest>=7.1.1; extra == "testing"
Requires-Dist: pytest-cov>=3.0.0; extra == "testing"
Requires-Dist: mypy>=0.942; extra == "testing"
Requires-Dist: flake8>=4.0.1; extra == "testing"
Requires-Dist: tox>=3.25.0; extra == "testing"

# SMPrecursorPredictor
A ML pipeline for the prediction of specialised metabolites starting substances.

## Installation

### Manually

1. Clone the repository and move into the directory:

```bash
git clone
cd SMPrecursorPredictor
```

2. Create a conda environment and activate it:

```bash
conda create -n sm_precursor_predictor python=3.10
conda activate sm_precursor_predictor
```

3. Install the dependencies:

```bash
pip install -r requirements.txt
```

4. Install the package:

```bash
pip install .
```

### Pypi

1. Create a conda environment and activate it:

```bash
conda create -n sm_precursor_predictor python=3.10
conda activate sm_precursor_predictor
pip install SMPrecursorPrediction
```

## Making predictions

Models available: 

- Layered FP + Low Variance FS + Ridge Classifier
- Morgan FP + Ridge Classifier

```python
from sm_precursor_predictor import predict_precursors
precursors = predict_precursors(
            ["[H][C@]89CN(CCc1c([nH]c2ccccc12)[C@@](C(=O)OC)(c3cc4c(cc3OC)N(C)[C@@]5([H])[C@@]"
             "(O)(C(=O)OC)[C@H](OC(C)=O)[C@]7(CC)C=CCN6CC[C@]45[C@@]67[H])C8)C[C@](O)(CC)C9",
             "COC1=C(C=CC(=C1)C2=C(C(=O)C3=C(C=C(C=C3O2)O)O)O[C@H]4[C@@H]([C@H]([C@H]([C@H](O4)CO)O)O)O)O"],
             model="Layered FP + Low Variance FS + Ridge Classifier")
print(precursors)
```

or

read a csv file with a column of SMILES and a column of IDs and save the predictions in a csv file:

```python
from sm_precursor_predictor import predict_from_csv
predictions = predict_from_csv("path_to_csv", 
                               smiles_field="SMILES", 
                               ids_field="ID",
                               model="Layered FP + Low Variance FS + Ridge Classifier")
predictions.to_csv("path_to_save_predictions.csv")
```

## Making and explaining predictions

This is only possible with one model: *Morgan FP + Ridge Classifier*.

Example with linalool:

```python
from sm_precursor_predictor import get_prediction_and_explanation

prediction, images, plots = get_prediction_and_explanation(smiles="CC(=CCCC(C)(C=C)O)C", threshold=0.20)
```
![feature_importance](feature_importance.png)

```
prediction
```


```
['Geranyl diphosphate']
```

```
images[0]
```
![Linalool](molecule_Geranyl_diphosphate.png)
