Metadata-Version: 2.1
Name: comptox_analysis
Version: 1.0
Summary: ML analysis
Home-page: https://github.com/PARODBE/comptox_analysis
Download-URL: https://github.com/PARODBE/comptox_analysis
Author: Pablo Rodríguez Belenguer
Author-email: parodbe@gmail.com
License: GNU
Description-Content-Type: text/markdown
License-File: LICENSE

[![Python](https://img.shields.io/pypi/pyversions/torchquad)](https://img.shields.io/pypi/pyversions/torchquad)
[![License](https://img.shields.io/badge/license-GPLv3-blue)](https://img.shields.io/badge/license-GPLv3-blue)

### 
In January 2024, I defended my doctoral thesis in computational toxicology, and since then, I have been contemplating how to contribute to the community with code that can simplify calculations needed according to the specific problem. Over these years, among many other things, I have worked on topics related to interpretability, visualization of chemical space, and combining the outputs of different models. In all cases, the problem was common: I often needed a lot of code for tasks that were part of my daily routine. That's why I decided to create ``comptox_analysis``.


# Installation

## Dependencies

comptox_analysis requires:

- rdkit=2023.09.6
- bokeh=3.1.1
- scikit-learn
- matplotlib
- numpy
- pandas

_ _ _

## User installation

The easiest way to install comptox_analysis is using ```pip```:

```bash
pip install comptox_analysis
```

_ _ _

# Example of Usage for Interpretability using highlighter

Imagine you obtain the feature importance, which in this case are fingerprints, using a ML or DL model, and you want to highlight the most important fingerprints for each molecule. To carry out this task, we first load the necessary libraries and the dataset, build a Random Forest model as an example, and then extract the feature importance. 

```python
# Import library
from comptox_analysis.highlighting.highlighting_atoms import highlighter
from rdkit import Chem
from rdkit.Chem import PandasTools
from rdkit.Chem.AllChem import GetMorganFingerprintAsBitVect
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor

df=PandasTools.LoadSDF('caco2.sdf',smilesName='Smiles')

def fingerprints_inputs2(dataframe):
        X=np.array([GetMorganFingerprintAsBitVect(mol,radius=2,nBits=2048,useFeatures=True) for mol in [Chem.MolFromSmiles(m) for m in list(dataframe.Smiles)]])
        y=dataframe.activity.astype('float')
        return X,y

X,y=fingerprints_inputs2(df)

model=RandomForestRegressor(random_state=46).fit(X,y)

columns=[f'fp_{i}' for i in range(2048)]

imp=pd.Series(data=model.feature_importances_,index=columns).sort_values(ascending=False)
imp[:10].plot.bar();

```

<p align="center">
  <img src="https://raw.githubusercontent.com/phi-grib/comptox_analysis/main/images/importances.PNG" alt="Cover Page">
</p>

We are going to select compounds with two of the ten most important variables (in this case, we obtain a list with three compounds). Now, we render the image of the first compound using ```render_image``` function:

```python

df_fp=pd.DataFrame(X,columns=columns)

indexes=df_fp[(df_fp.fp_576==1)&(df_fp.fp_779==1)].index.values

mols=df.loc[indexes,'Smiles'].tolist()

highlighter_instance=highlighter(mols)

highlighter_instance.render_image(number=0,indexes=False)

```

<p align="center">
  <img src="https://raw.githubusercontent.com/phi-grib/comptox_analysis/main/images/molecule_rend..PNG" alt="Cover Page">
</p>


Also, you can visualize molecular fragments using ```fragmentation``` function:

```python

highlighter_instance.fragmentation(n=26,number=0)[0]

```

<p align="center">
  <img src="https://raw.githubusercontent.com/phi-grib/comptox_analysis/main/images/frag..PNG" alt="Cover Page">
</p>

And now, using ```highlighting``` function, you will be able to observe the fragments that you want to highlight (the package supports both, rdkit and morgan fingerprints):

```python

highlighter_instance = highlighter([mols[0]])

highlighter_instance.highlighting(type='morgan',fingerprint_numbers=[[1300]])

```

<p align="center">
  <img src="https://raw.githubusercontent.com/phi-grib/comptox_analysis/main/images/one_compound_highl..PNG" alt="Cover Page">
</p>

Also, it is possible to highlight multiple compounds (in this case we are highlighting the selected fingerprints as an example):

```python

#for multiple compounds

highlighter_instance = highlighter(mols)

highlighter_instance.highlighting(fingerprint_numbers=[[576,779],[576,779],[576,779]])


```

<p align="center">
  <img src="https://raw.githubusercontent.com/phi-grib/comptox_analysis/main/images/multiple_highl..PNG" alt="Cover Page">
</p>


_ _ _

# Example of Usage for chemical space visualization

Another option which is included in this library is visualize the chemical space. It is supported the use of three kind of descriptor:

- molecular descriptor: Mollogp, molweight... (207 variables)
- Morgan fingerprints
- Rdkit fingerprints

If you select molecular descriptor, a Bokeh plot will display the results of a PCA. For any type of fingerprint, a parametric t-SNE is constructed, where the t-SNE output is trained using an MLPRegressor. With just one line of code, you can generate an interactive HTML visualization of the chemical space, including molecular structures that appear when hovering over each scatter point. 

Both training and test sets can be visualized (if applicable), and the points on the graph can be color-coded according to the class each compound belongs to, supporting multi-class classification problems. Additionally, you can deselect compounds from specific classes by clicking on the corresponding legend in the plot.

```python

plotter = chemical_space_plotter('cox2_train.sdf', 'cox2_test.sdf')
plotter.visualizer(type='molecular_descriptor', test=True, Save=True)
```
