Metadata-Version: 2.1
Name: infoalign
Version: 0.1.1
Summary: A package for the paper: learning molecular representation in a cell
Author-email: Gang Liu <gliu7@nd.edu>
License: MIT
Project-URL: Homepage, https://github.com/liugangcode/InfoAlign
Project-URL: Bug Tracker, https://github.com/liugangcode/InfoAlign/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=2.2.0
Requires-Dist: torch-geometric>=2.6.1
Requires-Dist: numpy
Requires-Dist: pandas>=2.2.3
Requires-Dist: click
Requires-Dist: huggingface_hub>=0.22.2
Requires-Dist: joblib>=1.3.2
Requires-Dist: networkx>=3.2.1
Requires-Dist: ogb>=1.3.6
Requires-Dist: PyYAML>=6.0.2
Requires-Dist: rdkit>=2023.9.5
Requires-Dist: scikit_learn>=1.4.1.post1
Requires-Dist: scipy>=1.14.1
Requires-Dist: tqdm>=4.66.2
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: isort; extra == "dev"
Requires-Dist: flake8; extra == "dev"

# The Package for InfoAlign: Learning Molecular Representation in a Cell

**InfoAlign** is a package for learning molecular representations from bottleneck information, derived from molecular structures, cell morphology, and gene expressions. For more detailed information, please refer to our [paper](https://arxiv.org/abs/2406.12056v3).

This package uses a pretrained model based on the method described in the [paper](https://arxiv.org/abs/2406.12056v3). It takes molecules as input (e.g., a single SMILES string or a list of SMILES strings) and outputs their learned representations. These molecular representations can be applied to various downstream tasks, such as molecular property prediction.

For related projects by the main ML researcher and developer, visit: [https://github.com/liugangcode/InfoAlign](https://github.com/liugangcode/InfoAlign).

## Installation

Install the package via pip:

```
pip install infoalign
```

## Usage

### Command Line Interface (CLI)
```
infoalign_pred --input {path_to_input_smiles.csv} 
               --output {path_to_output.npy} 
               --output-to-input-column  # This adds the representation to the input CSV as an additional column
```

### Python API
```
from infoalign.representer import InfoAlignRepresenter

model = InfoAlignRepresenter(model_path='infoalign_model/pretrain.pt')

# For a single SMILES string
one_rep = model.predict('CCC')

# For a list of SMILES strings
two_reps = model.predict(['CCC', 'CCC'])
```

## Citation

If you find this repository helpful, please cite our paper:

```
@article{liu2024learning,
  title={Learning Molecular Representation in a Cell},
  author={Liu, Gang and Seal, Srijit and Arevalo, John and Liang, Zhenwen and Carpenter, Anne E and Jiang, Meng and Singh, Shantanu},
  journal={arXiv preprint arXiv:2406.12056},
  year={2024}
}
```

## Acknowledgement

This project template was adapted from: [https://github.com/lwaekfjlk/python-project-template](https://github.com/lwaekfjlk/python-project-template). We thank the authors for their open-source contribution.
