Metadata-Version: 2.1
Name: lefshift
Version: 0.5.2
Summary: Predict 19F chemical shift
Author-email: Patrick Penner <patrick@ppenner.com>
Project-URL: Homepage, https://github.com/PatrickPenner/lefshift
Project-URL: Issues, https://github.com/PatrickPenner/lefshift/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE.md
Requires-Dist: numpy>=1.23.5
Requires-Dist: pandas>=1.5.2
Requires-Dist: rdkit>=2022.09.3
Requires-Dist: scikit-learn>=1.2.0
Requires-Dist: xgboost>=1.7.1

# Local Environment of Fluorine (LEF) Shift Prediction Tools

## Getting Started

A PyPi package is available:
```
pip install lefshift
```

You can also clone the repo manually and then install the lefshift package:
```
git clone https://github.com/PatrickPenner/lefshift.git
cd lefshift
pip install .
```
Be aware that the `data/` and `model/` directory used in the following are in
the `lefshift` repo.

Running `lefshift --help` should give you an overview of how to use the lefshift
commandline tool. You can quickly train a model using the Enamine data in `data/`
like this:
```
lefshift train data/train.csv --model models/my_model --id-column 'Catalog ID' --shift-column 'Shift 1 (ppm)' --smiles-column 'SMILES' --verbose
```
Training requires 3 columns: an ID column, a chemical shift column, and a
SMILES column. The above command gives the name of those columns explicitly.
You can also check the `lefshift train --help` output to see the default column
names that lefshift expects.

You can perform a prediction with the model you just trained with the following command:
```
lefshift predict data/test.csv --model models/my_model data/test_predicted.csv --smiles-column 'SMILES' --verbose
```
Prediction only requires a SMILES column. This is once again give explicitly,
but you can also ensure that there is an input column named "SMILES".

To split a data set into those samples in the applicability domain of the model
and those not in the applicability domain run this command:
```
lefshift split data/test.csv --model models/my_model data/test_known.csv data/test_unknown.csv --verbose
```
Splitting also only requires a SMILES column. The SMILES column in the input is
already called "SMILES" so we don't have to be explicit.

For more advanced usage please check the `--help` output of `lefshift` and the
subtools `train`, `predict`, and `split`. There is also a
`QM Assisted ML Tutorial.md` that describes a workflow to combine `lefshift`
and `lefqm`.

## Development

### Formatting pre-commit hook

Pre-commit is only used for consistent formatting. The core of that is the
black formatter and code style.

Install the formatting pre-commit hook with:
```
pre-commit install
```

### Code Quality

All quality criteria have a range of leniency.

| Criteria               | Threshold     |
| -------------          |:-------------:|
| pylint                 | \>9.0         |
| coverage (overall)     | \>90%         |
| coverage (single file) | \>80%         |

### Utility commands

Pre-commit on one file:
```
pre-commit run --files
```

Test command:

```
python -m unittest tests
```

PyLint command:

```
pylint lefshift tests > pylint.out
```

Coverage commands:
```
coverage run -m unittest tests
coverage html
```
