Metadata-Version: 2.1
Name: ivers
Version: 0.1.3
Summary: Python package to stratify split datasets based on endpoint distributions, also 2 different temporal splits. Chemprop compatible.
Home-page: http://github.com/iversohlsson/ivers
Author: Philip Ivers Ohlsson
Author-email: philip.iversohlsson@gmail.com
Project-URL: Documentation, http://github.com/iversohlsson/ivers/docs/_build/html/index.html
Project-URL: Source, http://github.com/iversohlsson/ivers
Keywords: chemprop chemistry data science dataset splitting stratification temporal splits ivers
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas
Requires-Dist: scikit-learn

# Ivers


This project offers tools for managing data splits, ensuring endpoint distributions are maintained, and presents two novel temporal split techniques: 'leaky' and 'all for free' splits. See the explanation below. 

**Note**: This library was used in this paper [PlaceHolder](https://github.com/IversOhlsson/ivers) to generate the data splits.

## Features
  - **Temporal Leaky**: Allows for forward-leakage in your data to simulate real-world scenarios where future data might influence the model subtly.
  - **Temporal AllForFree**: Provides a stricter temporal separation, ensuring that the training data is entirely independent of the test set, suitable for rigorous testing of model predictions over time.
  - **Temporal Fold Split**: Implements a novel approach to increasing the training set size successively across multiple folds based on the temporal time sequence
  - **Stratified Endpoint Split**: Our library introduces a stratified endpoint split, crucial for maintaining a consistent distribution of data across different categories or endpoints in your datasets. Especially useful in scenarios where endpoint distributions are critical, such as in cheminformatics and bioinformatics.
  - **Cross-Validation Support**: Integrates capabilities to ensure that each cross-validation split maintains endpoint distribution, ideal for developing models that are generalizable across varied data conditions.

## Integration with Chemprop

- By setting the `chemprop` variable to `true`, the library will generate splits compatible with the Chemprop library. This ensures that the features and train-test splits are generated in a way that can easily be used with Chemprop.

## Getting Started or Contributing

To get started with this library, clone the repository and install the required dependencies:

```bash
git clone https://github.com/IversOhlsson/ivers.git
cd ivers
pip install -r requirements.txt
```

## Installation via pip
You can also install the package via pip:
```bash
pip install ivers
```
We welcome contributions! Feel free to open issues or pull requests on our GitHub repository.

## Guide

## Reference
when using this library, please cite the following paper:
```
@article{Ivers_1,
  title={PlaceHolder},
  author={PlaceHolder},
  journal={PlaceHolder},
  volume={PlaceHolder},
  number={PlaceHolder},
  pages={PlaceHolder},
  year={PlaceHolder},
  publisher={PlaceHolder}
}
```
