Metadata-Version: 2.1
Name: data-drift-detector
Version: 0.0.4
Summary: Compare differences between 2 datasets to identify data drift
Home-page: https://github.com/kelvnt/data-drift-detector
Author: Kelvin Tay
Author-email: btkelvin@gmail.com
License: GPLv3
Platform: UNKNOWN
Requires-Python: >=3.6.0
Description-Content-Type: text/markdown
Requires-Dist: category-encoders (>=2.2.2)
Requires-Dist: matplotlib (>=3.4.3)
Requires-Dist: numpy (>=1.19.0)
Requires-Dist: pandas (==1.0.0)
Requires-Dist: scikit-learn (>=0.24.1)
Requires-Dist: scipy (>=1.5.4)
Requires-Dist: seaborn (>=0.11.2)

# Data Drift Detector
[![PyPI version](https://badge.fury.io/py/data-drift-detector.svg)](https://badge.fury.io/py/data-drift-detector)

This package contains some developmental tools to detect and compare statistical differences between 2 structurally similar pandas dataframes. The intended purpose is to detect data drift - where the statistical properties of an input variable change over time.

We provide a class `DataDriftDetector` which takes in 2 pandas dataframes and provides a few useful methods to compare and analyze the differences between the 2 datasets.

## Installation
Install the package with pip

    pip install data-drift-detector

## Example Usage

To compare 2 datasets:

    from data_drift_detector import DataDriftDetector

    # initialize detector
    detector = DataDriftDetector(df_prior = df_1, df_post = df_2)

    # methods to compare and analyze differences
    detector.calculate_drift()
    detector.plot_numeric_to_numeric()
    detector.plot_categorical_to_numeric()
    detector.plot_categorical()
    detector.compare_ml_efficacy(target_column="some_target_column")

You may also view an example notebook in the following directory `examples/example_usage.ipynb` to explore how it may be used.


