Metadata-Version: 2.1
Name: data-complexity
Version: 0.1.3
Summary: Data Complexity Measures
Home-page: https://github.com/ngocson2vn/data-complexity
Author: Son Nguyen
Author-email: ngocson2vn@gmail.com
License: MIT
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: numpy
Requires-Dist: gower

# data-complexity
The Data Complexity Measures in Python


## Install
```bash
$ pip install data-complexity
```


## How it works
### Maximum Fisher's Discriminant Ratio (F1)
```python
from dcm import dcm
from sklearn import datasets

iris = datasets.load_iris()
X = iris.data
y = iris.target

index, F1 = dcm.F1(X, y)
```

### Fraction of Borderline Points (N1)
```python
from dcm import dcm
from sklearn import datasets

bc = datasets.load_breast_cancer(as_frame=True)
X = bc.data.values
y = bc.target.values

N1 = dcm.N1(X, y)
```

### Entropy of Class Proportions (C1) and Imbalance Ratio (C2)
```python
from dcm import dcm
from sklearn import datasets

bc = datasets.load_breast_cancer(as_frame=True)
X = bc.data.values
y = bc.target.values

C1, C2 = dcm.C12(X, y)
```

### Other Measures
Coming soon...


## References
[1] How Complex is your classification problem? A survey on measuring classification complexity, https://arxiv.org/abs/1808.03591

[2] The Extended Complexity Library (ECoL), https://github.com/lpfgarcia/ECoL


