Metadata-Version: 2.1
Name: cefeste
Version: 1.2.4
Summary: Feature Selection and Elimination
Home-page: https://dev.azure.com/credem-data/DAT/_git/ce-feste
Author: DAT Team
Project-URL: Homepage, https://dev.azure.com/credem-data/DAT/_git/ce-feste
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9, <4
Description-Content-Type: text/markdown
License-File: LICENCE
Requires-Dist: typed-ast >=1.5.4
Requires-Dist: numpy ==1.22.4
Requires-Dist: pandas >=1.4.2
Requires-Dist: scikit-learn >=1.1.1
Requires-Dist: scipy >=1.8.1
Requires-Dist: statsmodels >=0.13.2
Requires-Dist: PyYAML >=6.0
Requires-Dist: shap ==0.41.0
Requires-Dist: ipython

# **C**redito **E**miliano -  **Fe**ature **S**election, **T**ransformation and **E**limination (CE - FeSTE)

This repo contains the 'FeSTE' python package which helps in the features management from the pre-filtering to the pre-processing and feature elimination.

# Installation

To install it:

1) **Optional**: create a new Python virtual environment (through bash terminal run: "py -m venv your_env_name" and then "source your_env_name/Scripts/activate )
2) Install the package:
    - User Mode: 
```pip install cefeste```


# Structure

The .py package is stored in src and contains 3 sub-modules:
- **selection**: contains the feature preliminary selection functions
- **transform**: contains the feature pre-processing functions
- **elimination**: contains the feature elimination functions

# Filters

## Selection

The main class of this module is FeatureSelection. It applies several filters that can be grouped in the following:

- Univariate filters:
    - No constant features
    - Number of distinct value too low
    - Number of missing values too high
    - Too concentrate in the most frequent value
    - Unstable between sets
- Multivariate filters:
    - Spearman Correlation for numerical features
    - Cramer's V for categorical features
    - R2 for mixed features
    - VIF
- Explanatory filters:
    - Feature AUROC for classification 
    - Feature Correlation with target for regression
    
## Trasformation

It is more a technical module which contains 3 classes useful for generating the production pipeline:

- ColumnExtractor: to extract columns from a pd.DataFrame
- ColumnRenamer: to rename columns and to transform a np.ndarray to a pd.DataFrame
- Categorizer: to trasform the dtype of pd.DataFrame columns from 'object' to 'category'

## Elimination

The main class of this module is FeatureElimination which is useful for selecting the most useful feature to keep in the model and optimize the hyperparams in the meanwhile.

It is a recursive method that at each iteration can:

- Perform the hyperparameters optimization using user-defined model, grid, gridsearch method, evaluation measure
- Calculate the feature shap importance value
- Identify the last importance feature(/s) and Delete them for the next iteration

