Metadata-Version: 2.1
Name: xklearn
Version: 0.0.1
Summary: Handy machine learning tools in the spirit of scikit-learn.
Home-page: https://github.com/simon-larsson/extrakit-learn
Author: Simon Larsson
Author-email: simonlarsson0@gmail.com
License: MIT
Description: # extrakit-learn
        
        [![PyPI version](https://badge.fury.io/py/xklearn.svg)](https://pypi.python.org/pypi/xklearn/) 
        [![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/simon-larsson/extrakit-learn/blob/master/LICENSE)
        
        Machine learnings components built to extend scikit-learn. All components use scikit's [object API](https://scikit-learn.org/stable/developers/contributing.html#apis-of-scikit-learn-objects) to work interchangably with scikit components. It is mostly a collection of tools that have been useful for [Kaggle](https://www.kaggle.com) competitions. extrakit-learn is in no way affiliated with scikit-learn in anyway, just inspired by it.
        
        ## Installation
        
            pip install xklearn
        
        ## Components
        - **TargetEncoder** - Categorical feature engineering based on target means.
        - **CountEncoder** - Categorical feature engineering based on value counts.
        - **FoldEstimator** - K-fold cross validation meta estimator.
        - **FoldLGBM** - K-fold cross validation meta LGBM.
        - **StackingClassifier** - Stack an ensemble of classifiers with a meta classifier.
        - **StackingRegressor** - Stack an ensemble of regressors with a meta regressor.
        
        ### Hierachy
            xklearn
            â”œâ”€â”€ preprocessing
            â”‚   â”œâ”€â”€ CountEncoder      
            â”‚   â””â”€â”€ TargetEncoder    
            â””â”€â”€ models
                â”œâ”€â”€ FoldEstimator
                â”œâ”€â”€ FoldLGBM
                â”œâ”€â”€ StackingClassifier
                â””â”€â”€ StackingRegressor
        
        ##### Example
        
            from xklearn.models import FoldEstimator
        
        ### TargetEncoder
        Performs target mean encoding of categorical features with optional smoothing.
        
        #### Arguments
        `smoothing` - Smoothing weight.
        
        `unseen` - Strategy for handling unseen values. Se replacement strategies below for options.
        
        `missing` - Strategy for handling missing values. Se replacement strategies below for options.
        
        ##### Replacement strategies
        
        `'one'` - Replace value with 1.
        
        `'nan'` - Replace value with np.nan.
        
        `'error'` - Raise ValueError.
        
        #### Example:
        
        ```python
        te = TargetEncoder(smoothing=10)
        X[0] = te.fit_transform(X[0], y)
        ```
        
        ### CountEncoder
        Replaces categorical values with their respective value count during training. Classes with a count of one and previously unseen classes during prediction are encoded as either one or nan.
        
        #### Arguments
        `unseen` - Strategy for handling unseen values. Se replacement strategies below for options.
        
        `missing` - Strategy for handling missing values. Se replacement strategies below for options.
        
        ##### Replacement strategies
        
        `'one'` - Replace value with 1.
        
        `'nan'` - Replace value with np.nan.
        
        `'error'` - Raise ValueError.
        
        #### Example:
        ```python
        ce = TargetEncoder(one_to_nan=True)
        X[0] = ce.fit_transform(X[0], y)
        ```
        
        ### FoldEstimator
        Meta estimator that performs cross validation over k folds. Can optionally be used as a stacked ensemble of k estimators.
        
        #### Arguments
        `est` - Base estimator.
        
        `fold` - Folding cross validation object, i.e KFold and StratifedKfold.
        
        `metric` - Evaluation metric.
        
        `ensemble` - Flag indicting post fit behaviour. True will make it a stacked ensemble, False will do a full refit on the full data.
        
        `verbose` - Flag for printing intermediate scores during fit.
        
        #### Example:
        ```python
        base = RandomForestRegressor(n_estimators=10)
        fold = KFold(n_splits=5)
        
        est = FoldEstimator(base, fold=fold, metric=mean_squared_error, verbose=1)
        
        est.fit(X_train, y_train)
        est.predict(X_test)
        ```
        
        ### FoldLGBM
        Meta estimator that performs cross validation over k folds on a LightGBM estimator. Can optionally be used as a ensemble of k estimators.
        
        #### Arguments
        `lgbm` - Base estimator.
        
        `fold` - Folding cross validation object, i.e KFold and StratifedKfold.
        
        `metric` - Evaluation metric.
        
        `fit_params` - Dictionary of parameter that should be fed to the fit method.
        
        `ensemble` - Flag indicting post fit behaviour. True will make it a stacked ensemble, False will do a full refit on the full data.
        
        `refit_params` - Dictionary of parameter that should be fed to the refit if `ensemble=False`.
        
        `verbose` - Flag for printing intermediate scores during fit.
        
        #### Example:
        ```python
        base = LGBMClassifier(n_estimators=1000)
        fold = KFold(n_splits=5)
        fit_params = {'eval_metric': 'auc',
                      'early_stopping_rounds': 50,
                      'verbose': 0}
                      
        fold_lgbm = FoldLGBM(base, 
                             fold=fold, 
                             metric=roc_auc_score,
                             fit_params=fit_params,
                             verbose=1)
                       
        fold_lgbm.fit(X_train, y_train)
        fold_lgbm.predict(X_test)
        ```
        
        ### StackingClassifier
        Ensemble classifier that stacks an ensemble of classifiers by using their outputs as input features.
        
        #### Arguments
        `clfs` - List of ensemble of classifiers.
        
        `meta_clf` - Meta classifier that stacks the predictions of the ensemble.
        
        `keep_features` - Flag to train the meta classifier on the original features too.
        
        `refit` - Flag to retrain the ensemble of classifiers.
        
        #### Example:
        ```python
        meta_clf = RidgeClassifier()
        ensemble = [RandomForestClassifier(), KNeighborsClassifier(), SVC()]
        
        stack_clf = StackingClassifier(clfs=ensemble, meta_clf=meta_clf, refit=True)
        
        stack_clf.fit(X_train, y_train)
        y_ = stack_clf.predict(X_test)
        ```
        
        ### StackingRegressor
        Ensemble regressor that stacks an ensemble of regressors by using their outputs as input features.
        
        #### Arguments
        `regs` - List of ensemble of regressors.
        
        `meta_reg` - Meta regressor that stacks the predictions of the ensemble.
        
        `keep_features` - Flag to train the meta regressor on the original features too.
        
        `refit` - Flag to retrain the ensemble of regressors.
        
        #### Example:
        ```python
        meta_reg = RidgeRegressor()
        ensemble = [RandomForestRegressor(), KNeighborsRegressor(), SVR()]
        
        stack_reg = StackingRegressor(regs=ensemble, meta_reg=meta_reg, refit=True)
        
        stack_reg.fit(X_train, y_train)
        y_ = stack_reg.predict(X_test)
        ```
        
Platform: UNKNOWN
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Topic :: Scientific/Engineering
Description-Content-Type: text/markdown
