Metadata-Version: 2.1
Name: ml_tooling
Version: 0.1.2
Summary: UNKNOWN
Home-page: https://github.com/andersbogsnes/ml_utils
Author: Anders Bogsnes
Author-email: abanbn@almbrand.dk
License: MIT
Description: # Model Utility library for Alm Brand
        [![Build Status](https://travis-ci.org/andersbogsnes/ml_utils.svg?branch=master)](https://travis-ci.org/andersbogsnes/ml_utils)
        [![Coverage Status](https://coveralls.io/repos/github/andersbogsnes/ml_utils/badge.svg?branch=master)](https://coveralls.io/github/andersbogsnes/ml_utils?branch=master)
        
        # Installation
        Use pip to install
        `pip install git+https://git@github.com/andersbogsnes/ml_utils.git`
        
        # Contents
        * Transformers
            - A library of transformers for use with Scikit-learn pipelines
        * Model base classes
            - Production baseclasses for subclassing - guarantees interface for use in API
                
        ## BaseClassModel
        A base Class for defining your model. 
        Your subclass must define two methods:
         
        - `get_prediction_data()`
         
            Function that, given an input, fetches corresponding features. Used for predicting an unseen observation
         
        - `get_training_data()`
            
            Function that retrieves all training data. Used for training and evaluating the model
        
        
        ### Example usage
        Define a class using BaseClassModel and implement the two required methods.
        Here we simply implement a linear regression on the Boston dataset using sklearn.datasets
        ```python
        from ml_utils import BaseClassModel
        from sklearn.datasets import load_boston
        from sklearn.linear_model import LinearRegression, Ridge, LassoLars
        
        # Define a new class
        
        class BostonModel(BaseClassModel):
            def get_prediction_data(self, idx):
                x, _ = load_boston(return_X_y=True)
                return x[idx] # Return given observation
                
            def get_training_data(self):
                return load_boston(return_X_y=True)
            
        # Use our new class to implement a given model - any sklearn compatible estimator
        linear_boston = BostonModel(LinearRegression())
        
        results = linear_boston.score_model()
        
        # Visualize results
        results.plot.residuals()
        results.plot.prediction_error()
        
        # Save our model
        linear_boston.save_model('.')
        
        # Recreate model
        BostonModel.load_model('./LinearRegression')
        
        # Train Different models and get the best performing
        models_to_try = [LinearRegression(), Ridge(), LassoLars()]
        
        # best_model will be BostonModel instantiated with the highest scoring model. all_results is a list of all results 
        best_model, alL_results = BostonModel.test_models(models_to_try, metric='neg_mean_squared_error')
        print(alL_results)
        
        ```
        
        The BaseClass implements a number of useful methods
        
        #### `save_model()`
        Saves the model as a binary file
           
        #### `load_model()` 
        Instantiates the class with a joblib pickled model
           
        #### `score_model()`
        Loads all training data and trains the model on it, using a train_test split.
        Returns a Result object containing all result parameters
        
        #### `train_model()`
        Loads all training data and trains the model on all data. 
        Typically used as the last step when model tuning is complete
        
        #### `set_config({'CONFIG_KEY': 'VALUE'})`
        Set configuration options - existing configuration options can be seen using the `.config` property
           
        #### `make_prediction(*args)`
        Makes a prediction given an input. For example a customer number. 
        Passed to the implemented `get_prediction_data()` method and calls `.predict()` on the estimator
           
        
        #### `test_models([model1, model2], metric='accuracy')`
        Runs `score_model()` on each model, saving the result.
        Returns the best model as well as a list of all results
        
        ## Visualizing results
        When a model is trained, it returns a Result object. 
        That object has number of visualization options depending on the type of model:
           
        ### Classifiers
           
        - `roc_curve()`
        - `confusion_matrix()`
        - `feature_importance()`
        - `lift_curve()`
           
        ### Regressors
           
        - `prediction_error()`
        - `residuals()`
        - `feature_importance()`
        
        # Transformers
        The library also provides a number of transformers for working with DataFrames in a pipeline
        ### Select
        A column selector - Provide a list of columns to be passed on in the pipeline
        #### Example
        ```python
        from ml_utils.transformers import Select
        import pandas as pd
        
        df = pd.DataFrame({
            "id": [1, 2, 3, 4],
            "status": ["OK", "Error", "OK", "Error"],
            "sales": [2000, 3000, 4000, 5000] 
        
        })
        
        select = Select(['id', 'status'])
        select.fit_transform(df)
        ```
        
        ### FillNA
        Fills NA values with instantiated value - passed to df.fillna()
        #### Example
        ```python
        from ml_utils.transformers import FillNA
        import pandas as pd
        import numpy as np
        
        df = pd.DataFrame({
            "id": [1, 2, 3, 4],
            "status": ["OK", "Error", "OK", "Error"],
            "sales": [2000, 3000, 4000, np.nan] 
        
        })
        
        fill_na = FillNA(0)
        fill_na.fit_transform(df)
        ```
        
        ### ToCategorical
        Performs one-hot encoding of categorical values through pd.Categorical. 
        All categorical values not found in training data will be set to 0 
        
        #### Example
        ```python
        from ml_utils.transformers import ToCategorical
        import pandas as pd
        
        df = pd.DataFrame({
            "status": ["OK", "Error", "OK", "Error"] 
        
        })
        
        onehot = ToCategorical()
        onehot.fit_transform(df)
        ```
        
        ### FuncTransformer
        Applies a given function to each column
        
        #### Example
        ```python
        from ml_utils.transformers import FuncTransformer
        import pandas as pd
        
        df = pd.DataFrame({
            "status": ["OK", "Error", "OK", "Error"]
        })
        
        uppercase = FuncTransformer(lambda x: x.str.upper)
        uppercase.fit_transform(df)
        ```
        
        ### Binner
        Bins numerical data into supplied bins
        
        #### Example
        ```python
        from ml_utils.transformers import Binner
        import pandas as pd
        
        df = pd.DataFrame({
            "sales": [1500, 2000, 2250, 7830]
        })
        
        binned = Binner(bins=[0, 1000, 2000, 8000])
        binned.fit_transform(df)
        ```
        
        ### Renamer
        Renames columns to be equal to the passed list - must be in order
        
        #### Example
        ```python
        from ml_utils.transformers import Renamer
        import pandas as pd
        
        df = pd.DataFrame({
            "Total Sales": [1500, 2000, 2250, 7830]
        })
        
        rename = Renamer(['sales'])
        rename.fit_transform(df)
        ```
        
        
        ### DateEncoder
        Adds year, month, day, week columns based on a datefield. Each date type can be toggled in the initializer
        
        ```python
        from ml_utils.transformers import DateEncoder
        import pandas as pd
        
        df = pd.DataFrame({
            "sales_date": [pd.to_datetime('2018-01-01'), pd.to_datetime('2018-02-02')]
        })
        
        dates = DateEncoder(week=False)
        dates.fit_transform(df)
        ```
        
        ### FreqFeature
        Converts a column into a normalized frequencies
        
        ```python
        from ml_utils.transformers import FreqFeature
        import pandas as pd
        
        df = pd.DataFrame({
            "sales_category": ['Sale', 'Sale', 'Not Sale']
        })
        
        freq = FreqFeature()
        freq.fit_transform(df)
        ```
        
        ### DFFeatureUnion
        A FeatureUnion equivalent for DataFrames. Concatenates the result of multiple transformers
        
        ```python
        from ml_utils.transformers import FreqFeature, Binner, Select, DFFeatureUnion
        from sklearn.pipeline import make_pipeline
        import pandas as pd
        
        df = pd.DataFrame({
            "sales_category": ['Sale', 'Sale', 'Not Sale'],
            "sales": [1500, 2000, 2250, 7830]
        })
        
        freq = make_pipeline(Select('sales_category') ,FreqFeature())
        binned = make_pipeline(Select('sales'), Binner(bins=[0, 1000, 2000, 8000]))
        
        union = DFFeatureUnion([freq, binned])
        union.fit_transform(df)
        ```
Keywords: ml framework tooling
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6.0
Description-Content-Type: text/markdown
