Metadata-Version: 2.1
Name: hyanova
Version: 1.0.9
Summary: A pure python implementation of fuctional ANOVA algorithm.
Home-page: https://github.com/exiarepairii/hyanova
Author: Su Qiao
Author-email: qiaosu98@outlook.com
License: MIT
Description: HyANOVA
        =======
        
        HyANOVA is a pure python implementation of fuctional ANOVA algorithm,
        which can be used to analyze the importance of hyperparameters in
        machine learning algorithm.
        
        Quick Start
        ~~~~~~~~~~~
        
        To install the package, please use the ``pip`` installation as follows:
        
        .. code:: shell
        
           pip install hyanova
        
        Here is a short example of usage. You can download the
        `data <./examples/iris%5BGridSearchCV%5DModel1.csv>`__ from the example
        folder.
        
        .. code:: python
        
           import hyanova
        
           path = './iris[GridSearchCV]Model1.csv'         # gridsearch results generated by sklearn
           metric = 'mean_test_score'              # metric for model performance
           df,params = hyanova.read_csv(path,metric)
           # df,params = hyanova.read_df(df,metric)         You can also load data from pd.DataFrame
           importance = hyanova.analyze(df)
        
        The ``metric`` is the feature you choose to evaluate the model
        performance, it must appears in the ``.csv`` file or the
        ``pandas.DataFrame`` object’s column. And the result you got will be
        similar to this below, see the next section(ANOVA) for more details.
        
        .. code:: python
        
           print(importance)
           >>>              u       v_u  F_u(v_u/v_all)
           0           (alpha,)  0.056885        0.892057
           1        (l1_ratio,)  0.002489        0.039030
           2  (alpha, l1_ratio)  0.004394        0.068912
        
        APIs
        ~~~~
        
        Load Data
        '''''''''
        
        HyANOVA is designed to analyze the grid search results generated by
        sklearn. It provides two ways to load the data.
        
        -  You can use ``read_df(df,metric)`` to load data from a
           ``<class 'pandas.core.frame.DataFrame'>`` object. It will return two
           objects.
        
           -  a ``DataFrame`` with all hyperparameters’ value and the value of
              metric you choose
           -  a ``list`` of all hyperparameters’ name
        
           Here is an example.
        
           .. code:: python
        
              print(df.head)
        
           .. code:: shell
        
              >>> mean_fit_time  std_fit_time  mean_score_time  std_score_time  param_alpha  \
              0       0.003899      0.000194         0.048513        0.007621     0.000977   
              1       0.003401      0.000584         0.042454        0.011295     0.000977   
              2       0.002706      0.000502         0.048544        0.009059     0.000977   
              3       0.003304      0.000531         0.040709        0.003031     0.000977   
              4       0.001801      0.000116         0.000289        0.000014     0.000977   
        
                 param_l1_ratio                                     params  \
              0            0.00   {'alpha': 0.0009765625, 'l1_ratio': 0.0}   
              1            0.25  {'alpha': 0.0009765625, 'l1_ratio': 0.25}   
              2            0.50   {'alpha': 0.0009765625, 'l1_ratio': 0.5}   
              3            0.75  {'alpha': 0.0009765625, 'l1_ratio': 0.75}   
              4            1.00   {'alpha': 0.0009765625, 'l1_ratio': 1.0}   
        
                 split0_test_score  split1_test_score  split2_test_score  mean_test_score  \
              0           0.828571           0.971429           0.971429         0.923810   
              1           0.885714           0.971429           0.942857         0.933333   
              2           0.885714           1.000000           0.942857         0.942857   
              3           0.885714           0.914286           0.914286         0.904762   
              4           0.885714           1.000000           0.942857         0.942857   
        
                 std_test_score  rank_test_score  
              0        0.067344                4  
              1        0.035635                3  
              2        0.046657                1  
              3        0.013469                5  
              4        0.046657                1  
        
           .. code:: python
        
              df,params = hyanova.read_df(df,'mean_test_score')
              print(df.head)
              >>>  alpha  l1_ratio  mean_test_score
              0  0.000977      0.00         0.923810
              1  0.000977      0.25         0.933333
              2  0.000977      0.50         0.942857
              3  0.000977      0.75         0.904762
              4  0.000977      1.00         0.942857
              print(params)
              >>> ['alpha', 'l1_ratio']
        
        -  Use ``hyanova.read_csv(path,metric)`` to load data from ``.csv``
           file. The `template
           file <./examples/iris%5BGridSearchCV%5DModel1.csv>`__ can be find at
           the example folder. It is equivalent to
           ``hyanova.read_df(pandas.read_csv(path),metric)``.
        
        ANOVA
        '''''
        
        Use ``hyanova.analyze(df)`` to do the functional ANOVA decomposition. It
        needs a ``pnadas.DataFrame`` object which has a format similar to the
        following table. You can use the methods HyANOVA provides to load data
        easily.
        
        == ======= ======== ===============
        \  alpha   l1_ratio mean_test_score
        == ======= ======== ===============
        0  0.00977 0.00     0.923810
        1  0.00977 0.25     0.933333
        2  0.00977 0.50     0.942857
        3  0.00977 0.75     0.904762
        == ======= ======== ===============
        
        **Note:** The metric(mean_test_score) should always be in the last
        column.
        
        The ``hyanova.analyze(df)`` will return a ``DataFrame`` with
        hyperparameters’ name, variance(v_u) and the importance(F_u).
        
        .. code:: python
        
           importance = hyanova.analyze(df)
           >>> 100%|██████████████████████████████████| 3/3 [00:00<00:00, 11.32it/s]
           print(importance)
           >>>              u       v_u  F_u(v_u/v_all)
           0           (alpha,)  0.056885        0.892057
           1        (l1_ratio,)  0.002489        0.039030
           2  (alpha, l1_ratio)  0.004394        0.068912
        
        **Note:** The F_u is the ratio of the variance caused by the
        hyperparameter itself(v_u) to the variance of all trials(v_all), so all
        F_u sums always equal to 1.See references for more details.
        
        Example usage
        ~~~~~~~~~~~~~
        
        You can use sklearn to do hyperparameters search and then use hyanova to
        analyze the importance of hyperparameters.
        
        .. code:: python
        
           import sklearn.datasets
           from sklearn.model_selection import GridSearchCV
           from sklearn.svm import SVC
           import pandas as pd
           import hyanova
        
           iris = sklearn.datasets.load_iris()
           X = iris.data
           y = iris.target
           model = SVC()
           grid = {'C': np.linspace(1e-9, 128, 10000)
                   'kernel': ('rbf', 'linear', 'poly', 'sigmoid')}
           grid_search = GridSearchCV(model,grid)
           result = grid_search.fit(X, y)
           df = pd.DataFrame(result.cv_results_)
           metric = 'mean_test_score'
           df, params = hyanova.read_df(df,metric)
           importance = hyanova.analyze(df)
        
        Dependencies
        ~~~~~~~~~~~~
        
        -  numpy
        -  pandas
        -  tqdm
        
        Why created HyANOVA?
        ~~~~~~~~~~~~~~~~~~~~
        
        I am completing my undergraduate thesis. In order to better understand
        the models used in my article, I looked for a lot of algorithms that can
        measure the importance of hyperparameters. Among them, functional ANOVA
        seems to be the most effective. But the original author’s implementation
        is based on java and uses python to call java files, which confuses me.
        I hope there is a module that is easier to understand and implemented
        completely based on python, which can help me with ANOVA decomposition,
        so I created HyANOVA. Hope that will help you too!
        
        References
        ~~~~~~~~~~
        
        1. Hutter, F., Hoos, H. & Leyton-Brown, K.. (2014). An Efficient
           Approach for Assessing Hyperparameter Importance. Proceedings of the
           31st International Conference on Machine Learning, in PMLR
           32(1):754-762
        2. https://github.com/frank-hutter/fanova
        
Keywords: anova,sklearn,hyperparameter,hyperparameter importance
Platform: any
Requires-Python: >=3.6
Description-Content-Type: text/x-rst
