Metadata-Version: 2.1
Name: embedding-tool
Version: 0.1.1
Summary: An embedding toolkit that can perform multiple embedding process which are low-dimensional embedding (dimension reduction), categorical variable embedding, and financial time-series embedding.
Home-page: https://github.com/thisisphume/embedding_tool/tree/master/
Author: Phume Ngampornsukswadi
Author-email: thisisphume@gmail.com
License: Apache Software License 2.0
Description: # Embedding Tool
        > An embedding toolkit that can perform multiple embedding process which are low-dimensional embedding (dimension reduction), categorical variable embedding, and financial time-series embedding.
        
        
        ## Install
        
        `pip install embedding-tool`
        
        ```python
        from embedding_tool.core import *
        ```
        
        ## How to use
        
        ### Dimension Reduction: `dimensionReducer` class
        The function performs dimensionality reduction, pre-processing the data and comparing the reconstruction error via PCA and autoencoder.
        
        **Input data:**
        The input matrix has a size of 863 $\times$ 768.
        
        ```python
        print ("Data's size: ", testing_data.shape)
        ```
        
            Data's size:  (863, 768)
            
        
        **Performing dimension reduction:** we will reduce the number of dimension from 768 to 2. The learning rate of 0.002 will be use for the ADAM optimizer for the autoencoder model fitting.
        
        ```python
        dim_reducer = dimensionReducer(testing_data, 2, 0.002)
        dim_reducer.fit()
        ```
        
        **Calculating the MSE of the reconstructed vectors**
        
        ```python
        dim_reducer.rmse_result
        ```
        
        
        
        
        <div>
        <style scoped>
            .dataframe tbody tr th:only-of-type {
                vertical-align: middle;
            }
        
            .dataframe tbody tr th {
                vertical-align: top;
            }
        
            .dataframe thead th {
                text-align: right;
            }
        </style>
        <table border="1" class="dataframe">
          <thead>
            <tr style="text-align: right;">
              <th></th>
              <th>PCA</th>
              <th>1AE</th>
              <th>2AE</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <th>MSE</th>
              <td>0.740122</td>
              <td>0.741265</td>
              <td>0.65168</td>
            </tr>
          </tbody>
        </table>
        </div>
        
        
        
        ```python
        dim_reducer.rmse_result.T.sort_values('MSE').head(1).values[0][0]
        ```
        
        
        
        
            0.6516801665399286
        
        
        
        Here we can see that the two-layers autoencoder has the best performance with the lowest MSE of 0.64.
        
        **Observing the loss for each epoch:** If we see that the MSE doesn't converge fast enough, we could adjust the learning rate parameter. The default is 0.002. Try increase it to 0.005 if it doesn't converge or decrease to 0.001 if it converges way too fast and oscillating.
        
        ```python
        dim_reducer.plot_autoencoder_performance()
        ```
        
        
        ![png](docs/images/output_14_0.png)
        
        
        
        ![png](docs/images/output_14_1.png)
        
        
        **Result (Reduced Dimension Output):** There are three outputs from three different methods, which are PCA, 1-layer AE, and 2-layers AE.
        
        ```python
        ### Embedding from PCA
        dim_reducer.dfLowDimPCA.head()
        ```
        
        
        
        
        <div>
        <style scoped>
            .dataframe tbody tr th:only-of-type {
                vertical-align: middle;
            }
        
            .dataframe tbody tr th {
                vertical-align: top;
            }
        
            .dataframe thead th {
                text-align: right;
            }
        </style>
        <table border="1" class="dataframe">
          <thead>
            <tr style="text-align: right;">
              <th></th>
              <th>0</th>
              <th>1</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <th>0</th>
              <td>-16.078718</td>
              <td>-6.701481</td>
            </tr>
            <tr>
              <th>1</th>
              <td>-8.858150</td>
              <td>9.354204</td>
            </tr>
            <tr>
              <th>2</th>
              <td>4.305739</td>
              <td>-0.464707</td>
            </tr>
            <tr>
              <th>3</th>
              <td>-11.514311</td>
              <td>-0.687461</td>
            </tr>
            <tr>
              <th>4</th>
              <td>1.212006</td>
              <td>6.537965</td>
            </tr>
          </tbody>
        </table>
        </div>
        
        
        
        ```python
        ### Embedding from 1-layer autoencoder
        dim_reducer.dfLowDim1AE.head()
        ```
        
        
        
        
        <div>
        <style scoped>
            .dataframe tbody tr th:only-of-type {
                vertical-align: middle;
            }
        
            .dataframe tbody tr th {
                vertical-align: top;
            }
        
            .dataframe thead th {
                text-align: right;
            }
        </style>
        <table border="1" class="dataframe">
          <thead>
            <tr style="text-align: right;">
              <th></th>
              <th>0</th>
              <th>1</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <th>0</th>
              <td>-6.178097</td>
              <td>4.734626</td>
            </tr>
            <tr>
              <th>1</th>
              <td>2.075333</td>
              <td>5.529111</td>
            </tr>
            <tr>
              <th>2</th>
              <td>0.953502</td>
              <td>-1.667776</td>
            </tr>
            <tr>
              <th>3</th>
              <td>-2.488155</td>
              <td>4.001960</td>
            </tr>
            <tr>
              <th>4</th>
              <td>3.183654</td>
              <td>0.589496</td>
            </tr>
          </tbody>
        </table>
        </div>
        
        
        
        ```python
        ### Embedding from 2-layers autoencoder
        dim_reducer.dfLowDim2AE.head()
        ```
        
        
        
        
        <div>
        <style scoped>
            .dataframe tbody tr th:only-of-type {
                vertical-align: middle;
            }
        
            .dataframe tbody tr th {
                vertical-align: top;
            }
        
            .dataframe thead th {
                text-align: right;
            }
        </style>
        <table border="1" class="dataframe">
          <thead>
            <tr style="text-align: right;">
              <th></th>
              <th>0</th>
              <th>1</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <th>0</th>
              <td>32.622066</td>
              <td>54.652271</td>
            </tr>
            <tr>
              <th>1</th>
              <td>35.649811</td>
              <td>40.493984</td>
            </tr>
            <tr>
              <th>2</th>
              <td>15.314294</td>
              <td>5.869064</td>
            </tr>
            <tr>
              <th>3</th>
              <td>19.667603</td>
              <td>37.821194</td>
            </tr>
            <tr>
              <th>4</th>
              <td>36.183212</td>
              <td>25.429262</td>
            </tr>
          </tbody>
        </table>
        </div>
        
        
        
        **Plotting the embedding**
        
        ```python
        ### Embedding from 2-layers autoencoder
        plot_output(dim_reducer.dfLowDim2AE)
        ```
        
        
        ![png](docs/images/output_20_0.png)
        
        
        ```python
        ### Embedding from 1-layer autoencoder
        plot_output(dim_reducer.dfLowDim1AE)
        ```
        
        
        ![png](docs/images/output_21_0.png)
        
        
        ***
        
        # Reference: 
        - https://towardsdatascience.com/dimensionality-reduction-pca-versus-autoencoders-338fcaf3297d
        - https://towardsdatascience.com/autoencoders-vs-pca-when-to-use-which-73de063f5d7
        
        ***
        
Keywords: autoencoder PCA embedding
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.6
Description-Content-Type: text/markdown
