Metadata-Version: 2.0
Name: metta-data
Version: 0.2.2
Summary: Store/Read train and test matrices
Home-page: https://github.com/dssg/metta-data
Author: Center for Data Science and Public Policy
Author-email: datascifellows@gmail.com
License: BY DOWNLOADING metta-data PROGRAM YOU AGREE TO THE FOLLOWING TERMS OF USE:
Requires-Dist: PyYAML
Requires-Dist: boto3
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: python-dateutil
Requires-Dist: tables

Copyright ©2017.  The University of Chicago (“Chicago”). All Rights Reserved.  

Permission to use, copy, modify, and distribute this software, including all object code and source code, and any accompanying documentation (together the “Program”) for educational and not-for-profit research purposes, without fee and without a signed licensing agreement, is hereby granted, provided that the above copyright notice, this paragraph and the following three paragraphs appear in all copies, modifications, and distributions. For the avoidance of doubt, educational and not-for-profit research purposes excludes any service or part of selling a service that uses the Program. To obtain a commercial license for the Program, contact the Technology Commercialization and Licensing, Polsky Center for Entrepreneurship and Innovation, University of Chicago, 1452 East 53rd Street, 2nd floor, Chicago, IL 60615.

Created by Data Science and Public Policy, University of Chicago

The Program is copyrighted by Chicago. The Program is supplied "as is", without any accompanying services from Chicago. Chicago does not warrant that the operation of the Program will be uninterrupted or error-free. The end-user understands that the Program was developed for research purposes and is advised not to rely exclusively on the Program for any reason.

IN NO EVENT SHALL CHICAGO BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THE PROGRAM, EVEN IF CHICAGO HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. CHICAGO SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE PROGRAM PROVIDED HEREUNDER IS PROVIDED "AS IS". CHICAGO HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.

Description: # metta-data
        Train Matrix and Test Matrix Storage
        
        [![Build Status](https://travis-ci.org/dssg/metta-data.svg?branch=master)](https://travis-ci.org/dssg/metta-data)
        [![codecov](https://codecov.io/gh/dssg/metta-data/branch/master/graph/badge.svg)](https://codecov.io/gh/dssg/metta-data)
        
        
        ##  Description
        
        Python library for storing and recalling meta data, and DataFrames of training and
        testing sets.
        
        ## Installation
        To get the latest stable version:
        ```
        pip install metta-data
        ```
        
        To get the current master branch:
        ```
        pip install git+git://github.com/dssg/metta-data.git
        ```
        
        
        ## How-to
        
        `metta` expects you to hand it a dictionary for each dataframe with the following keys:
        - `beginning_of_time` (date.datetime): The earliest time that enters your covariate calculations.
        - `end_time` (date.dateime): The last time that enters your covariate calculations.
        - `label_window` (str): The length of the labeling window you are using in this matrix eg: '1y', '6m'
        - `label_name` (str): The outcome variable's column name. This column must be in the last position in your dataframe.
        - `matrix_id` (str): Human readable id for the dataset
        
        ### Storing a train and test pair
        ```
        import metta
        
        
        train_config = {'beginning_of_time': datetime.date(2012, 12, 20),
                        'end_time': datetime.date(2016, 12, 20),
                        'label_window': '3m',
                        'label_name': 'inspection_1yr',
                        'label_type': 'binary',
                        'matrix_id': 'CDPH_2012',
                        'feature_names': ['break_last_3yr', 'soil', 'pressure_zone'],
                        'indices': ['entity_id', 'as_of_date'] }
        
        
        test_config = {'beginning_of_time': datetime.date(2015, 12, 20),
                       'end_time': datetime.date(2016, 12, 21),
                       'label_window': '3m',
                       'label_name': 'inspection_1yr',
                       'label_type': 'binary'
                       'matrix_id': 'CDPH_2015',
                       'feature_names': ['break_last_3yr', 'soil', 'pressure_zone'],
                       'inidces': ['entity_id', 'as_of_date'] }
        
        
        metta.archive_train_test(train_config,
                                 X_train,
                                 test_config,
                                 X_test,
                                 directory='./old_matrices',
                                 format='hd5',
                                 overwrite=False)
        ```
        
        ### Storing a train and multiple test sets
        ```
        import metta
        
        
        train_config = {'beginning_of_time': datetime.date(2012, 12, 20),
                        'end_time': datetime.date(2016, 12, 20),
                        'label_window': '3m',
                        'label_name': 'inspection_1yr',
                        'label_type': 'binary',
                        'matrix_id': 'CDPH_2012',
                        'feature_names': ['break_last_3yr', 'soil', 'pressure_zone'],
                        'indices': ['entity_id', 'as_of_date'] }
        
        
        base_test_config = {'beginning_of_time': datetime.date(2015, 12, 20),
                       'end_time': datetime.date(2016, 12, 21),
                       'label_window': '3m',
                       'label_name': 'inspection_1yr',
                       'label_type': 'binary',
                       'matrix_id': 'CDPH_2015',
                       'feature_names': ['break_last_3yr', 'soil', 'pressure_zone'],
                       'indices': ['entity_id', 'as_of_date']}
        
        train_uuid = metta.archive_matrix(train_config, X_train, directory='./matrices')
        
        test_uuids = []
        
        for years in range(1, 5):
        	test_config = base_test_config.copy()
        	test_config['beginning_of_time'] += relativedelta(years=years)
        	test_config['end_time'] += relativedelta(years=years)
        	test_config['matrix_id'] = 'CDPH_{}'.format(test_config['end_time'].year)
        	test_uuids.append(metta.archive_matrix(
        		test_config,
        		df_data,
        		directory='./matrices',
                overwrite=False,
        		format='csv',
        		train_uuid=train_uuid
        	))
        
        ```
        
        
        ### Uploading to S3
        ```
        dict_config = yaml.load(open('aws_keys.yaml'))
        
        metta.upload_to_s3(access_key_id=dict_config['AWSAccessKey'],
                           secret_access_key=dict_config['AWSSecretKey'],
                           bucket=dict_config['Bucket'],
                           folder=dict_config['Folder'],
                           directory='./old_matrices')
        
        ```
        
        
        
Keywords: metta
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
