Metadata-Version: 2.1
Name: data-science-toolbox
Version: 0.1.0
Summary: Various code to aid in data science projects for tasks involving data cleaning, ETL, EDA, NLP, viz, feature engineering, feature selection, model validation, etc.
Home-page: https://github.com/safurrier/data_science_utils
License: MIT
Author: safurrier
Author-email: safurrier@gmail.com
Requires-Python: >=3.7,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Requires-Dist: click (>=7.0)
Requires-Dist: dask (>=2.8.1)
Requires-Dist: matplotlib (>=3.0.2)
Requires-Dist: numpy (>=1.15.2)
Requires-Dist: pandas (>=0.25.3)
Requires-Dist: pandas_profiling (>=2.3.0)
Requires-Dist: papermill (>=1.2.1)
Requires-Dist: pytest (>=5.2.4)
Requires-Dist: scikit_learn (>=0.21.3)
Requires-Dist: scipy (>=1.1.0)
Requires-Dist: seaborn (>=0.9.0)
Requires-Dist: spacy (>=2.2.3)
Requires-Dist: textacy (>=0.9.1)
Requires-Dist: textblob (>=0.15.3)
Requires-Dist: vaderSentiment (>=3.2.1)
Description-Content-Type: text/markdown

data-science-utils
==============================

Various code to aid in data science projects for tasks involving data cleaning, 
ETL, EDA, NLP, viz, feature engineering, feature selection, etc.


Project Organization
------------
    ├── README.md          <- The top-level README for developers using this project.
    ├── gists              <- Code gists with commonly used code (change to root
    │                         directory, connect to database, profile data, etc)
    ├── io                 <- Code for input/output utilities
    ├── etl                <- For building reproducible ETL pipelines, including data
    │                         checks and transformers
    ├── ml                 <- Machine Learning utility code (feature engineering, etc) 
    ├── pandas             <- Pandas related utility code
    │   ├── analysis                  
    │   ├── cleaning
    │   ├── engineering
    │   ├── text    
    │   ├── datetime     
    │   ├── optimization       
    │   └── profiling      
    ├── text               <- Code for dealing with text. Includes distributed loading of text corpus, 
    │                         entity statement extraction, sentiment analysis, etc.	
    ├── __init__.py        <- Makes data_science_utils a Python module               
    ├── project_utils.py   <- For project specific utilities
    └── LICENSE
    

