Metadata-Version: 2.1
Name: data-science-toolbox
Version: 0.1.2
Summary: Various code to aid in data science projects for tasks involving data cleaning, ETL, EDA, NLP, viz, feature engineering, feature selection, model validation, etc.
Home-page: https://github.com/safurrier/data_science_toolbox
License: MIT
Author: safurrier
Author-email: safurrier@gmail.com
Requires-Python: >=3.7,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Requires-Dist: click (>=7.0)
Requires-Dist: dask (>=2.8.1)
Requires-Dist: matplotlib (>=3.0.2)
Requires-Dist: numpy (>=1.15.2)
Requires-Dist: pandas (>=0.25.3)
Requires-Dist: pandas_profiling (>=2.3.0)
Requires-Dist: papermill (>=1.2.1)
Requires-Dist: pytest (>=5.2.4)
Requires-Dist: scikit_learn (>=0.21.3)
Requires-Dist: scipy (>=1.1.0)
Requires-Dist: seaborn (>=0.9.0)
Requires-Dist: spacy (>=2.2.3)
Requires-Dist: textacy (==0.7.1)
Requires-Dist: textblob (>=0.15.3)
Requires-Dist: vaderSentiment (>=3.2.1)
Description-Content-Type: text/markdown

data-science-toolbox
=====================

Various code to aid in data science projects for tasks involving data cleaning,
ETL, EDA, NLP, viz, feature engineering, feature selection, model training and validation etc.

Project Organization

---------------------
    ├── README.md              
    ├── data_science_toolbox   <- Project source code
    │   │
    │   ├── gists                  <- Code gists with commonly used code (change to root
    │   │                             directory, connect to database, profile data, etc)
    │   ├── io                     <- Code for input/output utilities
    │   ├── etl                    <- For building reproducible ETL pipelines, including data
    │   │                             checks and transformers
    │   ├── ml                     <- Machine Learning utility code (feature engineering, etc) 
    │   ├── pandas                 <- Pandas related utility code
    │   │   ├── analysis                  
    │   │   ├── cleaning
    │   │   ├── engineering
    │   │   ├── text    
    │   │   ├── datetime     
    │   │   ├── optimization       
    │   │   └── profiling   
    │   ├── project_utils.py   <- For project specific utilities
    │   │
    │   ├── text               <- Code for dealing with text. Includes distributed loading of text corpus, 
    │   │                         entity statement extraction, sentiment analysis, pii removal etc.
    │   └── __init__.py        <- Makes data_science_toolbox a Python module               
    ├── tests
    ├── LICENSE
    ├── poetry.lock
    └── pyproject.toml 
