Metadata-Version: 2.1
Name: data-science-toolbox
Version: 0.1.3
Summary: Various code to aid in data science projects for tasks involving data cleaning, ETL, EDA, NLP, viz, feature engineering, feature selection, model validation, etc.
Home-page: https://github.com/safurrier/data_science_toolbox
License: MIT
Author: safurrier
Author-email: safurrier@gmail.com
Requires-Python: >=3.7,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Requires-Dist: click (>=7.0)
Requires-Dist: dask (>=2.8.1)
Requires-Dist: matplotlib (>=3.0.2)
Requires-Dist: numpy (>=1.15.2)
Requires-Dist: pandas (>=0.25.3)
Requires-Dist: pandas_profiling (>=2.3.0)
Requires-Dist: papermill (>=1.2.1)
Requires-Dist: pytest (>=5.2.4)
Requires-Dist: scikit_learn (>=0.21.3)
Requires-Dist: scipy (>=1.1.0)
Requires-Dist: seaborn (>=0.9.0)
Requires-Dist: spacy (>=2.2.3)
Requires-Dist: textacy (==0.8.0)
Requires-Dist: textblob (>=0.15.3)
Requires-Dist: vaderSentiment (>=3.2.1)
Description-Content-Type: text/markdown

# data_science_toolbox

=====================

Various code to aid in data science projects for tasks involving data cleaning,
ETL, EDA, NLP, viz, feature engineering, feature selection, model training and validation etc.

## Installation

### Using pip

You can install using the pip package manager by running

    pip install data-science-toolbox

## Project Organization

---------------------
    ├── README.md              
    ├── data_science_toolbox   <- Project source code
    │   │
    │   ├── gists              <- Code gists with commonly used code (change to root
    │   │                         directory, connect to database, profile data, etc)
    │   ├── data_checks        <- Code for data checks and assertions
    │   ├── io                 <- Code for input/output utilities
    │   ├── etl                <- For building reproducible ETL pipelines, including data
    │   │                         checks and transformers
    │   ├── ml                 <- Machine Learning utility code (feature engineering, etc) 
    │   ├── pandas             <- Pandas related utility code
    │   │   ├── analysis                  
    │   │   ├── cleaning
    │   │   ├── engineering
    │   │   ├── text    
    │   │   ├── datetime     
    │   │   ├── optimization       
    │   │   └── profiling   
    │   ├── project_utils.py   <- For project specific utilities
    │   │
    │   ├── text               <- Code for dealing with text. Includes distributed loading of text corpus, 
    │   │                         entity statement extraction, sentiment analysis, pii removal etc.
    │   └── __init__.py        <- Makes data_science_toolbox a Python module               
    ├── tests                  <- Pytest unit tests 
    ├── dist                   <- tars and whls of version builds
    ├── LICENSE
    ├── poetry.lock
    └── pyproject.toml 
