Metadata-Version: 2.1
Name: HackDuck
Version: 0.1.2
Summary: Machine learning data flow for reproducible data science
Home-page: https://github.com/AlexandreKempf/HackDuck
Author: Alexandre Kempf
Author-email: alexandre.kempf@cri-paris.org
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: prefect
Requires-Dist: mlflow
Requires-Dist: pyyaml
Requires-Dist: numpy
Requires-Dist: torch
Requires-Dist: simplejson
Requires-Dist: TaskBank

# IDEAL HACKDUCK PROJECT


Several pipelines for dataflow (Prefect):
  - nothing -> data_generation -> save to disk
  - preprocessing
  - augmentation
  - postprocessing

Model handle (Pytorch & Ignite):
  - fit -> give X and Y and learn
  - evaluate -> give X and Y, predict and return metrics
  - predict -> give X, return Y

Save logs and artifacts (MLflow):
  - save metrics during training (ignite)
  - save a bunch of data before and after each pipeline

Run model from with a REST app (MLflow):
  - save a github folder for each project
  - can easely have predition on a bunch of data



# FEATURES:
 - seed for reproducibility
 - map arguments to loop over a list
 - mlflow integration (automatic logs parameters, can log metrics or artifacts)
 - all prefect avantages
 - handle subflows
 - task bank to do basic operations
 - unit test handle by ward


# TODO:
[ ] map over subflows ?
[ ] pip package for TaskBank and save commit (needed to rerun the flow)
[ ] save python files inside mlruns/... and git them and save git commit
[ ] being able to rerun a previous flow (save args and kwargs and output ref)
[ ] run it in a docker
[ ] put to prod thanks to travis CI that create the MLflow git repo
[ ] do deep learning with it


