Metadata-Version: 2.0
Name: cacheflow
Version: 0.1
Summary: Caching Workflow Engine
Home-page: https://gitlab.com/remram44/cacheflow
Author: Remi Rampin
Author-email: remi.rampin@nyu.edu
Maintainer: Remi Rampin
Maintainer-email: remi.rampin@nyu.edu
License: BSD-3-Clause
Project-URL: Source, https://gitlab.com/remram44/cacheflow
Project-URL: Tracker, https://gitlab.com/remram44/cacheflow/issues
Project-URL: Say Thanks, https://saythanks.io/to/remram44
Keywords: cache,workflow,pipeline,dataflow,flow,execution,engine
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: POSIX
Classifier: Operating System :: Microsoft :: Windows
Classifier: Programming Language :: Python
Classifier: Programming Language :: SQL
Classifier: Programming Language :: Unix Shell
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Software Development :: Build Tools
Classifier: Topic :: Software Development :: Interpreters
Classifier: Topic :: Text Processing :: Markup
Classifier: Topic :: Utilities
Requires-Dist: markdown
Requires-Dist: PyYAML
Requires-Dist: requests

CacheFlow
=========

CacheFlow is a caching workflow engine, capable of executing dataflows while
reusing previous results where appropriate, for efficiency. It is very
extensible and can be used in many projects.

Goals
-----

* ☑  Python 3 workflow system
* ☑ Executes dataflows from JSON files
* ☐ Can also load from SQL database
* ☐ Parallel execution
* ☐ Streaming
* ☑ Extensible: can add new modules, new storage formats, new caching mechanism, new executors
* ☐ Pluggable: extensions can be installed from PyPI without forking
* ☑ Re-usable: can execute workflows by itself, but can also be embedded into applications. Some I plan on developing myself:

  * Literate programming app: snippets or modules embedded into a markdown file, which are executed on render (similar to Rmarkdown). Results would be cached, making later rendering fast
  * Integrate in some of my NYU research projects (VisTrails Vizier, D3M)

Other ideas:

* ☐ Use Jupyter kernels as backends to execute code (giving me quick access to all the languages they support)
* ☐ Isolate script execution (to run untrusted Python/... code, for example with Docker)

Non-goals
---------

* Make a super-scalable and fast workflow execution engine: I'd rather make executors based on Spark, Dask, Ray than re-implement those

Status
------

Basic structures are here, extracted from D3M. Execution works.


