Metadata-Version: 2.1
Name: dalymi
Version: 0.1.5
Summary: [data like you mean it] A lightweight, data-focused and non-opinionated pipeline manager written in and for Python.
Home-page: https://github.com/joschnitzbauer/dalymi
Author: Joerg Schnitzbauer
Author-email: joschnitzbauer@gmail.com
License: MIT
Description: # dalymi
        
        *[data like you mean it]*
        
        [![Documentation Status](https://readthedocs.org/projects/dalymi/badge/?version=latest)](http://dalymi.readthedocs.io/en/latest/?badge=latest)
        
        A lightweight, data-focused and non-opinionated pipeline manager written in and for Python.
        
        --------------------------------------------------------------------------------
        
        _dalymi_ allows to build data processing pipelines as [directed acyclic graphs]([https://en.wikipedia.org/wiki/Directed_acyclic_graph]) (DAGs) and facilitates rapid, but controlled, model development. The goal is to prototype quickly, but scale to production with ease.
        To achieve this, _dalymi_ uses "make"-style workflows, _i.e._ tasks with missing input trigger the execution of input-producing tasks before being executed themselves. At the same time, _dalymi_ provides fine control to run and undo specific pipeline parts for quick test iterations. This ensures output reproducability and minimizes manual errors.
        
        Several features facilitate _dalymi_'s goal:
        
        - simple, non-opinionated API (most choices left to user)
        - no external dependencies for pipeline execution
        - one-line installation (ready for use)
        - no configuration
        - auto-generated command line interface for pipeline execution
        - quick start, but high flexibility to customize and extend:
            - task output can be stored in any format Python can touch (local files being the default)
            - customizable command line arguments
            - templated output location (e.g. timestamped files)
            - support for automated checks on data integrity during runtime
        - DAG visualization using [graphviz](https://www.graphviz.org/)
        - API design encourages good development practices (modular code, defined data schemas, self-documenting code, easy workflow viz, etc.)
        
        ## Installation
        _dalymi_ requires Python >= 3.5.
        
        ``` bash
        pip install dalymi
        ```
        
        For the latest development:
        ``` bash
        pip install git+https://github.com/joschnitzbauer/dalymi.git
        ```
        
        ## Documentation
        http://dalymi.readthedocs.io/
        
        ## Simple example
        simple.py:
        ``` python
        from dalymi import Pipeline
        from dalymi.resources import PandasCSV
        import pandas as pd
        
        
        # Define resources:
        numbers_resource = PandasCSV(name='numbers', loc='numbers.csv', columns=['number'])
        squares_resource = PandasCSV(name='squares', loc='squares.csv', columns=['number', 'square'])
        
        
        # Define the pipeline
        pl = Pipeline()
        
        
        @pl.output(numbers_resource)
        def create_numbers(**context):
            return pd.DataFrame({'number': range(11)})
        
        
        @pl.output(squares_resource)
        @pl.input(numbers_resource)
        def square_numbers(numbers, **context):
            numbers['square'] = numbers['number']**2
            return numbers
        
        
        if __name__ == '__main__':
            # Run the default command line interface
            pl.cli()
        ```
        
        Command line:
        ```bash
        python simple.py run     # executes the pipeline. skips tasks for which output already exists.
        ```
        
        More examples can be found [here](https://github.com/joschnitzbauer/dalymi/tree/master/examples).
        
        ## Roadmap
        - More docstrings
        - Unit tests
        - Continuous integration
        - Parallel task processing
        - REST API during pipeline run
        - Web interface for pipeline run
        
        ## Warranty
        Although _dalymi_ is successfully used in smaller applications, it is not battle-tested yet and lacks unit tests. If you decide to use it, be prepared to communicate issues or fix bugs (it's not a lot of code... :)).
        
        ## Contributions
        ... are welcome!
        
Platform: UNKNOWN
Description-Content-Type: text/markdown
