Metadata-Version: 1.1
Name: vipe
Version: 0.5.3
Summary: Tool for visualizing Apache Oozie pipelines
Home-page: https://github.com/openaire/vipe
Author: Mateusz Kobos
Author-email: mkobos@icm.edu.pl
License: Apache License, Version 2.0
Description: About
        =====
        
        |Build Status|
        
        This is a tool for visualizing Apache Oozie workflows as data flow
        pipelines.
        
        .. figure:: docs/summary_diagram.png
           :alt: Visual summary of what the tool does
        
        *Fig. 1*: Visual summary of what the tool does.
        
        The tool is a command-line application that ingests imperative
        description of a workflow in Apache Oozie XML file and converts it to a
        data pipeline representation in PNG image file. Note that in order for
        the application to be able to extract the pipeline representation,
        content of the Oozie XML file has to follow certain conventions (e.g.,
        the names of Oozie action properties that correspond to ports have to
        follow a convention of being prefixed with "input" or "output" string).
        See file ``vipe/oozie/converter/iis.py`` for a code which follows such
        conventions used in workflow definitions of `OpenAIRE IIS
        project <https://github.com/openaire/iis>`__.
        
        How to install and run
        ======================
        
        Run ``pip install vipe`` to install the stable version of the software
        from PyPI repository. After installing the software, you can run it by
        executing ``vipe-oozie2png`` (run ``vipe-oozie2png --help`` for usage
        instructions).
        
        Note that the following libraries have to be installed in the system for
        the tool to work:
        
        -  ``libyaml`` (this is required by ``pyyaml`` library used by the
           solution) - on Ubuntu 14.04 system, this can be installed by running
           ``apt-get install libyaml-dev``
        -  `GraphViz <graphiz.org>`__ - on Ubuntu 14.04 system, this can
           installed by running ``apt-get install graphviz``.
        
        Goals
        =====
        
        There are two main goals of the solution:
        
        -  Show existing **workflows without distracting technical details**
           (i.e. a high-level/business view). In order to achieve it, the
           application shows only data dependencies between workflow nodes, i.e.
           if one node is a producer of data consumed by other workflow node, a
           link between them is shown. If two nodes are executed one after
           another but their order is not really important (in such case the
           order is defined just out of convenience or because both of them need
           to have access to full computational resources), the information
           about their order is not visible. The user of the application can
           also specify a detail level of the visualization.
        -  Make **data passed between workflow nodes a first-class citizen**.
           The user of the visualization should focus on the most important
           aspect of the defined workflows - flow of the data between the
           modules.
        
        Example visualizations
        ======================
        
        This section contains example visualization of various workflows. The
        visualization were generated with the application version 0.5.
        
        Simple workflow
        ---------------
        
        Below we show visualization of Oozie workflow
        ```vipe/oozie/test/data/bypass/workflow.xml`` <vipe/oozie/test/data/bypass/workflow.xml>`__.
        Internally, this workflow is converted to ``OozieGraph`` representation
        (see its YAML representation in
        ```vipe/oozie/test/data/bypass/workflow.yaml`` <vipe/oozie/test/data/bypass/workflow.yaml>`__)
        and then subsequently to ``Pipeline`` representation (see its YAML
        representation in
        ```vipe/oozie/test/data/bypass/pipeline.yaml`` <vipe/oozie/test/data/bypass/pipeline.yaml>`__)
        and then finally to a PNG image.
        
        See Fig. 2-5 for visualizations of the workflow with different levels of
        details as specified by the user.
        
        .. figure:: docs/example_visualizations/bypass/detail_lowest-ports_none.png
           :alt: 
        
        *Fig. 2*: Simple workflow visualized with the lowest level of details.
        
        .. figure:: docs/example_visualizations/bypass/detail_medium-ports_none.png
           :alt: 
        
        *Fig. 3*: Simple workflow visualized with medium level of details.
        
        .. figure:: docs/example_visualizations/bypass/detail_medium-ports_input_output.png
           :alt: 
        
        *Fig. 4*: Simple workflow visualized with medium level of details with
        input and output ports shown.
        
        .. figure:: docs/example_visualizations/bypass/detail_highest-ports_input_output.png
           :alt: 
        
        *Fig. 5*: Simple workflow visualized with the highest level of detail
        with input and output ports shown.
        
        Workflows from OpenAIRE IIS project
        -----------------------------------
        
        In this section, we show visualizations generated for real-life
        workflows from `OpenAIRE IIS
        project <https://github.com/openaire/iis>`__ - see Fig. 6-8.
        
        .. figure:: docs/example_visualizations/iis/primary-main-medium_detail.png
           :alt: 
        
        *Fig. 6*: Primary-main workflow from OpenAIRE IIS project with medium
        level of detail.
        
        .. figure:: docs/example_visualizations/iis/primary-processing-lowest_detail.png
           :alt: 
        
        *Fig. 7*: Primary-processing workflow from OpenAIRE IIS project with the
        lowest level of detail.
        
        .. figure:: docs/example_visualizations/iis/primary-processing-medium_detail.png
           :alt: 
        
        *Fig. 8*: Primary-processing workflow from OpenAIRE IIS project with
        medium level of detail.
        
        Features
        ========
        
        User-visible features
        ---------------------
        
        Features visible to the user of the application are listed below. Note
        that we use a notion of port (see chapter 3 of Gregor Hohpe, Bobby
        Woolf: "Enterprise Integration Patterns: Designing, Building, and
        Deploying Messaging Solutions", Addison-Wesley, 2003) corresponding to a
        join point between node and connection in a data pipeline graph.
        
        -  The produced pipeline representation can be either **PNG image or
           YAML-formatted text file**.
        -  Each workflow node can have its **input and output ports shown**;
           ports are connected using arrows to show the producer-consumer
           dependencies.
        -  There are **many detail levels** on which the graph can be shown. The
           amount of detail shown on each detail level depends on priority
           assigned to different kind of nodes and on options given to the
           application. The priority is implemented in a well-separated part of
           the code responsible for interpreting custom conventions used in
           Oozie XML file (file ``vipe/oozie/converter/iis.py`` in the source
           code contains such code for conventions used in workflow definitions
           of `OpenAIRE IIS project <https://github.com/openaire/iis>`__) and
           thus it is reasonably easy for a developer to adjust it to Oozie XML
           conventions used in a different project.
        -  The produced graph can have either **horizontal or vertical
           orientation**.
        
        Developer-visible features
        --------------------------
        
        In this section, we describe internal features of the solution that are
        of interest of people who want to extend its code.
        
        **Extensibility areas**. The application was designed and implemented
        with extensibility in mind - we wanted to make it **easily extensible in
        the following areas**.
        
        -  **Input descriptions of workflow**, e.g. possibility to analyze
           source code instead of Oozie XML file.
        -  **Conventions used in the Oozie XML** file, i.e. possibility to use
           different conventions of describing workflows, other than the ones
           used in `OpenAIRE IIS project <https://github.com/openaire/iis>`__.
           Namely, the developer should be required only to implement a new
           ``PipelineConverter``-derived class.
        -  **Output artefacts**, e.g. producing website or interactive web
           applications instead of static images.
        
        **Processing stages**. In order to attain mentioned extensibility goals,
        the processing in the application was separated into stages shown in
        Fig. 9.
        
        .. figure:: docs/data_processing.png
           :alt: Data processing in the application
        
        *Fig. 9*: Data processing in the application. Boxes correspond to data
        structures or files while the arrows correspond to processing steps. The
        area enclosed with dotted line shows discussed potential future
        extensions of the application. Names highlighted in gray correspond to
        names of classes in the source code.
        
        **Intermediate representations**. It is worth noting that there are two
        intermediate representations of the workflow (as shown in Fig. 9):
        
        -  ``OozieGraph`` class that corresponds directly to objects defined in
           Oozie XML workflow file,
        -  ``Pipeline`` class corresponding to data pipeline representation of
           the processing. It contains information about dependencies between
           workflow nodes and data passed between them. It doesn’t store
           information about the order in which the workflow nodes are defined.
        
        A ``PipelineConverter``-derived class is used to translate
        ``OozieGraph`` into ``Pipeline``.
        
        Code development
        ================
        
        Python packages that the application depends on are listed in the
        ``requirements.txt`` file. Note that the project is written in Python 3,
        so you need to install Python 3 version of these dependencies (on Ubuntu
        14.04 system you can do it by executing, e.g.
        ``sudo pip3 install pytest``).
        
        The **docstrings** in the code follow `Google style
        guide <https://google-styleguide.googlecode.com/svn/trunk/pyguide.html#Comments>`__
        with types declared in accordance to
        `Sphinx <http://sphinx-doc.org/>`__'s `type annotating
        conventions <http://sphinx-doc.org/latest/ext/example_google.html>`__.
        Note that you have to use Sphinx version at least 1.3 if you want to
        generate documentation with type annotations.
        
        Future work
        ===========
        
        Possible future extensions of the application are listed below.
        
        -  Generate an interlinked website containing visualization of all
           workflows and subworkflows along with some additional information,
           like a list of all input and output ports with the type of data they
           ingest or produce.
        -  Show types of data related to each port.
        -  Show links from the names of types of data related to each port to
           their schemas (extracted from surrounding system’s source code).
        -  Show link to the Oozie XML workflow corresponding to given diagram
           (it should be extracted from surrounding system’s source code).
        -  Show comments and descriptions from the original Oozie workflow
           definition.
        -  Show some statistics related to the workflow (e.g., number of nodes).
        -  Check whether data passed between workflow nodes is compatible (i.e.
           check that there is no such situation that a data produced by a
           certain workflow node is incompatible with data consumed by its
           consumer). This would be akin to static type checking for the
           workflow.
        
        License
        =======
        
        The code is licensed under Apache License, Version 2.0
        
        .. |Build Status| image:: https://travis-ci.org/openaire/vipe.png?branch=master
           :target: https://travis-ci.org/openaire/vipe
        
Keywords: Oozie,workflow visualization,pipeline visualization
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.4
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Topic :: Scientific/Engineering :: Visualization
