Metadata-Version: 1.1
Name: taggedartifacts
Version: 0.1.1
Summary: Simple artifact versioning and caching for scientific workflows
Home-page: https://github.com/jisantuc/taggedartifacts
Author: James Santucci
Author-email: james.santucci@gmail.com
License: MIT license
Description: =======
        taggedartifacts
        =======
        
        
        .. image:: https://img.shields.io/pypi/v/taggedartifacts.svg
                :target: https://pypi.python.org/pypi/taggedartifacts
        
        .. image:: https://circleci.com/gh/jisantuc/taggedartifacts.svg?style=svg
            :target: https://circleci.com/gh/jisantuc/taggedartifacts
        
        .. image:: https://pyup.io/repos/github/jisantuc/taggedartifacts/shield.svg
             :target: https://pyup.io/repos/github/jisantuc/taggedartifacts/
             :alt: Updates
        
        
        
        Simple artifact versioning and caching for scientific workflows
        
        
        * Free software: MIT license
        * Documentation: https://taggedartifacts.readthedocs.io.
        
        
        Features
        --------
        
        :code:`taggedartifacts` exists to provide a simple interface for versioning functions that produce artifacts.
        An "artifact" could be anything -- maybe you have some sort of ETL pipeline that writes intermediate files,
        or you have a plotting function that writes a bunch of plots to disk, or you have a machine learning
        workflow that produces a bunch of model files somewhere. The purpose of :code:`taggedartifacts` is to allow you
        to write normally -- give your output its regular name, like :code:`plot.png` -- and automatically attach
        git commit and configuration information as part of the path.
        
        Example
        -------
        
        The following example shows how to use :code:`taggedartifacts` to tag an output file with commit and config info:
        
        .. codeblock:: python
        
            from taggedartifacts import Artifact
            @Artifact(keyword='outpath', config={}, allow_dirty=True)
            def save_thing(outpath):
                with open(outpath, 'w') as outf:
                    outf.write('good job')
            
        
            save_thing(outpath='foo.txt')
        
        The resulting file that would be created would be :code:`foo-<commit>-<config-hash>.txt`, without having to
        litter string formats and fetching git commit info throughout the code.
        
        Why
        ---
        
        It's really easy, once you start running a lot of experiments, to end up with a ton of output files
        produced at different times with names like :code:`plot.png`, :code:`plot2.png`, :code:`plot-please-work.png`, etc.
        Later, you'll maybe want to show someone a plot, and they'll try to reproduce it, and you won't be
        able to tell them the state of the code when the plot was produced. That's not great! :code:`taggedartifacts`
        offers one solution to this problem, where you can tell at a glance whether two files were produced
        by the same code and the same configuration.
        
        Isn't this just another workflow library
        ----------------------------------------
        
        It's not! I promise.
        
        The workflow library ecosystem in python already has a lot of entrants, like Luigi_, Airflow_, 
        Pinball_, and probably many I haven't heard of. There are also experiment and data/code versioning systems
        around like DVC_, and older solutions to DAGs that understand how not to redo work, like :code:`make`. :code:`taggedartifacts`
        isn't really like any of those. It isn't aware of a DAG of all of your tasks at any point, and it doesn't
        know anything about data science workflows in general. It only knows about tagging some sort of file-based
        output with git commit and configuration information so that you can tell whether two artifacts produced
        potentially on different computers should match.
        
        As a result, you don't have to have a separate daemon running, you don't get anything like task
        distribution and parallelization for free, and you don't get a special CLI. :code:`taggedartifacts` only attempts to
        solve one problem.
        
        .. _Luigi: https://github.com/spotify/luigi
        .. _Airflow: https://github.com/apache/airflow
        .. _Pinball: https://github.com/pinterest/pinball
        .. _DVC: https://github.com/iterative/dvc
        
        Credits
        -------
        
        This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.
        
        .. _Cookiecutter: https://github.com/audreyr/cookiecutter
        .. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage
        
        
        =======
        History
        =======
        
        0.1.1 (2019-04-27)
        ------------------
        
        * Renamed to avoid pypi name conflict. 
        
        0.1.0 (2019-04-27)
        ------------------
        
        * First release on PyPI.
        
Keywords: taggedartifacts
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
