Metadata-Version: 1.2
Name: decimate
Version: 0.9.1
Summary: A fault-tolerant SLURM scheduler extension
Home-page: https://github.com/samkos/decimate
Author: Samuel KORTAS
Author-email: samuel.kortas@kaust.edu.sa
License: UNKNOWN
Description: NAME
        
               decimate - a fault-tolerant SLURM scheduler extension
        
        SYNOPSIS
        
               dbatch [ Slurm options ] [ --check <user_script> ]
                                        [ --max-retry=<number of restart> ]
                                        script [args...]
        
        DESCRIPTION
        
               Developped by the KAUST Supercomputing Laboratory (KSL),
               decimate is a SLURM extension written in python designed to handle
               dependent jobs more easely and efficiently.
        
               Decimate transparently adds parameters to SLURM sbatch command
               to check the correctness of jobs and automatically
               reschedules jobs found faulty.
        
               Using Decimate on Shaheen II, one can submit, run, monitor or
               terminate a workflow composed of dependent jobs. If asked,
               thanks to standardized or customized messages, the user will be
               informed by mail of the progress of its workflow on the system.
        
               In case of failure of one part of tne workflow, decimate
               automatically detects the failure, signals it to the user and
               launches the misbehaving part after having fixed the job
               dependency. By default if the same failure happens three
               consecutive times, decimate cancels the whole workfow removing
               all the depending jobs from the scheduling. In a next version,
               decimate will allow the automatic restarting of the workflow
               once the problem causing its failure has been cured.
        
               decimate also allows the user to define his own mail alerts
               that can be sent at any point of the workflow through a call to
               a python method. This feature will also be available from bash
               in a next version.
        
               Some customized checking functions can also be designed by the
               user. Their purpose is to validate if a step of the workflow
               was succesful or not. It could involved checking for the
               presence of some result files, grepping some error or success
               messages in them, computing ratio or checksum... These
               intermediate results can be easely transmitted to decimate
               validating or not the correctness of any step. They can also be
               forwarded by mail to the user where as the workflow is
               executing.
        
        USE
        
               At this moment, jobs only need to be submitted through the
                   dbatch
               command that accepts exactely the same parameters as the
               original SLURM sbatch command plus the new parameters
               
                        --check=SCRIPT_FILE
        		               where SCRIPT_FILE  is a python
        		               or shell script
        			       to check if results are ok.
        
                         --max-retry=MAX_RETRY
        		               number of time a step can fail and be
                                       restarted automatically before failing the 
                                       whole workflow  (3 per default)
        
               sslog tails out the decimate logging file attached to the
               current directory, tracking all the jobs that were launched
               with dbatch from this directory.
        
               sstatus gives the current status of the workflow excecuting
               in the current directory.
               
               Decimate is still in a beta phase and under test with some of
               our KSL users. More documentations will be provided once the
               stabilized and fully tested version is made available by the
               end of June 2018.
        
               If interested in testing decimate or contributing, please send
               a mail to help@hpc.kaust.edu.sa
        
        AUTHOR
        
               Written by Samuel Kortas (samuel.kortas (at) kaust.edu.sa)
        
        REPORTING BUGS
        
               Report decimate bugs to help@hpc.kaust.edu.sa
        
        
        COPYRIGHT
               Copyright (c) 2017, KAUST Supercomputing Laboratory
               All rights reserved.
        
               Redistribution and use in source and binary forms, with or without
               modification, are permitted provided that the following conditions are met:
        
               * Redistributions of source code must retain the above copyright notice, this
                 list of conditions and the following disclaimer.
        
               * Redistributions in binary form must reproduce the above copyright notice,
                 this list of conditions and the following disclaimer in the documentation
                 and/or other materials provided with the distribution.
        
               THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
               AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
               IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
               DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
               FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
               DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
               SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
               CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
               OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
               OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
        
        SEE ALSO
        
               decimate non stable version home page:
                        <https://bitbucket.org/kaust_KSL/decimate>
        		
               KAUST Supercomputing Laboratory: <http://hpc.kaust.edu.sa/>
        
Keywords: scheduler extension workflow parametric
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Other Environment
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Software Development :: Build Tools
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Operating System :: POSIX
Classifier: Topic :: System :: Distributed Computing
Classifier: Topic :: Utilities
Requires-Python: >=2.7,  <3
