Metadata-Version: 2.1
Name: nextflowpy
Version: 0.4.2
Summary: A Python wrapper around Nextflow.
Home-page: https://github.com/goodwright/nextflow.py
Author: Sam Ireland
Author-email: sam@goodwright.com
License: MIT
Keywords: nextflow bioinformatics pipeline
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Requires-Python: !=2.*, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*
Description-Content-Type: text/x-rst
License-File: LICENSE

nextflow.py
===========

|ci| |version| |pypi| |nextfow| |license|

.. |ci| image:: https://github.com/goodwright/nextflow.py/actions/workflows/main.yml/badge.svg
  :target: https://github.com/goodwright/nextflow.py/actions/workflows/main.yml

.. |version| image:: https://img.shields.io/pypi/v/nextflowpy.svg
  :target: https://pypi.org/project/nextflow/

.. |pypi| image:: https://img.shields.io/pypi/pyversions/nextflowpy.svg
  :target: https://pypi.org/project/nextflow/

.. |nextfow| image:: https://img.shields.io/badge/Nextflow-22.04%20%7C%2021.10%20%7C%2020.10-orange
  :target: https://www.nextflow.io/

.. |license| image:: https://img.shields.io/pypi/l/nextflowpy.svg?color=blue)
  :target: https://github.com/goodwright/nextflow.py/blob/master/LICENSE

nextflow.py is a Python wrapper around the Nextflow pipeline framework. It lets
you run Nextflow pipelines from Python code.

Example
-------

   >>> import nextflow
   >>> execution = nextflow.run(path="main.nf", params={"param1": "123"})
   >>> print(execution.status)


Installing
----------

pip
~~~

nextflow.py can be installed using pip::

    $ pip install nextflowpy

If you get permission errors, try using ``sudo``::

    $ sudo pip install nextflowpy


Development
~~~~~~~~~~~

The repository for nextflow.py, containing the most recent iteration, can be
found `here <http://github.com/goodwright/nextflow.py/>`_. To clone the
nextflow.py repository directly from there, use::

    $ git clone git://github.com/goodwright/nextflow.py.git


Nextflow
~~~~~~~~

nextflow.py requires the Nextflow executable to be installed and in your PATH.
Instructions for installing Nextflow can be found at
`their website <https://www.nextflow.io/docs/latest/getstarted.html#installation/>`_.


Testing
~~~~~~~

To test a local version of nextflow.py, cd to the nextflow.py directory and run::

    $ python -m unittest discover tests

You can opt to only run unit tests or integration tests::

    $ python -m unittest discover tests.unit
    $ python -m unittest discover tests.integration



Overview
--------

The starting point for any nextflow.py pipeline is the ``Pipeline``
object. This is initialised with a path to the file in question, and,
optionally, the location of an accompanying config file:

    >>> pipeline1 = nextflow.Pipeline("pipelines/my-pipeline.nf")
    >>> pipeline2 = nextflow.Pipeline("main.nf", config="nextflow.config")

Running
~~~~~~~

To actually execute the pipeline, the ``run`` method is used:

    >>> execution = pipeline.run()

This will return an ``Execution`` object, which represents the pipeline
execution that just took place. You can customise the execution with various
options:

    >>> execution = pipeline.run(location="./rundir", params={"param1": "123"}, profile=["docker", "test"], version="22.0.1")

This sets the execution to take place in a different location, passes
``--param1=123`` as a command line argument when the pipeline is run, uses the
Nextflow profiles 'docker' and 'test', and runs with Nextflow version 22.0.1
(regardless of what version of Nextflow is installed).

Executions
##########

An ``Execution`` represents a single execution of a
``Pipeline``. It has properties for:

* ``id`` - The unique ID of that run, generated by Nextflow.

* ``started`` - When the pipeline ran (as a UNIX timestamp).

* ``started_dt`` - When the pipeline ran (as a Python datetime).

* ``duration`` - how long the execution took in seconds.

* ``status`` - the status Nextflow reports on completion.

* ``command`` - the command used to run the pipeline.

* ``stdout`` - the stdout of the execution process.

* ``stderr`` - the stderr of the execution process.

* ``log`` - the full text of the log file produced.

* ``returncode`` - the exit code of the run - usually 0 or 1.

* ``pipeline`` - the ``Pipeline`` that created the execution.

It also has a ``process_executions`` property, which is a list of
``ProcessExecution`` objects. Nextflow processes data by chaining
together isolated 'processes', and each of these has a
``ProcessExecution`` object representing its execution. These have the
following properties:

* ``hash`` - The unique ID generated by Nextflow, of the form ``xx/xxxxxx``.

* ``process`` - The name of the process that spawned the process execution.

* ``name`` - The name of this specific process execution.

* ``status`` - the status Nextflow reports on completion.

* ``stdout`` - the stdout of the process execution.

* ``stderr`` - the stderr of the process execution.

* ``started`` - When the process execution ran (as a UNIX timestamp).

* ``started_dt`` - When the process execution ran (as a Python datetime).

* ``duration`` - how long the process execution took in seconds.

* ``returncode`` - the exit code of the process execution - usually 0 or 1.

Process executions can have various files passed to them, and will create files
during their execution too. These can be obtained as follows:

    >>> process_execution.input_data() # Full absolute paths
    >>> process_execution.input_data(include_path=False) # Just file names
    >>> process_execution.all_output_data() # Full absolute paths
    >>> process_execution.all_output_data(include_path=False) # Just file names

.. note::
   Nextflow makes a distinction between process output files which were
   'published' via some channel, and those which weren't. It is not possible to
   distinguish these once execution is complete, so nextflow.py reports all
   output files, not just those which are 'published'.

Polling
~~~~~~~

The method described above will run the pipeline and wait while it does, with
the completed ``Execution`` being returned only at the end.

An alternate method is to use ``run_and_poll``, which returns an
``Execution`` object every few seconds representing the state of the
pipeline execution at that moment in time, as a generator::

    for execution in pipeline.run_and_poll(sleep=2, location="./rundir", params={"param1": "123"}, profile=["docker", "test"], version="22.0.1"):
        print("Processing intermediate execution")

By default, an ``Execution`` will be returned every 5 seconds, but you
can adjust this as required with the ``sleep`` paramater. This is useful if you
want to get information about the progress of the pipeline execution as it
proceeds.

Direct Running
~~~~~~~~~~~~~~

If you just want to run a single pipeline without initialising a
``Pipeline`` object first, you can ``run`` or
``run_and_poll`` directly, without needing to create a
``Pipeline``:

    >>> import nextflow
    >>> execution = nextflow.run(path="pipeline.nf", config="settings.config", params={"param1": "123"})

Changelog
---------

Release 0.4.2
~~~~~~~~~~~~~

`26th September, 2022`

* Added `bash` attribute to process executions.


Release 0.4.1
~~~~~~~~~~~~~

`11th September, 2022`

* Fixed issue in execution polling where previous execution interferes initially.
* Execution parsing now checks directory is fully ready for parsing.
* Fixed issue where logs are unparseable in certain locales.


Release 0.4.0
~~~~~~~~~~~~~

`13th July, 2022`

* Process executions now report their input files as paths.
* Process executions now report all their output files as paths.
* Executions now have properties for their originating pipeline.
* Removed schema functionality.


Release 0.3.1
~~~~~~~~~~~~~

`15th June, 2022`

* Process polling now accesses stdout and stderr while process is ongoing.


Release 0.3
~~~~~~~~~~~

`4th June, 2022`

* Allow module-level run methods for directly running pipelines.
* Allow for running pipelines with different Nextflow versions.
* Improved datetime parsing.
* Simplified process execution parsing.
* Fixed concatenation of process executions with no parentheses.
* Tests now check compatability with different Nextflow versions.

Release 0.2.2
~~~~~~~~~~~~~

`21st March, 2022`

* Log outputs now have ANSI codes removed.

Release 0.2.1
~~~~~~~~~~~~~

`19th February, 2022`

* Execution polling now handles unready execution directory.
* Better detection of failed process executions mid execution.


Release 0.2
~~~~~~~~~~~

`14th February, 2022`

* Added method for running while continuously polling pipeline execution.
* Optimised process execution object creation from file state.

Release 0.1.4
~~~~~~~~~~~~~

`12th January, 2022`

* Pipeline command generation no longer applies quotes if there are already quotes.


Release 0.1.3
~~~~~~~~~~~~~

`24th November, 2021`

* Fixed Windows file separator issues.
* Renamed NextflowProcess -> ProcessExecution.

Release 0.1.2
~~~~~~~~~~~~~

`3rd November, 2021`

* Better handling of missing Nextflow executable.

Release 0.1.1
~~~~~~~~~~~~~

`29th October, 2021`

* Renamed `nextflow_processes` to `process_executions`.
* Added quotes around paths to handle spaces in paths.

Release 0.1
~~~~~~~~~~~~~

`18th October, 2021`

* Basic Pipeline object.
* Basic Execution object.
* Basic ProcessExecution object.

