Metadata-Version: 2.1
Name: sapporo
Version: 1.7.1
Summary: The sapporo-service is a standard implementation conforming to the Global Alliance for Genomics and Health (GA4GH) Workflow Execution Service (WES) API specification.
Home-page: https://github.com/sapporo-wes/sapporo-service
Author: DDBJ(Bioinformatics and DDBJ Center)
Author-email: tazro.ohta@chiba-u.jp
License: Apache2.0
Platform: any
Classifier: Development Status :: 5 - Production/Stable
Classifier: Framework :: Flask
Classifier: Intended Audience :: Healthcare Industry
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: cryptography
Requires-Dist: cwl-inputs-parser>=1.0.2
Requires-Dist: flask_jwt_extended
Requires-Dist: flask-cors
Requires-Dist: flask
Requires-Dist: jsonschema
Requires-Dist: multiqc
Requires-Dist: psutil
Requires-Dist: python-magic
Requires-Dist: pyyaml
Requires-Dist: requests
Requires-Dist: rocrate
Provides-Extra: tests
Requires-Dist: flake8; extra == "tests"
Requires-Dist: isort; extra == "tests"
Requires-Dist: mypy; extra == "tests"
Requires-Dist: pytest; extra == "tests"
Requires-Dist: types-jsonschema; extra == "tests"
Requires-Dist: types-Flask-Cors; extra == "tests"
Requires-Dist: types-Flask; extra == "tests"
Requires-Dist: types-PyYAML; extra == "tests"
Requires-Dist: types-requests; extra == "tests"
Requires-Dist: types-setuptools; extra == "tests"
Requires-Dist: typing-extensions; extra == "tests"

# sapporo-service

[![pytest](https://github.com/sapporo-wes/sapporo-service/actions/workflows/pytest.yml/badge.svg)](https://github.com/sapporo-wes/sapporo-service/actions/workflows/pytest.yml)
[![flake8](https://github.com/sapporo-wes/sapporo-service/actions/workflows/flake8.yml/badge.svg)](https://github.com/sapporo-wes/sapporo-service/actions/workflows/flake8.yml)
[![isort](https://github.com/sapporo-wes/sapporo-service/actions/workflows/isort.yml/badge.svg)](https://github.com/sapporo-wes/sapporo-service/actions/workflows/isort.yml)
[![mypy](https://github.com/sapporo-wes/sapporo-service/actions/workflows/mypy.yml/badge.svg)](https://github.com/sapporo-wes/sapporo-service/actions/workflows/mypy.yml)
[![DOI](https://zenodo.org/badge/220937589.svg)](https://zenodo.org/badge/latestdoi/220937589)
[![Apache License](https://img.shields.io/badge/license-Apache%202.0-orange.svg?style=flat&color=important)](http://www.apache.org/licenses/LICENSE-2.0)

<img src="https://raw.githubusercontent.com/sapporo-wes/sapporo/main/logo/sapporo-service.svg" width="400" style="display: block; margin-left: auto; margin-right: auto; margin-top: 30px; margin-bottom: 30px;" alt="sapporo-service logo">

The sapporo-service is a standard implementation conforming to the [Global Alliance for Genomics and Health](https://www.ga4gh.org) (GA4GH) [Workflow Execution Service](https://github.com/ga4gh/workflow-execution-service-schemas) (WES) API specification.

We have also extended the API specification.
For more details, please refer to [`./sapporo-wes-1-1-0-openapi-spec.yml`](./sapporo-wes-1-1-0-openapi-spec.yml) for more details.

One of the key features of the sapporo-service is its ability to abstract workflow engines, making it easy to adapt various workflow engines to the WES standard.
Currently, we have verified compatibility with the following workflow engines:

- [cwltool](https://github.com/common-workflow-language/cwltool)
- [nextflow](https://www.nextflow.io)
- [Toil (experimental)](https://toil.ucsc-cgl.org)
- [cromwell](https://github.com/broadinstitute/cromwell)
- [snakemake](https://snakemake.readthedocs.io/en/stable/)
- [ep3 (experimental)](https://github.com/tom-tan/ep3)
- [StreamFlow (experimental)](https://github.com/alpha-unito/streamflow)

Another unique feature of the sapporo-service is a mode that permits only workflows registered by the system administrator to be executed.
This feature is particularly beneficial when setting up a WES in a shared HPC environment.

## Installation and Startup

The sapporo-service is compatible with Python 3.8 or later versions.

You can install it using pip:

```bash
pip3 install sapporo
```

To start the sapporo-service, run the following command:

```bash
sapporo
```

### Using Docker

Alternatively, you can run the sapporo-service using Docker.
If you want to use Docker-in-Docker (DinD), make sure to mount `docker.sock`, `/tmp`, and other necessary directories.

To start the sapporo-service using Docker, run the following command:

```bash
docker compose up -d
```

## Usage

You can view the help for the sapporo-service as follows:

```bash
$ sapporo --help
usage: sapporo [-h] [--host] [-p] [--debug] [-r] [--disable-get-runs]
               [--disable-workflow-attachment] [--run-only-registered-workflows]
               [--service-info] [--executable-workflows] [--run-sh]
               [--url-prefix] [--auth-config]

This is an implementation of a GA4GH workflow execution service that can easily
support various workflow runners.

optional arguments:
  -h, --help            show this help message and exit
  --host                Specify the host address for Flask. (default: 127.0.0.1)
  -p , --port           Specify the port for Flask. (default: 1122)
  --debug               Enable Flask's debug mode.
  -r , --run-dir        Specify the run directory. (default: ./run)
  --disable-get-runs    Disable the `GET /runs` endpoint.
  --disable-workflow-attachment
                        Disable the `workflow_attachment` feature on the `Post
                        /runs` endpoint.
  --run-only-registered-workflows
                        Only run registered workflows. Check the registered
                        workflows using `GET /executable-workflows`, and specify
                        the `workflow_name` in the `POST /run` request.
  --service-info        Specify the `service-info.json` file. The
                        `supported_wes_versions` and `system_state_counts` will
                        be overwritten by the application.
  --executable-workflows
                        Specify the `executable-workflows.json` file.
  --run-sh              Specify the `run.sh` file.
  --url-prefix          Specify the prefix of the URL (e.g., --url-prefix /foo
                        will result in /foo/service-info).
  --auth-config         Specify the `auth-config.json` file.
```

### Operating Mode

The sapporo-service can be started in one of the following two modes:

- Standard WES mode (Default)
- Execute only registered workflows mode

You can switch between these modes using the `--run-only-registered-workflows` startup argument or by setting the `SAPPORO_ONLY_REGISTERED_WORKFLOWS` environment variable to `True` or `False`.
Note that startup arguments take precedence over environment variables.

#### Standard WES Mode

In this mode, the sapporo-service conforms to the standard WES API specification.

#### Execute Only Registered Workflows Mode

In this mode, the sapporo-service only allows workflows registered by the system administrator to be executed.

The key changes in this mode are:

- `GET /executable_workflows` returns the list of executable workflows.
- `POST /runs`, use `workflow_name` instead of `workflow_url`.

The list of executable workflows is managed in [`executable_workflows.json`](./sapporo/executable_workflows.json).
By default, this file is located in the application directory of the sapporo-service.
However, you can override it using the startup argument `--executable-workflows` or the environment variable `SAPPORO_EXECUTABLE_WORKFLOWS`.

### Run Directory

The sapporo-service organizes all submitted workflows, workflow parameters, output files, and related data within a specific directory on the file system.
This directory, known as the "run directory".
To specify a different location for the run directory, use the startup argument `--run-dir` or set the environment variable `SAPPORO_RUN_DIR`.

The run dir structure is as follows:

```bash
$ tree run
.
└── 29
    └── 29109b85-7935-4e13-8773-9def402c7775
        ├── cmd.txt
        ├── end_time.txt
        ├── exe
        │   └── workflow_params.json
        ├── exit_code.txt
        ├── outputs
        │   ├── <output_file>
        ├── outputs.json
        ├── run.pid
        ├── run_request.json
        ├── start_time.txt
        ├── state.txt
        ├── stderr.log
        ├── stdout.log
        └── workflow_engine_params.txt
├── 2d
│   └── ...
└── 6b
    └── ...
```

You can manage each run by physically deleting it using the `rm` command.

Executing `POST /runs` can be quite complex.
For your convenience, we've provided examples using `curl` in the [`./tests/curl_example`](./tests/curl_example) directory.
Please refer to these examples as a guide.

### `run.sh`

The [`run.sh`](./sapporo/run.sh) script is used to abstract the workflow engine.
When `POST /runs` is invoked, the sapporo-service forks the execution of `run.sh` after preparing the necessary files in the run directory.
This allows you to adapt various workflow engines to WES by modifying `run.sh`.

By default, `run.sh` is located in the application directory of the sapporo-service.
You can override this location using the startup argument `--run-sh` or the environment variable `SAPPORO_RUN_SH`.

### Other Startup Arguments

You can modify the host and port used by the application using the startup arguments `--host` and `--port` or the environment variables `SAPPORO_HOST` and `SAPPORO_PORT`.

The following three startup arguments and corresponding environment variables can be used to limit the WES:

- `--disable-get-runs` / `SAPPORO_GET_RUNS`: Disables `GET /runs`. This can be useful when using WES with an unspecified number of users, as it prevents users from viewing or cancelling other users' runs by knowing the run_id.
- `--disable-workflow-attachment` / `SAPPORO_WORKFLOW_ATTACHMENT`: Disables the `workflow_attachment` field in `POST /runs`. This field is used to attach files for executing workflows, and disabling it can address security concerns.
- `--url-prefix` / `SAPPORO_URL_PREFIX`: Sets the URL prefix. For example, if `--url-prefix /foo/bar` is set, `GET /service-info` becomes `GET /foo/bar/service-info`.

The response content of `GET /service-info` is managed in [`service-info.json`](./sapporo/service-info.json).
By default, this file is located in the application directory of the sapporo-service.
You can override this location using the startup argument `--service-info` or the environment variable `SAPPORO_SERVICE_INFO`.

### Generate Download Link

The sapporo-service allows you to generate download links for files and directories located under the `run_dir`.

For more details, please refer to the `GetData` section in [`./sapporo-wes-1-1-0-openapi-spec.yml`](./sapporo-wes-1-1-0-openapi-spec.yml).

### Parse Workflow

The sapporo-service offers a feature to inspect the type, version, and inputs of a workflow document.

For more details, please refer to the `ParseWorkflow` section in [`./sapporo-wes-1-1-0-openapi-spec.yml`](./sapporo-wes-1-1-0-openapi-spec.yml).

### Generate RO-Crate

Upon completion of workflow execution, the sapporo-service generates an RO-Crate from the `run_dir`, which is saved as `ro-crate-metadata.json` within the same directory. You can download the RO-Crate using the `GET /runs/{run_id}/ro-crate/data/ro-crate-metadata.json` endpoint.

Additionally, you can generate an RO-Crate from the `run_dir` as follows:

```bash
# Inside the Sapporo run_dir
$ ls
cmd.txt                     run.sh                      state.txt
exe/                        run_request.json            stderr.log
executable_workflows.json   sapporo_config.json         stdout.log
outputs/                    service_info.json           workflow_engine_params.txt
run.pid                     start_time.txt              yevis-metadata.yml

# Execute the sapporo/ro_crate.py script
$ docker run --rm -v /var/run/docker.sock:/var/run/docker.sock -v $PWD:$PWD -w $PWD ghcr.io/sapporo-wes/sapporo-service:latest python3 /app/sapporo/ro_crate.py $PWD
```

For more information on RO-Crate, please also refer to [`./tests/ro-crate`](./tests/ro-crate).

### Authentication

The sapporo-service supports authentication, configurable via the [`./sapporo/auth_config.json`](./sapporo/auth_config.json).
By default, this configuration is as follows:

```json
{
  "auth_enabled": false,
  "auth_provider": "local",
  "local_auth": {
    "jwt_secret_key": "spr_secret_key_please_change_this",
    "users": [
      {
        "username": "spr_test_user",
        "password": "spr_test_password"
      }
    ]
  },
  "oidc_auth": {
    "realm_url": "http://localhost:8080/realms/sapporo-dev",
    "username_claim": "sub"
  }
}
```

This configuration file can be directly edited or relocated using the `--auth-config` startup argument or the `SAPPORO_AUTH_CONFIG` environment variable.

#### Configuration Fields

- `auth_enabled`: Determines if JWT authentication is activated. If set to `true`, JWT authentication is enabled.
- `auth_provider`: Specifies the type of authentication provider, supporting:
  - `local`: Uses a locally managed list of users for authentication.
    - Tokens are issued by Sapporo.
    - Usernames and passwords are referenced from the `auth_config.json`.
  - `oidc`: Uses an OpenID Connect (OIDC) provider like [Keycloak](https://www.keycloak.org).
    - Tokens are issued by the OIDC provider.
    - User information is managed by the OIDC provider.
- `local_auth`: Configuration for local authentication includes:
  - `jwt_secret_key`: Secret key for signing JWTs. Changing this key is highly recommended.
  - `users`: List of users eligible for JWT authentication, specifying username and password.
- `oidc_auth`: Configuration for OIDC authentication includes:
  - `realm_url`: URL of the OIDC realm.
  - `username_claim`: JWT claim used as the username.

#### Authentication Endpoints

When JWT authentication is enabled, endpoints requiring authentication include:

- `GET /runs`
- `POST /runs`
- `GET /runs/{run_id}`
- `POST /runs/{run_id}/cancel`
- `GET /runs/{run_id}/status`
- `GET /runs/{run_id}/data`

Each run is associated with a `username`, ensuring that only the user who created a run can access details like `GET /runs/{run_id}`.

#### Local Authentication

For local JWT authentication, configure `auth_config.json` as shown:

```json
{
  "auth_enabled": true,
  "auth_provider": "local",
  "local_auth": {
    "jwt_secret_key": "new_secret_key",
    "users": [
      {
        "username": "user1",
        "password": "password1"
      },
      {
        "username": "user2",
        "password": "password2"
      }
    ]
  },
  "oidc_auth": {
    "realm_url": "http://localhost:8080/realms/sapporo-dev",
    "username_claim": "sub"
  }
}
```

Starting sapporo-service with this configuration allows access to the `GET /service-info` endpoint, while `GET /runs` will require authentication:

```bash
# Start sapporo-service
$ sapporo

# GET /service-info
$ curl -X GET localhost:1122/service-info
{
  "auth_instructions_url": "https://github.com/sapporo-wes/sapporo-service",
  "contact_info_url": "https://github.com/sapporo-wes/sapporo-service",
...

# Generate JWT for authentication
$ TOKEN=$(curl -s -X POST \
    -H "Content-Type: application/json" \
    -d '{"username":"user1", "password":"password1"}' \
    localhost:1122/auth | jq -r '.access_token')

# Authenticate and access runs
$ curl -X GET -H "Authorization: Bearer $TOKEN" localhost:1122/runs
{
  "runs": []
}
```

#### OpenID Connect (OIDC) Authentication

For OIDC authentication, ensure the `auth_provider` is set to `oidc` and appropriate configurations are specified under `oidc_auth`. Users must obtain a token from the OIDC provider and attach it to the Authorization header for authentication.

## Development

To start the development environment, follow these steps:

```bash
$ docker compose -f compose.dev.yml up -d --build
$ docker compose -f compose.dev.yml exec app bash
# inside container
$ sapporo
```

We utilize [flake8](https://pypi.org/project/flake8/), [isort](https://github.com/timothycrosley/isort), and [mypy](http://mypy-lang.org) for linting and style checking.

```bash
bash ./tests/lint_and_style_check/flake8.sh
bash ./tests/lint_and_style_check/isort.sh
bash ./tests/lint_and_style_check/mypy.sh

bash ./tests/lint_and_style_check/run_all.sh
```

For testing, we use [pytest](https://docs.pytest.org/en/latest/).

```bash
pytest .
```

## Adding New Workflow Engines to Sapporo Service

Take a look at the [`run.sh`](./sapporo/run.sh) script, which is invoked from Python. This shell script receives a request with a Workflow Engine such as cwltool and triggers the `run_cwltool` bash function.

This function executes a Bash Shell command to start a Docker container for the Workflow Engine and monitors its exit status. For a comprehensive example, please refer to this pull request: <https://github.com/sapporo-wes/sapporo-service/pull/29>

## License

This project is licensed under [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0). See the [LICENSE](./LICENSE) file for details.

## Notice

Please note that this repository is participating in a study into sustainability of open source projects. Data will be gathered about this repository for approximately the next 12 months, starting from 2021-06-16.

Data collected will include number of contributors, number of PRs, time taken to close/merge these PRs, and issues closed.

For more information, please visit [our informational page](https://sustainable-open-science-and-software.github.io/) or download our [participant information sheet](https://sustainable-open-science-and-software.github.io/assets/PIS_sustainable_software.pdf).
