Metadata-Version: 2.3
Name: moriarty
Version: 0.2.6
Summary: moriarty
Project-URL: Source, https://github.com/wh1isper/moriarty
Author-email: wh1isper <jizhongsheng957@gmail.com>
License: BSD 3-Clause License
License-File: LICENSE
Keywords: moriarty
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Requires-Dist: anyio
Requires-Dist: async-timeout; python_version < '3.11'
Requires-Dist: brq>=0.3.6
Requires-Dist: click
Requires-Dist: httpx
Requires-Dist: importlib-metadata
Requires-Dist: loguru
Requires-Dist: pydantic>=2
Provides-Extra: docs
Requires-Dist: autodoc-pydantic; extra == 'docs'
Requires-Dist: pydata-sphinx-theme; extra == 'docs'
Requires-Dist: sphinx; extra == 'docs'
Requires-Dist: sphinx-click; extra == 'docs'
Provides-Extra: matrix
Requires-Dist: alembic; extra == 'matrix'
Requires-Dist: asyncpg; extra == 'matrix'
Requires-Dist: boto3; extra == 'matrix'
Requires-Dist: escapism; extra == 'matrix'
Requires-Dist: fastapi; extra == 'matrix'
Requires-Dist: jinja2; extra == 'matrix'
Requires-Dist: kubernetes-asyncio; extra == 'matrix'
Requires-Dist: pluggy; extra == 'matrix'
Requires-Dist: psycopg2-binary; extra == 'matrix'
Requires-Dist: sqlalchemy[asyncio]; extra == 'matrix'
Requires-Dist: uvicorn[standard]; extra == 'matrix'
Provides-Extra: service
Requires-Dist: alembic; extra == 'service'
Requires-Dist: asyncpg; extra == 'service'
Requires-Dist: escapism; extra == 'service'
Requires-Dist: fastapi; extra == 'service'
Requires-Dist: jinja2; extra == 'service'
Requires-Dist: kubernetes-asyncio; extra == 'service'
Requires-Dist: pluggy; extra == 'service'
Requires-Dist: psycopg2-binary; extra == 'service'
Requires-Dist: sqlalchemy[asyncio]; extra == 'service'
Requires-Dist: uvicorn[standard]; extra == 'service'
Provides-Extra: sqs
Requires-Dist: boto3; extra == 'sqs'
Provides-Extra: test
Requires-Dist: alembic; extra == 'test'
Requires-Dist: asyncpg; extra == 'test'
Requires-Dist: boto3; extra == 'test'
Requires-Dist: boto3-stubs[s3,sqs]; extra == 'test'
Requires-Dist: docker; extra == 'test'
Requires-Dist: escapism; extra == 'test'
Requires-Dist: fastapi; extra == 'test'
Requires-Dist: jinja2; extra == 'test'
Requires-Dist: kubernetes-asyncio; extra == 'test'
Requires-Dist: pluggy; extra == 'test'
Requires-Dist: psycopg2-binary; extra == 'test'
Requires-Dist: pytest; extra == 'test'
Requires-Dist: pytest-asyncio; extra == 'test'
Requires-Dist: pytest-cov; extra == 'test'
Requires-Dist: sqlalchemy[asyncio]; extra == 'test'
Requires-Dist: uvicorn[standard]; extra == 'test'
Description-Content-Type: text/markdown

![](https://img.shields.io/github/license/wh1isper/moriarty)
![](https://img.shields.io/github/v/release/wh1isper/moriarty)
![](https://img.shields.io/docker/image-size/wh1isper/moriarty)
![](https://img.shields.io/pypi/dm/moriarty)
![](https://img.shields.io/github/last-commit/wh1isper/moriarty)
![](https://img.shields.io/pypi/pyversions/moriarty)
[![codecov](https://codecov.io/gh/Wh1isper/moriarty/graph/badge.svg?token=NKHSM0W8L5)](https://codecov.io/gh/Wh1isper/moriarty)

# moriarty

Moriarty is a set of components for building asynchronous inference cluster.

Relying on cloud vendors or self-built global queue services, asynchronous inference clusters can be built without exposing ports to the public.

## Why asynchronous inference, why moriarty?

- Preventing client timeout.
- Avoid HTTP disconnection due to network or other issues.
- Reducing HTTP queries with queues.
- Deploy on Multi/Hybrid/Private cloud, even on bare metal.

## Alternatives

This project came from my deep use of [Asynchronous Inferenc for AWS Sagemaker](https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference.html), and as far as I know, only AWS and Aliyun provide asynchronous inference support.

For open source projects, there are many deployment solutions, but most of them are synchronous inference (based on HTTP or RPC).I don't find any alternative for async inference. Maybe Kubeflow pipeline can be used for asynchronous inference. But without serving support(Leave model in GPU as a service, not load per job), there is a significant overhead of GPU memory cache and model load time.

## Architecture Overview

![Architecture Overview](./assets/Architecture.png)

Key Components:

- Matrix: single producer, multiple consumers. `Connector` as producer, provide HTTP API for _Backend Service_ and push invoke request to the global **Job Queue**. `Operator` as consumer, pull tasks from the **Job Queue** and push them to local queue. Pulling or not depends on the load of inference cluster. And also, `Operator` will autoscale inference container if needed.
- Endpoint: Deploy a function as an HTTP service.
- Sidecar: Proxy and transform queue message into HTTP request.
- Init: Init script for inference container

CLIs:

- `moriarty-matrix`: Manager matrix components
- `moriarty-operator`: Start the operator component
- `moriarty-connector`: Start the connector component
- `moriarty-sidecar`: Start the sidecar component
- `moriarty-deploy`: Request `operator`'s API or database for deploy inference endpoint.

## Install

`pip install moriarty[matrix]` for all components.

Or use docker image

`docker pull wh1isper/moriarty` or `docker pull ghcr.io/wh1isper/moriarty`

> `docker pull wh1isper/moriarty:dev` for developing version

## Develop

Install pre-commit before commit

```
pip install pre-commit
pre-commit install
```

Install package locally with test dependencies

```
pip install -e .[test]
```

Run tests with pytest

```
pytest -v tests/
```
