Metadata-Version: 2.1
Name: tensor-parallel
Version: 1.0.19
Summary: Automatically shard your large model between multiple GPUs, works without torch.distributed
Home-page: https://github.com/BlackSamorez/tensor_parallel
Author: Andrei Panferov and Yaroslav Lisnyak
Author-email: yalisnyak@nes.com
Project-URL: Bug Tracker, https://github.com/BlackSamorez/tensor_parallel/issues
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENCE
Requires-Dist: torch (>=1.11)
Requires-Dist: transformers (>=4.25.1)
Provides-Extra: dev
Requires-Dist: pytest (==6.2.5) ; extra == 'dev'
Requires-Dist: pytest-forked ; extra == 'dev'
Requires-Dist: pytest-asyncio (==0.16.0) ; extra == 'dev'
Requires-Dist: accelerate (==0.15.0) ; extra == 'dev'
Requires-Dist: black (==22.3.0) ; extra == 'dev'
Requires-Dist: isort (==5.10.1) ; extra == 'dev'
Requires-Dist: psutil ; extra == 'dev'

# tensor_parallel

Run your PyTorch model on multiple GPUs from basic python

```python
import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration

from tensor_parallel import tensor_parallel # <- interface for automatic optimal backend selection

tokenizer = T5Tokenizer.from_pretrained("t5-small")
model = T5ForConditionalGeneration.from_pretrained("t5-small")

model = tensor_parallel(model, ["cuda:0", "cuda:1"]) # <- magic happens here
# only half of the model is placed on each GPU reducing memory footprint twofold

inputs = tokenizer("Translate from German to English: How are you?", return_tensors="pt")["input_ids"].to("cuda:0")
outputs = model.generate(inputs, num_beams=5)
print(tokenizer.decode(outputs[0]))  # Wie sind Sie?
```

## Installation

The recomended way to install this package is to use [pip](https://pypi.org/project/pip/):
```
pip install tensor_parallel
```

### Code style

We use [black](https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html) and [isort](https://pycqa.github.io/isort/) for all pull requests.
Before committing your code, simply run `black . && isort .` and you will be fine.

--------------------------------------------------------------------------------
