Metadata-Version: 2.1
Name: cjm-diffusers-utils
Version: 0.0.1
Summary: Some utility functions I frequently use with 🤗 diffusers.
Home-page: https://github.com/cj-mills/cjm-diffusers-utils
Author: cj-mills
Author-email: millscj.mills2@gmail.com
License: Apache Software License 2.0
Keywords: nbdev jupyter notebook python
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Provides-Extra: dev
License-File: LICENSE

cjm-diffusers-utils
================

<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Install

``` sh
pip install cjm_diffusers_utils
```

## How to use

### pil_to_latent

``` python
from cjm_diffusers_utils.core import pil_to_latent
from PIL import Image  # For working with images
from torchvision import transforms  # PyTorch module for image transformations
# Import diffusers AutoencoderKL
from diffusers import AutoencoderKL
```

``` python
model_name = "stabilityai/stable-diffusion-2-1"
vae = AutoencoderKL.from_pretrained(model_name, subfolder="vae")
```

``` python
img_path = img_path = '../images/cat.jpg'
src_img = Image.open(img_path).convert('RGB')
print(f"Source Image Size: {src_img.size}")

img_latents = pil_to_latent(src_img, vae)
print(f"Latent Dimensions: {img_latents.shape}")
```

    Source Image Size: (768, 512)
    Latent Dimensions: torch.Size([1, 4, 64, 96])

### latent_to_pil

``` python
from cjm_diffusers_utils.core import latent_to_pil
```

``` python
decoded_img = latent_to_pil(img_latents, vae)
print(f"Decoded Image Size: {decoded_img.size}")
```

    Decoded Image Size: (768, 512)

### text_to_emb

``` python
from cjm_diffusers_utils.core import text_to_emb
# Import the `CLIPTextModel`, `CLIPTokenizer`
from transformers import CLIPTextModel, CLIPTokenizer
```

``` python
# Load the tokenizer for the specified model
tokenizer = CLIPTokenizer.from_pretrained(model_name, subfolder="tokenizer")
# Load the text encoder for the specified model
text_encoder = CLIPTextModel.from_pretrained(model_name, subfolder="text_encoder")
```

``` python
prompt = "A cat sitting on the floor."
text_emb = text_to_emb(prompt, tokenizer, text_encoder)
text_emb.shape
```

    torch.Size([2, 77, 1024])

### prepare_noise_scheduler

``` python
from cjm_diffusers_utils.core import prepare_noise_scheduler
from diffusers import DDIMScheduler
```

``` python
noise_scheduler = DDIMScheduler.from_pretrained(model_name, subfolder='scheduler')
print(f"Number of timesteps: {len(noise_scheduler.timesteps)}")
print(noise_scheduler.timesteps[:10])

noise_scheduler = prepare_noise_scheduler(noise_scheduler, 70, 1.0)
print(f"Number of timesteps: {len(noise_scheduler.timesteps)}")
print(noise_scheduler.timesteps[:10])
```

    Number of timesteps: 1000
    tensor([999, 998, 997, 996, 995, 994, 993, 992, 991, 990])
    Number of timesteps: 70
    tensor([967, 953, 939, 925, 911, 897, 883, 869, 855, 841])

### prepare_depth_mask

``` python
from cjm_diffusers_utils.core import prepare_depth_mask
```

``` python
depth_map_path = '../images/depth-cat.png'
depth_map = Image.open(depth_map_path)
print(f"Depth map size: {depth_map.size}")

depth_mask = prepare_depth_mask(depth_map)
depth_mask.shape, depth_mask.min(), depth_mask.max()
```

    Depth map size: (768, 512)

    (torch.Size([1, 1, 64, 96]), tensor(-1.), tensor(1.))
