Metadata-Version: 2.1
Name: pgsocr
Version: 0.1.4
Summary: A command line utility for converting Blu-ray subs to SRT or ASS using AI Language Models.
Author-Email: pcpeasant <pcpeasant25@gmail.com>
License: MIT
Requires-Python: >=3.11
Requires-Dist: pillow>=10.4.0
Requires-Dist: tesserocr>=2.7.0
Requires-Dist: numpy>=2.0.0
Requires-Dist: tqdm>=4.66.4
Requires-Dist: torch>=2.3.1; extra == "lm"
Requires-Dist: torchvision>=0.18.1; extra == "lm"
Requires-Dist: torchaudio>=2.3.1; extra == "lm"
Requires-Dist: transformers>=4.41.2; extra == "lm"
Requires-Dist: einops>=0.8.0; extra == "lm"
Requires-Dist: timm>=1.0.7; extra == "lm"
Requires-Dist: bitsandbytes>=0.43.3; extra == "lm"
Provides-Extra: lm
Description-Content-Type: text/markdown

# pgsocr
Convert Blu-Ray SUP subtitles to SRT or ASS using AI Language Models or Tesseract.

### Prerequisites

If planning on using Tesseract, see: https://tesseract-ocr.github.io/tessdoc/Installation.html \
Make sure to install all the required language packs and note down the location of the 'tessdata' directory. \
Make sure to set the TESSDATA_PREFIX environment variable to the location of the 'tessdata' directory from the previous step.

### Installation

Download the latest .whl from the Releases tab and install using pip. \
Make sure to install the [lm] extras if you want to use AI models.

### Usage:

    Options:
    -i: Specify the path to the SUP file or (batch mode) directory.
    -o: Specify the path to the output directory.
    -m: Specify the OCR engine to use (florence2 or tesseract or minicpmv).
    -l: (Only if using Tesseract) Specify the list of languages to use separated by spaces. Defaults to English.
    -b: (Only if using Tesseract) Specify a custom character blacklist for Tesseract. Enter an empty string to turn off the default blacklist.
    -f: Specify the output format (SRT or ASS). ASS output also has support for subtitle positioning.

    Note: The AI models are more accurate than Tesseract but far more resource heavy. A recent GPU with a large amount of VRAM is recommended.

    Examples:
    # Single file
    pgsocr -i /path/to/file -o path/to/outputdir -m tesseract -l eng jpn

    # Multiple files in a directory
    pgsocr -i /path/to/inputdir -o /path/to/outputdir -m florence2
