Metadata-Version: 2.1
Name: start-ocr
Version: 0.0.6
Summary: Applying pdfplumber + opencv + pytesseract to extract content and metadata from formal PDF files.
Author-email: Marcelino Veloso III <hi@mv3.dev>
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: python-dotenv>=1.0
Requires-Dist: pdfplumber>=0.10.3
Requires-Dist: opencv-python>=4.9
Requires-Dist: pytesseract>=0.3.10
Requires-Dist: rich>=13.7.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-env>=0.8; extra == "dev"
Requires-Dist: pytest-datadir>=1.5; extra == "dev"
Requires-Dist: pytest-cov>=4.1; extra == "dev"
Requires-Dist: pre-commit>=3.3; extra == "dev"
Requires-Dist: mkdocs>=1.5; extra == "dev"
Requires-Dist: mkdocstrings[python]>=0.22; extra == "dev"
Requires-Dist: mkdocs-material>=9.5; extra == "dev"
Requires-Dist: ipykernel>=6.23; extra == "dev"
Requires-Dist: build>=1.0.3; extra == "dev"
Requires-Dist: twine>=4.0.2; extra == "dev"

# start-ocr

![Github CI](https://github.com/justmars/start-ocr/actions/workflows/ci.yml/badge.svg)

1. Applying pdfplumber + opencv + pytesseract to extract content and metadata from formal PDF files.
2. pdfplumber's `page.extract_text_lines()` is experimental and thus can work or not depending on the pdf file.
3. See [documentation](https://justmars.github.io/start-ocr).

## Installation

```sh
just start
```
