Metadata-Version: 2.1
Name: DocumentAI_std
Version: 0.3.0.dev1
Summary: The main standards for Latis Document AI project
Home-page: https://github.com/LATIS-DocumentAI-Group/DocumentAI-std
Author: Hamza Gbada
Author-email: hamza.gbada@gmail.com
License: MIT
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Requires-Python: >=3.11, <3.13
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pytest==7.4.4
Requires-Dist: easyocr==1.7.1
Requires-Dist: pytesseract==0.3.10
Requires-Dist: numpy<2.0,>=1.26.3
Requires-Dist: paddleocr==2.7.3
Requires-Dist: Pillow==10.3.0
Requires-Dist: PyMuPDF>=1.21.1
Requires-Dist: setuptools
Requires-Dist: paddlepaddle
Requires-Dist: black

# DocumentAI-std

[//]: # ( TODO: Write a well presented documentation)
**DocumentAI-std** is a Python library designed to facilitate and standardize document analysis and processing tasks. It offers functionality for handling document elements, performing optical character recognition (OCR), and managing document datasets.

## Installation

To install **DocumentAI-std**, you can follow these steps:

1. Clone the repository from GitHub:

```sh
pip install DocumentAI-std
```


## Example of Usage

Here's an example demonstrating how to use the `Wildreceipt` dataset:

```python
from DocumentAI_std.datasets import Wildreceipt

# Define train and test sets
train_set = Wildreceipt(
    train=True,
    img_folder="/path/to/train/images/",
    label_path="/path/to/train/annotations.txt",
)
test_set = Wildreceipt(
    train=False,
    img_folder="/path/to/test/images/",
    label_path="/path/to/test/annotations.txt",
)

# Assert the number of data samples in train and test sets
assert len(train_set.data) == 1267
assert len(test_set.data) == 472
```

In the above example:
- We import the `Wildreceipt` dataset from the DocumentAI_std library.
- We create train and test dataset instances, specifying the paths to image folders and annotation files.
- We assert that the number of data samples in the train and test sets matches the expected counts.
