Metadata-Version: 2.1
Name: document_image_utils
Version: 0.1.18.6.2
Summary: Toolkit for document image processing
Author: Gonçalo Afonso
Author-email: brazafonso2001@gmail.com
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: opencv-python
Requires-Dist: scipy
Description-Content-Type: text/markdown

# Contextualization

In progress toolkit for document image pre processing.

Aimed for images to be OCRed.

# Main Available methods

- Auto rotate image

    Uses left margin of a document to calculate the angle of rotation present, and correct it accordingly.

    Can be given the rotation direction (clocwise or counter_clockwise), or in auto mode tries to determine the side to which the document is tilted (can be none, in which case image won't be rotated).

- Calculate rotation direction

    Calculates rotation direction of an image by finding the biggest sets of the first black pixels appearances (with outliers removed) in the image for each direction: clockwise, counter_clockwise and none.

    For none direction, the set is created based on pixels with same 'x' coordinate that with less than a 5% height difference, relative to the image's height.

- Binarize document

- Split document into columns

    Analyzes document image pixel color frequency and split document image into columns.

- Auto crop document

    Analyzes document image pixel color frequency and cut document margins, aiming mostly to remove possible folds in the corners.

- Identify document images
    Identify document images in image, using algorithm available in leptonica's repository that finds potential image masks.

- Get document delimiters
    Get document delimiters, using image transformations.

- Segment document
    Segments document image into header, body and footer, using delimiters. Only the body is always guaranteed to have a value.

# Bash commands:\n

- binarize : binarize document image.

- rotate_document : rotate document image.

- split_columns : split document into column images.

- d_auto_crop : auto crop document image.
