Metadata-Version: 2.1
Name: image_text_reader
Version: 0.1.2
Summary: A library to read text from images
Home-page: https://github.com/yourusername/image_text_reader
Author: barmin15
Author-email: bokora@lauderalumni.hu
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pytesseract
Requires-Dist: Pillow

# Image Text Reader

The `image-text-reader` library allows you to extract text from images using Optical Character Recognition (OCR) with the help of the `pytesseract` library and `Pillow` for image processing.

## Table of Contents

- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [Usage](#usage)
- [Code Explanation](#code-explanation)
- [Contributing](#contributing)
- [License](#license)

## Prerequisites

- Python 3.x
- Tesseract-OCR

## Installation

1. **Install the required Python libraries:**

    ```bash
    pip install image-text-reader
    ```

2. **Install Tesseract-OCR:**
    - **Windows:** Download and install from [here](https://github.com/UB-Mannheim/tesseract/wiki).
    - **macOS:** Use Homebrew to install:

      ```bash
      brew install tesseract
      ```

    - **Linux:** Use your package manager, for example:

      ```bash
      sudo apt-get install tesseract-ocr
      ```

## Usage

1. **Create a Python script** (e.g., `test_script.py`) and import the `ocr_image` function from the `image_text_reader` library:

    ```python
    from image_text_reader import ocr_image
    ```

2. **Set the path to your image and Tesseract-OCR executable:**

    ```python
    # Update these paths for your system
    image_path = 'C:/path_to_your_image.jpg'  # Replace with the path to your test image
    tesseract_cmd = 'C:/Program Files/Tesseract-OCR/tesseract.exe'  # Path to Tesseract executable

    extracted_text = ocr_image(image_path, tesseract_cmd=tesseract_cmd)
    print("Extracted Text:")
    print(extracted_text)
    ```

3. **Run your script:**

    ```bash
    python test_script.py
    ```

## Code Explanation

- **Preprocessing Function:**

    The `preprocess_image` function prepares the image for OCR by converting it to grayscale, sharpening it, and enhancing its contrast:

    ```python
    def preprocess_image(image_path):
        image = Image.open(image_path).convert('L')
        image = image.filter(ImageFilter.SHARPEN)
        enhancer = ImageEnhance.Contrast(image)
        image = enhancer.enhance(2)
        return image
    ```

- **OCR Function:**

    The `ocr_image` function processes the image and then extracts the text using `pytesseract`:

    ```python
    def ocr_image(image_path, tesseract_cmd=None):
        if tesseract_cmd:
            pytesseract.pytesseract.tesseract_cmd = tesseract_cmd
        image = preprocess_image(image_path)
        text = pytesseract.image_to_string(image, lang='eng')
        return text
    ```

## Contributing

Contributions are welcome! Please open an issue or submit a pull request for any changes.

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.

For more information, visit the [image-text-reader library page on PyPI](https://pypi.org/project/image-text-reader/).
