Metadata-Version: 2.1
Name: gpt-pdf-reader
Version: 1.3
Summary: A Python package that utilizes GPT-4V and other tools to extract and process information from PDF files
Author: Max Hager
Author-email: maxhager28@gmail.com
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: cachetools ==5.3.2
Requires-Dist: certifi ==2023.7.22
Requires-Dist: cffi ==1.16.0
Requires-Dist: charset-normalizer ==3.3.2
Requires-Dist: cryptography ==41.0.5
Requires-Dist: google-api-core ==2.13.0
Requires-Dist: google-auth ==2.23.4
Requires-Dist: google-cloud-core ==2.3.3
Requires-Dist: google-cloud-storage ==2.13.0
Requires-Dist: google-crc32c ==1.5.0
Requires-Dist: google-resumable-media ==2.6.0
Requires-Dist: googleapis-common-protos ==1.61.0
Requires-Dist: idna ==3.4
Requires-Dist: load-dotenv ==0.1.0
Requires-Dist: pdf2image ==1.16.3
Requires-Dist: pdfminer.six ==20221105
Requires-Dist: Pillow ==10.1.0
Requires-Dist: protobuf ==4.25.0
Requires-Dist: pyasn1 ==0.5.0
Requires-Dist: pyasn1-modules ==0.3.0
Requires-Dist: pycparser ==2.21
Requires-Dist: PyMuPDF ==1.23.6
Requires-Dist: PyMuPDFb ==1.23.6
Requires-Dist: pypandoc ==1.12
Requires-Dist: PyPDF2 ==3.0.1
Requires-Dist: python-dotenv ==1.0.0
Requires-Dist: requests ==2.31.0
Requires-Dist: rsa ==4.9
Requires-Dist: urllib3 ==2.0.7

# GPT PDF Reader

GPT PDF Reader is a Python package that utilizes GPT-4V and other tools to extract and process information from PDF files.

## Features

- Extracts figures from PDF files using the `pdffigures2` Scala library.
- Converts PDF pages to images and uploads them to Google Cloud Bucket.
- Utilizes GPT-4V Vision to generate Markdown content from pdf an than inserts image urls into markdown.

## Installation

The installation process requires Java and Scala. The following instructions are for macOS users:

```bash
brew tap AdoptOpenJDK/openjdk
brew install --cask adoptopenjdk11
brew install jenv
echo 'export PATH="$HOME/.jenv/bin:$PATH"' >> ~/.zshrc
echo 'eval "$(jenv init -)"' >> ~/.zshrc
```

After updating your shell configuration, close and reopen your terminal, then set Java 11 as the global version using jenv:

```bash
jenv add /Library/Java/JavaVirtualMachines/adoptopenjdk-11.jdk/Contents/Home/
jenv global 11.0.11
```

Install GPT PDF Reader via pip:

```bash
pip install gptpdfreader
```

Configure the required environment variables in your .env file without spaces or unnecessary quotes:

```env
OPENAI_API_KEY=open_ai_key
GOOGLE_ID=google_project_id
GOOGLE_BUCKET=google_bucket_name
```

## Usage

To process a PDF and generate Markdown content:

```python
from gptpdfreader.reader import main

main('path_to_your_pdf.pdf')
```

This will process the specified PDF and output a Markdown file with the extracted information in the same directory.

## Limitations 

some limitations

## Contributing

We welcome contributions! Please open an issue or submit a pull request on our GitHub repository.

## Support

For questions and support, please open an issue in the GitHub issue tracker.

## License




