Metadata-Version: 2.1
Name: lemonpdf
Version: 2.0rc2
Summary: Python3 library to get urls from PDF files.
Author: zudefoque
Author-email: Juan Bindez <juanbindez780@gmail.com>
License: MIT license
Project-URL: Homepage, https://github.com/juanbindez/lemonpdf
Project-URL: Bug Reports, https://github.com/juanbindez/lemonpdf/issues
Project-URL: Read the Docs, http://lemonpdf.readthedocs.io/
Keywords: PDF,Extractor,cli,tools
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python
Classifier: Topic :: Internet
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Terminals
Classifier: Topic :: Utilities
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Requires-Dist: packaging
Requires-Dist: pdf2image
Requires-Dist: pillow
Requires-Dist: PyMuPDF
Requires-Dist: PyMuPDFb
Requires-Dist: pytesseract

# lemonpdf

![PyPI - Downloads](https://img.shields.io/pypi/dm/lemonpdf)
![PyPI - License](https://img.shields.io/pypi/l/lemonpdf)
![GitHub Tag](https://img.shields.io/github/v/tag/JuanBindez/lemonpdf?include_prereleases)
<a href="https://pypi.org/project/lemonpdf/"><img src="https://img.shields.io/pypi/v/lemonpdf" /></a>

### Python3 library to get urls from PDF files.


### Install
    sudo apt install tesseract-ocr poppler-utils
    pip install lemonpdf

### Quickstart


### Command line interface use (CLI)

#### get urls

    lemonpdf -u file.pdf

#### save urls list in file txt

    lemonpdf -u file.pdf -o urls.txt -s

#### get domains

    lemonpdf -d file.pdf

#### save domains in file txt

    lemonpdf -d file.pdf -o domains.txt -s

### scripts

#### get urls and save file txt

```python

from lemonpdf import Extractor

pdf_path = 'file.pdf'
output_txt_path = 'out_file.txt'

extractor = Extractor(pdf_path=pdf_path, output_txt_path=output_txt_path)

urls = extractor.extract_urls(save=True)

print(urls)


```

#### get domains and save file txt

```python
from lemonpdf import Extractor

pdf_path = 'file.pdf'
output_txt_path = 'domains.txt'

extractor = Extractor(pdf_path=pdf_path, output_txt_path=output_txt_path)

urls = extractor.extract_domains(save=True)

print(urls)


```
