Metadata-Version: 2.1
Name: pdfdata
Version: 0.1.3.2
Summary: Extracting text and data from PDFs
Home-page: https://github.com/petermeissner/pdfdata
Author: Peter Meissner
Author-email: retep.meissner@gmail.com
License: MIT
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: PyMuPDF

![Python 3.6, 3.7, 3.8, 3.9](https://github.com/petermeissner/pdfdata/workflows/Python%20package/badge.svg) [![Downloads Total](https://pepy.tech/badge/pdfdata)](https://pepy.tech/project/pdfdata) [![Downloads per Month](https://pepy.tech/badge/pdfdata/month)](https://pepy.tech/project/pdfdata)

# {pdfdata}

Python package for extracting text and data from PDFs. 

# Installation

```shell
pip install pdfdata
```

# Usage

```python
from pdfdata import *
from pprint import pprint


# parse pdf as dictionary
pdf_parsed = parse_pdf('pdfs/0641-20.pdf')
res        = pdf_doc_extract_span_list(pdf_parsed)

pprint(res, depth=3)



# parse pdf as list of spans
pdf_parsed = parse_pdf('pdfs/0641-20.pdf')
res        = pdf_doc_extract_span_df(pdf_parsed)

pprint(res[0])




# transform pdf text to jsonnl
pdf_text_to_jsonnl('pdfs/0641-20.pdf', '0641-20.jsonnl')
```





# DevNotes

**build**

```shell
python -m build
```


**pypi test upload**

```shell
python -m twine upload --repository testpypi dist/* --skip-existing
```


