Metadata-Version: 2.1
Name: pdfreader
Version: 0.1.14
Summary: Pythonic API for parsing PDF files
Home-page: http://github.com/maxpmaxp/pdfreader
Author: Maksym Polshcha
Author-email: maxp@sterch.net
Maintainer: Maksym Polshcha
Maintainer-email: maxp@sterch.net
License: MIT Licence
Keywords: pdf,pdfreader,pdfparser,adobe,parser,cmap
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: POSIX
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.4
Description-Content-Type: text/x-rst
License-File: LICENSE
Requires-Dist: bitarray >=1.1.0
Requires-Dist: pillow >=7.1.0
Requires-Dist: pycryptodome >=3.9.9
Requires-Dist: python-dateutil >=2.8.1

=========
pdfreader
=========
:Info: See `the tutorials & documentation <https://pdfreader.readthedocs.io>`_ for more information.
:Author & Maintainer: Maksym Polshcha <maxp@sterch.net>

See `GitHub <https://github.com/maxpmaxp/pdfreader>`_ for the latest source.

About
=====

*pdfreader* is a Pythonic API for:
    * extracting texts, images and other data from PDF documents (plain or protected)
    * accessing different objects within PDF documents


*pdfreader* is **NOT** a tool (maybe one day it become!):
    * to create or update PDF files
    * to split PDF files into pages or other pieces
    * convert PDFs to any other format

Nevertheless it can be used as a part of such tools.

See `Tutorials & Documentation <https://pdfreader.readthedocs.io>`_.

Features
========

* Extracts texts (plain text and formatted text objects)
* Extract PDF forms data (pure strings and formatted text objects)
* Supports all PDF encodings, CMap, predefined cmaps.
* Extracts images and image masks as `Pillow/PIL Images <https://pillow.readthedocs.io/en/stable/reference/Image.html>`_
* Supports encrypted and password-protected PDF documents
* Allows browse any document objects, resources and extract any data you need (fonts, annotations, metadata, multimedia, etc.)
* Follows `PDF-1.7 specification <https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf>`_
* Lazy objects access allows to process huge PDF documents quite fast

Installation
============

*pdfreader* can be installed with `pip <http://pypi.python.org/pypi/pip>`_::

  $ python -m pip install pdfreader

Or ``easy_install`` from
`setuptools <http://pypi.python.org/pypi/setuptools>`_::

  $ python -m easy_install pdfreader

You can also download the project source and do::

  $ python setup.py install


Tutorial and Documentation
===========================

`Tutorial, real-life examples and documentation <https://pdfreader.readthedocs.io>`_


Support, Bugs & Feature Requests
============================================

*pdfreader* uses `GitHub issues <https://github.com/maxpmaxp/pdfreader/issues>`_ to keep track of bugs,
feature requests, etc.


Related Projects
================

* `pdfminer <https://github.com/euske/pdfminer>`_ 
* `pyPdf2 <https://github.com/py-pdf/PyPDF2>`_
* `xpdf <http://www.foolabs.com/xpdf/>`_
* `pdfbox <http://pdfbox.apache.org/>`_
* `mupdf <http://mupdf.com/>`_


References
==========

* `Document management - Potable document format - PDF 1.7 <https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf>`_
* `Adobe CMap and CIDFont Files Specification <https://www.adobe.com/content/dam/acom/en/devnet/font/pdfs/5014.CIDFont_Spec.pdf>`_
* `PostScript Language Reference Manual <https://www-cdf.fnal.gov/offline/PostScript/PLRM2.pdf>`_
* `Adobe CMap resources <https://github.com/adobe-type-tools/cmap-resources>`_
* `Adobe glyph list specification (AGL) <https://github.com/adobe-type-tools/agl-specification>`_


Donation
========
If this project is helpful, you can treat me to coffee :-)

.. image:: https://www.paypalobjects.com/en_US/i/btn/btn_donateCC_LG.gif
   :target: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=VMVFZSDHDFVK6&item_name=PDFReader+support&currency_code=USD&source=url
