Metadata-Version: 2.1
Name: pyDHTMLParser
Version: 2.2.3
Summary: Python HTML/XML parser for easy web scraping.
Home-page: https://github.com/Bystroushaak/pyDHTMLParser
Author: Bystroushaak
Author-email: bystrousak@kitakitsune.org
License: MIT
Platform: UNKNOWN
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Markup :: HTML
Classifier: Topic :: Text Processing :: Markup :: XML
Provides-Extra: docs
Requires-Dist: sphinx ; extra == 'docs'
Requires-Dist: sphinxcontrib-napoleon ; extra == 'docs'
Provides-Extra: test
Requires-Dist: pytest ; extra == 'test'
Requires-Dist: pytest-cov ; extra == 'test'


.. image:: https://badge.fury.io/py/pyDHTMLParser.png
    :target: https://pypi.python.org/pypi/pyDHTMLParser

.. image:: https://img.shields.io/pypi/dm/pyDHTMLParser.svg
    :target: https://pypi.python.org/pypi/pyDHTMLParser

.. image:: https://readthedocs.org/projects/pyDHTMLParser/badge/?version=latest
    :target: http://pyDHTMLParser.readthedocs.org/

.. image:: https://img.shields.io/github/issues/Bystroushaak/pyDHTMLParser.svg
    :target: https://github.com/Bystroushaak/pyDHTMLParser/issues

.. image:: https://img.shields.io/pypi/l/pyDHTMLParser.svg

What is it?
===========
DHTMLParser is a lightweight HTML/XML parser created for one purpose - quick and easy 
picking selected tags from DOM.

It can be very useful when you are in need to write own "guerilla" API for some webpage, or a scrapper.

If you want, you can also create HTML/XML documents more easily than by joining strings.

Documentation
=============

Full module documentation can be found here: http://pyDHTMLParser.rtfd.org

Changelog
=========

2.2.3
-----
    - 2020-04-12 Fix by #25 (thx https://github.com/fm4d).

2.2.2
-----
    - Attempt to fix strange recursive inheritance problem.

2.2.0
-----
    - Rewritten for compatibility with python3.

2.1.0 - 2.1.8
-------------
    - State parser fixed - it can now recover from invalid html like ``<invalid tag=something">``.
    - Rewritten to use ``StateEnum`` in parser for better readability.
    - Garbage collector is now disabled during _raw_split().
    - Fixed #16 - recovery after tags which don't ends with ``>`` (``</code`` for example).
    - Closed #17 - implementation of ignoring of ``<`` in usage as `is smaller than` sign.
    - Restored support of multiline attributes.
    - ``.parseString()`` now doesn't try to parse HTML element parameters.
    - Implemented ``first()`` getter.
    - License changed to MIT.
    - Fixed #18: bug which in some cases caused invalid output.
    - Added HTMLElement.__repr__().
    - Added test_coverage.sh.
    - Added extended test_equality() coverage.
    - Formatting improvements.
    - Improved constructor handling, which is now much more readable.
    - Updated formatting of the setup.py.
    - Added more tests.
    - Fixed #22; bug in the SpecialDict.
    - Fixed some nasty unicode problems.
    - Fixed python 2 / 3 problem in docs/__init__.py.
    - getVersion() -> get_version().

2.0.10
------
    - Added more tests of removeTags().
    - run_tests.sh now gets arguments.
    - Check for string in removeTags() changed to basestring from str.

2.0.6 - 2.0.9
-------------
    - Fixed behaviour of toString() and tagToString().
    - SpecialDict is now derived from OrderedDict.
    - Changed and added tests of .params attribute (OrderedDict is now used).
    - Fixed bug in _repair_tags().
    - Removed _repair_tags() - it wasn't really necessary.
    - Fixed nasty bug which *could* cause invalid XML output.

2.0.1 - 2.0.5
-------------
    - Fixed bugs in ``.match()``.
    - Fixed broken links in documentation.
    - Fixed bugs in ``.isAlmostEqual()``.
    - ``.find()``; Fixed bug which prevented tag_name to be None.
    - Added op ``.__eq__()`` to the `SpecialDict`.
    - Added new method ``.containsParamSubset()`` to ``HTMLElement``.

2.0.0
-----
    - Rewritten, refactored, splitted to multiple files.
    - Added unittest coverage of almost 100% of the code.
    - Added better selector methods (``.wfind()``, ``.match``)
    - Added Sphinx documentation.
    - Fixed a lot of bugs.


