Metadata-Version: 1.2
Name: speedparser3
Version: 0.3.1
Summary: Python3 version of speedparser https://github.com/jmoiron/speedparser
Home-page: https://github.com/nikegp/speedparser3
Author: Nike Gurin-Petrovych
Author-email: nike.gurin@gmail.com
License: MIT
Description: speedparser3
        ------------
        
        Speedparser3 is a Python 3.5+ version of the original Speedparser by https://github.com/jmoiron/speedparser.
        
        speedparser
        -----------
        
        Speedparser is a black-box "style" reimplementation of the `Universal Feed
        Parser <http://code.google.com/p/feedparser/>`_.  It uses some feedparser code
        for date and authors, but mostly re-implements its data normalization algorithms
        based on feedparser output.  It uses ``lxml`` for feed parsing and for optional
        HTML cleaning.  Its compatibility with ``feedparser`` is very good for a strict
        subset of fields, but poor for fields outside that subset.  See
        ``tests/speedparsertests.py`` for more information on which fields are more or
        less compatible and which are not.
        
        On an Intel(R) Core(TM) i5 750, running only on one core, ``feedparser`` managed
        ``2.5 feeds/sec`` on the test feed set (roughly 4200 "feeds" in 
        ``tests/feeds.tar.bz2``), while ``speedparser`` manages around ``65 feeds/sec``
        with HTML cleaning on and ``200 feeds/sec`` with cleaning off.
        
        installing
        ----------
        
        ``pip3 install speedparser3``
        
        usage
        -----
        
        Usage is similar to feedparser::
        
            >>> import speedparser3
            >>> result = speedparser3.parse(feed)
            >>> result = speedparser3.parse(feed, clean_html=False)
        
        differences
        -----------
        
        There are a few interface differences and many result differences between
        speedparser3 and feedparser.  The biggest similarity is that they both return
        a ``FeedParserDict()`` object (with keys accessible as attributes), they both
        set the ``bozo`` key when an error is encountered, and various aspects of the
        ``feed`` and ``entries`` keys are likely to be identical *or* very similar.
        
        ``speedparser3`` uses different (and in some cases less or none; buyer beware)
        data cleaning algorithms than ``feedparser``.  When it is enabled, lxml's
        ``html.cleaner`` library will be used to clean HTML and give similar but not
        identical protection against various attributes and elements.  If you supply
        your own ``Cleaner`` element to the "``clean_html`` kwarg, it will be used
        by ``speedparser3`` to clean the various attributes of the feed and entries.
        
        ``speedparser3`` does not attempt to fix character encoding by default because
        this processing can take a long time for large feeds.  If the encoding value of
        the feed is wrong, or if you want this extra level of error tollerance, you
        can either use the ``chardet`` module to detect the encoding based on the
        document or pass ``encoding=True`` to ``speedparser3.parse`` and it will fall
        back to encoding detection if it encounters encoding errors.
        
        If your application is using ``feedparser`` to consume many feeds at once and
        CPU is becoming a bottleneck, you might want to try out ``speedparser3`` as an
        alternative (using ``feedparser`` as a backup).  If you are writing an
        application that does not ingest many feeds, or where CPU is not a problem,
        you should use ``feedparser`` as it is flexible with bad or malformed data and
        has a much better test suite.
        
        
        
Keywords: feedparser speedparser rss atom rdf lxml python3
Platform: UNKNOWN
Classifier: License :: OSI Approved :: MIT License
Classifier: Intended Audience :: Developers
Classifier: Operating System :: POSIX
Classifier: Development Status :: 4 - Beta
Requires-Python: >=3.5
