Metadata-Version: 2.0
Name: requests-html
Version: 0.1.1
Summary: HTML Parsing for Humans.
Home-page: https://github.com/requests/requests
Author: Kenneth Reitz
Author-email: me@kennethreitz.org
License: MIT
Description-Content-Type: UNKNOWN
Platform: UNKNOWN
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Requires-Dist: requests
Requires-Dist: pyquery
Requires-Dist: html2text
Requires-Dist: fake-useragent
Requires-Dist: parse
Requires-Dist: bs4


Requests-HTML: HTML Parsing for Humans™
=======================================

This library intends to make parsing HTML (e.g. scraping the web) as
simple and intuitive as possible.

When using this library you automatically get:

- CSS Selectors (a.k.a jQuery-style, thanks to PyQuery).
- XPath Selectors, for the faint at heart.
- Mocked user-agent (like a real web browser).
- Automatic following of redirects.
- Connection–pooling and cookie persistience.
- The Requests experience you know and love, with magic parsing abilities.

Other nice features include:

- Markdown export of pages and elements.


Usage
=====

Make a GET request to 'python.org', using Requests:

.. code-block:: pycon

    >>> from requests_html import session
    >>> r = session.get('https://python.org/')

Grab a list of all links on the page, as–is (anchors excluded):

.. code-block:: pycon

    >>> r.html.links
    {'/users/membership/', '/about/gettingstarted/', 'http://feedproxy.google.com/~r/PythonInsider/~3/zVC80sq9s00/python-364-is-now-available.html', '/about/success/', 'http://flask.pocoo.org/', 'http://www.djangoproject.com/', '/blogs/', ... '/psf-landing/', 'https://wiki.python.org/moin/PythonBooks'}

Grab a list of all links on the page, in absolute form (anchors excluded):

.. code-block:: pycon

    >>> r.html.absolute_links
    {'http://feedproxy.google.com/~r/PythonInsider/~3/zVC80sq9s00/python-364-is-now-available.html', 'https://www.python.org/downloads/mac-osx/', 'http://flask.pocoo.org/', 'https://www.python.org//docs.python.org/3/tutorial/', 'http://www.djangoproject.com/', 'https://wiki.python.org/moin/BeginnersGuide', 'https://www.python.org//docs.python.org/3/tutorial/controlflow.html#defining-functions', 'https://www.python.org/about/success/', 'http://twitter.com/ThePSF', 'https://www.python.org/events/python-user-group/634/', ..., 'https://wiki.python.org/moin/PythonBooks'}

Select an element with a jQuery selector.

.. code-block:: pycon

    >>> about = r.html.find('#about', first=True)

Grab an element's text contents:

.. code-block:: pycon

    >>> print(about.text)
    About
    Applications
    Quotes
    Getting Started
    Help
    Python Brochure

Introspect an Element's attributes:

.. code-block:: pycon

    >>> about.attrs
    {'id': 'about', 'class': 'tier-1 element-1  ', 'aria-haspopup': 'true'}

Select Elements within Elements:

.. code-block:: pycon

    >>> about.find('a')
    [<Element 'a' href='/about/' title='' class=''>, <Element 'a' href='/about/apps/' title=''>, <Element 'a' href='/about/quotes/' title=''>, <Element 'a' href='/about/gettingstarted/' title=''>, <Element 'a' href='/about/help/' title=''>, <Element 'a' href='http://brochure.getpython.info/' title=''>]

Render an Element as Markdown:

.. code-block:: pycon

    >>> print(about.markdown)

    * [About](/about/)

      * [Applications](/about/apps/)
      * [Quotes](/about/quotes/)
      * [Getting Started](/about/gettingstarted/)
      * [Help](/about/help/)
      * [Python Brochure](http://brochure.getpython.info/)

Search for text on the page:

.. code-block:: pycon

    >>> r.html.search('Python is a {} language')[0]
    programming

More complex CSS Selector example (copied from Chrome dev tools):

.. code-block:: pycon

    >>> r = session.get('https://github.com/')
    >>> sel = 'body > div.application-main > div.jumbotron.jumbotron-codelines > div > div > div.col-md-7.text-center.text-md-left > p'

    >>> print(r.html.find(sel)[0].text)
    GitHub is a development platform inspired by the way you work. From open source to business, you can host and review code, manage projects, and build software alongside millions of other developers.

XPath is also supported:

.. code-block:: pycon

   >>> r.html.xpath('a')
   [<Element 'a' class='btn' href='https://help.github.com/articles/supported-browsers'>]

Installation
============

.. code-block:: shell

    $ pipenv install requests-html
    ✨🍰✨



