Metadata-Version: 2.0
Name: scrapelib
Version: 1.2.0
Summary: a library for scraping things
Home-page: http://github.com/jamesturk/scrapelib
Author: James Turk
Author-email: dev@jamesturk.net
License: BSD
Platform: any
Classifier: Development Status :: 6 - Mature
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Provides-Extra: dev
Requires-Dist: requests[security] (>=2)
Provides-Extra: dev
Requires-Dist: coveralls; extra == 'dev'
Requires-Dist: flake8; extra == 'dev'
Requires-Dist: mock; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: sphinx; extra == 'dev'
Requires-Dist: sphinx-rtd-theme; extra == 'dev'

=========
scrapelib
=========

.. image:: https://travis-ci.org/jamesturk/scrapelib.svg?branch=master
    :target: https://travis-ci.org/jamesturk/scrapelib

.. image:: https://coveralls.io/repos/jamesturk/scrapelib/badge.png?branch=master
    :target: https://coveralls.io/r/jamesturk/scrapelib

.. image:: https://img.shields.io/pypi/v/scrapelib.svg
    :target: https://pypi.python.org/pypi/scrapelib

.. image:: https://readthedocs.org/projects/scrapelib/badge/?version=latest
    :target: https://readthedocs.org/projects/scrapelib/?badge=latest
    :alt: Documentation Status

scrapelib is a library for making requests to less-than-reliable websites, it is implemented
(as of 0.7) as a wrapper around `requests <http://python-requests.org>`_.

scrapelib originated as part of the `Open States <http://openstates.org/>`_
project to scrape the websites of all 50 state legislatures and as a result
was therefore designed with features desirable when dealing with sites that
have intermittent errors or require rate-limiting.

Advantages of using scrapelib over alternatives like httplib2 simply using
requests as-is:

* All of the power of the suberb `requests <http://python-requests.org>`_ library.
* HTTP, HTTPS, and FTP requests via an identical API
* support for simple caching with pluggable cache backends
* request throttling
* configurable retries for non-permanent site failures

Written by James Turk <dev@jamesturk.net>, thanks to Michael Stephens for
initial urllib2/httplib2 version

See https://github.com/jamesturk/scrapelib/graphs/contributors for contributors.

Requirements
============

* python 2.7, >=3.3
* requests >= 2.0 (earlier versions may work but aren't tested)


Example Usage
=============

Documentation: http://scrapelib.readthedocs.org/en/latest/

::

  import scrapelib
  s = scrapelib.Scraper(requests_per_minute=10)

  # Grab Google front page
  s.get('http://google.com')

  # Will be throttled to 10 HTTP requests per minute
  while True:
      s.get('http://example.com')


