Metadata-Version: 2.1
Name: crawlib
Version: 0.1.0
Summary: tool set for crawler project.
Home-page: https://github.com/MacHu-GWU/
Author: Sanhe Hu
Author-email: husanhe@gmail.com
Maintainer: Unknown
License: MIT
Download-URL: https://pypi.python.org/pypi/crawlib/0.1.0#downloads
Platform: Windows
Platform: MacOS
Platform: Unix
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: MacOS
Classifier: Operating System :: Unix
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Requires-Dist: requests
Requires-Dist: requests-html
Requires-Dist: beautifulsoup4
Requires-Dist: mongoengine-mate (==0.0.5)
Requires-Dist: pymongo-mate (==0.0.4)
Requires-Dist: rolex (==0.0.8)
Requires-Dist: constant2 (==0.0.13)
Requires-Dist: attrs-mate (==0.0.5)
Requires-Dist: diskcache (==4.1.0)
Requires-Dist: atomicwrites (==1.3.0)
Requires-Dist: loggerFactory (==0.0.5)
Provides-Extra: docs
Requires-Dist: sphinx (==1.8.1) ; extra == 'docs'
Requires-Dist: sphinx-rtd-theme ; extra == 'docs'
Requires-Dist: sphinx-jinja ; extra == 'docs'
Requires-Dist: sphinx-copybutton ; extra == 'docs'
Requires-Dist: docfly (>=0.0.17) ; extra == 'docs'
Requires-Dist: rstobj (>=0.0.5) ; extra == 'docs'
Requires-Dist: pygments ; extra == 'docs'
Provides-Extra: tests
Requires-Dist: pytest (==3.2.3) ; extra == 'tests'
Requires-Dist: pytest-cov (==2.5.1) ; extra == 'tests'
Requires-Dist: pathlib-mate ; extra == 'tests'
Requires-Dist: flask ; extra == 'tests'
Requires-Dist: scrapy ; extra == 'tests'


.. image:: https://readthedocs.org/projects/crawlib/badge/?version=latest
    :target: https://crawlib.readthedocs.io/index.html
    :alt: Documentation Status

.. image:: https://travis-ci.org/MacHu-GWU/crawlib-project.svg?branch=master
    :target: https://travis-ci.org/MacHu-GWU/crawlib-project?branch=master

.. image:: https://codecov.io/gh/MacHu-GWU/crawlib-project/branch/master/graph/badge.svg
  :target: https://codecov.io/gh/MacHu-GWU/crawlib-project

.. image:: https://img.shields.io/pypi/v/crawlib.svg
    :target: https://pypi.python.org/pypi/crawlib

.. image:: https://img.shields.io/pypi/l/crawlib.svg
    :target: https://pypi.python.org/pypi/crawlib

.. image:: https://img.shields.io/pypi/pyversions/crawlib.svg
    :target: https://pypi.python.org/pypi/crawlib

.. image:: https://img.shields.io/badge/STAR_Me_on_GitHub!--None.svg?style=social
    :target: https://github.com/MacHu-GWU/crawlib-project

------


.. image:: https://img.shields.io/badge/Link-Document-blue.svg
      :target: https://crawlib.readthedocs.io/index.html

.. image:: https://img.shields.io/badge/Link-API-blue.svg
      :target: https://crawlib.readthedocs.io/py-modindex.html

.. image:: https://img.shields.io/badge/Link-Source_Code-blue.svg
      :target: https://crawlib.readthedocs.io/py-modindex.html

.. image:: https://img.shields.io/badge/Link-Install-blue.svg
      :target: `install`_

.. image:: https://img.shields.io/badge/Link-GitHub-blue.svg
      :target: https://github.com/MacHu-GWU/crawlib-project

.. image:: https://img.shields.io/badge/Link-Submit_Issue-blue.svg
      :target: https://github.com/MacHu-GWU/crawlib-project/issues

.. image:: https://img.shields.io/badge/Link-Request_Feature-blue.svg
      :target: https://github.com/MacHu-GWU/crawlib-project/issues

.. image:: https://img.shields.io/badge/Link-Download-blue.svg
      :target: https://pypi.org/pypi/crawlib#files


Welcome to ``crawlib`` Documentation
==============================================================================

``crawlib`` is a board-first-search crawler framework for targeting-crawler (For those you know where's your data located and how's been organized). You just need to focus on the data model and html extraction logic, and let the framework do the rest of things like:

- duplicate filter
- recursive crawling
- status tracking
- periodical update

Currently it supports mongodb as backend storage only.


.. _install:

Install
------------------------------------------------------------------------------

``crawlib`` is released on PyPI, so all you need is:

.. code-block:: console

    $ pip install crawlib

To upgrade to latest version:

.. code-block:: console

    $ pip install --upgrade crawlib

