Metadata-Version: 2.0
Name: webstruct
Version: 0.3
Summary: A library for creating statistical NER systems that work on HTML data
Home-page: https://github.com/scrapinghub/webstruct
Author: Mikhail Korobov, Terry Peng
Author-email: kmike84@gmail.com, pengtaoo@gmail.com
License: MIT
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Text Processing :: Linguistic
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Requires: sklearn
Requires: lxml

Webstruct
=========

.. image:: https://travis-ci.org/scrapinghub/webstruct.svg?branch=master
    :target: https://travis-ci.org/scrapinghub/webstruct

Webstruct is a library for creating statistical NER_ systems that work
on HTML data, i.e. a library for building tools that extract named
entities (addresses, organization names, open hours, etc) from webpages.

Unlike most NER systems, webstruct works on HTML data, not only
on text data. This allows to define features that use HTML structure,
and also to embed annotation results back into HTML.

Read the docs_ for more info.

License is MIT.

.. _docs: http://webstruct.readthedocs.org/en/latest/
.. _NER: http://en.wikipedia.org/wiki/Named-entity_recognition

Contributing
------------

* Source code: https://github.com/scrapinghub/webstruct
* Bug tracker: https://github.com/scrapinghub/webstruct/issues

To run tests, make sure nose_ is installed, then run ``runtests.sh`` script.

.. _nose: https://github.com/nose-devs/nose


