Metadata-Version: 2.1
Name: data_extractor
Version: 0.10.2
Summary: Combine XPath, CSS Selectors and JSONPath for Web data extracting.
License: MIT
Keywords: data-extractor,data-extraction,xpath,css-selectors,jsonpath
Author-email: 林玮 (Jade Lin) <linw1995@icloud.com>
Requires-Python: >=3.7
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Dist: typing-extensions==3.7.4.3
Provides-Extra: cssselect
Requires-Dist: lxml<5,>=4.3; extra == "cssselect"
Requires-Dist: cssselect<2,>=1.0.3; extra == "cssselect"
Provides-Extra: jsonpath-extractor
Requires-Dist: jsonpath-extractor<0.9,>=0.5; extra == "jsonpath-extractor"
Provides-Extra: jsonpath-rw
Requires-Dist: jsonpath-rw<2,>=1.4; extra == "jsonpath-rw"
Provides-Extra: jsonpath-rw-ext
Requires-Dist: jsonpath-rw<2,>=1.4; extra == "jsonpath-rw-ext"
Requires-Dist: jsonpath-rw-ext<2,>=1.2; extra == "jsonpath-rw-ext"
Provides-Extra: lxml
Requires-Dist: lxml<5,>=4.3; extra == "lxml"
Project-URL: documentation, https://data-extractor.readthedocs.io/en/latest/
Project-URL: homepage, https://github.com/linw1995/data_extractor
Project-URL: repository, https://github.com/linw1995/data_extractor
Description-Content-Type: text/x-rst

==============
Data Extractor
==============

|license| |Pypi Status| |Python version| |Package version| |PyPI - Downloads|
|GitHub last commit| |Code style: black| |Build Status| |codecov|
|Documentation Status| |PDM managed|

Combine **XPath**, **CSS Selectors** and **JSONPath** for Web data extracting.

Quickstarts
<<<<<<<<<<<

Installation
~~~~~~~~~~~~

Install the stable version from PYPI.

.. code-block:: shell

    pip install "data-extractor[jsonpath-extractor]"  # for extracting JSON data
    pip install "data-extractor[lxml]"  # for extracting HTML data

Or install the latest version from Github.

.. code-block:: shell

    pip install "data-extractor[jsonpath-extractor] @ git+https://github.com/linw1995/data_extractor.git@master"

Extract JSON data
~~~~~~~~~~~~~~~~~

Currently supports to extract JSON data with below optional dependencies

- jsonpath-extractor_
- jsonpath-rw_
- jsonpath-rw-ext_

.. _jsonpath-extractor: https://github.com/linw1995/jsonpath
.. _jsonpath-rw: https://github.com/kennknowles/python-jsonpath-rw
.. _jsonpath-rw-ext: https://python-jsonpath-rw-ext.readthedocs.org/en/latest/

install one dependency of them to extract JSON data.

Extract HTML(XML) data
~~~~~~~~~~~~~~~~~~~~~~

Currently supports to extract HTML(XML) data with below optional dependencies

- lxml_ for using XPath_
- cssselect_ for using CSS-Selectors_

.. _lxml: https://lxml.de/
.. _XPath: https://www.w3.org/TR/xpath-10/
.. _cssselect: https://cssselect.readthedocs.io/en/latest/
.. _CSS-Selectors: https://www.w3.org/TR/selectors-3/

Usage
~~~~~

.. code-block:: python3

    from data_extractor import Field, Item, JSONExtractor


    class Count(Item):
        followings = Field(JSONExtractor("countFollowings"))
        fans = Field(JSONExtractor("countFans"))


    class User(Item):
        name_ = Field(JSONExtractor("name"), name="name")
        age = Field(JSONExtractor("age"), default=17)
        count = Count()


    assert User(JSONExtractor("data.users[*]"), is_many=True).extract(
        {
            "data": {
                "users": [
                    {
                        "name": "john",
                        "age": 19,
                        "countFollowings": 14,
                        "countFans": 212,
                    },
                    {
                        "name": "jack",
                        "description": "",
                        "countFollowings": 54,
                        "countFans": 312,
                    },
                ]
            }
        }
    ) == [
        {"name": "john", "age": 19, "count": {"followings": 14, "fans": 212}},
        {"name": "jack", "age": 17, "count": {"followings": 54, "fans": 312}},
    ]

Changelog
<<<<<<<<<

v0.10.2
~~~~~~~

**Build**

- upgrade jsonpath-extractor to v0.8.0

v0.10.1
~~~~~~~

**Fix**

- typo in .utils.Property

v0.10.0
~~~~~~~~~~

**Feature**

- supports PEP 561 -- Distributing and Packaging Type Information

**Fix**

- remove LICENSE file from dist files
- duplicated extracting if class attrs overlap happened #67
- remove super class sub-extractors error #68

**Refactor**

- remove duplciated module "data_extractor.abc"
- remove the lazy build mechanism of extractors
- JSON backend invoking mechanism
- make all properties of extractors immutable

**Document**

- fix wrong docstring of "data_extractor.utils.Property"


Contributing
<<<<<<<<<<<<


Environment Setup
~~~~~~~~~~~~~~~~~

Clone the source codes from Github.

.. code-block:: shell

    git clone https://github.com/linw1995/data_extractor.git
    cd data_extractor

Setup the development environment.
Please make sure you install the pdm_,
pre-commit_ and nox_ CLIs in your environment.

.. code-block:: shell

    make init
    make PYTHON=3.7 init  # for specific python version

Linting
~~~~~~~

Use pre-commit_ for installing linters to ensure a good code style.

.. code-block:: shell

    make pre-commit

Run linters. Some linters run via CLI nox_, so make sure you install it.

.. code-block:: shell

    make check-all

Testing
~~~~~~~

Run quick tests.

.. code-block:: shell

    make

Run quick tests with verbose.

.. code-block:: shell

    make vtest

Run tests with coverage.
Testing in multiple Python environments is powered by CLI nox_.

.. code-block:: shell

    make cov

.. _pdm: https://github.com/pdm-project/pdm
.. _pre-commit: https://pre-commit.com/
.. _nox: https://nox.thea.codes/en/stable/

.. |license| image:: https://img.shields.io/github/license/linw1995/data_extractor.svg
    :target: https://github.com/linw1995/data_extractor/blob/master/LICENSE

.. |Pypi Status| image:: https://img.shields.io/pypi/status/data_extractor.svg
    :target: https://pypi.org/project/data_extractor

.. |Python version| image:: https://img.shields.io/pypi/pyversions/data_extractor.svg
    :target: https://pypi.org/project/data_extractor

.. |Package version| image:: https://img.shields.io/pypi/v/data_extractor.svg
    :target: https://pypi.org/project/data_extractor

.. |PyPI - Downloads| image:: https://img.shields.io/pypi/dm/data-extractor.svg
    :target: https://pypi.org/project/data_extractor

.. |GitHub last commit| image:: https://img.shields.io/github/last-commit/linw1995/data_extractor.svg
    :target: https://github.com/linw1995/data_extractor

.. |Code style: black| image:: https://img.shields.io/badge/code%20style-black-000000.svg
    :target: https://github.com/ambv/black

.. |Build Status| image:: https://github.com/linw1995/data_extractor/workflows/Lint&Test/badge.svg
    :target: https://github.com/linw1995/data_extractor/actions?query=workflow%3ALint%26Test

.. |codecov| image:: https://codecov.io/gh/linw1995/data_extractor/branch/master/graph/badge.svg
    :target: https://codecov.io/gh/linw1995/data_extractor

.. |Documentation Status| image:: https://readthedocs.org/projects/data-extractor/badge/?version=latest
    :target: https://data-extractor.readthedocs.io/en/latest/?badge=latest

.. |PDM managed| image:: https://img.shields.io/badge/pdm-managed-blueviolet
    :target: https://pdm.fming.dev

