Metadata-Version: 2.1
Name: wikidatasets
Version: 0.2.0
Summary: Break WikiData dumps into smaller knowledge graphs
Home-page: https://github.com/armand33/wikidatasets
Author: Armand Boschin
Author-email: aboschin@enst.fr
License: BSD license
Keywords: wikidatasets
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.7
Requires-Dist: tqdm (==4.32.2)
Requires-Dist: sparqlwrapper (==1.8.2)
Requires-Dist: pandas (==0.24.1)

============
WikiDataSets
============


.. image:: https://img.shields.io/pypi/v/wikidatasets.svg
        :target: https://pypi.python.org/pypi/wikidatasets

.. image:: https://img.shields.io/travis/armand33/wikidatasets.svg
        :target: https://travis-ci.org/armand33/wikidatasets

.. image:: https://readthedocs.org/projects/wikidatasets/badge/?version=latest
        :target: https://wikidatasets.readthedocs.io/en/latest/?badge=latest
        :alt: Documentation Status


.. image:: https://pyup.io/repos/github/armand33/wikidatasets/shield.svg
     :target: https://pyup.io/repos/github/armand33/wikidatasets/
     :alt: Updates



Breaking WikiData dumps into smaller knowledge graphs (e.g. graph of human entities).


* Free software: BSD license
* Documentation: https://wikidatasets.readthedocs.io.
* Paper: https://arxiv.org/abs/1906.04536

Data Sets
---------
Data sets are available on this `page <https://graphs.telecom-paristech.fr/Home_page.html#wikidatasets-section)>`_.

Features
--------

This is a non-exhaustive list of useful functions :

* ``wikidatasets.processFunction.get_subclasses`` : Gets a list of WikiData IDs of entities which are subclasses of the subject.
* ``wikidatasets.processFunction.query_wikidata_dump`` : Goes through a Wikidata dump. It can either collect entities that are instances of test_entities or collect the dictionary of labels. It can also do both.
* ``wikidatasets.processFunction.build_dataset`` : Builds datasets from the pickle files produced by the query_wikidata_dump.
* ``wikidatasets.utils.load_data_labels`` : Loads the edges and attributes files into Pandas dataframes and merges the labels of entities and relations to get.

The example/ folder contains examples of scripts to create datasets (e.g. `build_humans.py <https://github.com/armand33/WikiDataSets/blob/master/examples/build_humans.py>`_).
Such scripts should be placed in the main directory (along with ``utils.py``, ``processFunctions.py``) and hard-coded paths should be tuned to match your installation.

Citations
---------

If you find this code useful in your research, please consider citing our `paper <https://arxiv.org/abs/1906.04536>`_:

.. code-block::

    @misc{arm2019wikidatasets,
        title={WikiDataSets : Standardized sub-graphs from WikiData},
        author={Armand Boschin},
        year={2019},
        eprint={1906.04536},
        archivePrefix={arXiv},
        primaryClass={cs.LG}
    }

Credits
-------

This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.

.. _Cookiecutter: https://github.com/audreyr/cookiecutter
.. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage


=======
History
=======

0.2.0 (2019-07-02)
------------------

* Added export of a nodes.txt to the build_dataset function.



0.1.0 (2019-07-01)
------------------

* First release on PyPI.


