Metadata-Version: 2.1
Name: kgdata
Version: 5.3.4
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: Implementation :: CPython
Requires-Dist: orjson >=3.9.0, <4.0.0
Requires-Dist: tqdm >=4.64.0, <5.0.0
Requires-Dist: beautifulsoup4 >=4.9.3, <5.0.0
Requires-Dist: loguru >=0.6.0, <0.7.0
Requires-Dist: rdflib >=6.1.1, <7.0.0
Requires-Dist: six >=1.16.0, <2.0.0
Requires-Dist: ruamel.yaml >=0.17.21, <0.18.0
Requires-Dist: chardet >=5.0.0, <6.0.0
Requires-Dist: ujson >=5.5.0, <6.0.0
Requires-Dist: redis >=3.5.3, <4.0.0
Requires-Dist: numpy >=1.22.3, <2.0.0
Requires-Dist: fastnumbers >=3.2.1, <4.0.0
Requires-Dist: requests >=2.28.0, <3.0.0
Requires-Dist: sem-desc >=5.1.0, <6.0.0
Requires-Dist: click >=8.1.3, <9.0.0
Requires-Dist: parsimonious >=0.8.1, <0.9.0
Requires-Dist: hugedict >=2.12.5, <3.0.0
Requires-Dist: rsoup >=3.1.7, <4.0.0
Requires-Dist: lxml >=4.9.0, <5.0.0
Requires-Dist: ray >=2.0.1, <3.0.0
Requires-Dist: pqdict >=1.3.0, <2.0.0
Requires-Dist: python-dotenv >=0.19.0, <0.20.0 ; extra == 'dev'
Requires-Dist: pytest >=7.1.3, <8.0.0 ; extra == 'dev'
Requires-Dist: black >=22.10.0, <23.0.0 ; extra == 'dev'
Requires-Dist: pyspark >=3.4.1, <4.0.0 ; extra == 'spark'
Requires-Dist: kgdata[dev,spark] ; extra == 'all'
Provides-Extra: dev
Provides-Extra: spark
Provides-Extra: all
License-File: LICENSE
License-File: LICENSE
Summary: Library to process dumps of knowledge graphs (Wikipedia, DBpedia, Wikidata)
Keywords: knowledge-graph,wikidata,wikipedia,dbpedia
Home-Page: https://github.com/binh-vu/kgdata
Author-email: Binh Vu <binh@toan2.com>
Requires-Python: >=3.8
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: homepage, https://github.com/binh-vu/kgdata
Project-URL: repository, https://github.com/binh-vu/kgdata

# kgdata ![PyPI](https://img.shields.io/pypi/v/kgdata) ![Documentation](https://readthedocs.org/projects/kgdata/badge/?version=latest&style=flat)

KGData is a library to process dumps of Wikipedia, Wikidata. What it can do:

- Clean up the dumps to ensure the data is consistent (resolve redirect, remove dangling references)
- Create embedded key-value databases to access entities from the dumps.
- Extract Wikidata ontology.
- Extract Wikipedia tables and convert the hyperlinks to Wikidata entities.
- Create Pyserini indices to search Wikidata’s entities.
- and more

For a full documentation, please see [the website](https://kgdata.readthedocs.io/).

## Installation

From PyPI (using pre-built binaries):

```bash
pip install kgdata[spark]   # omit spark to manually specify its version if your cluster has different version
```

