Metadata-Version: 2.1
Name: cldfzenodo
Version: 0.2.0
Summary: functionality to retrieve CLDF datasets deposited on Zenodo
Home-page: https://github.com/cldf/cldfzenodo
Author: Robert Forkel
Author-email: dlce.rdm@eva.mpg.de
License: Apache 2.0
Platform: any
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: attrs
Requires-Dist: html5lib
Requires-Dist: nameparser
Requires-Dist: pycldf (>=1.23.0)
Provides-Extra: cli
Requires-Dist: cldfbench ; extra == 'cli'
Provides-Extra: dev
Requires-Dist: flake8 ; extra == 'dev'
Requires-Dist: twine ; extra == 'dev'
Requires-Dist: wheel ; extra == 'dev'
Provides-Extra: test
Requires-Dist: cldfbench ; extra == 'test'
Requires-Dist: coverage (>=4.2) ; extra == 'test'
Requires-Dist: pytest-cov ; extra == 'test'
Requires-Dist: pytest-mock ; extra == 'test'
Requires-Dist: pytest (>=5) ; extra == 'test'

# cldfzenodo

[![Build Status](https://github.com/cldf/cldfzenodo/workflows/tests/badge.svg)](https://github.com/cldf/cldfzenodo/actions?query=workflow%3Atests)
[![PyPI](https://img.shields.io/pypi/v/cldfzenodo.svg)](https://pypi.org/project/cldfzenodo)

`cldfzenodo` provides programmatic access to CLDF data deposited on [Zenodo](https://zenodo.org).


## Install

```shell
pip install cldfzenodo
```

## CLI

`cldfzenodo` provides a subcommand to be run from [cldfbench](https://github.com/cldf/cldfbench).
To make use of this command, you have to install `cldfbench`, which can be done via
```shell
pip install cldfzenodo[cli]
```
Then you can download CLDF datasets from Zenodo, using the DOI for identification. E.g.
```shell
cldfbench zenodo.download 10.5281/zenodo.4683137  --directory wals-2020.1/
```
will download WALS Online as CLDF dataset into `wals-2020.1`:
```shell
$ tree wals-2020.1/
wals-2020.1/
├── areas.csv
├── chapters.csv
├── codes.csv
├── contributors.csv
├── countries.csv
├── examples.csv
├── language_names.csv
├── languages.csv
├── parameters.csv
├── sources.bib
├── StructureDataset-metadata.json
└── values.csv

0 directories, 12 files
```


## API

Metadata and data of (potential) CLDF datasets deposited on Zenodo is accessed via `cldfzenodo.Record`
objects. Such objects can be obtained in various ways:
- Via DOI:
  ```python
  import cldfzenodo
  rec = cldfzenodo.Record.from_doi('https://doi.org/10.5281/zenodo.4762034')
  ```
- From deposits grouped into a Zenodo community (and obtained through OAI-PMH):
  ```python
  import cldfzenodo.oai
  for rec in cldfzenodo.oai.iter_records('dictionaria'):
    print(rec)
  ```
- From search results using keywords:
  ```python
  import cldfzenodo
  for rec in cldfzenodo.search_wordlists():
    print(rec)
  ```

`cldfzenodo.Record` objects provide sufficient metadata to allow identification and data access:
```python
>>> from cldfzenodo import Record
>>> print(Record.from_doi('10.5281/zenodo.4762034').bibtex)
@misc{zenodo-4762034,
  author    = {Hammarström, Harald and Forkel, Robert and Haspelmath, Martin and Bank, Sebastian},
  title     = {glottolog/glottolog: Glottolog database 4.4 as CLDF},
  keywords  = {cldf:StructureDataset, linguistics},
  publisher = {Zenodo},
  year      = {2021},
  doi       = {10.5281/zenodo.4762034},
  url       = {https://doi.org/10.5281/zenodo.4762034},
  copyright = {Creative Commons Attribution 4.0}
}
```

One can download the full deposit (and access - possible multiple - CLDF datasets):
```python
from pycldf import iter_datasets
record.download('my_directory')
for cldf in iter_datasets('my_directory'):
    pass
```

But often, only the "pure" CLDF data is of interest - and not the additional metadata and curation
context, e.g. of [cldfbench](https://github.com/cldf/cldfbench)-curated datasets. This can be done
via
```python
from pycldf import Dataset
mdpath = record.download_dataset('my_directory')
cldf = Dataset.from_metadata(mdpath)
```


