Metadata-Version: 2.1
Name: nerdpool-client
Version: 1.1.0
Summary: A client for Nerdpool-Api
Home-page: https://github.com/acdh-oeaw/nerdpool-client
Author: Peter Andorfer
Author-email: peter.andorfer@oeaw.ac.at
License: MIT
Platform: UNKNOWN
Description-Content-Type: text/markdown
Requires-Dist: requests

[![Run Tests](https://github.com/acdh-oeaw/nerdpool-client/actions/workflows/test.yml/badge.svg)](https://github.com/acdh-oeaw/nerdpool-client/actions/workflows/test.yml) [![codecov](https://codecov.io/gh/acdh-oeaw/nerdpool-client/branch/master/graph/badge.svg?token=LXKIMGDXXF)](https://codecov.io/gh/acdh-oeaw/nerdpool-client) [![PyPI version](https://badge.fury.io/py/nerdpool-client.svg)](https://badge.fury.io/py/nerdpool-client)

# nerdpool-client

A Python client for downloading data from https://nerdpool-api.acdh-dev.oeaw.ac.at

## install

`pip install nerdpool_client`

## usage

### list data set titles


```python

from nerdpool_client import NerdPoolClient

client = NerdPoolClient()
print(client.data_sets)
# ['RTA', 'RITA', 'MRP', 'Chronik Aldersbach', 'DIPKO']
```

### download samples as .jsonl file

* go to [nerdpool-api](https://nerdpool-api.acdh-dev.oeaw.ac.at/) and create/filter you'r prefered data sample; e.g. all samples from MRP: 

```python

from nerdpool_client import NerdPoolClient

url = "https://nerdpool-api.acdh-dev.oeaw.ac.at/api/ner-sample/?format=json&ner_ent_type__contains=&ner_source__title=MRP"
client = NerdPoolClient()
client.dump_to_jsonl(url)
# 'out.jsonl'
```

### download samples as test.jsonl and eval.jsonl files

* With `file_name_prefix` you can add a custom prefix to the default file names `train.jsonl` and `eval.jsonl`
* The param `split` defines that each `split` sample should be saved into `eval.jsonl` and not into `train.jsonl`

```python
from nerdpool_client import NerdPoolClient

url = "https://nerdpool-api.acdh-dev.oeaw.ac.at/api/ner-sample/?format=json&ner_ent_type__contains=&ner_source__title=MRP"
client = NerdPoolClient()
client.dump_to_train_eval(url, file_name_prefix="mrp__", split=10)
# ['mrp__train.jsonl', 'mrp__eval.jsonl]
```

