Metadata-Version: 2.1
Name: nlp-datasets
Version: 0.0.7
Summary: A dataset utils repository. For tensorflow 2.x only!
Home-page: https://github.com/naivenmt/datasets
Author: ZhouYang Luo
Author-email: zhouyang.luo@gmail.com
License: MIT License
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3.6
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
Requires-Dist: deprecated (>=1.2.5)

# datasets
A dataset utils repository. **For tensorflow>=2.0.0b only!**

## Requirements

* python 3.6
* tensorflow>=2.0.0b

## Installation

```bash
pip install nlp-datasets
```

## Contents

* Build dataset for seq2seq models. [seq2seq_dataset.py](nlp_datasets/seq2seq/seq2seq_dataset.py)
* Build dataset for NMT. [nmt_dataset.py](nlp_datasets/nmt/nmt_dataset.py)
* Build dataset for DSSM. [dssm_dataset.py](nlp_datasets/dssm/dssm_dataset.py)
* Build dataset for MatchPyramid. [matchpyramid_dataset.py](nlp_datasets/matchpyramid/match_pyramid_dataset.py)

## Usage

### For NMT task

```python
from nlp_datasets import NMTSameFileDataset

o = NMTSameFileDataset(config=None, logger_name=None)
train_files = [] # your files
# train_dataset is an instance of tf.data.Dataset
train_dataset = o.build_train_dataset(train_files)

```

```python
from nlp_datasets import NMTSeparateFileDataset

o = NMTSeparateFileDataset(config=None, logger_name=None)
feature_files = [] # your files
label_files = []
train_dataset = o.build_train_dataset(feature_files,label_files)
```

### For DSSM task

```python
from nlp_datasets import DSSMSameFileDataset

o = DSSMSameFileDataset(config=None, logger_name=None)
train_dataset = o.build_train_dataset(train_files=[])

```

```python
from nlp_datasets import DSSMSeparateFileDataset

o = DSSMSeparateFileDataset(config=None, logger_name=None)
query_files = []
doc_files = []
label_files = []
train_dataset = o.build_train_dataset(query_files, doc_files, label_files)

```

### For MatchPyramid task

```python
from nlp_datasets import MatchPyramidSameFileDataset

o = MatchPyramidSameFileDataset(config=None, logger_name=None)
train_dataset = o.build_train_dataset(train_files=[])

```

```python
from nlp_datasets import MatchPyramidSeparateFilesDataset

o = MatchPyramidSeparateFilesDataset(config=None, logger_name=None)
query_files = []
doc_files = []
label_files = []
train_dataset = o.build_train_dataset(query_files, doc_files, label_files)

```

