Metadata-Version: 2.1
Name: fibber
Version: 0.1.0.dev0
Summary: Fibber is a benchmarking suite for adversarial attacks on text classification.
Home-page: https://github.com/DAI-Lab/fibber
Author: MIT Data To AI Lab
Author-email: dailabmit@gmail.com
License: MIT license
Keywords: fibber fibber fibber
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: numpy (>=1.18.0)
Requires-Dist: tensorflow (>=2.0.0)
Requires-Dist: tensorflow-hub (>=0.9.0)
Requires-Dist: torch (<2,>=1.0)
Requires-Dist: torchvision (<1,>=0.4.2)
Requires-Dist: transformers (>=2.4.0)
Requires-Dist: tqdm (>=4.0.0)
Requires-Dist: spacy (>=2.0.0)
Requires-Dist: pandas (>=1.0.0)
Requires-Dist: textattack (==0.2.10)
Requires-Dist: nltk (>=3.0)
Requires-Dist: stanza (==1.1.1)
Provides-Extra: dev
Requires-Dist: bumpversion (>=0.5.3) ; extra == 'dev'
Requires-Dist: pip (>=9.0.1) ; extra == 'dev'
Requires-Dist: watchdog (>=0.8.3) ; extra == 'dev'
Requires-Dist: m2r2 (<0.3,>=0.2.5) ; extra == 'dev'
Requires-Dist: nbsphinx (<0.7,>=0.5.0) ; extra == 'dev'
Requires-Dist: Sphinx (==3.2.1) ; extra == 'dev'
Requires-Dist: pydata-sphinx-theme ; extra == 'dev'
Requires-Dist: autodocsumm (>=0.1.10) ; extra == 'dev'
Requires-Dist: PyYaml (<6,>=5.3.1) ; extra == 'dev'
Requires-Dist: argh (<1,>=0.26.2) ; extra == 'dev'
Requires-Dist: sphinx-rtd-theme (<1,>=0.4) ; extra == 'dev'
Requires-Dist: ipython (<8,>=7) ; extra == 'dev'
Requires-Dist: flake8 (>=3.7.7) ; extra == 'dev'
Requires-Dist: isort (>=5) ; extra == 'dev'
Requires-Dist: autoflake (>=1.2) ; extra == 'dev'
Requires-Dist: autopep8 (>=1.4.3) ; extra == 'dev'
Requires-Dist: twine (>=1.10.0) ; extra == 'dev'
Requires-Dist: wheel (>=0.30.0) ; extra == 'dev'
Requires-Dist: coverage (>=4.5.1) ; extra == 'dev'
Requires-Dist: tox (>=2.9.1) ; extra == 'dev'
Requires-Dist: pytest (>=3.4.2) ; extra == 'dev'
Requires-Dist: pytest-cov (>=2.6.0) ; extra == 'dev'
Provides-Extra: test
Requires-Dist: pytest (>=3.4.2) ; extra == 'test'
Requires-Dist: pytest-cov (>=2.6.0) ; extra == 'test'

<p align="left">
<img width=15% src="https://dai.lids.mit.edu/wp-content/uploads/2018/06/Logo_DAI_highres.png" alt=“DAI-Lab” />
<i>An open source project from Data to AI Lab at MIT.</i>
</p>

<!-- Uncomment these lines after releasing the package to PyPI for version and downloads badges -->
[![PyPI Shield](https://img.shields.io/pypi/v/fibber.svg)](https://pypi.python.org/pypi/fibber)
[![Downloads](https://pepy.tech/badge/fibber)](https://pepy.tech/project/fibber)
[![Travis CI Shield](https://travis-ci.org/DAI-Lab/fibber.svg?branch=master&status=started)](https://travis-ci.org/DAI-Lab/fibber)
[![Coverage Status](https://codecov.io/gh/DAI-Lab/fibber/branch/master/graph/badge.svg)](https://codecov.io/gh/DAI-Lab/fibber)


# Fibber

Fibber is a library to evaluate different strategies to paraphrase natural language, especially how these strategies can break text classifiers without changing the meaning of a sentence.

- Documentation: [https://DAI-Lab.github.io/fibber](https://DAI-Lab.github.io/fibber)
- GitHub: [https://github.com/DAI-Lab/fibber](https://github.com/DAI-Lab/fibber)

# Overview

Fibber is a library to evaluate different strategies to paraphrase natural language. In this library, we have several built-in paraphrasing strategies. We also have a benchmark framework to evaluate the quality of paraphrase. In particular, we use the GPT2 language model to measure how meaningful is the paraphrased text. We use a universal sentence encoder to evaluate the semantic similarity between original and paraphrased text. We also train a BERT classifier on the original dataset, and check of paraphrased sentences can break the text classifier.

# Install

## Requirements

**fibber** has been developed and tested on [Python 3.6, 3.7 and 3.8](https://www.python.org/downloads/)

Also, although it is not strictly required, the usage of [conda](https://docs.conda.io/en/latest/miniconda.html)
is highly recommended to avoid interfering with other software installed in the system
in which **fibber** is run.

These are the minimum commands needed to create a conda environment using python3.6 for **fibber**:

```bash
# First you should install conda.
conda create -n fibber_env python=3.6
```

Afterward, you have to execute this command to activate the environment:

```bash
conda activate fibber_env
```

**Then you should install tensorflow and pytorch.** Please follow the instructions for [tensorflow](https://www.tensorflow.org/install) and [pytorch](https://pytorch.org). Fibber requires `tensorflow>=2.0.0` and `pytorch>=1.5.0`. Please choose a proper version of tensorflow and pytorch according to the CUDA version on your computer.


Remember to execute `conda activate fibber_env` every time you start a new console to work on **fibber**!

**Install Java** Please install a Java runtime environment on your computer.

## Install from PyPI

After creating the conda environment and activating it, we recommend using
[pip](https://pip.pypa.io/en/stable/) in order to install **fibber**:

```bash
pip install fibber
```

This will pull and install the latest stable release from [PyPI](https://pypi.org/).

## Use without install

If you are using this project for research purpose and want to make changes to the code,
you can install all requirements by

```bash
git clone git@github.com:DAI-Lab/fibber.git
cd fibber
pip install --requirement requirement.txt
```

Then you can use fibber by

```base
python -m fibber.datasets.download_datasets
python -m fibber.benchmark.benchmark
```

In this case, any changes you made on the code will take effect immediately.


## Install from source

With your conda environment activated, you can clone the repository and install it from
source by running `make install` on the `stable` branch:

```bash
git clone git@github.com:DAI-Lab/fibber.git
cd fibber
git checkout stable
make install
```


# Quickstart

In this short tutorial, we will guide you through a series of steps that will help you
getting started with **fibber**.

**(1) [Install Fibber](#Install)**

**(2) Get a demo dataset and resources.**

```python
from fibber.datasets import get_demo_dataset

trainset, testset = get_demo_dataset()

from fibber.resources import download_all

# resources are downloaded to ~/.fibber
download_all()
```



**(3) Create a Fibber object.**

```python
from fibber.fibber import Fibber

# args starting with "bs_" are hyperparameters for the BertSamplingStrategy.
arg_dict = {
    "use_gpu_id": 0,
    "gpt2_gpu_id": 0,
    "bert_gpu_id": 0,
    "strategy_gpu_id": 0,
    "bs_block_size": 3,
    "bs_wpe_weight": 10000,
    "bs_use_weight": 1000,
    "bs_gpt2_weight": 10,
    "bs_clf_weight": 3
}

# create a fibber object. 
# This step may take a while (about 1 hour) on RTX TITAN, and requires 20G of 
# GPU memory. If there's not enough GPU memory on your GPU, consider assign use
# gpt2, bert, and strategy to different GPUs. 
# 
fibber = Fibber(arg_dict, dataset_name="demo", strategy_name="BertSamplingStrategy",
                trainset=trainset, testset=testset, output_dir="exp-demo")
```

**(4) You can also ask fibber to paraphrase your sentence.**

The following command can randomly paraphrase the sentence into 5 different ways.

```python
# Try sentences you like. 
# label 0 means negative, and 1 means positive.
fibber.paraphrase(
    {"text0": ("The Avengers is a good movie. Although it is 3 hours long, every scene has something to watch."), 
     "label": 1}, 
    field_name="text0", 
    n=5)
```

The output is a tuple of (str, list, list).

```python
# Original Text
'The Avengers is a good movie. Although it is 3 hours long, every scene has something to watch.'

# 5 paraphrases
['the avengers is a good movie. even it is 2 hours long, there is not enough to watch.',
  'the avengers is a good movie. while it is 3 hours long, it is still very watchable.',
  'the avengers is a good movie and although it is 2 ¹⁄₂ hours long, it is never very interesting.',
  'avengers is not a good movie. while it is three hours long, it is still something to watch.',
  'the avengers is a bad movie. while it is three hours long, it is still something to watch.']

# Evaluation metrics of these 5 paraphrases.

  {'EditingDistance': 8,
   'USESemanticSimilarity': 0.9523628950119019,
   'GloVeSemanticSimilarity': 0.9795315341042675,
   'GPT2GrammarQuality': 1.492070198059082,
   'BertClfPrediction': 0},
  {'EditingDistance': 9,
   'USESemanticSimilarity': 0.9372092485427856,
   'GloVeSemanticSimilarity': 0.9575780832312993,
   'GPT2GrammarQuality': 0.9813404679298401,
   'BertClfPrediction': 1},
  {'EditingDistance': 11,
   'USESemanticSimilarity': 0.9265919327735901,
   'GloVeSemanticSimilarity': 0.9710499628056698,
   'GPT2GrammarQuality': 1.325406551361084,
   'BertClfPrediction': 0},
  {'EditingDistance': 7,
   'USESemanticSimilarity': 0.8913971185684204,
   'GloVeSemanticSimilarity': 0.9800737898362042,
   'GPT2GrammarQuality': 1.2504483461380005,
   'BertClfPrediction': 1},
  {'EditingDistance': 8,
   'USESemanticSimilarity': 0.9124080538749695,
   'GloVeSemanticSimilarity': 0.9744155151490856,
   'GPT2GrammarQuality': 1.1626977920532227,
   'BertClfPrediction': 0}]
```

**(5) You can ask fibber to randomly pick a sentence from the dataset and paraphrase it.**


```python
fibber.paraphrase_a_random_sentence(n=5)
```



# Supported strategies

In this version, we implement three strategies

- IdentityStrategy:
	- The identity strategy outputs the original text as its paraphrase.
	- This strategy generates exactly 1 paraphrase for each original text regardless of `--num_paraphrases_per_text` flag.
- RandomStrategy:
	- The random strategy outputs the random shuffle of words in the original text.
- TextFoolerStrategy: 
	- Implementation of [Jin et. al, 2019](https://arxiv.org/abs/1907.11932)
- BertSamplingStrategy:


# What's next?

For more details about **fibber** and all its possibilities
and features, please check the [documentation site](
https://DAI-Lab.github.io/fibber/).


# History

## version 0.1.0

This release is a major update to Fibber library. Advanced paraphrase algorithms are included.

- Add two strategies: TextFoolerStrategy and BertSamplingStrategy.
- Improve the benchmarking framework: add more metrics specifically designed for adversarial attack.
- Datasets: add a variation of AG's news dataset, `ag_no_title`.
- Bug fix and improvements.

## version 0.0.1

This is the first release of Fibber library. This release contains:

- Datasets: fibber contains 6 built-in datasets.
- Metrics: fibber contains 6 metrics to evaluate the quality of paraphrased
  sentences. All metrics have a unified interface.
- Benchmark framework: the benchmark framework and easily evaluate the
  phraphrase strategies on built-in datasets and metrics.
- Strategies: this release contains 2 basic strategies, the identity strategy
  and random strategy.
- A unified Fibber interface: users can easily use fibber by creating a Fibber
  object.


