Metadata-Version: 2.1
Name: sentivi
Version: 1.0.7
Summary: A simple tool for Vietnamese Sentiment Analysis
Home-page: https://github.com/vndee/sentivi
Author: Duy V. Huynh
Author-email: hvd.huynhduy@gmail.com
License: MIT
Platform: UNKNOWN
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Description-Content-Type: text/markdown
Requires-Dist: alabaster (==0.7.12)
Requires-Dist: Babel (==2.8.0)
Requires-Dist: bleach (==3.2.1)
Requires-Dist: blis (==0.4.1)
Requires-Dist: bump2version (==1.0.0)
Requires-Dist: bumpversion (==0.6.0)
Requires-Dist: catalogue (==1.0.0)
Requires-Dist: certifi (==2020.6.20)
Requires-Dist: cffi (==1.14.3)
Requires-Dist: chardet (==3.0.4)
Requires-Dist: click (==7.1.2)
Requires-Dist: colorama (==0.4.3)
Requires-Dist: cryptography (==3.1.1)
Requires-Dist: cymem (==2.0.3)
Requires-Dist: dill (==0.3.2)
Requires-Dist: docutils (==0.16)
Requires-Dist: fastapi (==0.61.1)
Requires-Dist: filelock (==3.0.12)
Requires-Dist: future (==0.18.2)
Requires-Dist: gensim (==3.8.3)
Requires-Dist: h11 (==0.10.0)
Requires-Dist: idna (==2.10)
Requires-Dist: imagesize (==1.2.0)
Requires-Dist: jeepney (==0.4.3)
Requires-Dist: Jinja2 (==2.11.2)
Requires-Dist: joblib (==0.16.0)
Requires-Dist: keyring (==21.4.0)
Requires-Dist: MarkupSafe (==1.1.1)
Requires-Dist: msmb-theme (==1.2.0)
Requires-Dist: murmurhash (==1.0.2)
Requires-Dist: numpy (==1.19.2)
Requires-Dist: packaging (==20.4)
Requires-Dist: pkginfo (==1.5.0.1)
Requires-Dist: plac (==1.1.3)
Requires-Dist: preshed (==3.0.2)
Requires-Dist: pycparser (==2.20)
Requires-Dist: pydantic (==1.6.1)
Requires-Dist: Pygments (==2.7.1)
Requires-Dist: pyparsing (==2.4.7)
Requires-Dist: python-crfsuite (==0.9.7)
Requires-Dist: python-multipart (==0.0.5)
Requires-Dist: pytz (==2020.1)
Requires-Dist: pyvi (==0.1)
Requires-Dist: readme-renderer (==26.0)
Requires-Dist: regex (==2020.9.27)
Requires-Dist: requests (==2.24.0)
Requires-Dist: requests-toolbelt (==0.9.1)
Requires-Dist: rfc3986 (==1.4.0)
Requires-Dist: sacremoses (==0.0.43)
Requires-Dist: scikit-learn (==0.23.2)
Requires-Dist: scipy (==1.5.2)
Requires-Dist: SecretStorage (==3.1.2)
Requires-Dist: sentencepiece (==0.1.91)
Requires-Dist: six (==1.15.0)
Requires-Dist: sklearn-crfsuite (==0.3.6)
Requires-Dist: smart-open (==2.2.0)
Requires-Dist: snowballstemmer (==2.0.0)
Requires-Dist: spacy (==2.3.2)
Requires-Dist: Sphinx (==3.2.1)
Requires-Dist: sphinx-drove-theme (==1.11.0)
Requires-Dist: sphinx-rtd-theme (==0.5.0)
Requires-Dist: sphinxcontrib-applehelp (==1.0.2)
Requires-Dist: sphinxcontrib-devhelp (==1.0.2)
Requires-Dist: sphinxcontrib-htmlhelp (==1.0.3)
Requires-Dist: sphinxcontrib-jsmath (==1.0.1)
Requires-Dist: sphinxcontrib-qthelp (==1.0.3)
Requires-Dist: sphinxcontrib-serializinghtml (==1.1.4)
Requires-Dist: srsly (==1.0.2)
Requires-Dist: starlette (==0.13.6)
Requires-Dist: tabulate (==0.8.7)
Requires-Dist: thinc (==7.4.1)
Requires-Dist: threadpoolctl (==2.1.0)
Requires-Dist: tokenizers (==0.8.1rc2)
Requires-Dist: tqdm (==4.49.0)
Requires-Dist: transformers (==3.1.0)
Requires-Dist: twine (==3.2.0)
Requires-Dist: urllib3 (==1.25.10)
Requires-Dist: uvicorn (==0.12.0)
Requires-Dist: wasabi (==0.8.0)
Requires-Dist: webencodings (==0.5.1)

## A Simple Tool For Sentiment Analysis

**Sentivi** - a simple tool for sentiment analysis which is a wrapper of [scikit-learn](https://scikit-learn.org) and
[PyTorch Transformers](https://huggingface.co/transformers/) models (for more specific purpose, it is recommend to use native library instead). It is made for easy and faster pipeline to train and evaluate several
classification algorithms.

Documentation: https://sentivi.readthedocs.io/en/latest/index.html

### Classifiers

- [x] Decision Tree
- [x] Gaussian Naive Bayes
- [x] Gaussian Process
- [x] Nearest Centroid
- [x] Support Vector Machine
- [x] Stochastic Gradient Descent
- [ ] Character Convolutional Neural Network
- [x] Multi-Layer Perceptron
- [x] Long Short Term Memory
- [x] Text Convolutional Neural Network
- [x] Transformer
- [ ] Ensemble
- [ ] Lexicon-based 

### Text Encoders

- [x] One-hot
- [x] Bag of Words
- [x] Term Frequency - Inverse Document Frequency
- [x] Word2Vec
- [x] Transformer Tokenizer (for Transformer classifier only)
- [ ] WordPiece
- [ ] SentencePiece

### Install
- Install legacy version from PyPI:
    ```bash
    pip install sentivi
    ```

- Install latest version from source:
    ```bash
    git clone https://github.com/vndee/sentivi
    cd sentivi
    pip install .
    ```

### Example

```python
from sentivi import Pipeline
from sentivi.data import DataLoader, TextEncoder
from sentivi.classifier import SVMClassifier
from sentivi.text_processor import TextProcessor

text_processor = TextProcessor(methods=['word_segmentation', 'remove_punctuation', 'lower'])

pipeline = Pipeline(DataLoader(text_processor=text_processor, n_grams=3),
                    TextEncoder(encode_type='one-hot'),
                    SVMClassifier(num_labels=3))

train_results = pipeline(train='./data/dev.vi', test='./data/dev_test.vi')
print(train_results)

pipeline.save('./weights/pipeline.sentivi')
_pipeline = Pipeline.load('./weights/pipeline.sentivi')

predict_results = _pipeline.predict(['hàng ok đầu tuýp có một số không vừa ốc siết. chỉ được một số đầu thôi .cần '
                                    'nhất đầu tuýp 14 mà không có. không đạt yêu cầu của mình sử dụng',
                                    'Son đẹpppp, mùi hương vali thơm nhưng hơi nồng, chất son mịn, màu lên chuẩn, '
                                    'đẹppppp'])
print(predict_results)
print(f'Decoded results: {_pipeline.decode_polarity(predict_results)}')
```
Take a look at more examples in [example/](https://github.com/vndee/sentivi/tree/master/example).

### Pipeline Serving

Sentivi use [FastAPI](https://fastapi.tiangolo.com/) to serving pipeline. Simply run a web service as follows:

```python
# serving.py
from sentivi import Pipeline, RESTServiceGateway

pipeline = Pipeline.load('./weights/pipeline.sentivi')
server = RESTServiceGateway(pipeline).get_server()

```

```bash
# pip install uvicorn python-multipart
uvicorn serving:server --host 127.0.0.1 --port 8000
```
Access Swagger at http://127.0.0.1:8000/docs or Redoc http://127.0.0.1:8000/redoc. For example, you can use
[curl](https://curl.haxx.se/) to send post requests:

```bash
curl --location --request POST 'http://127.0.0.1:8000/get_sentiment/' \
     --form 'text=Son đẹpppp, mùi hương vali thơm nhưng hơi nồng'

# response
{ "polarity": 2, "label": "#POS" }
```

#### Deploy using Docker
```dockerfile
FROM tiangolo/uvicorn-gunicorn-fastapi:python3.7

COPY . /app

ENV PYTHONPATH=/app
ENV APP_MODULE=serving:server
ENV WORKERS_PER_CORE=0.75
ENV MAX_WORKERS=6
ENV HOST=0.0.0.0
ENV PORT=80

RUN pip install -r requirements.txt
```

```bash
docker build -t sentivi .
docker run -d -p 8000:80 sentivi
```

### Future Releases

- Lexicon-based
- CharCNN
- Ensemble learning methods
- Model serving (Back-end and Front-end)


