Metadata-Version: 2.1
Name: rivertext
Version: 0.0.2
Summary: An Python Library for training and evaluating on Incremental Word Embedding.
Home-page: https://github.com/dccuchile/rivertext
Download-URL: https://github.com/dccuchile/rivertext
Author: Rivertext Team
Maintainer: Rivertext Team
Maintainer-email: gabrielturrab@ug.uchile.cl
License: new BSD
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved
Classifier: Programming Language :: Python
Classifier: Topic :: Software Development
Classifier: Topic :: Scientific/Engineering
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX
Classifier: Operating System :: Unix
Classifier: Operating System :: MacOS
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE

RiverText
===================================================================================

RiverTex is an open-source library for modeling and training different incremental word vector architectures proposed by the state-of-the-art.

It seeks to standardize many existing incremental word vector algorithms into a unified framework to provide a standardized
interface and facilitate the development of new methods.

RiverTex provides two training paradigms:

* `learn_one`, which trains one instance at a time;

* and `learn_many`, which trains a mini-batch of instances at a time.

This allows for more efficient training of text representation models with text data streams.

RiverText also provides an interface similar to the [`river`](https://riverml.xyz) package, making it easy for developers to use the library to quickly
and easily train text representation models.

The official documentation can be found at this [link](https://dccuchile.github.io/rivertext/).

Installation
============

Requirements
------------

These package will be installed along with the package, in case these have not already been installed:

1. nltk
2. numpy
3. river
4. scikit_learn
5. scipy
6. torch
7. tqdm

Contributing
------------

Development Requirements
------------------------

Testing
-------

All unit tests are in the rivertext/tests folder. It uses `pytest` as a framework to run them.

To run the test, execute:

```
pytest tests
```

To check the coverage, run:

```
pytest tests --cov-report xml:cov.xml --cov rivertext
```

And then:

```
coverage report -m
```

Build the documentation
-----------------------

The documentation is created using `mkdocs` and `mkdocs-material`. It can be found in the docs folder at the root of the project. First, you need to install:

```
pip install mkdocs
pip install "mkdocstrings[python]"
pip install mkdocs-material
```

Then, to compile the documentation, run:

```
mkdocs build
mkdocs serve
```

Changelog
=========

References
========

```bibtex
@article{montiel2021river,
  title={River: machine learning for streaming data in Python},
  author={Montiel, Jacob and Halford, Max and Mastelini, Saulo Martiello and Bolmier, Geoffrey and Sourty,
    Raphael and Vaysse, Robin and Zouitine, Adil and Gomes, Heitor Murilo and Read, Jesse and Abdessalem,
    Talel and others},
  year={2021}
}

@article{bravo2022incremental,
  title={Incremental Word Vectors for Time-Evolving Sentiment Lexicon Induction},
  author={Bravo-Marquez, Felipe and Khanchandani, Arun and Pfahringer, Bernhard},
  journal={Cognitive Computation},
  volume={14},
  number={1},
  pages={425--441},
  year={2022},
  publisher={Springer}
}

@article{kaji2017incremental,
  title={Incremental skip-gram model with negative sampling},
  author={Kaji, Nobuhiro and Kobayashi, Hayato},
  journal={arXiv preprint arXiv:1704.03956},
  year={2017}
}
```

Team
====

- [Gabriel Iturra](https://giturra.github.io/)
- [Felipe Bravo-Marquez](https://felipebravom.com/)

Contact
------------
Please write to gabrieliturrab at ug.chile.cl for inquiries about the software. You are also welcome to do a pull request or publish an issue in the RiverText repository on Github.
