Metadata-Version: 2.1
Name: retvec
Version: 1.0.0
Summary: Resilient and Efficient Text Vectorizer
Home-page: https://github.com/google-research/retvec
Author: Google
Author-email: retvec@google.com
License: Apache License 2.0
Description: # RETVec: Resilient & Efficient Text Vectorizer
        
        
        ## Overview
        RETVec is a next-gen text vectorizer designed to offer built-in adversarial resilience using robust word embeddings. Read the paper here: https://arxiv.org/abs/2302.09207.
        
        RETVec is trained to be resilient against character manipulations including insertion, deletion, typos, homoglyphs, LEET substitution, and more. The RETVec model is trained on top of a novel character embedding which can encode all UTF-8 characters and words. Thus, RETVec works out-of-the-box on over 100 languages without the need for a lookup table or fixed vocabulary size. Furthermore, RETVec is a layer, which means that it can be inserted into any TF model without the need for a separate pre-processing step.
        
        
        ### Getting started
        
        #### Installation
        
        You can use pip to install the TensorFlow version of RETVec:
        
        ```python
        pip install retvec
        ```
        
        RETVec has been tested on TensorFlow 2.6+ and python 3.7+.
        
        ### Basic Usage
        
        `training/train_tf_retvec_models.py` is the RETVec model training script. Example usage:
        
        ```python
        train_tf_retvec_models.py --train_config <train_config_path> --model_config <model_config_path> --output_dir <output_path>
        ```
        
        Configurations for our base models are under the `configs/` folder.
        
        ### Colab
        
        Colab for training and releasing a new RETVec model: `notebooks/train_and_relase_a_rewnet.ipynb`
        
        Hello world colab: `notebooks/hello_world.ipynb`
        
        ## Disclaimer
        This is not an official Google product.
        
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Provides-Extra: tensorflow
Provides-Extra: dev
