Metadata-Version: 2.1
Name: nmtpytorch
Version: 3.0.0
Summary: Sequence-to-Sequence Framework in Pytorch
Home-page: https://github.com/lium-lst/nmtpytorch
Author: Ozan Caglayan
Author-email: ozan.caglayan@univ-lemans.fr
License: MIT
Project-URL: Wiki, https://github.com/lium-lst/nmtpytorch/wiki
Description: ![nmtpytorch](https://github.com/lium-lst/nmtpytorch/blob/master/docs/logo.png?raw=true "nmtpytorch")
        
        [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
        [![Python 3.6](https://img.shields.io/badge/python-3.6-blue.svg)](https://www.python.org/downloads/release/python-360/)
        
        `nmtpytorch` allows training of various end-to-end neural architectures including
        but not limited to neural machine translation, image captioning and automatic
        speech recognition systems. The initial codebase was in `Theano` and was
        inspired from the famous [dl4mt-tutorial](https://github.com/nyu-dl/dl4mt-tutorial)
        codebase.
        
        `nmtpytorch` is mainly developed by the **Language and Speech Team** of **Le Mans University** but
        receives valuable contributions from the [Grounded Sequence-to-sequence Transduction Team](https://github.com/srvk/jsalt-2018-grounded-s2s)
        of *Frederick Jelinek Memorial Summer Workshop 2018*:
        
        Loic Barrault, Ozan Caglayan, Amanda Duarte, Desmond Elliott, Spandana Gella, Nils Holzenberger,
        Chirag Lala, Jasmine (Sun Jae) Lee, Jindřich Libovický, Pranava Madhyastha,
        Florian Metze, Karl Mulligan, Alissa Ostapenko, Shruti Palaskar, Ramon Sanabria, Lucia Specia and Josiah Wang.
        
        If you use **nmtpytorch**, you may want to cite the following [paper](https://ufal.mff.cuni.cz/pbml/109/art-caglayan-et-al.pdf):
        ```
        @article{nmtpy2017,
          author    = {Ozan Caglayan and
                       Mercedes Garc\'{i}a-Mart\'{i}nez and
                       Adrien Bardet and
                       Walid Aransa and
                       Fethi Bougares and
                       Lo\"{i}c Barrault},
          title     = {NMTPY: A Flexible Toolkit for Advanced Neural Machine Translation Systems},
          journal   = {Prague Bull. Math. Linguistics},
          volume    = {109},
          pages     = {15--28},
          year      = {2017},
          url       = {https://ufal.mff.cuni.cz/pbml/109/art-caglayan-et-al.pdf},
          doi       = {10.1515/pralin-2017-0035},
          timestamp = {Tue, 12 Sep 2017 10:01:08 +0100}
        }
        ```
        
        ## Installation
        
        `nmtpytorch` currently requires `python>=3.6` and `torch==0.4.1`.
        We are not planning to support Python 2.x.
        
        **IMPORTANT:** After installing `nmtpytorch`, you **need** to run `nmtpy-install-extra`
        to download METEOR related files into your `${HOME}/.nmtpy` folder.
        This step is only required once.
        
        ### pip
        
        You can install `nmtpytorch` from `PyPI` using `pip` (or `pip3` depending on your
        operating system and environment):
        
        ```
        $ pip install nmtpytorch
        ```
        
        This will automatically fetch and install the dependencies as well. For the `torch`
        dependency it will specifically install the `torch 0.4.1` package from `PyPI` that
        ships `CUDA 9.0` within.
        
        ### conda
        
        We provide an `environment.yml` file in the repository that you can use to create
        a ready-to-use anaconda environment for `nmtpytorch`:
        
        ```
        $ conda update --all
        $ git clone https://github.com/lium-lst/nmtpytorch.git
        $ conda env create -f nmtpytorch/environment.yml
        ```
        
        ### Development Mode
        
        For continuous development and testing, it is sufficient to run `python setup.py develop`
        in the root folder of your GIT checkout. From now on, all modifications to the source
        tree are directly taken into account without requiring reinstallation.
        
        ## Documentation
        
        We currently only provide some preliminary documentation in our [wiki](https://github.com/lium-lst/nmtpytorch/wiki).
        
        ## Release Notes
        
        ### v3.0.0 (05/10/2018)
        Major release that brings support for **Pytorch 0.4** and drops support for **0.3**.
        
        Training and testing on **CPUs** are now supported thanks to easier device
        semantics of Pytorch 0.4: just give `-d cpu` to `nmtpy` to switch to CPU mode.
        NOTE: Training on CPUs is only logical for debugging, otherwise it's very slow.
          - NOTE: `device_id` is no longer a configuration option. It should be removed
          from your old configurations.
          - Multi-GPU is not supported. Always restrict to single GPU using
            `CUDA_VISIBLE_DEVICES` environment variable.
        
        You can now override the config options used to train a model during
        inference: Example: `nmtpy translate (...) -x model.att_temp:0.9`
        
        `nmtpy train` now detects invalid/old `[train]` options and refuses to
        train the model.
        
        **New sampler:** `ApproximateBucketBatchSampler`
        Similar to the default `BucketBatchSampler` but more efficient for sparsely
        distributed sequence-lengths as in speech recognition. It bins similar-length
        items to buckets. It no longer guarantees that the batches are completely
        made of same-length sequences so **care has to be taken in the encoders**
        to support packing/padding/masking. `TextEncoder` already does this automatically
        while speech encoder `BiLSTMp` does not care.
        
        **EXPERIMENTAL**: You can decode an ASR system using the approximate sampler
        although the model does not take care of the padded positions (a warning
        will be printed at each batch).
        The loss is 0.2% WER for a specific dataset that we tried. So although the computations
        in the encoder becomes noisy and not totally correct, the model can handle
        this noise quite robustly:
        
        `$ nmtpy translate -s val -o hyp -x model.sampler_type:approximate best_asr.ckpt`
        
        This type of batching cuts ASR decoding time almost by a factor of 2-3.
        
        #### Other changes
          - Vocabularies generated by `nmtpy-build-vocab` now contains frequency
            information as well. The code is backward-compatible with old vocab files.
          - `Batch` objects should now be explicitly moved to the allocated device
            using `.device()` method. See `mainloop.py` and `test_performance()` from
            the `NMT` model.
          - Training no longer shows the cached GPU allocation from `nvidia-smi` output
            as it was in the end a hacky thing to call `nvidia-smi` periodically. We
            plan to use `torch.cuda.*` to get an estimate on memory consumption.
          - NOTE: Multi-process data loading is temporarily disabled as it was
            crashing from time to time so `num_workers > 0` does not have an effect
            in this release.
          - `Attention` is separated into `DotAttention` and `MLPAttention` and a
            convenience function `get_attention()` is provided to select between them
            during model construction.
          - `get_activation_fn()` should be used to select between non-linearities
            dynamically instead of doing `getattr(nn.functional, activ)`. The latter
            will not work for `tanh` and `sigmoid` in the next Pytorch releases.
          - Simplification: `ASR` model is now derived from `NMT`.
        
        
        ### v2.0.0 (26/09/2018)
          - Ability to install through `pip`.
          - Advanced layers are now organized into subfolders.
          - New basic layers: Convolution over sequence, MaxMargin.
          - New attention layers: Co-attention, multi-head attention, hierarchical attention.
          - New encoders: Arbitrary sequence-of-vectors encoder, BiLSTMp speech feature encoder.
          - New decoders: Multi-source decoder, switching decoder, vector decoder.
          - New datasets: Kaldi dataset (.ark/.scp reader), Shelve dataset, Numpy sequence dataset.
          - Added learning rate annealing: See `lr_decay*` options in `config.py`.
          - Removed subword-nmt and METEOR files from repository. We now depend on
            the PIP package for subword-nmt. For METEOR, `nmtpy-install-extra` should
            be launched after installation.
          - More multi-task and multi-input/output `translate` and `training` regimes.
          - New early-stopping metrics: Character and word error rate (cer,wer) and ROUGE (rouge).
          - Curriculum learning option for the `BucketBatchSampler`, i.e. length-ordered batches.
          - New models:
             - ASR: Listen-attend-and-spell like automatic speech recognition
             - Multitask*: Experimental multi-tasking & scheduling between many inputs/outputs.
        
        ### v1.4.0 (09/05/2018)
          - Add `environment.yml` for easy installation using `conda`. You can now
          create a ready-to-use `conda` environment by just calling `conda env create -f environment.yml`.
          - Make `NumpyDataset` memory efficient by keeping `float16` arrays as they are
          until batch creation time.
          - Rename `Multi30kRawDataset` to `Multi30kDataset` which now supports both
          raw image files and pre-extracted visual features file stored as `.npy`.
          - Add CNN feature extraction script under `scripts/`.
          - Add doubly stochastic attention to `ShowAttendAndTell` and multimodal NMT.
          - New model `MNMTDecinit` to initialize decoder with auxiliary features.
          - New model `AMNMTFeatures` which is the attentive MMT but with features file
          instead of end-to-end feature extraction which was memory hungry.
        
        ### v1.3.2 (02/05/2018)
        
          - Updates to `ShowAttendAndTell` model.
        
        ### v1.3.1 (01/05/2018)
        
          - Removed old `Multi30kDataset`.
          - Sort batches by source sequence length instead of target.
          - Fix `ShowAttendAndTell` model. It should now work.
        
        ### v1.3 (30/04/2018)
        
         - Added `Multi30kRawDataset` for training end-to-end systems from raw images as input.
         - Added `NumpyDataset` to read `.npy/.npz` tensor files as input features.
         - You can now pass `-S` to `nmtpy train` to produce shorter experiment files with not all the hyperparameters in file name.
         - New post-processing filter option `de-spm` for Google SentencePiece (SPM) processed files.
         - `sacrebleu` is now a dependency as it is now accepted as an early-stopping metric.
         It only makes sense to use it with SPM processed files since they are detokenized
         once post-processed.
         - Added `sklearn` as a dependency for some metrics.
         - Added `momentum` and `nesterov` parameters to `[train]` section for SGD.
         - `ImageEncoder` layer is improved in many ways. Please see the code for further details.
         - Added unmerged upstream [PR](https://github.com/pytorch/pytorch/pull/5297/files) for `ModuleDict()` support.
         - `METEOR` will now fallback to English if language can not be detected from file suffixes.
         - `-f` now produces a separate numpy file for token frequencies when building vocabulary files with `nmtpy-build-vocab`.
         - Added new command `nmtpy test` for non beam-search inference modes.
         - Removed `nmtpy resume` command and added `pretrained_file` option for `[train]` to initialize model weights from a checkpoint.
         - Added `freeze_layers` option for `[train]` to give comma-separated list of layer name prefixes to freeze.
         - Improved seeding: seed is now printed in order to reproduce the results.
         - Added IPython notebook for attention visualization.
         - **Layers**
           - New shallow `SimpleGRUDecoder` layer.
           - `TextEncoder`: Ability to set `maxnorm` and `gradscale` of embeddings and work with or without sorted-length batches.
           - `ConditionalDecoder`: Make it work with GRU/LSTM, allow setting `maxnorm/gradscale` for embeddings.
           - `ConditionalMMDecoder`: Same as above.
         - **nmtpy translate**
           - `--avoid-double` and `--avoid-unk` removed for now.
           - Added Google's length penalty normalization switch `--lp-alpha`.
           - Added ensembling which is enabled automatically if you give more than 1 model checkpoints.
         - New machine learning metric wrappers in `utils/ml_metrics.py`:
           - Label-ranking average precision `lrap`
           - Coverage error
           - Mean reciprocal rank
        
        ### v1.2 (20/02/2018)
        
         - You can now use `$HOME` and `$USER` in your configuration files.
         - Fixed an overflow error that would cause NMT with more than 255 tokens to fail.
         - METEOR worker process is now correctly killed after validations.
         - Many runs of an experiment are now suffixed with a unique random string instead of incremental integers to avoid race conditions in cluster setups.
         - Replaced `utils.nn.get_network_topology()` with a new `Topology` [class](nmtpytorch/utils/topology.py) that will parse the `direction` string of the model in a more smart way.
         - If `CUDA_VISIBLE_DEVICES` is set, the `GPUManager` will always honor it.
         - Dropped creation of temporary/advisory lock files under `/tmp` for GPU reservation.
         - Time measurements during training are now structered into batch overhead, training and evaluation timings.
         - **Datasets**
           - Added `TextDataset` for standalone text file reading.
           - Added `OneHotDataset`, a variant of `TextDataset` where the sequences are not prefixed/suffixed with `<bos>` and `<eos>` respectively.
           - Added experimental `MultiParallelDataset` that merges an arbitrary number of parallel datasets together.
         - **nmtpy translate**
           - `.nodbl` and `.nounk` suffixes are now added to output files for `--avoid-double` and `--avoid-unk` arguments respectively.
           - A model-agnostic enough `beam_search()` is now separated out into its own file `nmtpytorch/search.py`.
           - `max_len` default is increased to 200.
        
        ### v1.1 (25/01/2018)
        
         - New experimental `Multi30kDataset` and `ImageFolderDataset` classes
         - `torchvision` dependency added for CNN support
         - `nmtpy-coco-metrics` now computes one METEOR without `norm=True`
         - Mainloop mechanism is completely refactored with **backward-incompatible**
           configuration option changes for `[train]` section:
            - `patience_delta` option is removed
            - Added `eval_batch_size` to define batch size for GPU beam-search during training
            - `eval_freq` default is now `3000` which means per `3000` minibatches
            - `eval_metrics` now defaults to `loss`. As before, you can provide a list
              of metrics like `bleu,meteor,loss` to compute all of them and early-stop
              based on the first
            - Added `eval_zero (default: False)` which tells to evaluate the model
              once on dev set right before the training starts. Useful for sanity
              checking if you fine-tune a model initialized with pre-trained weights
            - Removed `save_best_n`: we no longer save the best `N` models on dev set
              w.r.t. early-stopping metric
            - Added `save_best_metrics (default: True)` which will save best models
              on dev set w.r.t each metric provided in `eval_metrics`. This kind of
              remedies the removal of `save_best_n`
            - `checkpoint_freq` now to defaults to `5000` which means per `5000`
              minibatches.
            - Added `n_checkpoints (default: 5)` to define the number of last
              checkpoints that will be kept if `checkpoint_freq > 0` i.e. checkpointing enabled
          - Added `ExtendedInterpolation` support to configuration files:
            - You can now define intermediate variables in `.conf` files to avoid
              typing same paths again and again. A variable can be referenced
              from within its **section** using `tensorboard_dir: ${save_path}/tb` notation
              Cross-section references are also possible: `${data:root}` will be replaced
              by the value of the `root` variable defined in the `[data]` section.
          - Added `-p/--pretrained` to `nmtpy train` to initialize the weights of
            the model using another checkpoint `.ckpt`.
          - Improved input/output handling for `nmtpy translate`:
            - `-s` accepts a comma-separated test sets **defined** in the configuration
              file of the experiment to translate them at once. Example: `-s val,newstest2016,newstest2017`
            - The mutually exclusive counterpart of `-s` is `-S` which receives a
              single input file of source sentences.
            - For both cases, an output prefix **should now be** provided with `-o`.
              In the case of multiple test sets, the output prefix will be appended
              the name of the test set and the beam size. If you just provide a single file with `-S`
              the final output name will only reflect the beam size information.
         - Two new arguments for `nmtpy-build-vocab`:
            - `-f`: Stores frequency counts as well inside the final `json` vocabulary
            - `-x`: Does not add special markers `<eos>,<bos>,<unk>,<pad>` into the vocabulary
        
        #### Layers/Architectures
        
         - Added `Fusion()` layer to `concat,sum,mul` an arbitrary number of inputs
         - Added *experimental* `ImageEncoder()` layer to seamlessly plug a VGG or ResNet
           CNN using `torchvision` pretrained models
         - `Attention` layer arguments improved. You can now select the bottleneck
           dimensionality for MLP attention with `att_bottleneck`. The `dot`
           attention is **still not tested** and probably broken.
        
        New layers/architectures:
        
         - Added **AttentiveMNMT** which implements modality-specific multimodal attention
           from the paper [Multimodal Attention for Neural Machine Translation](https://arxiv.org/abs/1609.03976)
         - Added **ShowAttendAndTell** [model](http://www.jmlr.org/proceedings/papers/v37/xuc15.pdf)
        
        Changes in **NMT**:
        
          - `dec_init` defaults to `mean_ctx`, i.e. the decoder will be initialized
            with the mean context computed from the source encoder
          - `enc_lnorm` which was just a placeholder is now removed since we do not
            provided layer-normalization for now
          - Beam Search is completely moved to GPU
        
Keywords: nmt neural-mt translation sequence-to-sequence deep-learning pytorch
Platform: UNKNOWN
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.6
Classifier: Operating System :: POSIX
Requires-Python: ~=3.6
Description-Content-Type: text/markdown
