Metadata-Version: 2.1
Name: ecco
Version: 0.1.0
Summary: Visualization tools for NLP machine learning models.
Home-page: https://github.com/jalammar/ecco
Author: Jay Alammar
Author-email: alammar@gmail.com
License: BSD-3-Clause
Project-URL: Changelog, https://github.com/jalammar/ecco/blob/master/CHANGELOG.rst
Project-URL: Issue Tracker, https://github.com/jalammar/ecco/issues
Keywords: Natural Language Processing,Explainable AI,keyword3
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: Unix
Classifier: Operating System :: POSIX
Classifier: Operating System :: Microsoft :: Windows
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Topic :: Utilities
Requires-Python: !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*
Requires-Dist: transformers (~=4.2)
Requires-Dist: seaborn (~=0.11)
Requires-Dist: scikit-learn (~=0.23)
Requires-Dist: PyYAML (~=5.4)
Requires-Dist: captum (~=0.4)
Provides-Extra: dev
Requires-Dist: pytest (>=6.1) ; extra == 'dev'


..  image:: https://ar.pegg.io/img/ecco-logo-w-800.png
    :alt: Ecco Logo




Ecco is a python library for explaining Natural Language Processing models using interactive visualizations.

It provides multiple interfaces to aid the explanation and intuition of `Transformer
<https://jalammar.github.io/illustrated-transformer/>`_-based language models. Read: `Interfaces for Explaining Transformer Language Models <https://jalammar.github.io/explaining-transformers/>`_.

Ecco runs inside Jupyter notebooks. It is built on top of `pytorch
<https://pytorch.org/>`_ and `transformers
<https://github.com/huggingface/transformers>`_.

The library is currently an alpha release of a research project. Not production ready. You're welcome to contribute to make it better!

Installation
============


.. code-block:: python

    # Assuming you had PyTorch previously installed
    pip install ecco


Documentation
=============


To use the project:

.. code-block:: python

    import ecco

    # Load pre-trained language model. Setting 'activations' to True tells Ecco to capture neuron activations.
    lm = ecco.from_pretrained('distilgpt2', activations=True)

    # Input text
    text = "The countries of the European Union are:\n1. Austria\n2. Belgium\n3. Bulgaria\n4."

    # Generate 20 tokens to complete the input text.
    output = lm.generate(text, generate=20, do_sample=True)

    # Ecco will output each token as it is generated.

    # 'output' now contains the data captured from this run, including the input and output tokens
    # as well as neuron activations and input saliency values. 

    # To view the input saliency
    output.saliency()

This does the following:

1. It loads a pretrained Huggingface DistilGPT2 model. It wraps it an ecco ``LM`` object that does useful things (e.g. it calculates input saliency, can collect neuron activations).
2. We tell the model to generate 20 tokens.
3. The model returns an ecco ``OutputSeq`` object. This object holds the output sequence, but also a lot of data generated by the generation run, including the input sequence and input saliency values. If we set ``activations=True`` in ``from_pretrained()``, then this would also contain neuron activation values.
4. ``output`` can now produce various interactive explorables. Examples include:

- ``output.saliency()`` to generate input saliency explorable [`Input Saliency Colab Notebook <https://colab.research.google.com/github/jalammar/ecco/blob/main/notebooks/Ecco_Input_Saliency.ipynb>`_]
- ``output.run_nmf()`` to to explore non-negative matrix factorization of neuron activations  [`Neuron Activation Colab Notebook <https://colab.research.google.com/github/jalammar/ecco/blob/main/notebooks/Ecco_Neuron_Factors.ipynb>`_]


.. code-block:: python

    # To view the input saliency explorable
    output.saliency()

    # to view input saliency with more details (a bar and % value for each token)
    output.saliency(style="detailed")

    # output.activations contains the neuron activation values. it has the shape: (layer, neuron, token position)

    # We can run non-negative matrix factorization using run_nmf. We pass the number of factors/components to break down into
    nmf_1 = output.run_nmf(n_components=10) 

    # nmf_1 now contains the necessary data to create the interactive nmf explorable:
    nmf_1.explore()




Changelog
=========

0.0.8 (2020-11-20)
------------------

* Allowing the project some fresh air.

