Metadata-Version: 2.1
Name: whatlies
Version: 0.5.0
Summary: Tools to help uncover `whatlies` in word embeddings.
Home-page: https://rasahq.github.io/whatlies/
Author: Vincent D. Warmerdam
License: UNKNOWN
Project-URL: Documentation, https://rasahq.github.io/whatlies/
Project-URL: Source Code, https://github.com/RasaHQ/whatlies/
Project-URL: Issue Tracker, https://github.com/RasaHQ/whatlies/issues
Platform: UNKNOWN
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: License :: OSI Approved :: MIT License
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Description-Content-Type: text/markdown
Requires-Dist: numpy (>=1.16.0)
Requires-Dist: scipy (>=1.2.0)
Requires-Dist: scikit-learn (>=0.20.2)
Requires-Dist: umap-learn (>=0.3.10)
Requires-Dist: altair (>=4.0.1)
Requires-Dist: matplotlib (>=3.2.0)
Requires-Dist: spacy (>=2.2.3)
Requires-Dist: spacy-lookups-data (>=0.3.2)
Requires-Dist: networkx (>=2.4)
Requires-Dist: sense2vec (>=1.0.2)
Requires-Dist: fasttext (>=0.9.1)
Requires-Dist: bpemb (>=0.3.0)
Requires-Dist: gensim (>=3.8.3)
Provides-Extra: all
Requires-Dist: tensorflow (>=2.3.0) ; extra == 'all'
Requires-Dist: tensorflow-text (>=2.3.0) ; extra == 'all'
Requires-Dist: tensorflow-hub (>=0.8.0) ; extra == 'all'
Requires-Dist: transformers (>=3.0.0) ; extra == 'all'
Requires-Dist: ivis[cpu] (>=1.8.0) ; extra == 'all'
Requires-Dist: opentsne (>=0.4.3) ; extra == 'all'
Provides-Extra: base
Requires-Dist: numpy (>=1.16.0) ; extra == 'base'
Requires-Dist: scipy (>=1.2.0) ; extra == 'base'
Requires-Dist: scikit-learn (>=0.20.2) ; extra == 'base'
Requires-Dist: umap-learn (>=0.3.10) ; extra == 'base'
Requires-Dist: altair (>=4.0.1) ; extra == 'base'
Requires-Dist: matplotlib (>=3.2.0) ; extra == 'base'
Requires-Dist: spacy (>=2.2.3) ; extra == 'base'
Requires-Dist: spacy-lookups-data (>=0.3.2) ; extra == 'base'
Requires-Dist: networkx (>=2.4) ; extra == 'base'
Requires-Dist: sense2vec (>=1.0.2) ; extra == 'base'
Requires-Dist: fasttext (>=0.9.1) ; extra == 'base'
Requires-Dist: bpemb (>=0.3.0) ; extra == 'base'
Requires-Dist: gensim (>=3.8.3) ; extra == 'base'
Provides-Extra: dev
Requires-Dist: mkdocs (==1.1) ; extra == 'dev'
Requires-Dist: mkdocs-material (==4.6.3) ; extra == 'dev'
Requires-Dist: mkdocstrings (==0.8.0) ; extra == 'dev'
Requires-Dist: jupyterlab (>=0.35.4) ; extra == 'dev'
Requires-Dist: nbstripout (>=0.3.7) ; extra == 'dev'
Requires-Dist: nbval (>=0.9.5) ; extra == 'dev'
Requires-Dist: torch (>=1.6.0) ; extra == 'dev'
Requires-Dist: flake8 (>=3.6.0) ; extra == 'dev'
Requires-Dist: pytest (>=4.0.2) ; extra == 'dev'
Requires-Dist: black (>=19.3b0) ; extra == 'dev'
Requires-Dist: pytest-cov (>=2.6.1) ; extra == 'dev'
Requires-Dist: pre-commit (>=2.2.0) ; extra == 'dev'
Requires-Dist: tensorflow (>=2.3.0) ; extra == 'dev'
Requires-Dist: tensorflow-text (>=2.3.0) ; extra == 'dev'
Requires-Dist: tensorflow-hub (>=0.8.0) ; extra == 'dev'
Requires-Dist: transformers (>=3.0.0) ; extra == 'dev'
Requires-Dist: ivis[cpu] (>=1.8.0) ; extra == 'dev'
Requires-Dist: opentsne (>=0.4.3) ; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs (==1.1) ; extra == 'docs'
Requires-Dist: mkdocs-material (==4.6.3) ; extra == 'docs'
Requires-Dist: mkdocstrings (==0.8.0) ; extra == 'docs'
Requires-Dist: jupyterlab (>=0.35.4) ; extra == 'docs'
Requires-Dist: nbstripout (>=0.3.7) ; extra == 'docs'
Requires-Dist: nbval (>=0.9.5) ; extra == 'docs'
Provides-Extra: ivis
Requires-Dist: ivis[cpu] (>=1.8.0) ; extra == 'ivis'
Provides-Extra: opentsne
Requires-Dist: opentsne (>=0.4.3) ; extra == 'opentsne'
Provides-Extra: test
Requires-Dist: torch (>=1.6.0) ; extra == 'test'
Requires-Dist: flake8 (>=3.6.0) ; extra == 'test'
Requires-Dist: pytest (>=4.0.2) ; extra == 'test'
Requires-Dist: black (>=19.3b0) ; extra == 'test'
Requires-Dist: pytest-cov (>=2.6.1) ; extra == 'test'
Requires-Dist: nbval (>=0.9.5) ; extra == 'test'
Requires-Dist: pre-commit (>=2.2.0) ; extra == 'test'
Provides-Extra: tfhub
Requires-Dist: tensorflow (>=2.3.0) ; extra == 'tfhub'
Requires-Dist: tensorflow-text (>=2.3.0) ; extra == 'tfhub'
Requires-Dist: tensorflow-hub (>=0.8.0) ; extra == 'tfhub'
Provides-Extra: transformers
Requires-Dist: transformers (>=3.0.0) ; extra == 'transformers'

![](https://img.shields.io/pypi/v/whatlies)
![](https://img.shields.io/pypi/pyversions/whatlies)
![](https://img.shields.io/github/license/rasahq/whatlies)
[![Downloads](https://pepy.tech/badge/whatlies)](https://pepy.tech/project/whatlies)

<img src="docs/logo.png" width=255 height=255 align="right">

# whatlies

A library that tries help you to understand (note the pun).

> "What lies in word embeddings?"

This small library offers tools to make visualisation easier of both
word embeddings as well as operations on them. 

Feedback is welcome.

<img src="docs/square-logo.svg" width=200 height=200 align="right">

## Produced

This project was initiated at [Rasa](https://rasa.com) as a fun side project
that supports the research and developer advocacy teams at Rasa.
It is maintained by Vincent D. Warmerdam, Research Advocate at Rasa.

## Getting Started

For a quick overview, check out our introductory video on
[youtube](https://www.youtube.com/watch?v=FwkwC7IJWO0&list=PL75e0qA87dlG-za8eLI6t0_Pbxafk-cxb&index=9&t=0s). More
in depth getting started guides can be found on the [documentation page](https://rasahq.github.io/whatlies/).

## Features

The idea is that you can load embeddings from a language backend
and use mathematical operations on it.

```python
from whatlies import EmbeddingSet
from whatlies.language import SpacyLanguage

lang = SpacyLanguage("en_core_web_md")
words = ["cat", "dog", "fish", "kitten", "man", "woman",
         "king", "queen", "doctor", "nurse"]

emb = EmbeddingSet(*[lang[w] for w in words])
emb.plot_interactive(x_axis=emb["man"], y_axis=emb["woman"])
```

![](docs/gif-zero.gif)

You can even do fancy operations. Like projecting onto and away
from vector embeddings! You can perform these on embeddings as
well as sets of embeddings.  In the example below we attempt
to filter away gender bias using linear algebra operations.

```python
orig_chart = emb.plot_interactive('man', 'woman')

new_ts = emb | (emb['king'] - emb['queen'])
new_chart = new_ts.plot_interactive('man', 'woman')
```

![](docs/gif-one.gif)

There's also things like **pca** and **umap**.

```python
from whatlies.transformers import pca, umap

orig_chart = emb.plot_interactive('man', 'woman')
pca_plot = emb.transform(pca(2)).plot_interactive('pca_0', 'pca_1')
umap_plot = emb.transform(umap(2)).plot_interactive('umap_0', 'umap_1')

pca_plot | umap_plot
```

![](docs/gif-two.gif)

We even allow for BERT-style embeddings. Just use the square brackets.

```python
lang = SpacyLanguage("en_trf_robertabase_lg")
lang['programming in [python]']
```

You'll now get the embedding for the token "python" but in context of "programming in python".

## Documentation

To learn more and for a getting started guide, check out the [documentation](https://rasahq.github.io/whatlies/).

## Installation

To install the package as well as all the dependencies, simply run;

```bash
pip install whatlies
```

## Similar Projects

There are some projects out there who are working on similar tools and we figured it fair to mention and compare them here.

##### Julia Bazińska & Piotr Migdal Web App

The original inspiration for this project came from [this web app](https://lamyiowce.github.io/word2viz/) and [this pydata talk](https://www.youtube.com/watch?v=AGgCqpouKSs). It is a web app that takes a while to slow
but it is really fun to play with. The goal of this project is to make it
easier to make similar charts from jupyter using different language backends.


##### Tensorflow Projector

From google there's the [tensorflow projector project](https://projector.tensorflow.org/). It offers
highly interactive 3d visualisations as well as some transformations via tensorboard.

- The tensorflow projector will create projections in tensorboard, which you can also load
into jupyter notebook but whatlies makes visualisations directly.
- The tensorflow projector supports interactive 3d visuals, which whatlies currently doesn't.
- Whatlies offers lego bricks that you can chain together to get a visualisation started. This
also means that you're more flexible when it comes to transforming data before visualising it.

##### Parallax

From Uber AI Labs there's [parallax](https://github.com/uber-research/parallax) which is described
in a paper [here](https://arxiv.org/abs/1905.12099). There's a common mindset in the two tools;
the goal is to use arbitrary user defined projections to understand embedding spaces better.
That said, some differences that are worth to mention.

- It relies on bokeh as a visualisation backend and offers a lot of visualisation types
(like radar plots). Whatlies uses altair and tries to stick to simple scatter charts.
Altair can export interactive html/svg but it will not scale as well if you've drawing
many points at the same time.
- Parallax is meant to be run as a stand-alone app from the command line while Whatlies is
meant to be run from the jupyter notebook.
- Parallax gives a full user interface while Whatlies offers lego bricks that you can chain
together to get a visualisation started.
- Whatlies relies on language backends to fetch word embeddings. Parallax allows you to instead
fetch raw files on disk.
- Parallax has been around for a while, Whatlies is more new and therefore more experimental.

## Local Development

If you want to develop locally you can start by running this command.

```bash
make develop
```

### Documentation

This is generated via

```
make docs
```


