Metadata-Version: 2.1
Name: modmod
Version: 0.2.3
Summary: modular models for efficient ML development
Home-page: https://github.com/Remesh/modmod
Author: Nicholas Tietz-Sokolsky
Author-email: me@ntietz.com
License: Apache 2.0
Platform: UNKNOWN
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Requires-Python: >=3.6.0
Description-Content-Type: text/markdown



# modmod

modmod is a library for making *Mod*-ular *Mod*-els. The primary problem that
modmod solves is how to load models at runtime without instantiating them
multiple times; in that respect, it is essentially a dependency injection
system for models.

# Installation

To use modmod, just install it with your package manager in the usual way. If
you use [Pipenv](https://docs.pipenv.org/), you can copy/paste this:

```
pipenv install modmod
```

# Usage

There are two main pieces of modmod: Models and Pools.

A `Pool` is a container for models. A `Model` can be treated like an augmented
function which is a `Model` factory.

Here's an example of defining the simplest possible model:

```
from modmod.model import Model

class AddThings(Model):
    def call(self, x: int, y: int) -> int:
        return x + y
```

And here is how you would use it:

```
import modmod.pool

pool = modmod.pool.get()

adder = pool.get(AddThings)

z = adder(1, 2)
print(z) # prints 3
```

You can also take a shortcut to get the model:

```
adder = AddThings.get()
```

However, this should never be done inside a model, bceause it will use the
default pool and will have strange side effects if anyone tries to use your
model in a non-default pool.

## Models with initialization

Sometimes a model needs to be initialized to load in data or do other one-time
startup tasks. To do this, you just override the constructor and the `create`
method. Here's an example for stripping stopwords:

```
import nltk
from modmod.model import Model

class RemoveStopwords(Model):
  def __init__(self, pool, config, stopwords):
    super().__init__(pool, config)
    self.stopwords = stopwords

  @classmethod
  def create(cls, pool, config):
    nltk.download('stopwords')
    stopwords = nltk.corpus.stopwords.words('english')
    stopwords.append('')
    stopwords.remove('not')
    stopwords.remove('no')
    return RemoveStopwords(pool, config, stopwords)

  def call(self, words: List[str]) -> List[str]:
    return list(filter(lambda w: w not in self.stopwords, words))
```

The `create` method is invoked when you call `RemoveStopwords.get()`. It is
only called the _first_ time you get a model; after that, the created model
lives in the pool, and it will not be re-initialized.

*Why are *`__init__`* and *`create`* both required?* This is a good question.
The reason comes down to configurability and use in testing environments.
In the example above, if you wanted to experiment with a new list of
stopwords, you could use the constructor to create a model with that list and
then add it into the pool:
```
pool = modmod.pool.get('stopwords-experiment')
config = {}

remove_new_stopwords = RemoveStopwords(pool, config, ['stop', 'word', 'list'])
pool.add_model(remove_new_stopwords, RemoveStopwords)
```
Once it's added to the pool, any calls to
`RemoveStopwords.get('stopwords-experiment')` will find and retrieve the
manually created model.

Note: `create` is generally overridden if you have to do a heavy operation,
like downloading a file or reading in some data. If you are just using the pool
and the config object, it's perfectly acceptable to override `__init__` and
leave the default behavior for `create`.


## Configuring the pool

Every model gets configuration passed into them, and this comes from the pool.
So, if you need configuration, you need to configure the pool.

**Note:** the pool must be configured *before* you get any models, since
configuring it overwrites the existing pool.

To configure the default pool:

```
import modmod.pool

config = {'opt1': 2}

modmod.pool.configure(config)
```

## Non-default Pools

Sometimes you will want separate pools for separate tasks. One example of this
is for unit testing: you may want to test with multiple configurations of the
model. To do this, you can use separate pools.

The first step is to configure the pool:

```
import modmod.pool

poolname = 'my-pool'
config = {'opt1': 2}

modmod.pool.configure(config, poolname)
```

The second step is just to use the pool!

```
import modmod.pool

pool = modmod.pool.get('my-pool')

adder = pool.get(AddThings)
# Equivalent:
adder = AddThings.get('my-pool')
```

# Roadmap

We have a few initiatives on the roadmap. Each of these will be a version bump:

* [ ] Add support for data and model versioning, add support for model training
* [ ] Add hooks for profiling, debugging, caching



