Metadata-Version: 2.1
Name: distrel
Version: 0.2.1
Summary: Calculate the relationship between parameters with known distributions.
License: AGPL-3.0-or-later
Author: Alex
Author-email: adrysdale@protonmail.com
Requires-Python: >=3.11,<4.0
Classifier: License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: nevergrad (>=1.0.1,<2.0.0)
Requires-Dist: scipy (>=1.11.4,<2.0.0)
Requires-Dist: torch (>=2.1.1,<3.0.0)
Requires-Dist: tqdm (>=4.66.1,<5.0.0)
Description-Content-Type: text/markdown

# Distrel

Distrel (distribution relations) is a package used to approximate the relationship between dependent distributions when only the properties of the distribution are known.

## Example use case

Suppose we have 3 parameters with known distributions, such that: $`a \sim \mathcal{N}(\mu_a, \sigma_a)`$, $`b \sim \mathcal{N}(\mu_b, \sigma_b)`$ and $`c \sim \mathcal{N}(\mu_c, \sigma_c)`$ and distributions are related by $c = a \times b$.

Distrel attempts to solve the question, how do we correctly pseudo-randomly sample from $a$ and $b$ such then $c$ follows the correct distribution.

### How it works

Distrel is a wrapper around a simple neural network written in [pytorch](https://pytorch.org/) that finds the function $`f_{\mu}(a)`$ and $`f_{\sigma}(a)`$ from:

$`b \sim \mathcal{N}(f_{\mu}(a), f_{\sigma}(a))`$

such that:

$`a \times b \sim \mathcal{N}(\mu_c, \sigma_c)`$

That is, Distrel assumes that $a$ and $b$ are weakly related and hence samples $b$ from a normal distribution defined by a particular value of $a$.

Below is a diagram of the distrel neural network architecture:

![A diagram of the network architectures](drn.png) 

## Installation

With [poetry](https://python-poetry.org/)
```bash
poetry add distrel
```

or with [pip](https://pypi.org/)

```bash
pip install distrel
```

## Usage

To make things interesting, let's consider the relationship $a \times b = c$ but each distribution is a unique non-central t-distribution.
The key thing to note is that non-central t-distributions are defined by their degrees of freedom and a non-centrality parameter.
We could use any distribution but currently distributions can only be characterised by their mean, variance, skewness and kurtosis - if more properties are required then please do open an issue.
```python
from scipy.stats import nct
import distrel

# Defines the degrees of freedom (df) and non-centrality parameter (nc) for each distribution
a_df, a_nc = 9, 52
b_df, b_nc = 4, 30
c_df, c_nc = 143, 90


```

The distribution relation network (drn) needs three things:
    - A generator function to generate the input data.
    - A calculator function to obtain the 3rd distribution from the output of the generator and the output of the neural network.
    - The properties of each of the 3 distributions.
    
```python
# We need to define a generator function that can generate samples of A.
# It must take N (the number of samples as it's only argument.
# It doesn't matter that gen_a doesn't return a torch tensor, this is accounted for internally.
def gen_a(N):
    return nct.rvs(a_df, a_nc, size=N)
    
# We also need to define a function that calculates $c$ from $a$ and $b$.
# That is, the neural network will take input $a$, generate $b$ and then calculate $c$ from $a$ and 
# $b$.
def calc_c(a, b):
    return a * b
    
# Now we define the distribution relation network
drn = distrel.drn(gen=gen_a, calc=calc_c, seed=42)



# We need to define the distribution properties with the following:
# nct.stats returns mean, var, skew, kurt
a_mean, a_var, a_skew, a_kurt = nct.stats(a_df, a_nc, moments='mvsk')
b_mean, b_var, b_skew, b_kurt = nct.stats(b_df, b_nc, moments='mvsk')
c_mean, c_var, c_skew, c_kurt = nct.stats(c_df, c_nc, moments='mvsk')

drn.set_properties(
    {"mean": a_mean, "var": a_var, "skew": a_skew, "kurt": a_kurt}, 
    {"mean": b_mean, "var": b_var, "skew": b_skew, "kurt": b_kurt}, 
    {"mean": c_mean, "var": c_var, "skew": c_skew, "kurt": c_kurt},
)
# We could have alternatively initialised the drn with the following
# drn = distrel.drn(
#    gen=gen_a, 
#    calc=calc_c, 
#    properties=[
#        {"mean": a_mean, "var": a_var},
#        {"mean": b_mean, "var": b_var},
#        {"mean": c_mean, "var": c_var},
#    ],
#)
# This also shows that we don't /have/ to test for all four metrics.
# Currently, only mean, variance, skewness and kurtosis are supported.
```


Now the drn is ready to train, a tolerance can be provided to allow for early stopping if ALL of 
the distribution properties are within a percentage tolerance.
The tolerance defaults to zero which means no early stopping occurs.
Additionally, the maximum number of epochs can be set (this defaults to 100) and then drn will
stop training upon reaching this number regardless of performance.

The training starts off with gradient free optimisation before switching to a gradient based
optimiser.
To disable the gradient free optimiser, set budget to 1.

You can set the number of samples generated by the generator with N.

For reproducible results, you can also set the seed which sets the seed for pytorch, numpy and python's random.

`drn.train` returns a boolean, True if the training exited via to early stopping due to 
convergence, and False otherwise.
```python
converge = drn.train(
    max_epochs=1000, 
    tol=1e-2, 
    progress_bar=False,
    lr=1e-3
    budget=1000,
    N=1000,
    seed=42,
)
```

If you want more fine grain controlled over the optimiser - you can do that!
```python
from torch.optim import LBFGS

converge = drn.train(
    max_epochs=max_epochs,
    tol=tol,
    lr=lr,
    progress_bar=True,
    require_closure=True,
    optim=LBFGS,
    optim_kwargs={'line_search_fn': 'strong_wolfe'},
)

print(f"Converged?:\t{converge}")
```
The `optim_kwargs` are unpacked when calling the optimiser.
When the optimiser is called internally, set `require_closure` to True if you'd like to call:
```python
optim.step(closure)
```
or `require_closure` to False if you'd like to call:
```python
optim.step()
```

```python
# To view the network waits simply print the drn
print(drn)
```
Which returns something like:
```example
        _l2_ mu
in _l1_/
       \_l3_ sigma
out ~ N(mu, sigma)
---

l1:
Linear(in_features=1, out_features=1, bias=True)
Bias: -3.3627309799194336
Weight: 0.027783172205090523

l2:
Linear(in_features=1, out_features=1, bias=True)
Bias: 0.14515426754951477
Weight: 3.0655031204223633

l3:
Linear(in_features=1, out_features=1, bias=True)
Bias: -13.486244201660156
Weight: 2.5820298194885254
```

```python
# To use the drn as a predictor, call the predict method, a seed is optional but recommend for.
# reproducible results.
new_a = 37
new_b = drn.predict(37, seed=42)
new_c = drn.calc(new_a, new_b) # Wrapps around the calc_c function
```

## Contributing

Please feel free to submit an issue for something you'd like to see fixed or implemented.
Alternatively merge requests are always welcome!

## Copying

This project is licenced under the GNU GPL version 3 license - to read the license see [LICENSE](LICENSE).

