Metadata-Version: 2.1
Name: doframework
Version: 0.2.1
Summary: A testing framework for decision optimization model learning algorithms.
Home-page: https://github.com/IBM/doframework
Author: Orit Davidovich
Author-email: orit.davidovich@ibm.com
License: Apache-2.0
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: matplotlib
Requires-Dist: ray[default,k8s,serve] (>=1.13.0)
Requires-Dist: rayvens (>=0.7.0)
Requires-Dist: ibm-cos-sdk (>=2.10.0)
Requires-Dist: boto3 (>=1.17.110)
Requires-Dist: aiohttp (>=3.7.4)
Requires-Dist: aioredis (>=1.3.1)
Requires-Dist: scikit-learn (>=0.24.1)
Requires-Dist: scipy
Requires-Dist: PuLP (>=2.4)
Requires-Dist: GPy (>=1.9.9)

<!--
# Copyright IBM Corporation 2022
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
-->

# DOFramework

`doframework` is a testing framework for decision-optimization model learning algorithms. Such algorithms learn part or all of a decision-optimization model from data and solve the model to produce a predicted optimal solution. 

`doframework` randomly generates multiple optimization problems (f,O,D,x*) for your algorithm to learn and solve: 
* f is a continuous piece-wise linear function defined over a domain in d-dimensional space (d>1),
* O is a feasibility region in dom(f) defined by linear constraints,
* D = (X,y) is a dataset derived from f,
* x* is the true optimum of f in O (minimum or maximum).

`doframework` feeds your algorithm constraints and data (O,D) and collects its predicted optimum. The algorithm's predicted optimal value can then be compared to the true optimal value f(x*). By comparing the two over multiple randomly generated optimization problems, `doframework` produces a **prediction profile** for your algorithm.

`doframework` integrates with your algorithm (written in Python).

# Design

`doframework` was designed for optimal cloud distribution following an event-driven approach. 

`doframework` was built on top of [ray](https://www.ray.io/ "Ray -- fast and simple distributed computing") for cloud distribution and [rayvens](https://github.com/project-codeflare/rayvens "Rayvens augments Ray with events") for event driven management.

# Requirements

`doframework` was written for Python version >= 3.8.0. 

`doframework` can run either locally or remotely. For optimal performance, run it on a Kubernetes cluster. Cloud configuration is currently available for AWS and IBM Cloud [OpenShift](https://docs.openshift.com/ "RedHat OpenShift Documentation") clusters.

The framework uses storage (local or S3) to interact with simulation products. Configuration is currently available for [AWS](https://aws.amazon.com/s3/ "AWS S3") or [IBM Cloud Object Storage COS](https://www.ibm.com/cloud/object-storage "IBM Cloud Object Storage").

# Install

To run `doframework` locally, install with

```
$ pip install doframework
```

# Configs

Storage specifications are provided in a `configs.yaml`. You'll find examples under `./configs/*`.

The `configs.yaml` includes the list of source and target bucket names (under `buckets`). If necessary, S3 credentials are added under designated fields.

Here is the format of the `configs.yaml` either for local storage
```
local:
    buckets:
        inputs: '<inputs-folder>'
        inputs_dest: '<inputs-dest-folder>'
        objectives: '<objectives-folder>'
        objectives_dest: '<objectives-dest-folder>'
        data: '<data-folder>'
        data_dest: '<data-dest-folder>'
        solutions: '<solutions-folder>'
```
or S3
```
s3:
    buckets:
        inputs: '<inputs-bucket>'
        inputs_dest: '<inputs-dest-bucket>'
        objectives: '<objectives-bucket>'
        objectives_dest: '<objectives-dest-bucket>'
        data: '<data-bucket>'
        data_dest: '<data-dest-bucket>'
        solutions: '<solutions-bucket>'
    aws_secret_access_key: 'xxxx'
    aws_access_key_id: 'xxxx'
    endpoint_url: 'https://xxx.xxx.xxx'
    region: 'xx-xxxx'
    cloud_service_provider: 'aws'
```
Currently, two S3 providers are available under `s3:cloud_service_provider`: either `aws` or `ibm`. The `endpoint_url` is _optional_ for AWS.

**Bucket / folder names must be distinct**.

# Inputs

`input.json` files provide the necessary metadata for the random genration of optimization problems.

`doframework` will run end to end, once `input.json` files are uploaded to `<inputs-bucket>` / `<inputs-folder>`. 

The jupyter notebook `./notebooks/inputs.ipynb` allows you to automatically generate input files and upload them to `<inputs-bucket>`.

Here is an example of an input file (see input samples `input_basic.json` under `./inputs`).


```
{     
    "f": {
        "vertices": {
            "num": 7,
            "range": [[5.0,20.0],[0.0,10.0]],
        },
        "values": {
            "range": [0.0,5.0]
        },
    },
    "omega" : {
        "ratio": 0.8
    },
    "data" : {
        "N": 750,
        "noise": 0.01,
        "policy_num": 2,
        "scale": 0.4
    },
    "input_file_name": "input_basic.json"
}
```

`f:vertices:num`: number of vertices in the piece-wise linear graph of f.<br>
`f:vertices:range`: f domain will be inside this range.<br>
`f:values:range`: range of f values.<br>
`omega:ratio`: vol(O) / vol(dom(f)) >= ratio.<br>
`data:N`: number of data points to sample.<br>
`data:noise`: response variable noise.<br>
`data:policy_num`: number of centers in Gaussian mix distribution of data.<br>
`data:scale`: max STD of Gaussian mix distribution of data (as a ratio of domain diameter).

It's a good idea to start experimenting on low-dimensional problems. 

# User App Integration

Your algorithm will be integrated into `doframework` once it is decorated with `doframework.resolve`. 

A `doframework` experiment runs with `doframework.run()`. The `run()` utility accepts the decorated model and an absolute path to the `configs.yaml`.

Here is an example a user application `module.py`.

```
import doframework as dof

@dof.resolve
def alg(data: np.array, constraints: np.array, **kwargs):
    ...    
    return optimal_arg, optimal_val, regression_model

if __name__ == '__main__':

    dof.run(alg, 'configs.yaml', objectives=5, datasets=3, **kwargs)
```

`doframework` provides the following inputs to your algorithm: 

`data`: 2D np.array with features X = data[ : , :-1] and response variable y = data[ : ,-1].<br>
`constraints`: linear constraints as a 2D numpy array A. A data point x satisfies the constraints when A[ : , :-1]*x + A[ : ,-1] <= 0.<br>

It feeds your algorithm additional inputs in kwargs: 

`lower_bound`: lower bound per feature variable.<br>
`upper_bound`: upper bound per feature variable.<br>
`init_value`: optional initial value.<br>

The `run()` utility accepts the arguments:

`objectives`: number of objective targets to generate per input file.<br>
`datasets`: number of datasets to generate per objective target.<br>
`distribute`: True to run distributively, False to run sequentially.<br>
`logger`: True to see `doframework` logs, False otherwise.<br>
`after_idle_for`: stop running when event stream is idle after this many seconds.<br>
`alg_num_cpus`: number of CPUs to dedicate to your algorithm on each optimization task.<br>
`data_num_cpus`: number of CPUs to dedicate to data generation (useful in high dimensions).


# Algorithm Prediction Profile

Once you are done running a `doframework` experiment, run the notebook `notebooks/profile.ipynb`. It will fetch the relevant experiment products from the target buckets and produce the algorithm's prediction profile and prediction probabilities.

`doframework` produces three types of experiment product files:

* `objective.json`: containing information on (f,O,x*) 
* `data.csv`: containing the dataset the algorithm accepts as input
* `solution.json`: containing the algorithm's predicted optimum

See sample files under `./outputs`.

# Kubernetes Cluster

To run `doframework` on a K8S cluster, make sure you are on the cluster's local `kubectl` context. Log into your cluster, if necessary (applicable to OpenShift, see `./doc/openshift.md`).

You can check your local `kubectl` context and change it if necessary with
```
$ kubectl config current-context
$ kubectl config get-contexts
$ kubectl config use-context cluster_name
>> Switched to context "cluster_name".
```

Now `cd` into your project's folder and run the setup bash script `doframework-setup.sh`. The setup script will generate the cluster configuration file `doframework.yaml` in your project's folder. The setup script requires the absolute path to your `configs.yaml`. Running the setup `.sh` script will establish the `ray` cluster. 

```
$ cd <user_project_folder>
$ doframework-setup.sh --configs ~/path/to/configs.yaml
```

You have the option to adapt `doframework.yaml` to fit your application. 

Use the flag `--project-requirements` to specify the absolute path to your `requirements.txt` file. It will be `pip install -r requirements.txt` on your cluster nodes. 

Use the flag `--project-dir` to specify the absolute path to your project. It will be pip installed on your cluster nodes. 
```
$ doframework-setup.sh --configs ~/path/to/configs.yaml --project-requirements <absolute_requirements_path> --project-dir <absolute_project_path>
```

Use the `--skip` flag to skip re-generating the `doframework.yaml`.
```
$ doframework-setup.sh --skip
```
Or, in case you are familiar with `ray`, run instead 
```
$ ray up doframework.yaml --no-config-cache --yes
```
Upload `input.json` file(s) to your `<inputs_bucket>`. Now you can submit your application `module.py` to the cluster
```
$ ray submit doframework.yaml module.py
```

# Ray Cluster

To observe the `ray` dashboard, connect to `http://localhost:8265` in your browser. See `./doc/openshift.md` for OpenShift-specific instructions.

Some useful health-check commands: 

* Check the status of `ray` pods
```
$ kubectl get pods -n ray
```
* Check the status of the `ray` head node
```
$ kubectl describe pod rayvens-cluster-head-xxxxx -n ray
```
* Monitor autoscaling with
```
$ ray exec doframework.yaml 'tail -n 100 -f /tmp/ray/session_latest/logs/monitor*'
```
* Connect to a terminal on the head node
```
$ ray attach doframework.yaml
$ ...
$ exit
```
* Get a remote shell to the cluster manually (find the head node ID with `kubectl describe`)
```
$ kubectl -n ray exec -it rayvens-cluster-head-z97wc -- bash
```

After introducing manual changes to `doframework.yaml`, update with
```
$ ray up doframework.yaml --no-config-cache --yes
```

Shutdown the `ray` cluster with
```
$ ray down -y doframework.yaml
```

# Test

Run the setup bash script `doframework-setup.sh` with the `--example` flag to generate the test script  `doframework_example.py` in your project folder.
```
$ cd <user_project_folder>
$ doframework-setup.sh  --configs ~/path/to/configs.yaml --example
```

To run the test script locally, use
```
$ python doframework_example.py --configs ~/path/to/configs.yaml
```

To run the test script on your K8S cluster, use
```
$ ray submit doframework.yaml doframework_example.py --configs configs.yaml
```
[NOTE: we are using the path to the `configs.yaml` that was mounted on cluster nodes under `$HOME`.]

Make sure to upload input json files to `<inputs-bucket>` / `<inputs-folder>` once you run `doframework_example.py`.



