Metadata-Version: 2.1
Name: cloudsecpy
Version: 0.1.1
Summary: Library and toolkit for formally analyzing security policies in cloud systems using Python.
Author: Joe Stubbs
Author-email: jstubbs@tacc.utexas.edu
Requires-Python: >=3.8,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: backcall (==0.2.0)
Requires-Dist: boto (==2.49.0)
Requires-Dist: cvc5 (==1.0.1)
Requires-Dist: decorator (==5.0.9)
Requires-Dist: ipython (==7.16.1)
Requires-Dist: ipython-genutils (==0.2.0)
Requires-Dist: jedi (==0.17.2)
Requires-Dist: parso (==0.7.1)
Requires-Dist: pexpect (==4.8.0)
Requires-Dist: pickleshare (==0.7.5)
Requires-Dist: prompt-toolkit (==3.0.20)
Requires-Dist: ptyprocess (==0.7.0)
Requires-Dist: pygments (==2.10.0)
Requires-Dist: six (==1.16.0)
Requires-Dist: traitlets (==4.3.3)
Requires-Dist: wcwidth (==0.2.5)
Requires-Dist: z3-solver (==4.8.12.0)
Description-Content-Type: text/markdown

# About `cloudsec`
CloudSec (`cloudsecpy` on PyPI) is a library and toolkit for formally analyzing security policies in cloud and API systems using Python. It has a modular and extensible architecture allowing it to support multiple backends; 
currently, [z3](https://github.com/z3prover) and [cvc5](https://cvc5.github.io/) are supported, but more backends will be supported in the future. 
The CloudSec project takes influence from a number of related projects, such as the [Z3 Firewall Checker](https://github.com/Z3Prover/FirewallChecker).


## Table of Contents

1. [About](#about-cloudsec)
2. [Background](#background)
3. [Introduction](#introduction)
4. [Installation](#installation)
5. [Building the Images From Source](#building-the-images-from-source)
6. [Trying the Examples with Docker](#trying-the-examples-with-docker)
7. [Beyond the Introduction: Component Types, Policy Specifications, Variables and Policy Templates](#beyond-the-introduction-component-types-policy-specifications-variables-and-policy-templates)

   7a. [Defining Custom Policy Types](#defining-custom-policy-types)

   7b. [Policy Specifications, Variables and Policy Templates](#policy-specifications-variables-and-policy-templates)

8. [Developing CloudSec](#developing-cloudsec)


## Background
Cloud and API systems are commonly secured through the use of security *policies* -- rules governing the set of  actions that agents (users, services, etc) are authorized to take within the system. For example, AWS uses IAM Policies; Kubernetes provides pod security policies, network policies, RBAC, etc. 

Within this context, a common question arises: given two sets of security policies, `P` and `Q`, are the rules generated by `P` and `Q` equivalent, or is one set strictly more permissive than the other? A particular common case is to let `P` be some existing policy set and let `Q` be a policy set that includes a security vulnerability. In this case, determining that `P` is equivalent to `Q` (or even just that `P=>Q`) establishes that the existing policy set contains a vulnerability. These are the types of questions `cloudsec` tries to help answer.

## Introduction

To get a quick sense of what CloudSec is about, let's assume we have a web API with an authorization 
model that grants users access to different endpoints in the API. For simplicity, an "endpoint" will be
a URL path (e.g., `/apps`, `/jobs`, etc) and an HTTP verb (e.g., `GET`, `POST`, `DELETE`, etc.). We can use
CloudSec to define a new policy type for our web API with just a few lines of code:

```python
from cloudsec.core import StringComponent, StringEnumComponent, Policy, PolicyType, OneWildcardMatching

username = StringComponent(name="username", 
                           char_set=ALPHANUM_SET, 
                           max_len=25,             
                           matching_type=OneWildcardMatching())
path = StringComponent(name="path", 
                       char_set=PATH_CHAR_SET, 
                       max_len=250, 
                       matching_type=OneWildcardMatching())
verb = StringEnumComponent(name="verb", 
                           values=["GET", "POST", "PUT", "DELETE"], 
                           matching_type=OneWildcardMatching())
WebAPIPolicyType = PolicyType(components=[username, path, verb])
```

In the code above, we create a new security policy type (`WebAPIPolicyType`) defined by three parts:
the username, representing the identity to be authorized, and the path and verb, together representing an 
endpoint in our Web API. Note that this small amount of code is all that is needed to define a new policy
type -- it would be similarly easy to define a policy type for other kinds of systems. 

Once we have a policy type defined, we can create and analyze policies. Creating policies amounts to 
providing values for each of the components (`username`, `path` and `verb`, in the case above) as well as 
a specical `decision` field which all policy types inherit. The `decision` field takes one of two values:
`allow` or `deny`, indicating whether the policy authorizes or does not authorize the action.

Here we create two policies:

```python
p = Policy(policy_type=WebAPIPolicyType, 
           username="jstubbs", 
           path="/apps",
           verb="GET",
           decision="allow")

q = Policy(policy_type=WebAPIPolicyType, 
           username="jstubbs", 
           path="/apps",
           verb="*",
           decision="allow")

```
In the first policy, `p`, we authorize the user "jstubbs" for the GET /apps endpoint. In the second, `q`,
we authorize the same user for *all* verbs on the /apps path. The `*` is a wildcard which indicates any
value. We can use wildcard characters when defining any of the values in our policies.

We can now analyze these two policies using a cloudSec `PolicyEquivalenceChecker`. 

```python
checker = PolicyEquivalenceChecker(policy_type=WebAPIPolicyType, 
                                   policy_set_p=[p],
                                   policy_set_q=[q])
```

The `checker` object has methods that allow us to analyze the equivalence of the two policy sets, `[p]` and 
`[q]`:

```python
# p implies q here, because p is less permissive
result = checker.p_implies_q()
result.proved
  --> True

# q does not imply p; q is strictly more permissive
result = checker.q_implies_p()
result.proved
  --> False
result.found_counter_ex
  --> True
result.model
  --> [verb = "PUT", path = "/apps", username = "jstubbs"]
```

First we check if `p => q`, and cloudSec finds the result is proved. That's because every action authorized 
by our `p` policy is indeed authorized by `q`. Next, we check if `q => p`. Here, cloudSec shows the result is 
not proved and in fact it found a counter example. Indeed, `p` is strictly less permissive than `q`, and the 
`result.model` provides an example of an action that is authorized by `q` but not `p`. Note that in the case
above, cloudSec is leveraging the z3 theorem prover, but we could have used cvc5 instead -- all we would need
to do is specify `backend="c5c5"` when constructing the `PolicyEquivalenceChecker` instance.  


## Installation

The CloudSec library is available on pypi as `cloudsecpy`. Install using poetry, pip, etc. For example:

```bash
poetry add cloudsecpy
```
Note that CloudSec depends on SMT solvers, such as Z3 and cvc5. The python package will install the 
necessary dependent packages, e.g., `z3-solver`.

The project also provides Docker images with `cloudsecpy` and its dependencies preinstalled. THe following images are maintained:

* `ghcr.io/applyfmsec/cloudsec` -- Base image with CloudSec software and dependencies.
* `ghcr.io/applyfmsec/cloudsec-exs` -- Extends the base image with examples.
* `ghcr.io/applyfmsec/cloudsec-tests-perf` -- Extends the examples image with a complete performance test suite.

You can download any of the images above using the Docker CLI, for example:

```bash
docker pull ghcr.io/applyfmsec/cloudsec
```

## Building the Images From Source

You can also build any and all of the Docker images from source. First, clone this git repository:

```bash
git clone https://github.com/applyfmsec/cloudsec.git
cd cloudsec
```

Then, use the  `Makefile` to generate the images. For example, to build all of the images with one command:

```bash
make build
```

See the Makefile for commands to build specific images.


## Trying the Examples with Docker

With the images installed locally, we can try out the examples in a Docker container.
Start a container with the CloudsSec software and examples using `docker` (change the image name if you built 
from source):

```bash
$ docker run -it --rm --entrypoint=bash --name=sec  ghcr.io/applyfmsec/cloudsec-exs
```

From within the container, start a Python shell and import the examples:
(Have a look at the [examples_z3.py](examples/examples_z3.py) file, contained within the [examples](examples) directory, for all the definitions
of the objects used below.)

```python
# from within the container started above,

>>> from examples_z3 import *

>>> result = checker.p_implies_q()

# p => q is True, meaning that the p policy set is less permissive than the q policy set
# i.e., any activity allowed by p is also allowed by q.
>>>  result.proved
True

>>> result = checker.q_implies_p()

# q => p is False, however, because q is NOT less permissive than p; that is, q allows some activities
# that p does not allow.
>>> result.proved
False

# In fact, cloudsec was able to find a counter example to the statement q => p; i.e., it was able to find
# an activity allowed by q but not by p.
>>> result.found_counter_ex
True

# the result.model contains a counter example to the q => p statement; i.e., it contains an example of
# an activity that is allowed by the q policy set but not by the q policy set. 
>>> result.model
[action = "PUT",
 resource_path = "s2/home/jstubbs",
 resource_service = "files",
 resource_tenant = "a2cps",
 principal_username = "jstubbs",
 principal_tenant = "a2cps"]

```

## Beyond the Introduction: Component Types, Policy Specifications, Variables and Policy Templates


### Defining Custom Policy Types
The CloudSec library is designed to provide reusable components you can use to build your own security policy types representing the policies of your application. There are two types of building blocks provided by CloudSec that can be used for defining your own policy types: types where the underlying data are strings, and types where the underlying data are bit vectors. We discuss the string types first.

#### String Types
CloudSec currently provides 3 different base types for working with string data in security policies: `StringComponent`, `StringEnumComponent`, and `TupleComponent`. All three descend from the base class, `BaseComponent`, and allow for policies with wildcard (`*`) characters by utilizing regular expression constraints over the theory of strings. 

##### The `StringEnumComponent` Type
The `core.StringEnumComponent` type can be used for strings that can take a fixed, finite set of values. The `cloud.action` type, representing an HTTP verb, is a good example of a `StringEnumComponent` because instances of the type can only take on one of the following values: GET, POST, PUT, DELETE or *. In general, `StringEnumComponent` values cannot contain a wildcard with other characters; e.g., `P*` is not a valid `action` value.

To create a new type based on `StringEnumComponent`, specify the `name` of the component, the allowable `values` for the new type and a `matching_type`:

```python
example_enum = StringEnumComponent(name="example_enum", 
                                   values=['value_1', 'value_2', 'value_3'],
                                   matching_type=OneWildcardMatching()
```
The `matching_type=OneWildcardMatching()` specifies that we allow matching with one wildcard character. Currently, this is one of two supported matching types in CloudSec, the other being `ExactMatching()` which precludes the use of wildcards. Support for additional matching types is planned for future releases. 


##### The `StringComponent` Type
The `core.StringComponent` type can be used for strings that can take arbitrary values from a specified character set. A username in a cloud system is an example of something we might model with a `StringComponent` type, if we do not consider the list of usernames in our system to be a fixed, finite list (otherwise, `StringEnumComponent` would be a better choice). Unlike `StringEnumComponent`, a `StringComponent` value can contain wildcard characters with other characters.

To create a type based on `StringComponent`, one needs to specify the name, character set, maximum length, and the matching type. Here's an 
example representing a "path" (from the `cloudsec.cloud` module):

```python
path = StringComponent(name="path", 
                       char_set=PATH_CHAR_SET, 
                       max_len=250, 
                       matching_type=one_wildcard_matching)
```
The `cloud` module makes use of `StringComponent` for usernames as well. 

##### The `TupleComponent` Type
The `core.TupleComponent` type is useful for types that are really the composition of multiple `StringComponent` and/or `StringEnumComponent` types that should be thought of individually for the purposes of wildcard matching, but should be thought of as a single value in the overall policy. 

We illustrate the concept using the example of a principal (a user identity) in a multi-tenant cloud system. In such
a system, every user belongs to some tenant, and it is typical to represent an end-user identity as a username together with a tenant id. For example, for the `jsmith` user in the `foo` tenant, we might write the principal as `jsmith.foo` (or `jsmith@foo`, etc). For security policies in such a system that authorize principal(s) for one or more resources/actions, it is important to match the entire principal (user and tenant). For example, a security policy that authorizes `jsmith.bar` (`jsmith` in the `bar` tenant) for some resources has no bearing on the `jsmith.foo` user.
However, with wildcard characters, we want the wildcard to apply to only one component of the principal. For example, `*.foo` would be all users in the foo tenant, while `jsmith*.foo` would be all users in the foo tenant whose username starts with `jsmith`. 

We can create a new tuple type from existing types by using the `TupleComponent` class and specifying a `fields` attribute. The `fields` attribute is a list of previously defined components (`StringComponent` and/or `StringEnumComponent`).

To complete the example, here is how we might define our `principal` type as a tuple of `tenant` and `username` types, described above:

```python
tenant = StringEnumComponent(name="tenant", 
                             values=set(["tenant1", "tenant2", "tenant3"]),
                             matching_type=ExactMatching())

username = StringComponent(name="username", 
                           char_set=ALPHANUM_SET, 
                           max_len=25, 
                           matching_type=OneWildcardMatching())
principal = TupleComponent(name="principal", fields=[tenant, username])
```

### Policy Specifications, Variables and Policy Templates

A primary motivating use case for CloudSec is to formally prove or disprove that a set of policies in the real world conforms to a set of rules that dictates what kinds of policies are "allowable". CloudSec suggests the following approach: let `P` be the set of policies that exist in our real world system, and define the complete set of all "allowable" policies as `Q`. Then, use CloudSec to show either: 1) `P => Q`, in which case the policies in our real world system are all "safe"; or 2) `P NOT=> Q`, in which case CloudSec can return a counter example which will constitute a policy that is not safe.

We refer to the set of all allowable policies for a system (the set `Q` in the example above) as a policy specification. 
In general, defining the policy specification for a system is very difficult. One of the challenges is that policies evolve over time: 
new users are added, resources are created, modified and deleted, etc., and the security policies must be updated to reflect the changes. 

To support capturing the evolving nature of the security policies in a concise way, CloudSec provides support for defining policies with variables. We refer to policies that include variables as *policy templates* because typically, they represent a family of policies. 

As a motivating example, consider a shared Unix file system supporting multiple users. Each user has a home directory where they have exclusive read/write access. We might model the security policies for such as a system in CloudSec as follows:

```python
username = StringComponent(name="username", 
                           char_set=ALPHANUM_SET, 
                           max_len=25,             
                           matching_type=OneWildcardMatching())
path = StringComponent(name="path", 
                       char_set=PATH_CHAR_SET, 
                       max_len=250, 
                       matching_type=OneWildcardMatching())
permission_level = StringEnumComponent(name="permission_level", 
                                       values=["read", "write", "execute"], 
                                       matching_type=OneWildcardMatching())
FileSystemPolicyType = PolicyType(components=[username, path, permission_level])
```

With this model, we can represent the permission individual users have
on their home directory as follows:

```python
joe_home_dir_policy = Policy(policy_type=FileSystemPolicyType,
                             username="jstubbs", 
                             path="/home/jstubbs/",
                             permission_level="*",
                             decision="allow")
smruti_home_dir_policy = Policy(policy_type=FileSystemPolicyType,
                                username="spadhy", 
                                path="/home/spadhy/",
                                permission_level="*",
                                decision="allow")
```
Here, we have declared that user `jstubbs` and user `spadhy` should have
access to their respective home directories. But what about other users? 
Using variables, we can specify that every user should have access to their
home directory in one, succint policy template, as follows:

```python
home_dir_policy_template = Policy(policy_type=FileSystemPolicyType,
                                  username="{{ username }}", 
                                  path="/home/{{ username }}/*",
                                  permission_level="*",
                                  decision="allow")
```

We enclose CloudSec variable names in  `{{ }}`; here, we are using the ``username`` variable. Note that CloudSec defines free variables for each field in a policy; for fields within a tuples, the variable name is `<tuple_name>_<field_name>`.

We can now use CloudSec to check that our first two policies imply (i.e., are no more permissive) than the policy template:

```python
checker = PolicyEquivalenceChecker(policy_type=FileSystemPolicyType,
                                   policy_set_p=[joe_home_dir_policy, smruti_home_dir_policy],
                                   policy_set_q=[home_dir_policy_template])

result = checker.p_implies_q()
result.proved()
  --> True
```
We can think of `home_dir_policy_template` as the policy specification for this
case. With this approach, if we add another user to the system and a corresponding third home directory policy to our set, the implication will still hold. 


## Developing CloudSec

The `cloudsec` library includes a test suite based on `pytest`. The Makefile can be used to build
the tests container image and run the tests:

```bash
# Build the tests image 
$ make build-tests
```

```bash
# Run the tests
$ make test
```
