Metadata-Version: 2.1
Name: sensorizer
Version: 0.0.3
Summary: Timeseries data generation and preparation for batch jobs at scale
Home-page: https://github.com/equinor/sensorizer
Author: Jesus Gazol
Author-email: jgaz@equinor.com
Maintainer: jgaz@equinor.com
Maintainer-email: jgaz@equinor.com
License: GPL 3
Project-URL: Source, https://github.com/
Project-URL: Tracker, https://github.com/
Keywords: IoT,sensor
Platform: any
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: POSIX
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Topic :: Software Development :: Libraries
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Requires-Dist: numpy (>=1.16.3)
Requires-Dist: fastavro (>=0.22.13)
Requires-Dist: azure-eventhub (==1.3.1)
Provides-Extra: dev
Requires-Dist: pre-commit (>=2.1.1) ; extra == 'dev'
Provides-Extra: docs
Requires-Dist: sphinx (<3,>=2.0.0) ; extra == 'docs'
Requires-Dist: towncrier (>=18.5.0) ; extra == 'docs'
Requires-Dist: pygments-github-lexers (>=0.0.5) ; extra == 'docs'
Requires-Dist: sphinxcontrib-autoprogram (>=0.1.5) ; extra == 'docs'
Provides-Extra: testing
Requires-Dist: pytest (==5.3.5) ; extra == 'testing'
Requires-Dist: pytest-cov ; extra == 'testing'

# Sensorizer

Sensorizer is a python library built to simulate a flow of sensor data to disk (Avro) or event hubs, is meant
to be the starting point of a data pipeline.

The library has a docker container companion so you can have a source of sensor data in approximately 5 mins,
see the docker deployment section if your sink is either an avro file or an Azure Event Hub, if you want an
additional sink, have a look at the issues section.

The main characteristic is that it tries to simulate traffic with
similar timings, that is, it will release up to 400K readings per second,
one by one. Then you can send it to a streaming sink (Azure Event Hub implemented)
or a disk option (Avro implemented).

## Docker deployment

The deployment is container based, you can just pull the container:
```
docker pull jgc31416/sensorizer:latest
```

Then pass the configuration as environment variables, set up the environment variables
depending on the sink you want, this is an example for the Avro file sink using an environment file,
see /docs folder:
```
docker run --env-file=avro_sink.cfg jgc31416/sensorizer:latest
```

You will get the generated files in the container.


### Avro file sink

You might want to map the output folder of the dump file into your container host.

```
export NUMBER_OF_SENSORS="10000"
export NUMBER_OF_HOURS="1"
export SINK_TYPE="file"                            # store sensor readings to a file
export RUNNING_MODE="batch"                        # send the readings one by one or in batch mode
export EVENT_DUMP_FILENAME="/tmp/event_dump.avro"  # Where to save the data
```


### Event Hub sink

```bash
export NUMBER_OF_SENSORS="10000"
export NUMBER_OF_HOURS="1"
export SINK_TYPE="event_hub"                       # store sensor readings to a file
export RUNNING_MODE="batch"                        # send the readings one by one or in batch mode
export EVENT_HUB_ADDRESS="amqps://<EventHubNamepace>.servicebus.windows.net/<EventHub>"
export EVENT_HUB_SAS_POLICY="<PolicyName>"
export EVENT_HUB_SAS_KEY="<SAS_KEY>"
```

### Distribution of the sensor readings

The distribution of the sensor readings is the following:

- Frequencies are: 15% 1.0s, 65% 60.0s, 20% 3600.0s (Percentage is over the number of sensors,
s is seconds per reading)
- Base reading values: 50% 1, 40% 500, 10% 1000


### Sensor format

```python
@dataclass
class TimeserieRecord:
    """
    Class for time series
    """

    ts: float  # epoch
    data_type: str  # string(3)
    plant: str  # string(3)
    quality: float
    schema: str  # string(6)
    tag: str  # UUID
    value: float
```



## Getting started with the library development

Clone the project from github and enjoy.

### Prerequisites

This software has been tested in Linux, it might work in other OSs but it
is definitely not warrantied.

```
- Ubuntu latest stable / Debian Stretch / Fedora +25
- Python 3.7 (Dataclasses and typing in the code)
- Docker (if you want a container deployment)
```

### Installing

Python requirements:

```
pip install -r requirements.txt
```


## Running the tests

As simple as:

```
pytest sensorizer/tests/
```


## Built With

* Python 3.7
* Docker

## Contributing

Simply put, per branch features, merge to master, so:
- Fork the repo.
- Make a feature branch and develop.
- Test :)
- Create a pull request for your new feature.


