Metadata-Version: 2.1
Name: melon
Version: 0.0.5
Summary: Lightweight package meant to simplify data processing for Deep Learning
Home-page: https://github.com/evoneutron/melon
Author: evoneutron
Author-email: evoneutron@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
Requires-Dist: click (==7.0)
Requires-Dist: tqdm (==4.29.0)
Requires-Dist: pillow (==5.4.1)

.. |Build-Status| image:: https://travis-ci.com/evoneutron/melon.svg?branch=master
   :target: https://travis-ci.com/evoneutron/melon

Melon
=====

| Melon is a lightweight package meant to simplify data processing for Deep Learning.

| It removes the need for boilerplate code to pre-process the data prior to (model) training, testing and inference.
| It aims at standardizing data serialization and manipulation approaches.
|
| The default formats align with the requirements by frameworks such as **Tensorflow** / **PyTorch**.
| The tool also provides various level of customizations depending on the use-case.


Installation
------------

Install and update using `pip`_:

.. code-block:: text

    $ pip install melon

Supported in Python >=3.4.0

.. _pip: https://pip.pypa.io/en/stable/quickstart/


Examples
----------------

**Images**

| *With default* options_:

.. code-block:: python

    from melon import ImageReader

    def train():
        source_dir = "resources/images"
        reader = ImageReader(source_dir)
        X, Y = reader.read()
        ...
        with tf.Session() as s:
            s.run(..., feed_dict = {X_placeholder: X, Y_placeholder: Y})

| ``source_dir`` directory should contain images that need to be read. See ``tests/resources/images/sample`` for a sample directory. In the sample directory there is an optional ``labels.txt`` file that is described in Labeling_.

-------

| Since number of images may be too large to fit into memory the tool supports batch-processing.
|

.. code-block:: python

    from melon import ImageReader

    def train():
        source_dir = "resources/images"
        options = { "batch_size": 32 }
        reader = ImageReader(source_dir, options)
        while reader.has_next():
            X, Y = reader.read()
            ...

| This reads images in the batches of 32 until all images are read. If ``batch_size`` is not specified then ``reader.read()`` will read all images.

---------------

.. _Custom options:

| *With custom* options_:

.. code-block:: python

    from melon import ImageReader

    def train():
        source_dir = "resources/images"
        options = { "data_format": "channels_last", "normalize": False }
        reader = ImageReader(source_dir, options)
        ...

| This changes format of data to ``channels-last`` (each sample will be ``Height x Width x Channel``) and doesn't normalize the data. See options_ for available options.

.. _options:

Options
------------------

**Images**

    width
        Width of the output (pixels). default: ``255``

    height
        Height of the output (pixels). default: ``255``

    batch_size
        Batch size of each read. default: All images in a directory

    data_format
        Format of the images data

            | ``channels_first`` - `Channel x Height x Width` (default)
            | ``channels_last`` - `Height x Width x Channel`

    label_format
        Format of the labels data

            | ``one_hot`` - as a matrix, with one-hot vector per image (default)
            | ``label`` -  as a vector, with a single label per image


    normalize
        Normalize data. default: ``True``

    num_threads - number of threads for parallel processing
        default: Number of cores of the machine

.. _Labeling:

Labeling
-----------------

| In supervised learning each image needs to be mapped to a label.
| While the tool supports reading images without labels (e.g. for inference) it also provides a way to label them.

-----

**Generating labels file**

| To generate ``labels`` file use the following command:

.. code-block:: text

    $ melon generate
    > Source dir:

| After providing source directory the tool will generate ``labels`` file in that directory with blank labels.
| Final step is to add a label to each row in the generated file.
|
| For reference see ``tests/resources/images/sample/labels.txt``:

.. code-block:: text

    #legend
    pedestrian:0
    cat:1
    parrot:2
    car:3
    apple tree:4

    #map
    img275.jpg:1
    img324.jpg:2
    img551.jpg:3
    img928.jpg:1
    img999.png:0
    img736.png:4

| ``#legend`` section is optional but ``#map`` section is required to map a label to an image.

-----

**Format of the labels**

| Label's format can be specified in `Custom options`_. It defaults to ``one-hot`` format.

Roadmap
-------

- Support for video data

- Support for textual data

