Metadata-Version: 2.1
Name: rawbuilder
Version: 0.0.6
Summary: an elegant datasets factory
Home-page: https://github.com/M-Farag/rawbuilder
Author: Mina Farag Amin
Author-email: mina.farag@icloud.com
License: MIT license
Keywords: rawbuilder
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.6
Description-Content-Type: text/x-rst
Requires-Dist: pandas
Requires-Dist: faker
Requires-Dist: numpy

==========
rawbuilder
==========


.. image:: https://img.shields.io/pypi/v/rawbuilder.svg
        :target: https://pypi.python.org/pypi/rawbuilder

.. image:: https://readthedocs.org/projects/rawbuilder/badge/?version=latest
        :target: https://rawbuilder.readthedocs.io/en/latest/?version=latest
        :alt: Documentation Status

.. image:: https://app.travis-ci.com/M-Farag/rawbuilder.svg?branch=main
        :target: https://app.travis-ci.com/M-Farag/rawbuilder

.. image:: https://codecov.io/gh/M-Farag/rawbuilder/branch/main/graph/badge.svg?token=H6YCKETJRV
        :target: https://codecov.io/gh/M-Farag/rawbuilder


an elegant datasets factory


* Free software: MIT license
* Documentation: https://rawbuilder.readthedocs.io.



Features
========

* Schema oriented datasets builder

How to Use it
=================

Terminal:

    # Import the package into any python app
    import rawbuilder

    # Init the dataset object as ds
    ds = rawbuilder.DataSet(
    size=1000,
    task='user',
    schema_path='path/to/any/custom/json/schema'
    )

    # Build the dataset
    ds.build()

    # Get the schema location to edit with any IDE
    ds.schema_path

Schema
=================
- The Schema is a JSON object that describes three main components.
- The *model names*, the *column names*, and the *data types* per column.
- Note the below code-block, The model name is "Student", and it contain 4 properties [id,first_name,email,math_test_results].
- Each property of the model "student" is called a task and it has its columns and data source description.
- The builder will use all the information in the schema to build the required tasks or data sets.

Student data model example:
    "student": {
    "id": "int",
    "first_name": "first_name",
    "last_name": "last_name",
    "email": "email",
    "math_test_results": "random_int between,0,30"
    }

Data types to can use in the schema
************************************
- int: build a column of integers between 1 and requested dataset size.
- decrement: build a column of decremented integers between the requested size and 1.
- random_int: build a column of random integers between 0 and 100 by default.
- first_name: build a column of first names.
- last_name: build a column of last names.
- email: build a column of fake emails.

Data Modifiers
==============
Combine Data Modifiers to the above data types, it can adjust values, change the data nature, and gives more control over the final output.

Modifiers syntax is simple:
 "modifier,argument_1,arg_2,arg_*"

Use the modifier *between* to generate random integer column between 0 and 30:
 "math_test_results": "random_int between,0,30"

All Modifiers
*************

1) **Ranges**
--------------
Use this modifier to set the high-end and low-end for a specific data type

Syntax:
 "between,10,1000"

Supported with

random_int:
 "math_test_results": "random_int between,0,30"


History
=======

0.0.4 (2021-11-13)
******************
* Data modifiers

0.0.3 (2021-11-05)
******************
* Migrate to JSON
* Generate simple datasets

0.0.2 (2021-11-05)
******************
* Proof of concept

0.0.1 (2021-10-24)
******************
* First release on PyPI.


