Metadata-Version: 1.1
Name: mutate
Version: 0.0.1.dev13
Summary: Data processing tool to provide a CDM adhering readily indexable JSON docs
Home-page: https://github.com/redhat-performance/scribe
Author: redhat-performance
Author-email: browbench@redhat.com
License: UNKNOWN
Description-Content-Type: UNKNOWN
Description: Scribe
        ======
        
        Python Library for generating docs conforming to common data model .
        
        scribe is a python library to process data collected and conform it to
        common data model to facilitate data indexing into elasticsearch.
        Currently scribe only supports data collected through stockpile but
        it's very easy to integrate other data sources.
        
        It is suggested to use a venv to install and run scribe.
        
        ::
        
            python3 -m venv /path/to/new/virtual/environment
            source /path/to/new/virtual/environment/bin/activate
            pip install mutate
            # For testing purposes you can do clone the repo first
            # and then install as a local project
            git clone https://github.com/redhat-performance/scribe.git
            pip install -e scribe/
        
        Note: we're creating a python3 venv here as scribe is written in python3
        and is currently incompatible with python2
        
        Using scribe as a python library
        --------------------------------
        
        Scribe is provided by the python library mutate, which helps
        to transcribe an input document to a series of documents that conform
        to CDM. It can be done as follows.
        
        ::
        
            from mutate.transcribe import scribe
            for scribe_object in scribe('/tmp/stockpile.json','stockpile'):
                print(scribe_object)
        
        Contributing(create patchsets)
        ------------------------------
        
        Please visit http://gerrithub.io/ and Sign In with your github account.
        Make sure to import your ssh keys.
        
        Now, clone the github repository
        
        ::
        
                $ git clone https://github.com/redhat-performance/scribe.git
        
        Make sure, you've git-review installed, following should work.
        
        ::
        
                $ sudo pip install git-review
        
        To set up your cloned repository to work with Gerrit
        
        ::
        
                $ git review -s
        
        It's suggested to create a branch to do your work, name it something
        related to the change you'd like to introduce.
        
        ::
        
                $ git branch my_special_enhancement
                $ git checkout !$
        
        Make your changes and then commit them using the instructions below.
        
        ::
        
                $ git add /path/to/files/changed
                $ git commit -m "your commit title"
        
        Use a descriptive commit title followed by an empty space. You should
        type a small justification of what you are changing and why.
        
        Now you're ready to submit your changes for review:
        
        ::
        
                $ git review
        
        If you want to make another patchset from the same commit you can use
        the amend feature after further modification and saving. Make sure to be
        on same branch, and if don't have the branch please follow next set of
        instructions
        
        ::
        
                $ git add /path/to/files/changed
                $ git commit --amend
                $ git review
        
        If you want to submit a new patchset from a different location (perhaps
        on a different machine or computer for example) you can clone the repo
        again (if it doesn't already exist) and then use git review against your
        unique Change-ID:
        
        ::
        
                $ git review -d Change-Id
        
        Change-Id is the change id number as seen in Gerrit and will be
        generated after your first successful submission. So, in case of
        https://review.gerrithub.io/#/c/redhat-performance/scribe/+/425014/
        
        You can either do git review -d 425014 as it's the number or you can do
        git review -d If0b7b4f30615e46f009759b32a3fc533e811ebdc where
        If0b7b4f30615e46f009759b32a3fc533e811ebdc is the change-id present
        
        Make the changes on the branch that was setup by using the git review -d
        (the name of the branch is along the lines of
        review/username/branch\_name/patchsetnumber).
        
        Add the files to git and commit your changes using,
        
        ::
        
                $ git commit --amend
        
        You can edit your commit message as well in the prompt shown upon
        executing above command.
        
        Finally, push the patch for review using,
        
        ::
        
                $ git review
        
        Adding Depends-On to commit message
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        A lot of times, especially when adding a new module.. Changes made to
        scribe, will not ensure the CI to work propery until the respective
        changes to stockpile are merged. In this case, to ensure CI doesn't work
        with master branch of stockpile but rather the patchset you submitted to
        stockpile, You can use the depends-on functionality.
        
        To add Depends-On functionality, please copy the Change-Id of the
        patchset you submitted to stockpile, and add it to the commit message at
        the end like this.
        
        Note: Please add it after Change-Id in commit message.
        
        The commit message should look like:
        
        ::
        
            Your commit message
        
            Change-Id: I9bc121f076b8625da88705c9d96bd00117f94c22
        
            Depends-On: {Change-Id of the review submitted to stockpile}
        
        Say for example, you're working on adding a module to process satellite
        data, the CI won't be able to test it because stockpile doesn't have a
        satellite collection yet. However, because you've a commit thats still
        yet to be merged like
        https://review.gerrithub.io/#/c/redhat-performance/stockpile/+/425015/
        
        You can still ensure and verify stockpile-scribe workflow by adding
        Depends-On to your commit message in scribe, so commit message will look
        like:
        
        ::
        
            Adding satellite Module to work with stockpile.
        
            Change-Id: some_random_change_id_generated_after_git_review
        
            Depends-On: I66329511b38a558ce61efb7edb4c3be18625b252
        
        Note that the change ID in Depends-On is the same one in
        https://review.gerrithub.io/#/c/redhat-performance/stockpile/+/425015/
        
        For another example look at:
        https://review.gerrithub.io/#/c/redhat-performance/scribe/+/425969/
        
        Contributing(making changes)
        ----------------------------
        
        Scribe package is basically made of two modules:
        
        1. scribes
        2. scribe\_modules
        
        These 2 modules serve different purpose, scribes are for reading the
        input data and pre-processing them into a structure that can be used to
        create scribe\_modules
        
        The pre-processed dictionary structure can look like this:
        
        .. code:: json
        
              {
              "scribe_module_1": [
                  {
                      "host": "localhost",
                      "value": "sample_value_1"
                  },
                  {
                      "host": "host1",
                      "value": "sample_value_2"
                  },
                  {
                      "host": "host2",
                      "value": "sample_value_3"
                  }
              ],
              "scribe_module_2": [
                  {
                      "host": "host2",
                      "value": {
                          "field1": "sample_filed1_value_3",
                          "field2": "sample_field2_value_3"
                      }
                  },
                  {
                      "host": "host1",
                      "value": [
                          {
                              "field1": "sample_filed1_value_1",
                              "field2": "sample_field2_value_1"
                          },
                          {
                              "field1": "sample_filed1_value_2",
                              "field2": "sample_field2_value_2"
                          }
                      ]
                  }
              ]
              }
        
        Basically the dictionary needs to have first level keys that you've
        written 'scribe\_modules', match the name of the file in
        scribe\_modules/ . The children of each of the module in dictionary
        should have the 2 keys - 'host' and 'value'. the value for the key
        'value' can be either a dictionary or a list of dictionary
        
        Please note that the value for the key 'value' will be the one passed to
        the scribe\_modules while creating the object.
        
        So let's take the simple example of scribe\_module\_2 for host2, just
        one object would be created and the value passed would be
        
        .. code:: json
        
              {
              "field1": "sample_filed1_value_3",
              "field2": "sample_field2_value_3"
              }
        
        And like wise for host1, there will be 2 objects created.
        
        for object 1, following value would be passed:
        
        .. code:: json
        
              {
              "field1": "sample_filed1_value_1",
              "field2": "sample_field2_value_1"
              }
        
        for object 2, following value would be passed:
        
        .. code:: json
        
              {
              "field1": "sample_filed1_value_2",
              "field2": "sample_field2_value_2"
              }
        
        While for scribe\_module\_1 for host1, the value that will be passed
        would be: "sample\_value\_2"
        
        Adding new scribes
        ~~~~~~~~~~~~~~~~~~
        
        Steps to extend scribe to work with a new input-type 'example1' would
        involve:
        
        1. Creating 'example1.py' in the 'mutate/scribes/' directory. The
           sample code would look like:
        
        .. code:: python
        
        
            from . import ScribeBaseClass
        
        
            class Example1(ScribeBaseClass):
        
                def example1_build_initial_dict(self):
                    output_dict = {}
                    Example1_data = load_file(self._path)
                    # .... some sort of data manipulation
                    # .... to build the output_dict
                    return output_dict
        
                def __init__(self, path=None, source_type=None):
                    ScribeBaseClass.__init__(self, source_type=source_type, path=path)
                    self._dict = self.example1_build_initial_dict()
        
                def emit_scribe_dict(self):
                    return self._dict
        
        Note the following:
        
        a) from . import ScribeBaseClass needs to be present as we are
           inheriting from the ScribeBaseClass
        
        b) class Example1(ScribeBaseClass) is where inheritance occurs, ensure
           that '(ScribeBaseClass)' is present when you write the class
           definition
        
        c) The first letter in classname must be uppercase that's how factory
           method is defined.
        
        d) The \_\_init\_\_ function first calls the parent's \_\_init\_\_
           function and passes the default arguments which are path and
           source\_type, however more can be added. and they won't be needed to
           passed on to parent class's \_\_init\_\_ function.
        
        e) emit\_scribe\_dict is an abstractmethod and thus it needs to be
           defined in any other class that is written. However the method itself
           can be changed but it should return the dictionary object as
           described above.
        
        2. Add the module to choices list in scribe.py at L14, currently it
           looks like choices=['stockpile'], because at the time of creating
           this documentation only stockpile data could be transcribed using
           scribe.
        
        Adding new scribe\_modules
        ~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        Steps to extend scribe\_modules to work with a new module
        'scribe\_module\_1' would involve:
        
        1. Adding a new class 'scribe\_module\_1.py' to directory
           'mutate/scribe\_modules'. It'd probably look something like this:
           \`\`\`python
        
        from . import ScribeModuleBaseClass
        
        class Scribe\_module\_1(ScribeModuleBaseClass):
        
        ::
        
              def __init__(self, input_dict=None, module_name=None, host_name=None,
                           input_type=None, scribe_uuid=None):
                  ScribeModuleBaseClass.__init__(self, module_name=module_name,
                                                 input_dict=input_dict,
                                                 host_name=host_name,
                                                 input_type=input_type,
                                                 scribe_uuid=scribe_uuid)
                  if input_dict:
                      new_dict = {}
                      # ... this is where transformation occurs
                      # ... can call other member functions of class
                      # ... can set the entities of the class object like
                      self.entity_1 = input_dict
        
              # This isn't needed here, as it's how the __iter__ function is defined
              # in the parent class and it's not an abstractmethod, so only if you'd
              # like to change how __iter__ method should work for your class, you
              # should add the following next lines.
              # Not recommended, unless you know what you're doing
              def __iter__(self):
                    # ... your definition of how to make it iterable
        
        \`\`\`
        
        Note the following important things:
        
        a) from . import ScribeModuleBaseClass needs to be present as we are
           inheriting from the ScribeModuleBaseClass
        
        b) class Example1(ScribeModuleBaseClass) is where inheritance occurs,
           ensure that '(ScribeModuleBaseClass)' is present when you write the
           class.
        
        c) The first letter in classname must be uppercase that's how factory
           method is defined.
        
        d) The \_\_init\_\_ function first calls the parent's \_\_init\_\_
           function and passes the default arguments which are module\_name,
           input\_dict, host\_name, input\_type and scribe\_uuid. Please note
           that no more arguments can be passed.
        
        e) setting the new entities should be done inside the \_\_init\_\_
           function only, but the user has flexibility of calling another method
           from either same class or from lib/util.py to do transformation.
        
        2. Add schema for the new class 'example1.yml' to the directory
           'mutate/schema'. Scribe currently uses cerberus to validate the
           iterable produced by the scribe\_modules subclass. Please look at
           http://docs.python-cerberus.org/en/stable/validation-rules.html for
           more information on how to write the schema for your class's output.
        
        Note: The name of the yml file should match that of the scribe\_modules
        class that you create it for. Thus, for 'example1' class the file should
        be named 'example1.yml'
        
        Data Model and ES templates
        ---------------------------
        
        Directory 'mutate/schema' will essentially contain the data model.
        Work needs to be done so that these yml files can be used to create
        templates for elasticsearch. It's on the line of the ViaQ's
        elasticsearch templates work.
        
        Please refer https://github.com/ViaQ/elasticsearch-templates for more
        info on how templates can be created.
        
        Do note that, currently ViaQ/elasticsearch-templates doesn't support
        creating templates from the schema files present in 'mutate/schema'
        
        
Platform: UNKNOWN
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
