Metadata-Version: 1.1
Name: dtool-lookup-server-annotation-filter-plugin
Version: 0.1.1
Summary: Extend dtool-lookup-server with ability to filter by annotations
Home-page: https://github.com/jic-dtool/dtool-lookup-server-annotation-filter-plugin
Author: Tjelvar Olsson
Author-email: tjelvar.olsson@gmail.com
License: MIT
Download-URL: https://github.com/jic-dtool/dtool-lookup-server-annotation-filter-plugin/tarball/0.1.1
Description: dtool-lookup-server-annotation-filter-plugin
        ============================================
        
        .. image:: https://badge.fury.io/py/dtool-lookup-server-annotation-filter-plugin.svg
           :target: http://badge.fury.io/py/dtool-lookup-server-annotation-filter-plugin
           :alt: PyPi package
        
        .. image:: https://travis-ci.org/jic-dtool/dtool-lookup-server-annotation-filter-plugin.svg?branch=master
           :target: https://travis-ci.org/jic-dtool/dtool-lookup-server-annotation-filter-plugin
           :alt: Travis CI build status (Linux)
        
        .. image:: https://codecov.io/github/jic-dtool/dtool-lookup-server-annotation-filter-plugin/coverage.svg?branch=master
           :target: https://codecov.io/github/jic-dtool/dtool-lookup-server-annotation-filter-plugin?branch=master
           :alt: Code Coverage
        
        - GitHub: https://github.com/jic-dtool/dtool-lookup-server-annotation-filter-plugin
        - PyPI: https://pypi.python.org/pypi/dtool-lookup-server-annotation-filter-plugin
        - Free software: MIT License
        
        
        Introduction
        ------------
        
        This `dtool-lookup-server <https://github.com/jic-dtool/dtool-lookup-server>`_
        plugin adds the ability to get an overview of the dataset a user has got access
        to based on how those datasets have been annotated with key/value pairs.
        
        The purpose of this API is to give users an overview of all the datasets
        available to them and to allow them to drill down on those results by filtering
        based upon keys and key/value pairs.
        
        This API could be used to build a webapp that allows users to get an
        "eagle-eye" view of their data.
        
        
        Installation
        ------------
        
        This plugin depends on having installed and configured a  `dtool-lookup-server
        <https://github.com/jic-dtool/dtool-lookup-server>`_. This plugin can then
        be installed by running the commands below.
        
        This plugin only works in Python3 (i.e. not with Python2).
        
        ::
        
            git clone https://github.com/jic-dtool/dtool-lookup-server-annotation-filter-plugin.git
            cd dtool-lookup-server-annotation-filter-plugin
            python setup.py install
        
        See `dtool-lookup-server <https://github.com/jic-dtool/dtool-lookup-server>`_
        for more information about the setup of the base system.
        
        
        Routes
        ------
        
        This plugin has five routes.
        
        - POST /annotation_filter_plugin/annotation_keys
        - POST /annotation_filter_plugin/annotation_values
        - POST /annotation_filter_plugin/num_datasets
        - POST /annotation_filter_plugin/datasets
        - GET /annotation_filter_plugin/version
        
        The first gives access to all annotations keys that have are present on at
        least one dataset with a basic value. The keys will only be extracted from
        datasets that pass any annotation filter in the post request. The response from
        this route includes information about the number of datasets associated with
        each key.
        
        The second gives access to all values for the keys specified in the post
        request.  The values will only be extracted from the datasets that pass the
        annotation filter in the post request. The response form this route includes
        information about the number of datasets associated with each key/value pair.
        
        The third gives the number of datasets given a particular annotation filter.
        
        The fourth gives the list of datasets given a particular annotation filter.
        
        The fifth returns the version of the plugin.
        
        
        Filter syntax
        -------------
        
        Below are examples of JSON queries that can be posted to the  routes.
        
        No filters, i.e. get all (this only really makes sense for the
        /annotation_filter_plugin/annotation_keys route).
        
        ::
        
            {}
        
        Get only datasets that have the key "color"::
        
            {
                "annotation_keys": ["color"]
            }
        
        Get only datasets that have the "color" is set to "red"::
        
            {
                "annotations": {"color": "red"}
            }
        
        Get only datasets that have both the keys "color" and "pattern"::
        
            {
                "annotation_keys": ["color", "pattern"]
            }
        
        Get only datasets that have the "color" is set to "red" and
        "pattern" set to "stripey"::
        
            {
                "annotations": {"color": "red", "pattern": "stripey"}
            }
        
        Get only datasets that have the keys "color" and "pattern" and where the
        "color" is set to "red"::
        
            {
                "annotation_keys": ["color", "pattern"],
                "annotations": {"color": "red"}
            }
        
        
        
        Limitations
        -----------
        
        - This plugin only recognises annotations where the value is a basic type, such
          as a string, a number or a boolean value. In other words a dataset's
          annotations where the value is a  data structures such as lists and
          dictionaries will be ignored.
        - Datasets that do not have any annotation with a basic type as a value will
          not be recognised up by this plugin.
        
        
        Usage
        -----
        
        Preparation
        ~~~~~~~~~~~
        
        The dtool lookup server makes use of the Authorization header to pass through the
        JSON web token for authorization. Below we create environment variables for the
        token and the header used in the ``curl`` commands::
        
            $ TOKEN=$(flask user token olssont)
            $ HEADER="Authorization: Bearer $TOKEN"
        
        
        Find keys available for filtering and the number of datasets associated with them
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        The command below finds all annotations keys available for further filtering::
        
            $ curl -H "$HEADER" -H "Content-Type: application/json"  \
                -X POST -d '{}'  \
                http://localhost:5000/annotation_filter_plugin/annotation_keys
        
        The response below means that the annotation key "color" has 120 datasets
        associated with it and the annotation key "pattern" has 50 datasets associated
        with it.
        
        ::
        
            {"color": 120, "pattern": 50, "size": 10}
        
        Suppose that one chooses to filter further based on the "pattern" annotation key.
        Using the command below one could find the annotation keys that are still relevant
        given that each dataset has to have the annotation key "pattern".
        
        ::
        
            $ curl -H "$HEADER" -H "Content-Type: application/json"  \
                -X POST -d '{"annotation_keys": ["pattern"]}'  \
                http://localhost:5000/annotation_filter_plugin/annotation_keys
        
        The response below shows that no datasets that remain have the key "size" and
        45 of the datasets with the key "pattern" also have the key "color".
        
        ::
        
            {"color": 45, "pattern": 50}
        
        It is possible to filter based on an annotation key/value pair. For example, to
        limit the datasets to the case where the "pattern" is "stripey" one could use
        the command below.
        
        ::
        
            $ curl -H "$HEADER" -H "Content-Type: application/json"  \
                -X POST -d '{"annotations": {"pattern": "stripey"}}'  \
                http://localhost:5000/annotation_filter_plugin/annotation_keys
        
        The response below shows that this is more specific and that there are fewer
        results.
        
        ::
        
            {"color": 5, "pattern": 10}
        
        It is possible to make more complex queries. The command below also requires
        that the datasets have the key "color".
        
        ::
        
            $ curl -H "$HEADER" -H "Content-Type: application/json"  \
                -X POST -d '{"annotation_keys": ["color"], "annotations": {"pattern": "stripey"}}'  \
                http://localhost:5000/annotation_filter_plugin/annotation_keys
        
        In the response below there are now fewer datasets with the "pattern" key. That
        is because some of the datasets that were picked up previously did not have the
        "color" key.
        
        ::
        
            {"color": 5, "pattern": 3}
        
        It is also possible to filter using base URIs. The command below limits the
        keys to those from the base URIs "s3://snow-white" and "s3://mr-men"::
        
            $ curl -H "$HEADER" -H "Content-Type: application/json"  \
                -X POST -d '{"base_uris": ["s3://snow-white", "s3://mr-men"]}'  \
                http://localhost:5000/annotation_filter_plugin/annotation_keys
        
        The response below shows that there are fewer hits than when all base URIs
        are included.
        
        ::
        
            {"color": 77, "pattern": 35, "size": 4}
        
        
        Find annotations available for filtering and the number of datasets associated with them
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        The pattern for finding annotation key/value pairs and the number of datasets assocated
        with them is similar to that of finding the keys (above).
        
        The command below can be used to find all the values associated with the "color" key and
        the number of datasets that has been annotated with each particular value.
        
        ::
        
            $ curl -H "$HEADER" -H "Content-Type: application/json"  \
                -X POST -d '{"annotation_keys": ["color"]}'  \
                http://localhost:5000/annotation_filter_plugin/annotation_values
        
        The response below shows that there are five colors available and that most datasets
        have the color "red".
        
        ::
        
            {
                "color": {
                    "red": 50,
                    "pink": 30,
                    "blue": 20,
                    "green": 15,
                    "yellow": 5
                }
            }
        
        To get data for more keys they need to be included in the filter. The command below
        returns the datasets that have annotations for both "color" and "pattern".
        
        ::
        
            $ curl -H "$HEADER" -H "Content-Type: application/json"  \
                -X POST -d '{"annotation_keys": ["color", "pattern"]}'  \
                http://localhost:5000/annotation_filter_plugin/annotation_values
        
        The response contains less colors because some of the datasets annotated with color
        did not have a pattern annotation.
        
        ::
        
            {
                "color": {
                    "red": 15,
                    "pink": 10,
                    "blue": 10,
                    "green": 10
                }
                "pattern": {
                    "stripey": 40,
                    "wavy": 10
            }
        
        It is possible to make more specific queries. The command below also requires
        that the datasets have the stripey pattern.
        
        ::
        
            $ curl -H "$HEADER" -H "Content-Type: application/json"  \
                -X POST -d '{"annotation_keys": ["color"], "annotations": {"pattern": "stripey"}}'  \
                http://localhost:5000/annotation_filter_plugin/annotation_keys
        
        The response below shows that fewer datasets have been used to collect the
        annotation information.
        
        ::
        
            {
                "color": {
                    "red": 15,
                    "pink": 10,
                    "blue": 10,
                    "green": 5
                }
                "pattern": {
                    "stripey": 40,
            }
        
        It is also possible to filter using base URIs. The command below limits the
        keys to those from the base URIs "s3://snow-white" and "s3://mr-men"::
        
            $ curl -H "$HEADER" -H "Content-Type: application/json"  \
                -X POST -d '{"annotation_keys": ["color"], "base_uris": ["s3://snow-white", "s3://mr-men"]}'  \
                http://localhost:5000/annotation_filter_plugin/annotation_keys
        
        The response below shows that there are fewer hits than when all base URIs
        are included.
        
        ::
        
            {
                "color": {
                    "red": 50,
                    "pink": 20,
                    "blue": 7,
                }
            }
        
        
        Listing the number of datasets available for a particular filter
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        The number of datasets selected, using a particular filter, can be determined using the
        ``/annotation_filter_plugin/num_datasets`` route. The command below selects all datasets
        with at least one basic value (see the section below on limitations for an explanation
        of what a basic value is). 
        
        ::
        
            $ curl -H "$HEADER" -H "Content-Type: application/json"  \
                -X POST -d '{}'  \
                http://localhost:5000/annotation_filter_plugin/num_datasets
        
        The response below shows that there are 145 such datasets.
        
        ::
        
                145
        
        The command below uses a filter to select only datasets that have the key/value
        pair "pattern"/"stripey".
        
        ::
        
            $ curl -H "$HEADER" -H "Content-Type: application/json"  \
                -X POST -d '{"annotations": {"pattern": "stripey"}}'  \
                http://localhost:5000/annotation_filter_plugin/num_datasets
        
        The response shows that there are 10 such datasets.
        
        ::
        
                10
        
        Retrieving information about datasets selected by a particular filter
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        It is possible to get information about the datasets selected by a particular
        filter using the ``/annotation_filter_plugin/datasets`` route. The command
        below uses a filter to select only datasets that have the key/value pair
        "pattern"/"stripey".
        
        ::
        
            $ curl -H "$HEADER" -H "Content-Type: application/json"  \
                -X POST -d '{"annotations": {"pattern": "stripey"}}'  \
                http://localhost:5000/annotation_filter_plugin/datasets
        
        Below is a truncated version of the response.
        
        ::
        
            [
              {
                "annotations": {
                  "pattern": "stripey
                },
                "base_uri": "s3://dtool-demo",
                "created_at": "1530803916.74",
                "creator_username": "olssont",
                "dtoolcore_version": "3.3.0",
                "frozen_at": "1536749825.85",
                "name": "hypocotyl3",
                "type": "dataset",
                "uri": "s3://dtool-demo/ba92a5fa-d3b4-4f10-bcb9-947f62e652db",
                "uuid": "ba92a5fa-d3b4-4f10-bcb9-947f62e652db"
              }
              ...
            ]
        
Platform: UNKNOWN
