Metadata-Version: 2.1
Name: dask-geomodeling
Version: 2.0.2
Summary: On-the-fly operations on geographical maps.
Home-page: https://github.com/nens/dask-geomodeling
Author: Casper van der Wel
Author-email: casper.vanderwel@nelen-schuurmans.nl
License: BSD 3-Clause License
Description: dask-geomodeling
        ==========================================
        
        Dask-geomodeling is a collection of classes that are to be stacked together to
        create configurations for on-the-fly operations on geographical maps. By
        generating `Dask <https://dask.pydata.org/>`_ compute graphs, these operation
        may be parallelized and (intermediate) results may be cached.
        
        Multiple Block instances together make a view. Each Block has the ``get_data``
        method that fetches the data in one go, as well as a ``get_compute_graph``
        method that creates a graph to compute the data later.
        
        Constructing a view
        -------------------
        
        A dask-geomodeling view can be constructed by creating a Block instance:
        
        .. code:: python
        
           from dask_geomodeling.raster import RasterFileSource
           source = RasterFileSource('/path/to/geotiff')
        
        
        The view can now be used to obtain data from the specified file. More
        complex views can be created by nesting block instances:
        
        .. code:: python
        
           from dask_geomodeling.raster import Add, Multiply
           add = Add(source, 2.4)
           mult = Multiply(source, view_add)
        
        
        Obtaining data from a view
        --------------------------
        
        Dask-geomodeling revolves around *lazy data evaluation*. Each Block first
        evaluates what needs to be done for certain request, storing that in a
        *compute graph*. This graph can then be evaluated to obtain the data. The data
        is evaluated with dask, and the specification of the compute graph also comes
        from dask. For more information about how a graph works, consult the dask
        docs_:
        
        .. _docs: http://docs.dask.org/en/latest/custom-graphs.html
        
        We use the previous example to demonstrate how this works:
        
        .. code:: python
        
           import dask
           request = {
               "mode": "vals",
               "bbox": (138000, 480000, 139000, 481000),
               "projection": "epsg:28992",
               "width": 256,
               "height": 256
           }
           compute_graph, compute_token = add.get_compute_graph(**request)
           data = dask.get(compute_graph, compute_token)
        
        Here, we first generate a compute graph using dask-geomodeling, then evaluate
        the graph using dask. The power of this two-step procedure is twofold:
        
        1. Dask supports multi-threading, multi-processing, and cluster processing.
        2. The compute_token is a unique identifier of this computation: this can
           easily be used in caching methods.
        
        
        The Block class
        ----------------
        
        To write a new geoblock class, we need to write the following:
        
        1. the ``__init__`` that validates the arguments when constructing the block
        2. the ``get_sources_and_requests`` that processes the request
        3. the ``process`` that processes the data
        4. a number of attribute properties such as ``extent`` and ``period``
        
        About the 2-step data processing
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        The ``get_sources_and_requests`` method of any block is called recursively from
        ``get_compute_graph`` and feeds the request from the block to its sources. It
        does so by returning a list of (source, request) tuples. During the data evaluation
        each of these 2-tuples will be converted to a single data object which is
        supplied to the ``process`` function.
        
        An example in words. We ask the ``add`` block from the previous example to do the
        following:
        
        - give me a 256x256 raster at location (138000, 480000)
        
        The ``get_sources_and_requests`` would respond with the following:
        
        - I need a256x256 raster at location (138000, 480000) from
          ``RasterFileSource('/path/to/geotiff')``
        - I need the number 2.4
        
        The ``get_compute_graph`` method works recursively, so it also calls the
        ``get_sources_and_requests`` of the ``RasterStoreSource``. The result is a
        dask compute graph.
        
        When this compute graph is evaluated, the ``process`` method of the ``add``
        geoblock will ultimately receive two arguments:
        
        - the 256x256 raster from ``RasterFileSource('/path/to/geotiff')``
        - the number 2.4
        
        And the process method produces the end result.
        
        Implementation example
        ~~~~~~~~~~~~~~~~~~~~~~
        
        As an example, we use a simplified Dilate geoblock, which adds a buffer of 1
        pixel around all pixels of given value:
        
        .. code:: python
        
            class Dilate(RasterBlock):
                def __init__(self, store, values):
                    if not isinstance(store, RasterBlock):
                        raise TypeError("'{}' object is not allowed".format(type(store)))
                    values = np.asarray(values, dtype=store.dtype)
                    super(Dilate, self).__init__(store, values)
        
                @property
                def store(self):
                    return self.args[0]
        
                @property
                def values(self):
                    return self.args[1]
        
                def get_sources_and_requests(self, **request):
                    new_request = expand_request_pixels(request, radius=1)
                    return [(self.store, new_request), (self.values, None)]
        
                @staticmethod
                def process(data, values=None):
                    if data is None or values is None or 'values' not in data:
                        return data
                    original = data['values']
                    dilated = original.copy()
                    for value in values:
                        dilated[ndimage.binary_dilation(original == value)] = value
                    dilated = dilated[:, 1:-1, 1:-1]
                    return {'values': dilated, 'no_data_value': data['no_data_value']}
        
                @property
                def extent(self):
                    return self.store.extent
        
                @property
                def period(self):
                    return self.store.period
        
        
        In this example, we see all the essentials of a geoblock implementation.
        
        - The ``__init__`` checks the types of the provided arguments and calls the
          ``super().__init__`` that further initializes the geoblock.
        
        - The ``get_sources_and_requests`` expands the request with 1 pixel, so that
          dilation will have no edge effects. It returns two (source, request) tuples.
        
        - The ``process`` (static)method takes the amount arguments that
          ``get_sources_and_requests`` produces. It does the actual work and returns
          a data response.
        
        - Some attributes like ``extent`` and ``period`` need manual specification, as
          they might change through the geoblock.
        
        - The class derives from ``RasterBlock``, which sets the type of geoblock, and
          through that its request/response schema and its required attributes.
        
        
        Block types
        -----------
        
        A block type sets three things:
        
        1. the response schema: e.g. "RasterBlock.process returns a dictionary with
           a numpy array and a no data value"
        
        2. the request schema: e.g. "RasterBlock.get_sources_and_requests expects a
           dictionary with the fields 'mode', 'bbox', 'projection', 'height', 'width'"
        
        3. the attributes to be implemented on each geoblock
        
        This is not enforced at the code level, it is up to the developer to stick to
        this specification. The specification is written down in the type baseclass
        "RasterBlock", "GeometryBlock", etc.
        
        Local setup (for development)
        -----------------------------
        
        These instructions assume that ``git``, ``python3``, ``pip``, and
        ``virtualenv`` are installed on your host machine.
        
        First make sure you have the GDAL libraries installed. On Ubuntu::
        
            $ sudo apt install libgdal-dev
        
        Take note the GDAL version::
        
            $ apt show libgdal-dev
        
        Create and activate a virtualenv::
        
            $ virtualenv --python=python3 .venv
            $ source .venv/bin/activate
        
        Install PyGDAL with the correct version (example assumes GDAL 2.2.3)::
        
            $ pip install pygdal==2.2.3.*
        
        Install dask-geomodeling::
        
            $ pip install -e .[test]
        
        Run the tests::
        
            $ pytest
        
        Or optionally, with coverage and code style checking::
        
            $ pytest --cov=dask_geomodeling --black
        
        
        Changelog of dask-geomodeling
        ===================================================
        
        
        2.0.2 (2019-09-04)
        ------------------
        
        - Clean up the .check() method for RasterBlocks.
        
        - Added a Travisfile testing with against versions since 2017 on Linux and OSX.
        
        - Took some python 3.5 compatibility measures.
        
        - Added fix in ParseText block for pandas 0.23.
        
        - Changed underscores in config to dashes for dask 0.18 compatibility.
        
        - Constrained dask to >= 0.18, numpy to >= 1.12, pandas to >= 0.19,
          geopandas to >= 0.4, scipy to >= 0.19.
        
        - Removed the explicit (py)gdal dependency.
        
        
        2.0.1 (2019-08-30)
        ------------------
        
        - Renamed the package to dask-geomodeling.
        
        - Integrated the settings with dask.config.
        
        - Added BSD 3-Clause license.
        
        
        2.0.0 (2019-08-27)
        ------------------
        
        - Remove raster-store dependency.
        
        - Removed RasterStoreSource, ThreediResultSource, Result, Interpolate,
          DeprecatedInterpolate, GeoInterface, and GroupTemporal geoblocks.
        
        - Removed all django blocks GeoDjangoSource, AddDjangoFields, GeoDjangoSink.
        
        - Simplified tokenization of Block objects.
        
        - Implemented construct_multiple to construct multiple blocks at once.
        
        - Implemented MemorySource and GeoTIFFSource as new raster sources.
        
        - Add `Cumulative` geoblock for performing temporal cumulatives.
        
        
        1.2.13 (2019-08-20)
        -------------------
        
        - Add `TemporalAggregate` geoblock for performing temporal aggregates on
          raster data.
        
        - Fix raster math geoblocks to not have byte-sized integers 'wrap around'
          when they are added. All integer-types are now at least int32 and all float
          types at least float32.
        
        
        1.2.12 (2019-07-30)
        -------------------
        
        - Made GeoDjangoSource backwards compatible with existing graph definitions.
        
        - Fix Interpolate wrapper.
        
        
        1.2.11 (2019-07-19)
        -------------------
        
        - Added new parameter `filters` to GeoDjangoSource.
        
        
        1.2.10 (2019-07-05)
        -------------------
        
        - Classify block return single series with dtype of `labels`
          if `labels` are floats or integers.
        
        
        1.2.9 (2019-06-29)
        ------------------
        
        - Fix bug introduced in tokenization fix.
        
        
        1.2.8 (2019-06-29)
        ------------------
        
        - Skip tokenization if a block was already tokenized.
        
        
        1.2.7 (2019-06-28)
        ------------------
        
        - Implemented AggregateRasterAboveThreshold.
        
        
        1.2.6 (2019-06-27)
        ------------------
        
        - Fix in `ParseTextColumn` for empty column `description`.
        
        - Fix empty dataset case in ClassifyFromColumns.
        
        
        1.2.5 (2019-06-26)
        ------------------
        
        - Skip (costly) call to tokenize() when constructing without validation. If a
          graph was supplied that was generated by geoblocks, the token should be
          present in the name. If the name has incorrect format, a warning is emitted
          and tokenize() is called after all.
        
        - Deal with empty datasets in ClassifyFromColumns.
        
        
        1.2.4 (2019-06-21)
        ------------------
        
        - Updated ParseTextColumn: allow spaces in values.
        
        
        1.2.3 (2019-06-21)
        ------------------
        
        - Rasterize geoblock has a limit of 10000 geometries.
        
        - Implemented Choose geoblock for Series.
        
        - Added the block key in the exception message when construction failed.
        
        - Added caching to get_compute_graph to speedup graph generation.
        
        - Improved the documentation.
        
        
        1.2.2 (2019-06-13)
        ------------------
        
        - Fix tokenization of a geoblock when constructing with validate=False.
        
        - The raster requests generated in AggregateRaster have their bbox now snapped
          to (0, 0) for better reproducibility.
        
        
        1.2.1 (2019-06-12)
        ------------------
        
        - Fix bug in geoblocks.geometry.constructive.Buffer that was introduced in 1.2.
        
        
        1.2 (2019-06-12)
        ----------------
        
        - Extend geometry.field_operations.Classify for classification outside of
          the bins. For example, you can now supply 2 bins and 3 labels.
        
        - Implemented geometry.field_operations.ClassifyFromColumns that takes its bins
          from columns in a GeometryBlock, so that classification can differ per
          feature.
        
        - Extend geometry.base.SetSeriesBlock to setting constant values.
        
        - Implemented geometry.field_operations.Interp.
        
        - Implemented geometry.text.ParseTextColumn that parses a text column into
          multiple value columns.
        
        - AddDjangoFields converts columns to Categorical dtype automatically if the
          data is of 'object' dtype (e.g. strings). This makes the memory footprint of
          large text fields much smaller.
        
        - Make validation of a graph optional when constructing.
        
        - Use dask.get in construct and compute as to not doubly construct/compute.
        
        - Fix bug in geoblocks.geometry.constructive.Buffer that changed the compute
          graph inplace, prohibiting 2 computations of the same graph.
        
        
        1.1 (2019-06-03)
        ----------------
        
        - GeoDjangoSink returns a dataframe with the 'saved' column indicating whether
          the save succeeded. IntegrityErrors result in saved=False.
        
        - Added projection argument to `GeometryTiler`. The GeometryTiler only accepts
          requests that have a projection equal to the tiling projection.
        
        - Raise a RuntimeError if the amount of returned geometries by GeoDjangoSource
          exceeds the GEOMETRY_LIMIT setting.
        
        - Added `auto_pixel_size`  argument to geometry.AggregateRaster. If this
          is False, the process raises a RuntimeError when the required raster exceeds
          the `max_size` argument.
        
        - If `max_size` in the geometry.AggregateRaster is None, it defaults to
          the global RASTER_LIMIT setting.
        
        - Remove the index_field_name argument in GeoDjangoSource, instead obtain it
          automatically from model._meta.pk.name. The index can be added as a normal
          column by including it in 'fields'.
        
        - Change the default behaviour of 'fields' in GeoDjangoSource: if not given, no
          extra fields are included. Also start and end field names are not included.
        
        - Added the 'columns' attribute to all geometry blocks except for
          the GeometryFileSource.
        
        - Added tests for SetSeriesBlock and GetSeriesBlock.
        
        - Added check that column exist in GetSeriesBlock, AddDjangoFields and
          GeoDjangoSink.
        
        - Implemented Round geoblock for Series.
        
        - Fixed AggregateRaster when aggregating in a different projection than the
          request projection.
        
        - Allow GeometryTiler to tile in a different projection than the request
          geometry is using.
        
        
        1.0 (2019-05-09)
        ----------------
        
        - Improved GeoDjangoSink docstring + fixed bug.
        
        - Bug fix in GeoInterface for handling `inf` values.
        
        - Added `Area` Geoblock for area calculation in Geometry blocks.
        
        - Added MergeGeometryBlocks for `merge` operation between GeoDataFrames.
        
        - Added `GeometryBlock.__getitem__ `and `GeometryBlock.set`, getting single
          columns from and setting multiple columns to a GeometryBlock. Corresponding
          geoblocks are geometry.GetSeriesBlock and geometry.SetSeriesBlock.
        
        - Added basic operations for `add`,`sub`,`mul`,`div`,`truediv`,`floordiv`,
          `mod`, `eq`,`neq`,`ge`,`gt`,`le`,`lt`, `and`, `or`, `xor` and `not`
          operation in SeriesBlocks.
        
        - Documented the request and response protocol for GeometryBlock.
        
        - Added a tokenizer for shapely geometries, so that GeometryBlock request
          hashes are deterministic.
        
        - Added a tokenizer for datetime and timedelta objects.
        
        - Added geopandas dependency.
        
        - Removed GeoJSONSource and implemented GeometryFileSource. This new reader has
          no simplify and intersect functions.
        
        - Implemented geometry.set_operations.Intersection.
        
        - Implemented geometry.constructive.Simplify.
        
        - Adjusted the MockGeometry test class.
        
        - Reimplemented utils.rasterize_geoseries and fixed raster.Rasterize.
        
        - Reimplemented geometry.AggregateRaster.
        
        - Fixed time requests for 3Di Result geoblocks that are outside the range of
          the dataset
        
        - Implemented geometry.GeoDjangoSource.
        
        - Implemented geometry.GeoDjangoSink.
        
        - Added support for overlapping geometries when aggregating.
        
        - Increased performance of GeoSeries coordinate transformations.
        
        - Fixed inconsistent naming of the extent-type geometry response.
        
        - Consistently return an empty geodataframe in case there are no geometries.
        
        - Implemented geometry.Difference.
        
        - Implemented geometry.Classify.
        
        - Implemented percentile statistic for geometry.AggregateRaster.
        
        - Implemented geometry.GeometryTiler.
        
        - Explicitly set the result column name for AggregateRaster (default: 'agg').
        
        - Implemented count statistic for geometry.AggregateRaster.
        
        - Implemented geometry.AddDjangoFields.
        
        - Added temporal filtering for Django geometry sources.
        
        - Allow boolean masks in raster.Clip.
        
        - Implemented raster.IsData.
        
        - Implemented geometry.Where and geometry.Mask.
        
        - Extended raster.Rasterize to rasterize float, int and bool properties.
        
        - Fixed bug in Rasterize that set 'min_size' wrong.
        
        
        0.6 (2019-01-18)
        ----------------
        
        - Coerce the geo_transform to a list of floats in the raster.Interpolate,
          preventing TypeErrors in case it consists of decimal.Decimal objects.
        
        
        0.5 (2019-01-14)
        ----------------
        
        - Adapted path URLs to absolute paths in RasterStoreSource, GeoJSONSource, and
          ThreediResultSource. They still accept paths relative to the one stored in
          settings.
        
        
        0.4 (2019-01-11)
        ----------------
        
        - The `'store_resolution'` result field of `GeoInterface` now returns the
          resolution as integer (in milliseconds) and not as datetime.timedelta.
        
        - Added metadata fields to Optimizer geoblocks.
        
        - Propagate the union of the geometries in a Group (and Optimizer) block.
        
        - Propagate the intersection of the geometries in elementwise blocks.
        
        - Implement the projection metadata field for all blocks.
        
        - Fixed the Shift geoblock by storing the time shift in milliseconds instead of
          a datetime.timedelta, which is not JSON-serializable.
        
        
        0.3 (2018-12-12)
        ----------------
        
        - Added geoblocks.raster.Classify.
        
        - Let the raster.Interpolate block accept the (deprecated) `layout` kwarg.
        
        
        0.2 (2018-11-20)
        ----------------
        
        - Renamed ThreediResultSource `path` property to `hdf5_path` and fixed it.
        
        
        0.1 (2018-11-19)
        ----------------
        
        - Initial project structure created.
        
        - Copied graphs.py, tokenize.py, wrappers.py, results.py, interfaces.py,
          and relevant tests and factories from raster-store.
        
        - Wrappers are renamed into 'geoblocks', which are al subclasses of `Block`. The
          wrappers were restructured into submodules core, raster, geometry, and interfaces.
        
        - The new geoblocks.Block baseclass now provides the infrastructure for
          a) describing a relational block graph and b) generating compute graphs from a
          request for usage in parallelized computations.
        
        - Each element in a relational block graph or compute graph is hashed using the
          `tokenize` module from `dask` which is able to generate unique and deterministic
          tokens (hashes).
        
        - Blocks are saved to a new json format (version 2).
        
        - Every block supports the attributes `period`, `timedelta`, `extent`,
          `dtype`, `fillvalue`, `geometry`, and `geo_transform`.
        
        - The `check` method is implemented on every block and refreshes the
          primitives (`stores.Store` / `results.Grid`).
        
        - `geoblocks.raster.sources.RasterStoreSource` should now be wrapped around a
          `raster_store.stores.Store` in order to include it as a datasource inside a graph.
        
        - Reformatted the code using black code formatter.
        
        - Implemented `GroupTemporal` as replacement for multi-store Lizard objects.
        
        - Adapted `GeoInterface` to mimic now deprecated lizard_nxt.raster.Raster.
        
        - Fixed issue with ciso8601 2.*
        
        - Bumped raster-store dependency to 4.0.0.
        
        
Keywords: dask
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Topic :: Scientific/Engineering :: GIS
Requires-Python: >=3.5
Provides-Extra: test
