Metadata-Version: 2.1
Name: split-dataset
Version: 0.4.2
Summary: A package for HDF5-based chunked arrays
Home-page: https://github.com/portugueslab/split_dataset
Author: Vilim Stih & Luigi Petrucco @portugueslab
Author-email: luigi.petrucco@gmail.com
License: GNU General Public License v3
Description: 
        [![Python Version](https://img.shields.io/pypi/pyversions/split_dataset.svg)](https://pypi.org/project/split_dataset)
        [![PyPI](https://img.shields.io/pypi/v/split_dataset.svg)](
            https://pypi.python.org/pypi/split_dataset)
        [![Tests](https://img.shields.io/github/workflow/status/portugueslab/split_dataset/tests)](
            https://github.com/portugueslab/split_dataset/actions)
        [![Coverage Status](https://coveralls.io/repos/github/portugueslab/split_dataset/badge.svg?branch=master)](https://coveralls.io/github/portugueslab/split_dataset?branch=master)
        [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/python/black)
        [![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
        
        
        
        A minimal package for saving and reading large HDF5-based chunked arrays.
        
        This package has been developed in the [`Portugues lab`](http://www.portugueslab.com) for volumetric calcium imaging data. `split_dataset` is extensively used in the calcium imaging analysis package [`fimpy`](https://github.com/portugueslab/fimpy); The microscope control libraries [`sashimi`](https://github.com/portugueslab/sashimi) and [`brunoise`](https://github.com/portugueslab/brunoise) save files as split datasets.
        
        [`napari-split-dataset`](https://github.com/portugueslab/napari-split-dataset) support the visualization of SplitDatasets in `napari`.
        
        ## Why using Split dataset?
        Split datasets are numpy-like array saved over multiple h5 files. The concept of spli datasets is not different from e.g. [zarr arrays](https://zarr.readthedocs.io/en/stable/); however, relying on h5 files allow for partial reading even within the same file, which is crucial for visualizing volumetric time series, the main application `split_dataset` has been developed for (see [this discussion](https://github.com/zarr-developers/zarr-python/issues/521) on the limitation of zarr arrays).
        
        # Structure of a split dataset
        A split dataset is contained in a folder containing multiple, numbered  h5 files (one file per chunk) and a metadata json file with information on the shape of the full dataset and of its chunks.
        The h5 files are saved using the [flammkuchen](https://github.com/portugueslab/flammkuchen) library (ex [deepdish](https://deepdish.readthedocs.io/en/latest/)). Each file contains a dictionary with the data under the `stack` keyword.
        
        `SplitDataset` objects can than be instantiated from the dataset path, and numpy-style indexing can then be used to load data as numpy arrays. Any n of dimensions and block sizes are supported in principle; the package has been used mainly with 3D and 4D arrays.
        
        
        
        ## Minimal example
        ```python
        # Load a  SplitDataset via a SplitDataset object:
        from split_dataset import SplitDataset
        ds = SplitDataset(path_to_dataset)
        
        # Retrieve data in an interval:
        data_array = ds[n_start:n_end, :, :, :]
        ```
        
        ## Creating split datasets
        New split datasets can be created with the `split_dataset.save_to_split_dataset` function, provided that the original data is fully loaded in memory. Alternatively, e.g. for time acquisitions, a split dataset can be saved one chunk at a time. It is enough to save with `flammkuchen` correctly formatted .h5 files and the correspondent json metadata file describing the full split dataset shape (this is [what happens in sashimi](https://github.com/portugueslab/sashimi/blob/01046f2f24483ab702be379843a1782ababa7d2d/sashimi/processes/streaming_save.py#L186))
        
        
        # TODO
        * provide utilities for partial saving of split datasets
        * support for more advanced indexing (support for step and vector indexing)
        * support for cropping a `SplitDataset`
        * support for resolution and frequency metadata
        
        
        # History
        
        ### 0.4.0 (2021-03-23)
        * Added support to use a `SplitDataset` as data in a `napari` layer.
        
        ...
        
        ### 0.1.0 (2020-05-06)
        * First release on PyPI.
        
        
        Credits
        -------
        
        Part of this package was inspired by  [Cookiecutter](https://github.com/audreyr/cookiecutter) and [this](https://github.com/audreyr/cookiecutter-pypackage) template.
        
        .. _`Portugues lab`:
        .. _Cookiecutter:
        .. _this:
        
Keywords: split_dataset
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.5
Description-Content-Type: text/markdown
Provides-Extra: dev
