Metadata-Version: 2.1
Name: h5dataframe
Version: 0.2.2
Summary: Drop-in replacement for pandas DataFrames that allows to store data on an hdf5 file and manipulate data directly from that hdf5 file without loading it in memory.
License: CECILL-B
Author: Matteo Bouvier
Author-email: m.bouvier@vidium-solutions.com
Requires-Python: >=3.10,<4.0
Classifier: License :: CeCILL-B Free Software License Agreement (CECILL-B)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: ch5mpy (>=0.4.6,<0.5.0)
Requires-Dist: pandas (>=2.2,<3.0)
Requires-Dist: pandas-stubs (>=2.2.2.240603,<3.0.0.0)
Description-Content-Type: text/markdown

# h5dataframe

Drop-in replacement for pandas DataFrames that allows to store data on an hdf5 file and manipulate data directly from that hdf5 file without loading it in memory.

# Warning !

This is very much a **work in progress**, some features might not work yet or cause bugs.
**Save** your data elsewhere before converting it to an H5DataFrame.

If you miss a feature from pandas DataFrames, please fill an issue or feel free to contribute.

# Overview

This library provides the `H5DataFrame` object, replacing the regular `pandas.DataFrame`.

An `H5DataFrame` can be created from a `pandas.DataFrame` or from a dictionnary of (column_name -> column_values).

```python
>>> import pandas as pd
>>> from h5dataframe import H5DataFrame
>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}, 
                      index=['r1', 'r2', 'r3'])
>>> hdf = H5DataFrame(df)
>>> hdf
    a  b
r1  1  4
r2  2  5
r3  3  6
[RAM]
[3 rows x 2 columns]
```

At this point, all the data is still loaded in RAM, as indicated by the second-to-last line. To write the data to an hdf5 file, use the `H5DataFrame.write()` method.

```python
>>> hdf.write('path/to/file.h5')
>>> hdf
    a  b
r1  1  4
r2  2  5
r3  3  6
[FILE]
[3 rows x 2 columns]
```

The `H5DataFrame` is now backed on an hdf5 file, only loading data in RAM when requested.

Alternatively, an `H5DataFrame` can be read directly from an previously created hdf5 file with the `H5DataFrame.read()` method.

```python
>>> from h5dataframe import H5Mode
>>> H5DataFrame.read('path/to/file.h5', mode=H5Mode.READ)
    a  b
r1  1  4
r2  2  5
r3  3  6
[FILE]
[3 rows x 2 columns]
```

The default mode is `READ` (`'r'`) which creates a **read-only** `H5DataFrame`. To modify the data, use `mode=H5Mode.READ_WRITE` (`'r+'`).

# Installation

From pip:
```shell
pip install h5dataframe
```

From source:
```shell
git clone git@github.com:Vidium/h5dataframe.git
```
