Metadata-Version: 2.1
Name: rds2py
Version: 0.5.1
Summary: Parse and construct Python representations for datasets stored in RDS files
Home-page: https://github.com/biocpy/rds2py
Author: jkanche
Author-email: jayaram.kancherla@gmail.com
License: MIT
Project-URL: Documentation, https://biocpy.github.io/rds2py/
Project-URL: Source, https://github.com/biocpy/rds2py
Platform: Mac
Platform: Linux
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python
Requires-Python: >=3.8
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
License-File: LICENSE.txt
Requires-Dist: importlib-metadata; python_version < "3.8"
Requires-Dist: numpy
Requires-Dist: scipy
Requires-Dist: biocframe
Requires-Dist: biocutils>=0.1.5
Requires-Dist: genomicranges>=0.4.9
Requires-Dist: summarizedexperiment>=0.4.1
Requires-Dist: singlecellexperiment>=0.4.1
Requires-Dist: multiassayexperiment
Provides-Extra: optional
Requires-Dist: pandas; extra == "optional"
Requires-Dist: hdf5array; extra == "optional"
Provides-Extra: testing
Requires-Dist: setuptools; extra == "testing"
Requires-Dist: pytest; extra == "testing"
Requires-Dist: pytest-cov; extra == "testing"
Requires-Dist: pandas; extra == "testing"
Requires-Dist: hdf5array; extra == "testing"

[![Project generated with PyScaffold](https://img.shields.io/badge/-PyScaffold-005CA0?logo=pyscaffold)](https://pyscaffold.org/)
[![PyPI-Server](https://img.shields.io/pypi/v/rds2py.svg)](https://pypi.org/project/rds2py/)
![Unit tests](https://github.com/BiocPy/rds2py/actions/workflows/run-tests.yml/badge.svg)

# rds2py

Parse and construct Python representations for datasets stored in RDS files. `rds2py` supports various base classes from R, and Bioconductor's `SummarizedExperiment` and `SingleCellExperiment` S4 classes. ***For more details, check out [rds2cpp library](https://github.com/LTLA/rds2cpp).***

---
**Version 0.5.0** brings major changes to the package,
- Complete overhaul of the codebase using pybind11
- Streamlined readers for R data types
- Updated API for all classes and methods

Please refer to the [documentation](https://biocpy.github.io/rds2py/) for the latest usage guidelines. Previous versions may have incompatible APIs.

---

The package provides:

- Efficient parsing of RDS files with *minimal* memory overhead
- Support for R's basic data types and complex S4 objects
  - Vectors (numeric, character, logical)
  - Factors
  - Data frames
  - Matrices (dense and sparse)
  - Run-length encoded vectors (Rle)
- Conversion to appropriate Python/NumPy/SciPy data structures
  - dgCMatrix (sparse column matrix)
  - dgRMatrix (sparse row matrix)
  - dgTMatrix (sparse triplet matrix)
- Preservation of metadata and attributes from R objects
- Integration with BiocPy ecosystem for Bioconductor classes
  - SummarizedExperiment
  - RangedSummarizedExperiment
  - SingleCellExperiment
  - GenomicRanges
  - MultiAssayExperiment

## Installation

Package is published to [PyPI](https://pypi.org/project/rds2py/)

```shell
pip install rds2py

# or install optional dependencies
pip install rds2py[optional]
```

## Usage

If you do not have an RDS object handy, feel free to download one from [single-cell-test-files](https://github.com/jkanche/random-test-files/releases).

### Basic Usage

```python
from rds2py import read_rds
r_obj = read_rds("path/to/file.rds")
```

The returned `r_obj` either returns an appropriate Python class if a parser is already implemented or returns the dictionary containing the data from the RDS file.

## Write-your-own-reader

In addition, the package provides the dictionary representation of the RDS file, allowing users to write their own custom readers into appropriate Python representations.

```python
from rds2py import parse_rds

data = parse_rds("path/to/file.rds")
print(data)
```

if you know this RDS file contains an `GenomicRanges` object, you can use the built-in reader or write your own reader to convert this dictionary.

```python
from rds2py.read_granges import read_genomic_ranges

gr = read_genomic_ranges(data)
```

## Type Conversion Reference

| R Type | Python/NumPy Type |
|--------|------------------|
| numeric | numpy.ndarray (float64) |
| integer | numpy.ndarray (int32) |
| character | list of str |
| logical | numpy.ndarray (bool) |
| factor | list |
| data.frame | BiocFrame |
| matrix | numpy.ndarray or scipy.sparse matrix |
| dgCMatrix | scipy.sparse.csc_matrix |
| dgRMatrix | scipy.sparse.csr_matrix |

## Developer Notes

This project uses pybind11 to provide bindings to the rds2cpp library. Please make sure necessary C++ compiler is installed on your system.

<!-- pyscaffold-notes -->

## Note

This project has been set up using PyScaffold 4.5. For details and usage
information on PyScaffold see https://pyscaffold.org/.
