Metadata-Version: 2.1
Name: sqlite-memory-vfs
Version: 0.0.5
Summary: Python writable in-memory virtual filesystem for SQLite
Project-URL: Homepage, https://github.com/michalc/sqlite-memory-vfs
Author-email: Michal Charemza <michal@charemza.name>, Department for International Trade <sre@digital.trade.gov.uk>
License-File: LICENSE
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Database
Requires-Python: >=3.7.11
Requires-Dist: apsw>=3.43.0.0
Requires-Dist: sortedcontainers>=2.4.0
Provides-Extra: dev
Requires-Dist: pytest-cov>=3.0.0; extra == 'dev'
Requires-Dist: pytest>=6.2.5; extra == 'dev'
Description-Content-Type: text/markdown

# sqlite-memory-vfs

[![PyPI package](https://img.shields.io/pypi/v/sqlite-memory-vfs?label=PyPI%20package&color=%234c1)](https://pypi.org/project/sqlite-memory-vfs/) [![Test suite](https://img.shields.io/github/actions/workflow/status/michalc/sqlite-memory-vfs/test.yml?label=Test%20suite)](https://github.com/michalc/sqlite-memory-vfs/actions/workflows/test.yml) [![Code coverage](https://img.shields.io/codecov/c/github/michalc/sqlite-memory-vfs?label=Code%20coverage)](https://app.codecov.io/gh/michalc/sqlite-memory-vfs)

Python virtual filesystem for SQLite to read from and write to memory.

While SQLite supports the special filename `:memory:` that allows the creation of empty databases in memory, and `sqlite_deserialize` allows the population of an in-memory database from a contiguous block of raw bytes of a serialized database, there is no built-in way to populate such a database using _non_-contiguous raw bytes of a serialized database. The function `sqlite_serialize` can also only serialize a database to a contiguous block of memory. This virtual filesystem overcomes these limitations, and so allows larger databases to be downloaded and queried without hitting disk.

Based on [simonwo's gist](https://gist.github.com/simonwo/b98dc75feb4b53ada46f224a3b26274c) and [uktrade's sqlite-s3vfs](https://github.com/uktrade/sqlite-s3vfs), and inspired by [phiresky's sql.js-httpvfs](https://github.com/phiresky/sql.js-httpvfs), [dacort's Stack Overflow answer](https://stackoverflow.com/a/59434097/1319998) and [michalc's sqlite-s3-query](https://github.com/michalc/sqlite-s3-query).


## Installation

sqlite-memory-vfs can be installed from PyPI using `pip`.

```bash
pip install sqlite-memory-vfs
```

This will automatically install [APSW](https://rogerbinns.github.io/apsw/) along with any other dependencies.


### Deserializing (getting a regular SQLite file into the VFS)

This library allows the raw bytes of a SQLite database to be queried without having to save it to disk. This can be done by using the `deserialize_iter` method of `MemoryVFS`, passing it an iterable of `bytes` instances that contain the SQLite database.

```python
import apsw
import httpx
import sqlite_memory_vfs

memory_vfs = sqlite_memory_vfs.MemoryVFS()

# Any iterable of bytes can be used. In this example, they come via HTTP
with httpx.stream("GET", "https://data.api.trade.gov.uk/v1/datasets/uk-trade-quotas/versions/v1.0.366/data?format=sqlite") as r:
    memory_vfs.deserialize_iter('quota_balances.sqlite', r.iter_bytes())

with apsw.Connection('quota_balances.sqlite', vfs=memory_vfs.name) as db:
    cursor = db.cursor()
    cursor.execute('SELECT * FROM quotas;')
    print(cursor.fetchall())
```

If the `deserialize_iter` step is ommitted an empty database is automatically created in memory.

See the [APSW documentation](https://rogerbinns.github.io/apsw/) for more usage examples.


### Serializing (getting a regular SQLite file out of the VFS)

The bytes corresponding to each SQLite database in the VFS can be extracted with the `serialize_iter` function, which returns an iterable of `bytes`

```python
with open('my_db.sqlite', 'wb') as f:
    for chunk in memory_vfs.serialize_iter('my_db.sqlite'):
        f.write(chunk)
```


### Comparison with `sqlite_deserialize`

The main reason for using sqlite-memory-vfs over `sqlite_deserialize` is the lower memory usage for larger databases. For example the following may not even complete due to running out of memory:

```python
import resource

import apsw
import httpx

url = "https://data.api.trade.gov.uk/v1/datasets/uk-tariff-2021-01-01/versions/v4.0.46/data?format=sqlite"

with apsw.Connection(':memory:') as db:
    db.deserialize('main', httpx.get(url).read())
    cursor = db.cursor()
    cursor.execute('SELECT * FROM measures;')
    print(cursor.fetchall())

print('Max memory usage:', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
```

But the following does / should output a lower value of memory usage:

```python
import resource

import apsw
import httpx
import sqlite_memory_vfs

url = "https://data.api.trade.gov.uk/v1/datasets/uk-tariff-2021-01-01/versions/v4.0.46/data?format=sqlite"
memory_vfs = sqlite_memory_vfs.MemoryVFS()

with httpx.stream("GET", url) as r:
    memory_vfs.deserialize_iter('tariff.sqlite', r.iter_bytes())

with apsw.Connection('tariff.sqlite', vfs=memory_vfs.name) as db:
    cursor = db.cursor()
    cursor.execute('SELECT count(*) FROM measures;')
    print(cursor.fetchall())

print('Max memory usage:', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
```


## Tests

The tests require the dev dependencies installed

```bash
pip install -e ".[dev]"
```

and can then run with pytest

```bash
pytest
```
