Metadata-Version: 2.1
Name: pg-bulk-ingest
Version: 0.0.45
Summary: A collection of Python utility functions for ingesting data into SQLAlchemy-defined PostgreSQL tables, automatically migrating them as needed, and minimising locking
Project-URL: Source, https://github.com/uktrade/pg-bulk-ingest
Author-email: Department for International Trade <sre@digital.trade.gov.uk>
License-File: LICENSE
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.7
Requires-Dist: pg-force-execute>=0.0.10
Requires-Dist: sqlalchemy>=1.4.24
Requires-Dist: to-file-like-obj>=0.0.5
Provides-Extra: ci
Requires-Dist: coverage; extra == 'ci'
Requires-Dist: pytest; extra == 'ci'
Requires-Dist: pytest-cov; extra == 'ci'
Provides-Extra: ci-psycopg2-sqlalchemy1
Requires-Dist: psycopg2==2.9.2; extra == 'ci-psycopg2-sqlalchemy1'
Requires-Dist: sqlalchemy==1.4.24; extra == 'ci-psycopg2-sqlalchemy1'
Requires-Dist: to-file-like-obj==0.0.5; extra == 'ci-psycopg2-sqlalchemy1'
Provides-Extra: ci-psycopg2-sqlalchemy2
Requires-Dist: psycopg2==2.9.2; extra == 'ci-psycopg2-sqlalchemy2'
Requires-Dist: sqlalchemy==2.0.0; extra == 'ci-psycopg2-sqlalchemy2'
Requires-Dist: to-file-like-obj==0.0.5; extra == 'ci-psycopg2-sqlalchemy2'
Provides-Extra: ci-psycopg3-sqlalchemy2
Requires-Dist: psycopg==3.1.4; extra == 'ci-psycopg3-sqlalchemy2'
Requires-Dist: sqlalchemy==2.0.0; extra == 'ci-psycopg3-sqlalchemy2'
Requires-Dist: to-file-like-obj==0.0.5; extra == 'ci-psycopg3-sqlalchemy2'
Provides-Extra: dev
Requires-Dist: psycopg2>=2.9.2; extra == 'dev'
Requires-Dist: psycopg>=3.1.4; extra == 'dev'
Requires-Dist: pytest>=7.2.1; extra == 'dev'
Description-Content-Type: text/markdown

# pg-bulk-ingest

[![PyPI package](https://img.shields.io/pypi/v/pg-bulk-ingest?label=PyPI%20package&color=%234c1)](https://pypi.org/project/pg-bulk-ingest/) [![Test suite](https://img.shields.io/github/actions/workflow/status/uktrade/pg-bulk-ingest/test.yml?label=Test%20suite)](https://github.com/uktrade/pg-bulk-ingest/actions/workflows/test.yml) [![Code coverage](https://img.shields.io/codecov/c/github/uktrade/pg-bulk-ingest?label=Code%20coverage)](https://app.codecov.io/gh/uktrade/pg-bulk-ingest)

A Python utility function for ingesting data into a SQLAlchemy-defined PostgreSQL table, automatically migrating it as needed, allowing concurrent reads as much as possible.

Allowing concurrent writes is not an aim of pg-bulk-ingest. It is designed for use in ETL pipelines where PostgreSQL is used as a data warehouse, and the only writes to the table are from pg-bulk-ingest. It is assumed that there is only one pg-bulk-ingest running against a given table at any one time.


## Features

pg-bulk-ingest exposes a single function as its API that:

- Creates the table if necessary
- Migrates any existing table if necessary, minimising locking
- Ingests data in batches, where each batch is ingested in its own transaction
- Handles "high-watermarking" to carry on from where a previous ingest finished or errored
- Optionally performs an "upsert", matching rows on primary key
- Optionally deletes all existing rows before ingestion
- Optionally calls a callback just before each batch is visible to other database clients


---

Visit the [pg-bulk-ingest documentation](https://pg-bulk-ingest.docs.trade.gov.uk/) for usage instructions.
