Metadata-Version: 2.1
Name: sql-test-kit
Version: 0.4.0
Summary: Framework for testing SQL queries
Author: victorlandeau
Author-email: vlandeau@gmail.com
Requires-Python: >=3.8,<3.12
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Provides-Extra: bigquery
Requires-Dist: db-dtypes[bigquery] (>=1.0.0)
Requires-Dist: google-cloud-bigquery[bigquery] (>=3.0.0)
Requires-Dist: pandas (>=1.0.0)
Description-Content-Type: text/markdown

# sql-test-kit

This is a framework for testing SQL queries.
It works by directly running the queries against the targeted engine, thus being robust to any change in the
corresponding SQL dialect.
Moreover, it is currently focused on interpolating test data directly inside the SQL queries, making the test much
quicker than if it was creating actual tables.

# Installation

This package is available on Pypi, so you can use your favorite dependency managment tool to install it. For example :
* with pip : 
```shell
pip install sql-test-kit
```
* with poetry : 
```shell
poetry add sql-test-kit
```

# Usage example

`sql-test-kit` is currently available for most SQL engines (Postgres, Redshift, Snowflakes...), and is particularly useful
for those where no framework exists for locally testing SQL queries.

Nevertheless, a specific implementation has been written for BigQuery, in order to facilitate Table object initialization,
as well as null values interpolation in tests.

Here is a simple example of instantiating a Table object and a SQL query for BigQuery :
```python
from sql_test_kit import BigqueryTable, Column


sales_amount_col = "SALES_AMOUNT"
sales_date_col = "SALES_DATE"
sales_table = BigqueryTable(
    project="project",
    dataset="dataset",
    table="table",
    columns=[
        Column(sales_amount_col, "FLOAT64"),
        Column(sales_date_col, "STRING"),
    ],
)
current_year_sales_by_day_query = f"""
    SELECT {sales_date_col}, SUM({sales_amount_col}) AS {sales_amount_col}
    FROM {sales_table}
    WHERE {sales_date_col} >= "2023-01-01"
    GROUP BY {sales_date_col}
    """
```

You can then test it this way :
```python
import pandas as pd
from google.cloud.bigquery import Client

from sql_test_kit import QueryInterpolator


def test_current_year_sales_by_day_query():
    # Given
    sales_data = pd.DataFrame(
        {
            "SALES_ID": [1, 2, 3, 4],
            sales_date_col: ["2022-12-31", "2023-01-01", "2023-01-01", "2023-01-02"],
            sales_amount_col: [10, 20, 30, 40],
        }
    )

    # When
    interpolated_query = QueryInterpolator() \
        .add_input_table(sales_table, sales_data) \
        .interpolate_query(current_year_sales_by_day_query)
    current_year_sales_by_day_data = Client().query(interpolated_query).to_dataframe()

    # Then
    expected_current_year_sales_by_day_data = pd.DataFrame(
        {
            sales_date_col: ["2023-01-01", "2023-01-02"],
            sales_amount_col: [50, 40],
        }
    )

    pd.testing.assert_frame_equal(
        current_year_sales_by_day_data,
        expected_current_year_sales_by_day_data,
        check_dtype=False,
    )
```

