Metadata-Version: 2.1
Name: qapi-sdk
Version: 0.3.1
Summary: QAPI SDK provides a library of classes for working with Query API in your Python code.
Home-page: https://github.com/oneaudience/data-team-qapi-sdk
Author: TheBridgeDan
Author-email: 97176881+TheBridgeDan@users.noreply.github.com
Requires-Python: >=3.10,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Requires-Dist: boto3 (>=1.23.10,<2.0.0)
Requires-Dist: python-dotenv (>=0.20.0,<0.21.0)
Requires-Dist: requests (>=2.27.1,<3.0.0)
Requires-Dist: smart-open (>=6.0.0,<7.0.0)
Project-URL: Repository, https://github.com/oneaudience/data-team-qapi-sdk
Description-Content-Type: text/markdown

# QAPI SDK

QAPI SDK provides a library of classes for working with Query API in your Python code.

## Requirements

* Python 3.6+
* Must be logged into the private VPN.

## Installation

```bash
pip install qapi-sdk 
```

## Environment Variables

- `QAPI_URL`: QAPI Base URL.
- `EMAIL`: Your Email

  *Optional*: If you choose to add your AWS credentials, you can use the `read_columns` method to read in the
  headers of your CSV file automatically.
- `AWS_ACCESS_KEY_ID`: AWS Access Key ID
- `AWS_SECRET_ACCESS_KEY`: AWS Secret Access Key
- `AWS_DEFAULT_REGION`: AWS Default Region

## Examples

### Query

- `FEED ID`: The table must exist in Athena.
- `QUERY ID`: The query id is used as an identifier for the query. Query id must be unique. Once you have retrieved your
  data from S3 it is advised to delete the query.
- `SQL`: The SQL query to be executed.

```python
import time

from dotenv import load_dotenv

from qapi_sdk import Query

load_dotenv()

# Step 1: Assign your FEED ID, QUERY ID, and SQL QUERY
feed_id = "[FEED/TABLE NAME]"
query_id = "[QUERY NAME]"
query = f"SELECT * FROM {feed_id}"

# Step 2: Create a Query object
my_query = Query(
    feed_id=feed_id,
    query_id=query_id
)

# Step 3: Execute the query push
my_query.push_query(sql=query)

# Step 4: Wait for the query to complete
while my_query.query_status():
    print("Waiting for query to complete...")
    time.sleep(10)

# Step 5 (Optional): Delete the query
my_query.delete_query()
```

### Feed

- `FEED ID`: The table name you want to create in Athena.
- `PUSH ID`: The push id is used as an identifier for the query. Push id must be unique.
- `COLUMNS`: The name of the columns that will be pushed to Athena.

```python
import time

from dotenv import load_dotenv

from qapi_sdk import Feed

load_dotenv()

# Step 1: Assign your FEED ID, PUSH ID, and COLUMNS
feed_id = "test_feed"
push_id = "test_push"

# Step 1A: You can manually assign the columns
columns = [
    {
        "name": "email",
        "type": "string"
    },
    {
        "name": "md5email",
        "type": "string"
    },
    {
        "name": "firstname",
        "type": "string"
    }
]

# Step 1B (Optional): If you added AWS credentials, you can use the `read_columns` method to read 
# in the headers of your CSV file automatically.
columns = my_feed.read_columns(
    data_bucket="[DATA BUCKET]",
    data_key_dir="path/to/your/data/",
    delimiter=","
)

# Step 2: Create a Feed object
my_feed = Feed(feed_id=feed_id, push_id=push_id)

# Step 3: Define where to grab the data and format of the data.Then push the data to Athena.
my_feed.push_feed(
    pull_path_bucket="[DATA BUCKET]",
    pull_path_key="path/to/your/data/",
    columns=columns,
    separator=","
)

# Step 4: Wait for the push to complete
while my_feed.push_status():
    print("Waiting for push to complete...")
    time.sleep(10)

# Step 5 (Optional): Delete the push
my_feed.delete_push()

# Step 6 (Optional): Delete the feed
my_feed.delete_feed()
```

## Redshift

- `FEED ID`: The table must exist in Redshift.
- `QUERY ID`: The query id is used as an identifier for the query. Query id must be unique. Once you have retrieved your
  data from S3 it is advised to delete the query.
- `SQL`: The SQL query to be executed.
- If you query an Athena table from Redshift, you must append the Athena schema to the table name.
    - For example: `SELECT * FROM [query_api].[TABLE NAME]`
- If you use a `LIMIT` clause, you must wrap the query in a `SELECT * FROM ()` clause.
    - For example: `SELECT * FROM (SELECT * FROM [TABLE NAME] LIMIT 100)`

```python
import time

from dotenv import load_dotenv

from qapi_sdk import Redshift

load_dotenv()

# Step 1: Assign your FEED ID, QUERY ID, and SQL QUERY
feed_id = "dans_feed"
query_id = "dans_query"
query = "SELECT * FROM (SELECT * FROM [SCHEMA].[TABLE NAME] LIMIT 10)"

# Step 2: Create a Redshift object
my_query = Redshift(
    feed_id=feed_id,
    query_id=query_id
)

# Step 3: Execute the query push
my_query.push_query(sql=query)

# Step 4: Wait for the query to complete
while my_query.query_status():
    print("Waiting for query to complete...")
    time.sleep(10)

# Step 5 (Optional): Delete the query
# my_query.delete_query()

```

## CHANGELOG

### [0.3.1] - 2020-05-30

- Updated `README.md`

### [0.3.0] - 2020-05-30

- Added `Redshift` object to the SDK.
- Added `delete_query` method to Redshift class.
- Added `query_status` method to Redshift class.
- Added `push_query` method to Redshift class.
- Updated `README.md`

### [0.2.1] - 2020-05-30

- Added `homepage` and `repository` links to the `pyproject.toml` file.

### [0.2.0] - 2020-05-29

- Added `FEED` object to the SDK.
- Added `read_columns` method to Feed class.
- Added `delete_push` method to Feed class.
- Added `delete_feed` method to Feed class.
- Added `push_status` method to Feed class.
- Added `push_feed` method to Feed class.
- Updated `README.md`

### [0.1.4] - 2022-05-29

- Added `QUERY` object to the SDK.
- Added `delete_query` method to Query class.
- Added `query_status` method to Query class.
- Added `push_query` method to Query class.
- Added the `CHANGELOG` section.
- Updated `README.md`

