Metadata-Version: 2.1
Name: datajunction
Version: 0.0.1a12
Summary: DataJunction client library for connecting to a DataJunction server
Project-URL: repository, https://github.com/DataJunction/dj
Author-email: DataJunction Authors <yian.shang@gmail.com>
License: MIT
License-File: LICENSE.txt
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: <4.0,>=3.8
Requires-Dist: alive-progress>=3.1.2
Requires-Dist: pydantic>=1.10.7
Requires-Dist: requests<3.0.0,>=2.28.2
Provides-Extra: pandas
Requires-Dist: pandas>=2.0.2; extra == 'pandas'
Description-Content-Type: text/markdown

# DataJunction Python Client

## Installation
To install:
```
pip install datajunction
```

## Examples

To initialize the client:

```python
from datajunction import DJClient

dj = DJClient("http://dj-endpoint:8000")
```

### Catalogs and Engines

To list available catalogs for the DJ host:
```python
dj.catalogs()
```

To list available engines for the DJ host:
```python
dj.engines()
```

To create a catalog:

```python
from datajunction import Catalog

catalog = Catalog(
    name="prod"
)
catalog.publish()
```

To create an engine:

```python
from datajunction import Engine

engine = Engine(
    name="spark",
    version="3.2.2",
    uri="..."
)
engine.publish()
```

To attach an engine to a catalog:
```python
catalog.add_engine(engine)
```

### Nodes

All nodes for a given namespace can be found with:
```python
dj.namespace("default").nodes()
```

Specific node types can be retrieved with:
```python
dj.namespace("default").sources()
dj.namespace("default").dimensions()
dj.namespace("default").metrics()
dj.namespace("default").transforms()
dj.namespace("default").cubes()
```

To create a source node:

```python
from datajunction import NodeMode

repair_orders = dj.new_source(
    name="repair_orders",
    display_name="Repair Orders",
    description="Repair orders",
    catalog="dj",
    schema_="roads",
    table="repair_orders",
)
repair_orders.save(mode=NodeMode.PUBLISHED)
```

Nodes can also be created as drafts with:
```python
repair_orders.save(mode=NodeMode.DRAFT)
```

To create a dimension node:
```python
repair_order = dj.new_dimension(
    name="repair_order",
    query="""
    SELECT
      repair_order_id,
      municipality_id,
      hard_hat_id,
      dispatcher_id
    FROM repair_orders
    """,
    description="Repair order dimension",
    primary_key=["repair_order_id"],
)
repair_order.save()
```

To create a transform node:
```python
large_revenue_payments_only = dj.new_transform(
    name="large_revenue_payments_only",
    query="""
    SELECT
      payment_id,
      payment_amount,
      customer_id,
      account_type
    FROM revenue
    WHERE payment_amount > 1000000
    """,
    description="Only large revenue payments",
)
large_revenue_payments_only.save()
```

To create a metric:
```python
num_repair_orders = dj.new_metric(
    name="num_repair_orders",
    query="""
    SELECT
      count(repair_order_id)
    FROM repair_orders
    """,
    description="Number of repair orders",
)
num_repair_orders.save()
```
