Metadata-Version: 2.1
Name: ssb-ipython-kernels
Version: 0.2.26
Summary: Jupyter kernels for working with dapla services
Home-page: https://github.com/statisticsnorway/dapla-ipython-kernels
Author: Statistics Norway
Author-email: bjorn.skaar@ssb.no
License: MIT
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Description-Content-Type: text/markdown
Requires-Dist: ipython (==7.22.0)
Requires-Dist: pyspark (==3.1.1)
Requires-Dist: jupyterhub (==1.3.0)
Requires-Dist: oauthenticator (==14.0.0)
Requires-Dist: jwt (==1.2.0)
Requires-Dist: requests (==2.25.1)
Requires-Dist: requests-cache (==0.5.2)
Requires-Dist: responses (==0.13.2)
Requires-Dist: ipykernel (==5.5.3)
Requires-Dist: notebook (==6.3.0)
Requires-Dist: tornado (==6.1)
Requires-Dist: gcsfs (==0.6.2)
Requires-Dist: pyarrow (==3.0.0)
Requires-Dist: pandas (==1.2.4)
Requires-Dist: google-auth (==1.28.1)
Requires-Dist: google-auth-oauthlib (==0.4.4)
Requires-Dist: ipywidgets (==7.6.3)

# dapla-ipython-kernels
Python module for use within Jupyter notebooks. It contains kernel extensions for integrating with Apache Spark, 
Google Cloud Storage and custom dapla services.

[![PyPI version](https://img.shields.io/pypi/v/ssb-ipython-kernels.svg)](https://pypi.python.org/pypi/ssb-ipython-kernels/)
[![Status](https://img.shields.io/pypi/status/ssb-ipython-kernels.svg)](https://pypi.python.org/pypi/ssb-ipython-kernels/)
[![License](https://img.shields.io/pypi/l/ssb-ipython-kernels.svg)](https://pypi.python.org/pypi/ssb-ipython-kernels/)

## Getting Started

Install the module from pip:

```bash
# pip
pip install dapla-ipython-kernels
```

Now the module is ready to use with a single import:

```python
import dapla as dp
```

This module is targeted to python kernels in Jupyter, but it may work in any IPython environment. 
It also depends on a number of custom services, e.g. [the custom auth service](dapla/jupyterextensions/authextension.py)

To test, simply create any Pandas dataframe. This can be stored in Google Cloud Storage at a specific path:

```python
import pandas as pd
import dapla as dp

data = {
    'apples': [3, 2, 0, 1], 
    'oranges': [0, 3, 7, 2]
}
# Create pandas DataFrame
purchases = pd.DataFrame(data, index=['June', 'Robert', 'Lily', 'David'])

# Write pandas DataFrame to parquet
dp.write_pandas(purchases, '/testfolder/python/purchases', valuation='INTERNAL', state= 'INPUT')
```

Conversely, parquet files can be read from a path directly into a pandas DataFrame. 

```python
import dapla as dp
# Read path into pandas dataframe 
purchases = dp.read_pandas('/testfolder/python/purchases')
```

## Other functions

Since the python module integrates with Google Cloud Storage and custom dapla services, 
some other functions exist as well:

```python
import dapla as dp

# List path by prefix
dp.show('/testfolder/python')
```
| Path  | Timestamp |
| ----------------------------- | ------------- |
| /testfolder/python/purchases  | 1593120298095 |
| /testfolder/python/other  | 1593157667793 |


```python
import dapla as dp

# Show file details
dp.details('/testfolder/python/purchases')
```
| Size  | Name |
| ----- | -------------------------------------- |
| 2908  | 42331105444c9ca0ce049ef6de7160.parquet |


See also the [example notebook](examples/dapla_notebook.ipynb) written for Jupyter.

## Deploy to SSB jupyter

### Release version pypi
Make sure you have a clean master branch.<br>
run `make bump-version-patch` - this will update version and commit to git.<br>
run `git push --tags origin master` - important to have --tags to make it auto deploy to pypi

If everything was ok we should see a new release her: https://pypi.org/project/ssb-ipython-kernels/

### Update jupyter image on staging
* Bump ssb-ipython-kernels in dapla-gcp-jupyter [Dockerfile](https://github.com/statisticsnorway/dapla-gcp-jupyter/blob/master/jupyter/Dockerfile) <br>
    * Example of previous [update]( https://github.com/statisticsnorway/dapla-gcp-jupyter/commit/8027dc1cbad15dadb1347fe452c78711463e9f3c) <br> 
* Check new tag from build on [azure piplines](https://dev.azure.com/statisticsnorway/Dapla/_build/results?buildId=11202&view=logs&jobId=2143f898-48de-5476-aeb8-70e74f8d7c33&j=667c30d6-a912-540e-a406-35cd05a9f751&t=fb539ba6-e537-5346-19c8-c46f7dd4b185)
* update [platform dev jupyter-kubespawner-config](https://github.com/statisticsnorway/platform-dev/blob/master/flux/staging-bip-app/dapla-spark/jupyter/kubespawner-config.yaml) with tag
    * [Example](https://github.com/statisticsnorway/platform-dev/commit/b063b830deb6bc0d6a485d7f08fda473cf340ff6)

For now, we have to delete the running jupyer hub instance to make it use this new config


