Metadata-Version: 2.1
Name: datacatalog-object-storage-processor
Version: 0.1.1
Summary: A package for performing Data Catalog operations on object storage solutions
Home-page: https://github.com/mesmacosta/datacatalog-object-storage-processor
Author: Marcelo Costa
Author-email: mesmacosta@gmail.com
License: MIT license
Platform: Posix; MacOS X; Windows
Classifier: Development Status :: 3 - Alpha
Classifier: Natural Language :: English
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: google-cloud-datacatalog
Requires-Dist: google-cloud-storage
Requires-Dist: pandas

# datacatalog-object-storage-processor

A package for performing Data Catalog operations on object storage solutions.

[![CircleCI][1]][2] [![PyPi][7]][8] [![License][9]][9] [![Issues][10]][11]

<!--
  ⚠️ DO NOT UPDATE THE TABLE OF CONTENTS MANUALLY ️️⚠️
  run `npx markdown-toc -i README.md`.

  Please stick to 80-character line wraps as much as you can.
-->

## Table of Contents

<!-- toc -->

- [1. Environment setup](#1-environment-setup)
  * [1.1. Get the code](#11-get-the-code)
  * [1.2. Auth credentials](#12-auth-credentials)
      - [1.2.1. Create a service account and grant it below roles](#121-create-a-service-account-and-grant-it-below-roles)
      - [1.2.2. Download a JSON key and save it as](#122-download-a-json-key-and-save-it-as)
  * [1.3. Virtualenv](#13-virtualenv)
      - [1.3.1. Install Python 3.6+](#131-install-python-36)
      - [1.3.2. Create and activate an isolated Python environment](#132-create-and-activate-an-isolated-python-environment)
      - [1.3.3. Install the dependencies](#133-install-the-dependencies)
      - [1.3.4. Set environment variables](#134-set-environment-variables)
  * [1.4. Docker](#14-docker)
- [2. Create DataCatalog entries based on object storage files](#2-create-datacatalog-entries-based-on-object-storage-files)
  * [2.1. python main.py](#21-python-mainpy)
- [3 Delete up object storage entries on entry group](#3-delete-up-object-storage-entries-on-entry-group)
- [Disclaimers](#disclaimers)

<!-- tocstop -->

-----

## 1. Environment setup

### 1.1. Get the code

````bash
git clone https://github.com/mesmacosta/datacatalog-object-storage-processor
cd datacatalog-object-storage-processor
````

### 1.2. Auth credentials

##### 1.2.1. Create a service account and grant it below roles

- Data Catalog Admin
- Storage Admin or Custom Role with storage.buckets.list acl

##### 1.2.2. Download a JSON key and save it as
- `./credentials/datacatalog-object-storage-processor-sa.json`

### 1.3. Virtualenv

Using *virtualenv* is optional, but strongly recommended unless you use [Docker](#24-docker).

##### 1.3.1. Install Python 3.6+

##### 1.3.2. Create and activate an isolated Python environment

```bash
pip install --upgrade virtualenv
python3 -m virtualenv --python python3 env
source ./env/bin/activate
```

##### 1.3.3. Install the dependencies

```bash
pip install --upgrade --editable .
```

##### 1.3.4. Set environment variables

```bash
export GOOGLE_APPLICATION_CREDENTIALS=./credentials/datacatalog-object-storage-processor-sa.json
```

### 1.4. Docker

Docker may be used as an alternative to run all the scripts. In this case, please disregard the [Virtualenv](#23-virtualenv) install instructions.

## 2. Create DataCatalog entries based on object storage files

### 2.1. python main.py 

- python

```bash
datacatalog-object-storage-processor \
  object-storage create-entries --type cloud-storage \
  --project-id my_project \
  --entry-group-name my_entry_group_name \
  --bucket-prefix my_bucket
```

- docker

```bash
docker build --rm --tag datacatalog-object-storage-processor .
docker run --rm --tty -v your_credentials_folder:/data datacatalog-object-storage-processor \
  --type cloud-storage \
  --project-id my_project \
  --entry-group-name my_entry_group_name \
  --bucket-prefix my_bucket
```

## 3 Delete up object storage entries on entry group
Delete entries for given entry group

```bash
datacatalog-object-storage-processor \
  object-storage delete-entries --type cloud-storage \
  --project-id my_project \
  --entry-group-name my_entry_group_name
```

## Disclaimers

This is not an officially supported Google product.

[1]: https://circleci.com/gh/mesmacosta/datacatalog-object-storage-processor.svg?style=svg
[2]: https://circleci.com/gh/mesmacosta/datacatalog-object-storage-processor
[3]: https://virtualenv.pypa.io/en/latest/
[7]: https://img.shields.io/pypi/v/datacatalog-object-storage-processor.svg
[8]: https://pypi.org/project/datacatalog-object-storage-processor/
[9]: https://img.shields.io/github/license/mesmacosta/datacatalog-object-storage-processor.svg
[10]: https://img.shields.io/github/issues/mesmacosta/datacatalog-object-storage-processor.svg
[11]: https://github.com/mesmacosta/datacatalog-object-storage-processor/issues


## History


0.1.0 (2020-05-01)
------------------

* First release on PyPI.


