Metadata-Version: 2.1
Name: ngiab_data_preprocess
Version: 1.4.3
Summary: Graphical Tools for creating Next Gen Water model input data.
Author-email: Josh Cunningham <jcunningham8@ua.edu>
Project-URL: Homepage, https://github.com/CIROH-UA/NGIAB_data_preprocess
Project-URL: Issues, https://github.com/CIROH-UA/NGIAB_data_preprocess/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: pyarrow==15.0.2
Requires-Dist: PyYAML==6.0.1
Requires-Dist: pyogrio==0.7.2
Requires-Dist: pyproj==3.6.1
Requires-Dist: Flask==3.0.2
Requires-Dist: Flask-Cors==4.0.1
Requires-Dist: geopandas==0.14.3
Requires-Dist: requests==2.32.2
Requires-Dist: igraph==0.11.4
Requires-Dist: s3fs==2024.3.1
Requires-Dist: xarray==2024.2.0
Requires-Dist: rioxarray==0.15.1
Requires-Dist: zarr==2.17.1
Requires-Dist: netCDF4==1.6.5
Requires-Dist: dask==2024.4.1
Requires-Dist: dask[distributed]==2024.4.1
Requires-Dist: black==24.3.0
Requires-Dist: isort==5.13.2
Requires-Dist: h5netcdf==1.3.0
Requires-Dist: numba==0.59.1
Requires-Dist: exactextract==0.2.0.dev0
Requires-Dist: numpy==1.26.4
Requires-Dist: flaskwebgui==1.1.0
Requires-Dist: tqdm==4.66.4
Requires-Dist: rich==13.7.1
Requires-Dist: colorama==0.4.6
Requires-Dist: bokeh==3.5.1

# NGIAB Data Preprocess

This repository contains tools for preparing data to run a [next gen](https://github.com/NOAA-OWP/ngen) simulation using [NGIAB](https://github.com/CIROH-UA/NGIAB-CloudInfra). The tools allow you to select a catchment of interest on an interactive map, choose a date range, and prepare the data with just a few clicks!

![map screenshot](https://github.com/CIROH-UA/NGIAB_data_preprocess/blob/main/modules/map_app/static/resources/screenshot.png)

## Table of Contents

1. [What does this tool do?](#what-does-this-tool-do)
2. [Requirements](#requirements)
3. [Installation and Running](#installation-and-running)
4. [Development Installation](#development-installation)
5. [Usage](#usage)
6. [CLI Documentation](#cli-documentation)
   - [Arguments](#arguments)
   - [Examples](#examples)
   - [File Formats](#file-formats)
   - [Output](#output)

## What does this tool do?

This tool prepares data to run a next gen simulation by creating a run package that can be used with NGIAB. It picks default data sources, the [v20.1 hydrofabric](https://www.lynker-spatial.com/data?path=hydrofabric%2Fv20.1%2F) and [nwm retrospective v3 forcing](https://noaa-nwm-retrospective-3-0-pds.s3.amazonaws.com/index.html#CONUS/zarr/forcing/) data.

## Requirements

* This tool is officially supported on macOS or Ubuntu (tested on 22.04 & 24.04). To use it on Windows, please install [WSL](https://learn.microsoft.com/en-us/windows/wsl/install).
* GDAL needs to be installed.
* The 'ogr2ogr' command needs to work in your terminal.
`sudo apt install gdal-bin` will install gdal and ogr2ogr on ubuntu / wsl

## Installation and Running

```bash
# optional but encouraged: create a virtual environment
python3 -m venv env
source env/bin/activate
# installing and running the tool
pip install ngiab_data_preprocess
python -m map_app
```

The first time you run this command, it will download the hydrofabric and model parameter files from Lynker Spatial. If you already have them, place `conus.gpkg` and `model_attributes.parquet` into `modules/data_sources/`.

## Development Installation

<details>
  <summary>Click to expand installation steps</summary>

To install and run the tool, follow these steps:

1. Clone the repository:
   ```bash
   git clone https://github.com/CIROH-UA/NGIAB_data_preprocess
   cd NGIAB_data_preprocess
   ```
2. Create a virtual environment and activate it:
   ```bash
   python3 -m venv env
   source env/bin/activate
   ```
3. Install the tool:
   ```bash
   pip install -e .
   ```
4. Run the map app:
   ```bash
   python -m map_app
   ```
</details>

## Usage

Running the command `python -m map_app` will open the app in a new browser tab. Alternatively, you can manually open it by going to [http://localhost:5000](http://localhost:5000) with the app running.

To use the tool:
1. Select the catchment you're interested in on the map.
2. Pick the time period you want to simulate.
3. Click the following buttons in order:
    1) Create subset gpkg
    2) Create Forcing from Zarrs
    3) Create Realization

Once all the steps are finished, you can run NGIAB on the folder shown underneath the subset button.

**Note:** When using the tool, the output will be stored in the `./output/<your-first-catchment>/` folder. There is no overwrite protection on the folders.

# CLI Documentation

<details>
<summary>Click to expand CLI documentation</summary>

## Arguments

- `-h`, `--help`: Show the help message and exit.
- `-i INPUT_FILE`, `--input_file INPUT_FILE`: Path to a CSV or TXT file containing a list of waterbody IDs, lat/lon pairs, or gage IDs; or a single waterbody ID (e.g., `wb-5173`), a single lat/lon pair, or a single gage ID.
- `-l`, `--latlon`: Use latitude and longitude instead of waterbody IDs. When used with `-i`, the file should contain lat/lon pairs.
- `-g`, `--gage`: Use gage IDs instead of waterbody IDs. When used with `-i`, the file should contain gage IDs.
- `-s`, `--subset`: Subset the hydrofabric to the given waterbody IDs, locations, or gage IDs.
- `-f`, `--forcings`: Generate forcings for the given waterbody IDs, locations, or gage IDs.
- `-r`, `--realization`: Create a realization for the given waterbody IDs, locations, or gage IDs.
- `--start_date START_DATE`: Start date for forcings/realization (format YYYY-MM-DD).
- `--end_date END_DATE`: End date for forcings/realization (format YYYY-MM-DD).
- `-o OUTPUT_NAME`, `--output_name OUTPUT_NAME`: Name of the subset to be created (default is the first waterbody ID in the input file).

## Examples

`-l`, `-g`, `-s`, `-f`, `-r` can be combined like normal CLI flags. For example, to subset, generate forcings, and create a realization, you can use `-sfr` or `-s -f -r`.

1. Subset hydrofabric using waterbody IDs:
   ```
   python -m ngiab_data_cli -i waterbody_ids.txt -s
   ```

2. Generate forcings using a single waterbody ID:
   ```
   python -m ngiab_data_cli -i wb-5173 -f --start_date 2023-01-01 --end_date 2023-12-31
   ```

3. Create realization using lat/lon pairs from a CSV file:
   ```
   python -m ngiab_data_cli -i locations.csv -l -r --start_date 2023-01-01 --end_date 2023-12-31 -o custom_output
   ```

4. Perform all operations using a single lat/lon pair:
   ```
   python -m ngiab_data_cli -i 54.33,-69.4 -l -s -f -r --start_date 2023-01-01 --end_date 2023-12-31
   ```

5. Subset hydrofabric using gage IDs from a CSV file:
   ```
   python -m ngiab_data_cli -i gage_ids.csv -g -s
   ```

6. Generate forcings using a single gage ID:
   ```
   python -m ngiab_data_cli -i 01646500 -g -f --start_date 2023-01-01 --end_date 2023-12-31
   ```

## File Formats

### 1. Waterbody ID input:
- CSV file: A single column of waterbody IDs, or a column named 'wb_id', 'waterbody_id', or 'divide_id'.
- TXT file: One waterbody ID per line.

Example CSV (waterbody_ids.csv):
```
wb_id,soil_type
wb-5173,some
wb-5174,data
wb-5175,here
```
Or:
```
wb-5173
wb-5174
wb-5175
```

### 2. Lat/Lon input:
- CSV file: Two columns named 'lat' and 'lon', or two unnamed columns in that order.
- Single pair: Comma-separated values passed directly to the `-i` argument.

Example CSV (locations.csv):
```
lat,lon
54.33,-69.4
55.12,-68.9
53.98,-70.1
```
Or:
```
54.33,-69.4
55.12,-68.9
53.98,-70.1
```

### 3. Gage ID input:
- CSV file: A single column of gage IDs, or a column named 'gage' or 'gage_id'.
- TXT file: One gage ID per line.
- Single gage ID: Passed directly to the `-i` argument.

Example CSV (gage_ids.csv):
```
gage_id,station_name
01646500,Potomac River
01638500,Shenandoah River
01578310,Susquehanna River
```
Or:
```
01646500
01638500
01578310
```

## Output

The script creates an output folder named after the first waterbody ID in the input file, the provided output name, or derived from the first lat/lon pair or gage ID. This folder will contain the results of the subsetting, forcings generation, and realization creation operations.

</details>
