Metadata-Version: 2.1
Name: elwood
Version: 0.1.2
Summary: An open source dataset transformation, standardization, and normalization python library.
Home-page: https://github.com/jataware/elwood
Author: Brandon Rose, Powell Fendley
Author-email: info@jataware.com
License: MIT license
Keywords: elwood
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: AUTHORS.md
Requires-Dist: bump2version (==1.0.1)
Requires-Dist: Click (==8.0)
Requires-Dist: coverage (==4.5.4)
Requires-Dist: Cython (==0.29.23)
Requires-Dist: flake8 (==3.7.8)
Requires-Dist: fuzzywuzzy (>=0.18.0)
Requires-Dist: GDAL (==3.1.4)
Requires-Dist: geofeather (>=0.3.0)
Requires-Dist: geopandas (==0.8.1)
Requires-Dist: netCDF4 (==1.5.3)
Requires-Dist: numpy (==1.22)
Requires-Dist: openpyxl (==3.0.7)
Requires-Dist: pandas (==1.5.3)
Requires-Dist: pip (>=21.1)
Requires-Dist: pydantic (>=1.8.2)
Requires-Dist: pyproj (==2.6.1.post1)
Requires-Dist: python-Levenshtein (>=0.12.2)
Requires-Dist: Rtree (==0.8.3)
Requires-Dist: shapely
Requires-Dist: Sphinx (==1.8.5)
Requires-Dist: tox (==3.14.0)
Requires-Dist: tqdm (<5.0.0,>=4.41.1)
Requires-Dist: twine (==1.14.0)
Requires-Dist: watchdog (==0.9.0)
Requires-Dist: wheel (==0.33.6)
Requires-Dist: xarray (==0.16.1)
Requires-Dist: xlrd (==2.0.1)

# Elwood
An open source dataset transformation, standardization, and normalization python library.

# Usage

To use start using Elwood, simply run:

`pip install elwood`

Now you are able to use any of the dataset transformation, standardization, or normalization functions exposed through this library. To start, simply include `from elwood import elwood` in your python file. 

## Standardization
`elwood.process(args)`

#TODO STUB

## Transformation

The transformation functions include geographical extent clipping (latitude/longitude), geographical regridding (gridded data such as NetCDF or GeoTIFF), temporal clipping, and temporal scaling. 

### Geospatial Clipping

`elwood.clip_geo(dataframe, geo_columns, polygons_list)`

This function takes a pandas dataframe, a geo_columns list of the column names for latitude and longitude, ex: `["lat", "lng"]`, and a list containing lists of objects representing the polygons to clip the data to. ex: 
```
[
     [
        {
            "lat": 11.0,
            "lng": 42.0
        },
        {
            "lat": 11.0,
            "lng": 43.0
        },
        {
            "lat": 12.0,
            "lng": 43.0
        },
        {
            "lat": 12.0,
            "lng": 42.0
        }
    ],
    ...
]
```
### Geospatial regridding

`elwood.regrid_dataframe_geo(dataframe, geo_columns, scale_multi)`

This function takes a dataframe and regrids it's geography by some scale multiplier that is provided. This multiplier will be used to divide the current geographical scale in order to make a more coarse grained resolution dataset. The dataframe must have a detectable geographical scale, meaning each lat/lon represents a point in the middle of a gridded cell for the data provided. Lat and lon and determined by the geo_columns passed in: a list of the column names ex: `["lat", "lng"]`

### Temporal Clipping
`elwood.clip_dataframe_time(dataframe, time_column, time_ranges)`

This function will produce a dataframe that only includes rows with `time_column` values contained within `time_ranges`. The time_ranges argument is a list of objects containing a start and end time. ex: `[{"start": datetime, "end": datetime}, ...]`

### Temporal Scaling
`elwood.rescale_dataframe_time(dataframe, time_column, time_bucket, aggregation_function_list)`

This function will produce a dataframe who's rows are the aggregated data based on some time bucket and some aggregation function list provided. The `time_column` is the name of the column containing targeted time values for rescaling. The `time_bucket` is some DateOffset, Timedelta or str representing the desired time granularity, ex. `'M', 'A', '2H'`. The `aggregation_function_list` is a list of aggregation functions to apply to the data.  ex. `['sum']` or `['sum', 'min', 'max']`

## 0 to 1 Normalization

`elwood.normalize_features(dataframe, output_file)`

This function expects a dataframe with a "feature" column and a "value" column, or long data. Each entry for a feature has its own feature/value row.
This function returns a dataframe in which all numerical values under the "value" column for each "feature" have been 0 to 1 scaled.
Optionally you may specify an `output_file` name to generate a parquet file of the dataframe.


# Historys



