Metadata-Version: 2.1
Name: explosig-data
Version: 0.0.5
Summary: Process mutation data into standard formats originally developed for the ExploSig family of tools
Home-page: https://github.com/lrgr/explosig-data
Author: Leiserson Research Group
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: requests (>=2.22.0)
Requires-Dist: pandas (>=0.25.1)
Requires-Dist: numpy (>=1.17.0)
Requires-Dist: snakemake (>=5.3)
Requires-Dist: biopython (>=1.75)
Requires-Dist: twobitreader (>=3.1)
Requires-Dist: tqdm (>=4.39.0)

[![Build Status](https://travis-ci.org/lrgr/explosig-data.svg?branch=master)](https://travis-ci.org/lrgr/explosig-data)
[![PyPI](https://img.shields.io/pypi/v/explosig-data)](https://pypi.org/project/explosig-data/)

## ExploSig Data

Helpers for processing mutation data into standard formats originally developed for the [ExploSig](https://github.com/lrgr/explosig) family of tools.

- [Documentation](https://lrgr.github.io/explosig-data/)

### Installation

```sh
pip install explosig-data
```

### Example 

With raw SSM/MAF file from ICGC or TCGA:

```python
>>> import explosig_data as ed

>>> # Step 1: Process into the ExploSig "standard format":
>>> data_container = ed.standardize_ICGC_ssm_file('path/to/ssm.tsv') # if ICGC
>>> data_container = ed.standardize_TCGA_maf_file('path/to/maf.tsv') # if TCGA

>>> # Step 2: Process further
>>> data_container.extend_df().to_counts_df('SBS_96', ed.categories.SBS_96_category_list())

>>> # Step 3: Access any processed dataframe of interest:
>>> ssm_df = data_container.ssm_df
>>> extended_df = data_container.extended_df
>>> counts_df = data_container.counts_dfs['SBS_96']


>>> # Alternatively, use without the chaining API:
>>> ssm_df = ed.standardize_ICGC_ssm_file('path/to/ssm.tsv', wrap=False) # if ICGC
>>> ssm_df = ed.standardize_TCGA_maf_file('path/to/maf.tsv', wrap=False) # if TCGA
>>> extended_df = ed.extend_ssm_df(ssm_df)
>>> counts_df = ed.counts_from_extended_ssm_df(
        extended_df, 
        category_colname='SBS_96', 
        category_values=ed.categories.SBS_96_category_list()
    )
```

With data already in the ExploSig "standard format":

```python
>>> import explosig_data as ed
>>> import pandas as pd

>>> # Step 0: Load the data into a dataframe, for example by reading from a TSV file.
>>> ssm_df = pd.read_csv('path/to/standard.tsv', sep='\t')

>>> # Step 1: Wrap the dataframe using the container class to allow use of the chainable functions.
>>> data_container = ed.SimpleSomaticMutationContainer(ssm_df)

>>> # Now see step 2 above (or the alternative steps above).
```


### Development

Install for development (in editable mode):

```sh
pip install -e .
```

Build and push to PyPI:

```sh
python setup.py sdist bdist_wheel
python -m twine upload dist/*
```

