Metadata-Version: 2.1
Name: hypercluster
Version: 0.1.12
Summary: A package for automatic clustering hyperparameter optmization
Home-page: https://github.com/liliblu/hypercluster
Author: Lili Blumenberg, Ruggles Lab
Author-email: lili.blumenberg@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3.7
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Operating System :: Unix
Description-Content-Type: text/markdown
Requires-Dist: pandas (>=0.24.2)
Requires-Dist: numpy (>=1.16.4)
Requires-Dist: scipy (>=1.2.1)
Requires-Dist: matplotlib (>=3.1.0)
Requires-Dist: seaborn (>=0.9.0)
Requires-Dist: scikit-learn (>=0.22.0)
Requires-Dist: hdbscan (>=0.8.24)
Requires-Dist: snakemake (>=5.8.2)
Requires-Dist: igraph (>=0.7.1)
Requires-Dist: leidenalg (>=0.7.0)
Requires-Dist: louvain (>=0.6.1)

# Hypercluster
A package for clustering optimization with sklearn. 

### Requirements:  
pandas  
numpy  
scipy  
matplotlib  
seaborn  
scikit-learn  
hdbscan  

Optional:
snakemake


### Install  
With pip:
```
pip install hypercluster
```

or with conda:
```
conda install hypercluster
# or
conda install -c conda-forge -c bioconda hypercluster
```
If you are having problems installing with conda, try changing your channel priority. Priority of conda-forge > bioconda > defaults is recommended. 
To check channel priority: `conda config --get channels`
It should look like:
```
--add channels 'defaults'   # lowest priority
--add channels 'bioconda'
--add channels 'conda-forge'   # highest priority
```

If it doesn't look like that, try:
```
conda config --add channels bioconda
conda config --add channels conda-forge
```

### Docs 
https://hypercluster.readthedocs.io/en/latest/index.html  

It will also be useful to check out sklearn's page on [clustering](https://scikit-learn.org/stable/modules/clustering.html) 
and [evaluation metrics](https://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation) 

### Examples
https://github.com/liliblu/hypercluster/tree/dev/examples

### Quickstart with SnakeMake

Default `config.yml` and `hypercluster.smk` are in the snakemake repo above.  
Edit the `config.yml` file or arguments.
```bash
snakemake -s hypercluster.smk --configfile config.yml --config input_data_files=test_data input_data_folder=. 
```

Example editing with python:
```python
import yaml

with open('config.yml', 'r') as fh:
    config = yaml.load(fh)

input_data_prefix = 'test_data'
config['input_data_folder'] = os.path.abspath('.')
config['input_data_files'] = [input_data_prefix]
config['read_csv_kwargs'] = {input_data_prefix:{'index_col': [0]}}

with open('config.yml', 'w') as fh:
    yaml.dump(config, stream=fh)
```

Then call snakemake. 
```bash
snakemake -s hypercluster.smk
```

Or submit the snakemake scheduler as an sbatch job e.g. with BigPurple Slurm:
```bash
module add slurm
sbatch snakemake_submit.sh
```
Examples for `snakemake_submit.sh` and `cluster.json` is in the scRNA-seq example. 

### Quickstart with python
```python
import pandas as pd
from sklearn.datasets import make_blobs
import hypercluster

data, labels = make_blobs()
data = pd.DataFrame(data)
labels = pd.Series(labels, index=data.index, name='labels')

# With a single clustering algorithm
clusterer = hypercluster.AutoClusterer()
clusterer.fit(data).evaluate(
  methods = hypercluster.constants.need_ground_truth+hypercluster.constants.inherent_metrics, 
  gold_standard = labels
  )

clusterer.visualize_evaluations()

# With a range of algorithms

clusterer = hypercluster.MultiAutoClusterer()
clusterer.fit(data).evaluate(
  methods = hypercluster.constants.need_ground_truth+hypercluster.constants.inherent_metrics, 
  gold_standard = labels
  )

clusterer.visualize_evaluations()
```


