Metadata-Version: 2.1
Name: scanpy-scripts
Version: 0.1.0.1
Summary: Scripts for using scanpy from the command line
Home-page: https://github.com/ebi-gene-expression-group/scanpy-scripts
Author: nh3
Author-email: 
License: UNKNOWN
Platform: UNKNOWN
Description-Content-Type: text/markdown
Requires-Dist: matplotlib
Requires-Dist: pandas
Requires-Dist: scanpy (==1.3.2)
Requires-Dist: louvain

# scanpy-scripts
Scripts for using scanpy from the command line

In order to wrap scanpy's internal workflow in any given workflow language, it's important to have scripts to call each of those steps. These scripts are being written here, and will improve in completeness as time progresses. 

## Install

```bash
conda install scanpy-scripts
# or
pip3 install scanpy-scripts
```

## Test installation

There is an example script included:

```bash
scanpy-scripts-tests.sh
```

This downloads [a well-known test 10X dataset]('https://s3-us-west-2.amazonaws.com/10x.files/samples/cell/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz) and executes all of the scripts described below.

## Commands

Currently wrapper scripts are described below. Each script has usage instructions available via --help, consult function documentation in scanpy for further details.

### read-10x: read 10X data and create the AnnData object (calls `sc.read()`)

```bash
scanpy read-10x -d <10x data directory> -o <raw data object in .h5ad format>
```

### filter-cells: filter out poor-quality cells (calls `sc.pp.filter_cells()`)

```bash
scanpy filter-cells -i <raw data object in .h5ad format> -p n_genes,n_counts -l <min_genes>,<min_counts> -o <cell-filtered object in .h5ad format>
``` 

### filter-genes: filter out poorly covered genes (calls `sc.pp.filter_genes()`)

```bash
scanpy filter-genes -i <cell-filtered object in .h5ad format> -p n_cells,n_counts -l <min_celss>,<min_counts> -o <cell-and-gene-filtered object in .h5ad format>
``` 

### normalise-data: normalise the expression values (calls `sc.pp.normalize_per_cell()`)

```bash
scanpy normalise-data.py -i <cell-and-gene-filtered object in .h5ad format> -s <scale_factor> -o <object with normalised expression values in .h5ad format> [--save-raw]
```

### find-variable-genes: find variable genes (calls `sc.pp.filter_genes_dispersion()`)

```bash
scanpy find-variable-genes -i <object with normalised expression values in .h5ad format> --flavor <method to normalise dispersion> -p mean,disp -l <min_mean>,<min_disp> -j <high_mean>,<high_disp> -b <n bins> -n <n top genes> -o <object with variable genes in .h5ad format>
```

### scale-data: scale expression values (calls `sc.pp.scale()`, optionally `sc.pp.log1p()` and `sc.pp.regress_out()`)

```bash
scanpy scale-data -i <object with variable genes in .h5ad format>  -V <variables to regress> -x <scale max> -o <object with scaled expression values in .h5ad format>
```

### run-pca: run principal components analysis (calls `sc.tl.pca()`, `sc.pl.pca()`)

```bash
scanpy run-pca -i <object with scaled expression values in .h5ad format> -n <number of pcs to compute> -o <output object with PCs in .h5ad format> --output-embeddings-file <pca embedding in csv format> -output-loadings-file <pca loadings file in csv format> --output-stdev-file <pca stdev file in text format> --output-var-ratio-file <pca proportion of explained variance file in text format> -P <PCA plot file> --color <variable to color cells by>
```

### neighbours: compute neighbourhood graph (calls `sc.pp.neighbors()`)

```bash
scanpy neighbours -i <object with PCs .h5ad format> -N <number of neighbors to consider> -n <number of PCs to use> -m <method to compute connectivity> -M <distance metric> -o <output object with neighbourhood graph in .h5ad format>
```

### find-cluster: find clusters (calls `sc.tl.louvain()`)

```bash
scanpy find-cluster -i <object with neighbourhood graph in .h5ad format> --flavor <method to compute clustering> -o <output object with clusters in .h5ad format> --output-text-file <output cluster assignment table in csv table>
```

### run-umap: run UMAP analysis (calls `sc.tl.umap()`, `sc.pl.umap()`)

```bash
scanpy run-umap -i <object with clusters in .h5ad format> -n <number of dimensions> -o <output object with umap embeddings in .h5ad format> --output-embeddings-file <umap embeddings in csv format> -P <umap plot file> --color <variable to color cells by>
```

### run-tsne: run tSNE analysis (calls `sc.tl.tsne()`, `sc.pl.tsne()`)

```bash
scanpy run-tsne -i <object with clusters in .h5ad format> -n <number of dimensions> -o <output object with tsne embeddings in .h5ad format> --output-embeddings-file <tsne embeddings in csv format> -P <tsne plot file> --color <variable to color cells by>
```

### find-markers: find marker genes for each group/cluster of cells (calls `sc.tl.rank_genes_groups()`)

```bash
scanpy find-clusters -i <object with clusters in .h5ad format> -g <groupby> -n <number of genes to test for each group> -m <method of testing> --reference <reference group to compare agains> -o <output object in .h5ad format> --output-text-file <table of top tested candidate marker genes in csv format> -P <plot of candidate gene expression across groups> --show-n-genes <number of genes to plot>
```


