Metadata-Version: 2.1
Name: hypothesize
Version: 0.1.dev20
Summary: A Python package for comparing groups and measuring associations using robust statistics.
Home-page: https://github.com/Alcampopiano/hypothesize
Author: Allan Campopiano
Author-email: campopianoa@hcdsb.org
License: BSD 3-clause
Platform: UNKNOWN
Classifier: Development Status :: 1 - Planning
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: BSD License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: scipy
Requires-Dist: pandas
Requires-Dist: numpy
Requires-Dist: more-itertools

# Hypothesize
[![PyPI version](https://img.shields.io/pypi/v/hypothesize?style=flat-square)](https://pypi.org/project/hypothesize/)
[![PyPI - Downloads](https://img.shields.io/pypi/dw/hypothesize?style=flat-square)](https://pypistats.org/packages/hypothesize)
[![license](https://img.shields.io/pypi/l/hypothesize?style=flat-square)](https://github.com/Alcampopiano/hypothesize/blob/master/LICENSE)
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Alcampopiano/hypothesize/blob/master/examples/hypothesize_notebook_for_colab.ipynb)

A Python package for comparing groups and measuring associations using robust statistics.

This package is a port of Rand R. Wilcox's R library [WRS](https://dornsife.usc.edu/labs/rwilcox/software/). The functions in this repository, as well as the issues of robustness, are described in his book "[Introduction to Robust Estimation and Hypothesis Testing](https://play.google.com/store/books/details?id=8f8nBb4__EYC&gl=ca&hl=en-CA&source=productsearch&utm_source=HA_Desktop_US&utm_medium=SEM&utm_campaign=PLA&pcampaignid=MKTAD0930BO1&gclid=CjwKCAiA44LzBRB-EiwA-jJipJzyqx9kwNMq5MMU7fG2RrwBK9F7sirX4pfhS8wO7k9Uz_Sqf2P28BoCYzcQAvD_BwE&gclsrc=aw.ds)".

Please visit the [Hypothesize documentation site](https://Alcampopiano.github.io/hypothesize/).

### :warning: This repository is still in the early stages of development

## Installation
The Hypothesize package is available on [PyPI](https://pypi.org/project/hypothesize/). To install it, simply type:

```python
pip install hypothesize

```

## Expand the following topics to see examples

<details>
<summary><strong> <font size="3">How to compare two groups </font></strong></summary>

#### Load data from a CSV or create some example data

```python
from hypothesize.utilities import create_example_data

df=create_example_data(design_values=2)

df.head()
```


|    |   cell_1 |   cell_2 |
|---:|----------:|----------:|
|  0 | 0.0446518 |  0.90675  |
|  1 | 0.763458  |  0.291555 |
|  2 | 0.71039   |  0.59828  |
|  3 | 0.175208  |  0.268073 |
|  4 | 0.957819  |  0.222688 |


#### Import the desired function and pass in the data for each group
- This example uses the bootstrapped-t method with 20% trimmed means
- The output is a dictionary containing the results (95% confidence interval, p_value, test statistics, etc...)


```python
from hypothesize.compare_groups_with_single_factor import yuenbt

results=yuenbt(df.cell_1, df.cell_2)

print(results['ci'])
```

<p>

[-0.3115715617702292, 0.10636703554225341]

<p>

</details>

<details>
 <summary> <strong> <font size="3">How to compare groups in a factorial design</font></strong></summary>

#### Load data from a CSV or create some example data

```python
from hypothesize.utilities import create_example_data

df=create_example_data(design_values=[2,3])

df.head() 
```

|    |   cell_1_1 |   cell_1_2 |   cell_1_3 |   cell_2_1 |   cell_2_2 |   cell_2_3 |
|---:|-----------:|-----------:|-----------:|-----------:|-----------:|-----------:|
|  0 |  0.0446518 |   0.90675  |   0.795696 |  0.519486  |   0.333636 |  0.232153  |
|  1 |  0.763458  |   0.291555 |   0.84158  |  0.0339891 |   0.511235 |  0.732503  |
|  2 |  0.71039   |   0.59828  |   0.110407 |  0.898072  |   0.769496 |  0.0484005 |
|  3 |  0.175208  |   0.268073 |   0.888728 |  0.287442  |   0.100153 |  0.210394  |
|  4 |  0.957819  |   0.222688 |   0.834161 |  0.599158  |   0.655308 |  0.203486  |

#### Import the desired function and pass in the data
- This example uses a 2-by-3 design
- One approach is to use a set of linear contrasts that will test all main effects and interactions
- Then, the bootstrap-t method and the 20% trimmed mean can be used
- The results are a dictionary of DataFrames that contain various statistics for each factor and the interactions

```python
from hypothesize.compare_groups_with_two_factors import bwmcp

results=bwmcp(J=2, K=3, x=df)
```
<p>

```python
results['factor_A']
```
<p>    

|    |   con_num |    psihat |       se |     test |   crit_value |   p_value |
|---:|----------:|----------:|---------:|---------:|-------------:|----------:|
|  0 |         0 | 0.0393584 | 0.169849 | 0.231726 |      3.35959 |  0.941569 |   


<p>

```python
results['factor_B']
```
<p>    

|    |   con_num |     psihat |       se |       test |   crit_value |   p_value |
|---:|----------:|-----------:|---------:|-----------:|-------------:|----------:|
|  0 |         0 | -0.104506  | 0.126135 | -0.828529  |       2.4329 |  0.452421 |
|  1 |         1 | -0.0931364 | 0.151841 | -0.613382  |       2.4329 |  0.552588 |
|  2 |         2 |  0.01137   | 0.135392 |  0.0839783 |       2.4329 |  0.923205 |

<p>

```python
results['factor_AB']
```
<p>    

|    |   con_num |     psihat |       se |      test |   crit_value |   p_value |
|---:|----------:|-----------:|---------:|----------:|-------------:|----------:|
|  0 |         0 | -0.100698  | 0.126135 | -0.798336 |       2.3771 |  0.410684 |
|  1 |         1 | -0.037972  | 0.151841 | -0.250078 |       2.3771 |  0.804674 |
|  2 |         2 |  0.0627261 | 0.135392 |  0.463291 |       2.3771 |  0.659432 |



</details>

<details>
 <summary> <strong> <font size="3">How to compute a robust correlation</font></strong></summary>

#### Load data from a CSV or create some example data

```python
from hypothesize.utilities import create_example_data

df=create_example_data(design_values=2)

df.head() 
```
|    |   cell_1 |   cell_2 |
|---:|----------:|----------:|
|  0 | 0.0446518 |  0.90675  |
|  1 | 0.763458  |  0.291555 |
|  2 | 0.71039   |  0.59828  |
|  3 | 0.175208  |  0.268073 |
|  4 | 0.957819  |  0.222688 |


#### Import the desired function and pass in the data for each group
- One approach is to winsorize the x and y data
- A heteroscedastic method for testing zero correlation is also provided in this package but not shown here 
 - Please see the function `corb` which uses the percentile bootstrap to compute a 1-alpha CI and p_value for any correlation   
- The output is a dictionary containing various statistics (the winsorized correlation, winsorized covariance, etc...)

```python
from hypothesize.measuring_associations import wincor

results=wincor(df.cell_1, df.cell_2)

print(results['wcor'])
```

<p>

-0.05690314435050796

</details>

<p>

