Metadata-Version: 2.1
Name: PySurv
Version: 0.3.0
Summary: A Python tool for generating and plotting Kaplan-Meier survival curves.
Home-page: https://github.com/pushkarmutha/PySurv
Author: Pushkar Mutha
Author-email: pushkar.mutha@outlook.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: lifelines
Requires-Dist: matplotlib

# PySurv

PySurv is a Python package for generating and plotting Kaplan-Meier survival curves. It is designed to be simple and intuitive and is inspired from [MatSurv](https://github.com/aebergl/MatSurv).

![KM Curve Example](images/KMCurve.svg)
![Summary Example](images/Summary.png)

## Features

- Generate Kaplan-Meier (KM) survival curves
- Calculate hazard ratios (HR) with 95% confidence intervals using Cox PH method
- Option to show or hide confidence intervals (CI) for KM curves
- Flexible plotting with customizable styles, colors, and labels
- Generate synthetic time-to-event data by specifying hazard ratio and censoring rate.

## Installation

You can install PySurv via PyPI:
```
pip install pysurv
```
Make sure you have the following dependencies installed:
```
- numpy
- pandas
- lifelines
- matplotlib
```
These will be installed automatically when you install PySurv using pip.

## Usage

Here’s a detailed example of how to use PySurv for generating data, plotting KM curves, and customizing the output:
```
import pysurv

# Generate synthetic data
data = pysurv.generate_time_to_event_data(n_samples=2000, hazard_ratio=0.5, censoring_rate=0.3)

# Plot Kaplan-Meier curve with custom labels and colors
pysurv.plot_km_curve(
    data, 
    time_col='time', 
    event_col='event', 
    group_col='group', 
    group_labels=('Control Group', 'Treatment Group'),
    title="Survival Analysis of Treatment vs Control",
    y_label="Survival Probability (%)",
    x_label="Time (months)",
    colors=['blue', 'orange'], 
    show_ci=True,
    survival_time_points=[12, 24],  # Show % survival at 12 and 24 months
    return_summary=False,
    savepath='./KMCurve.svg'
)
```

### Output:

- A Kaplan-Meier curve
- Summary statistics, including median survival time and percentage survival at given time point(s)
- Hazard ratio and p-value displayed on the plot

## Function Reference

### plot_km_curve

Plots Kaplan-Meier survival curves and calculates hazard ratios, p-values, and confidence intervals.

#### Parameters:
- data: A Pandas DataFrame containing time-to-event data.
- time_col: Column name for the time data.
- event_col: Column name for the event data (1 for event, 0 for censored).
- group_col: Column name for the binary group data.
- group_labels: Labels for the groups (default: None).
- title: Title for the plot (default: None).
- y_label: Label for the y-axis (default: 'Survival Probability').
- x_label: Label for the x-axis (default: 'Time (months)').
- colors: List of colors for the groups (default: ['r', 'b']). If more than two groups, please manually provide a list of colors.
- line_styles: Line styles for the groups (default: ’-‘).
- fontsize: Font size for text on KM curve including title, axis labels, and risk tables (default: 18).
- linewidth: Line width of KM curves (default: 2.5).
- show_ci: Whether to show confidence intervals on the KM curves (default: False).
- show_inverted_hr: Whether to show inverted hazard ratio (default: False).
- survival_time_points (list): One or more time point(s) at which to estimate percentage survival (default: None).
- return_summary: Whether to return a summary of survival and hazard ratio statistics (default: False).
- savepath: Complete path (including filename and extension) to save the KM curve plot (default: None).
- **kwargs: Additional matplotlib arguments to pass for plotting KM curves.

#### \*Note:
- PySurv uses a univariable Cox regression model on the 'group_col' variable to compute hazard ratios and p-values. 
- MatSurv deafaults to the 'Mantel-Haenszel' (MH) and 'log rank' method. 
- Under the proportional hazards assumption, Cox regression is equivalent, if not better, to the MH/log-rank method.
- For more information, see [this](https://www.fharrell.com/post/logrank/) and [this](https://discourse.datamethods.org/t/when-is-log-rank-preferred-over-univariable-cox-regression/2344). 

## ToDo
- Clean up and organize the code
- Support m-ary cases
- Color risk tables below the KM curves

If you’d like to contribute to PySurv, here’s how you can help:

1. Fork the repository.
2. Create a new branch (git checkout -b feature/your-feature).
3. Make your changes and commit them (git commit -am 'Add some feature').
4. Push to the branch (git push origin feature/your-feature).
5. Create a pull request.

Please make sure to update tests as appropriate and follow the style guide used in the project.

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Disclaimer
This code is still in the initial stages and may contain bugs. I make no guarantees for the accuracy of this code. By using this code, you agree to take full responsibility for any issues or inaccuracies that may arise.
