Metadata-Version: 2.1
Name: litelearn
Version: 0.3.0
Summary: a python library for quickly building and evaluating models
Author: Aviad Rozenhek
Author-email: aviadr1@gmail.com
Requires-Python: >=3.8,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: catboost[widget] (>=1.2,<2.0)
Requires-Dist: llvmlite (>=0.39.1,<0.40.0)
Requires-Dist: numba (>=0.56,<0.57)
Requires-Dist: pandas (>=1.3.5,<2.0.0)
Requires-Dist: scikit-learn (>=1.0.2,<2.0.0)
Requires-Dist: seaborn (>=0.11.2,<0.12.0)
Requires-Dist: shap (>=0.41.0,<0.42.0)
Description-Content-Type: text/markdown

# litelearn

a python library for building models without fussing
over the nitty gritty details for data munging


## installation
`pip install litelearn`

## usage

once you have a `pandas` dataframe you can create a model 
for your dataset in 3 lines of code:

### Regression
```python
# load some dataset
import seaborn as sns

dataset = "penguins"
target = "body_mass_g"
df = sns.load_dataset(dataset).dropna(subset=[target])

# just 3 lines of code to create and evaluate a model
import litelearn as ll

model = ll.regress_df(df, target)
model.display_evaluation() 
```

### Classification
```python
# load some dataset
import seaborn as sns

dataset = "penguins"
target = "species"
df = sns.load_dataset(dataset).dropna(subset=[target])

# just 3 lines of code to create and evaluate a model
import litelearn as ll

model = ll.classify_df(df, target)
model.display_evaluation()
```

### Prediction
prediction is easy too, it will work on any data that resembles the training data.
dtypes don't have to match, and you can even have extra columns in your prediction data.
missing values and unknown categories will be imputed with the training data's values.

```python
df = ...  # load some dataframe
split = int(len(df) * 0.8)
train_df, val_df = df[:split], df[split:] 
model = ...  # build some model
pred = model.predict(val)  # predict on unseen data
```

## features
+ does all the data munging for you, including missing data, categorical data handling
+ uses the robust [catboost](https://catboost.ai/) library for gradient boosting, which is known for generating
  high quality models with little tuning
+ supports [shap](https://github.com/slundberg/shap) for explainability.
  call `model.display_shap()` or `model.get_shap()` to get the shap values for your model
+ supports sklearn's [permutation importance](https://scikit-learn.org/stable/modules/permutation_importance.html)
  call `model.display_permutation_importance()` or `model.get_permutation_importance()` 
  to get feature importances that are biased towards the model's performance on test data.
+ supports easy pickling: to save your model simply call `model.save("path/to/model.pkl")`
  and to load your model call `model.load("path/to/model.pkl")`
+ for regression models, you can call `model.display_residuals()` to see the residuals of your model
+ it also supports segmeents for your data using the `model.set_segments()` method.
  this will create a new column in your dataframe called `segment` which you can use to
  group your data. this is useful for seeing how your model performs on different segments of your data.
 
