Metadata-Version: 2.1
Name: cbrkit
Version: 0.3.1
Summary: Customizable Case-Based Reasoning (CBR) toolkit for Python with a built-in API and CLI.
Home-page: https://wi2trier.github.io/cbrkit/
License: MIT
Keywords: cbr,case-based reasoning,api,similarity,nlp,retrieval,cli,tool,library
Author: Mirko Lenz
Author-email: mirko@mirkolenz.com
Requires-Python: >=3.11,<3.13
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Framework :: Pytest
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Utilities
Classifier: Typing :: Typed
Provides-Extra: all
Provides-Extra: api
Provides-Extra: cli
Provides-Extra: nlp
Provides-Extra: transformers
Requires-Dist: fastapi[all] (>=0.100,<1.0) ; extra == "all" or extra == "api"
Requires-Dist: levenshtein (>=0.23,<1.0) ; extra == "all" or extra == "nlp"
Requires-Dist: nltk (>=3.8,<4.0) ; extra == "all" or extra == "nlp"
Requires-Dist: openai (>=1.5,<2.0) ; extra == "all" or extra == "nlp"
Requires-Dist: orjson (>=3.9,<4.0)
Requires-Dist: pandas (>=2.1,<3.0)
Requires-Dist: pyarrow (>=13.0)
Requires-Dist: pyyaml (>=6.0,<7.0)
Requires-Dist: sentence-transformers (>=2.2,<3.0) ; extra == "all" or extra == "transformers"
Requires-Dist: spacy (>=3.7,<4.0) ; extra == "all" or extra == "all" or extra == "nlp"
Requires-Dist: torch (>=2.1.1,<3.0.0) ; extra == "all" or extra == "transformers"
Requires-Dist: transformers (>=4.35,<5.0) ; extra == "all" or extra == "transformers"
Requires-Dist: typer[all] (>=0.9,<0.10) ; extra == "all" or extra == "cli"
Requires-Dist: uvicorn[standard] (>=0.24,<1.0) ; extra == "all" or extra == "api"
Requires-Dist: xmltodict (>=0.13,<0.14)
Project-URL: Repository, https://github.com/wi2trier/cbrkit
Description-Content-Type: text/markdown

<!-- markdownlint-disable MD033 MD041 -->
<h2><p align="center">CBRkit</p></h2>
<p align="center">
  <img width="256px" alt="cbrkit logo" src="https://raw.githubusercontent.com/wi2trier/cbrkit/main/assets/logo.png" />
</p>
<p align="center">
  <a href="https://pypi.org/project/cbrkit/">PyPI</a> |
  <a href="https://wi2trier.github.io/cbrkit/">Docs</a> |
  <a href="https://github.com/wi2trier/cbrkit/tree/main/tests/test_retrieve.py">Example</a>
</p>
<p align="center">
  Customizable Case-Based Reasoning (CBR) toolkit for Python with a built-in API and CLI.
</p>

---

# CBRkit

## Installation

The library is available on [PyPI](https://pypi.org/project/cbrkit/), so you can install it with `pip`:

```shell
pip install cbrkit
```

It comes with several optional dependencies for certain tasks like NLP which can be installed with:

```shell
pip install cbrkit[EXTRA_NAME,...]
```

where `EXTRA_NAME` is one of the following:

- `nlp`: Standalone NLP tools `levenshtein`, `nltk`, `openai`, and `spacy`
- `transformers`: NLP tools based on `pytorch` and `transformers`
- `cli`: Command Line Interface (CLI)
- `api`: REST API Server
- `all`: All of the above

## Usage

CBRkit allows the definition of similarity metrics through _composition_.
This means that you can easily build even complex similarities by mixing built-in and/or custom measures.
CBRkit also includes predefined aggregation functions.
To get started, we provide a [demo project](https://github.com/wi2trier/cbrkit-demo) that shows how to use the library in a real-world scenario.
The following modules are part of CBRkit:

- `loaders`: Functions for loading cases and queries.
- `sim`: Similarity generator functions for various data types (e.g., strings, numbers).
- `global_sim`: Similarity generator functions for aggregating the above ones.
- `retrieval`: Functions for retrieving cases based on a query.
- `typing`: Generic type definitions for defining custom functions.

CBRkit is fully typed, so IDEs like VSCode and PyCharm can provide autocompletion and type checking.
We will explain all modules and their basic usage in the following sections.

### Loading Cases

The first step is to load cases and queries.
We provide predefined functions for the most common formats like CSV, JSON, and XML.
Additionally, `cbrkit` also integrates with `pandas` for loading data frames.
The following example shows how to load cases and queries from a CSV file using `pandas`:

```python
import pandas as pd
import cbrkit

df = pd.read_csv("path/to/cases.csv")
cases = cbrkit.loaders.dataframe(df)
```

Queries can either be loaded using the same loader functions or constructed manually.

```python
queries = cbrkit.loaders.dataframe(pd.read_csv("path/to/queries.csv"))
```

