Metadata-Version: 2.1
Name: topic-benchmark
Version: 0.1.4
Summary: Benchmarking topic models for a paper
License: MIT
Author: Márton Kardos
Author-email: power.up1163@gmail.com
Requires-Python: >=3.9,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: catalogue (>=2.0.0,<3.0.0)
Requires-Dist: gensim (>=4.3.2,<5.0.0)
Requires-Dist: hdbscan (>=0.8.0,<0.9.0)
Requires-Dist: numpy (>=1.23.0,<2.0.0)
Requires-Dist: pyro-ppl (>=1.8.0,<2.0.0)
Requires-Dist: radicli (>=0.0.25,<0.0.26)
Requires-Dist: rich (>=13.6.0,<14.0.0)
Requires-Dist: scikit-learn (>=1.2.0,<2.0.0)
Requires-Dist: scipy (>=1.10.0,<2.0.0)
Requires-Dist: sentence-transformers (>=2.2.0,<3.0.0)
Requires-Dist: torch (>=2.1.0,<3.0.0)
Requires-Dist: turftopic (>=0.2.9,<0.3.0)
Requires-Dist: umap-learn (>=0.5.0,<0.6.0)
Description-Content-Type: text/markdown

# topic-benchmark
Just Benchmarking Topic Models :)

## Todo:

 - [ ] Run benchmark with these models and upload the results:
   - [ ] all-MiniLM-L6-v2 ⌛
   - [ ] all-mpnet-base-v2 ⌛
   - [ ] sentence-transformers/average_word_embeddings_glove.6B.300d ⌛
   - [ ] intfloat/e5-large-v2 (OR intfloat/multilingual-e5-large-instruct, to my knowledge, they are the same size, but this one performs way better on MTEB)
 - [ ] Implement pretty printing and formatting to Latex and MD tables for results.
 - [ ] _(Maybe)_ Implement speed tracking.

## Usage:

```bash
pip install topic-benchmark

python3 -m topic_benchmark run -e "embedding_model_name"
```

