Metadata-Version: 2.1
Name: tm-eval
Version: 0.0.1a0
Summary: A toolkit to quickly evaluate topic model goodness over number of topics
Home-page: https://github.com/dhchenx/tm-eval
Author: Donghua Chen
Author-email: douglaschan@126.com
License: MIT
Project-URL: Bug Reports, https://github.com/dhchenx/tm-eval/issues
Project-URL: Source, https://github.com/dhchenx/tm-eval
Keywords: topic modeling,metrics
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Build Tools
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3 :: Only
Requires-Python: >=3.6, <4
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: gensim
Provides-Extra: dev
Requires-Dist: check-manifest ; extra == 'dev'
Provides-Extra: test
Requires-Dist: coverage ; extra == 'test'

## Topic Modeling Evaluation
A toolkit to quickly evaluate model goodness over number of topics

### Metrics
Coherence measure to be used. 

- Fastest method - 'u_mass', 'c_uci' also known as `c_pmi`. 

- For 'u_mass' corpus should be provided, if texts is provided, it will be converted to corpus using the dictionary. 

- For 'c_v', 'c_uci' and 'c_npmi' `texts` should be provided (`corpus` isn't needed)

### Examples

Example 1: estimate metrics for one topic model with specific number of topics
```python
from tm_eval import *
# load a dictionary with document key and its term list split by ','.
input_file = "datasets/covid19_symptoms.pickle"
output_folder = "outputs"
model_name = "symptom"
num_topics = 10
# run
results = evaluate_all_metrics_from_lda_model(input_file=input_file, 
                                              output_folder=output_folder,
                                              model_name=model_name, 
                                              num_topics=num_topics)
print(results)
```
Example 2: find model goodness change over number of topics
```python
from tm_eval import *

if __name__=="__main__":
    # start configure
    # load a dictionary with document id as key and its term list split by ',' as value.
    input_file = "datasets/covid19_symptoms.pickle"
    output_folder = "outputs"
    model_name = "symptom"
    start=2
    end=20
    # end configure
    # run and explore
    list_results = explore_topic_model_metrics(input_file=input_file,
                                               output_folder=output_folder,
                                               model_name=model_name,
                                               start=start,
                                               end=end)
    # summarize results
    show_topic_model_metric_change(list_results,save=True,save_path=f"{output_folder}/metrics.csv")
```

### License

The `tm-eval` toolkit is provided by [Donghua Chen](https://github.com/dhchenx) with MIT License.


