Metadata-Version: 2.0
Name: tmtoolkit
Version: 0.1.2
Summary: Text Mining and Topic Modeling Toolkit
Home-page: https://github.com/WZBSocialScienceCenter/tmtoolkit
Author: Markus Konrad
Author-email: markus.konrad@wzb.eu
License: Apache 2.0
Description-Content-Type: UNKNOWN
Keywords: textmining textanalysis text mining analysis preprocessing topicmodeling topic modeling evaluation
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Utilities
Requires-Python: >=2.7
Requires-Dist: six
Requires-Dist: numpy
Requires-Dist: scipy
Requires-Dist: pandas
Requires-Dist: nltk
Requires-Dist: pyphen
Provides-Extra: excel_export
Requires-Dist: openpyxl; extra == 'excel_export'
Provides-Extra: improved_german_lemmatization
Requires-Dist: pattern; extra == 'improved_german_lemmatization'
Provides-Extra: plotting
Requires-Dist: matplotlib; extra == 'plotting'
Provides-Extra: topic_modeling_eval_griffiths_2004
Requires-Dist: gmpy2; extra == 'topic_modeling_eval_griffiths_2004'
Provides-Extra: topic_modeling_gensim
Requires-Dist: gensim; extra == 'topic_modeling_gensim'
Provides-Extra: topic_modeling_lda
Requires-Dist: lda; extra == 'topic_modeling_lda'
Provides-Extra: topic_modeling_sklearn
Requires-Dist: scikit-learn; extra == 'topic_modeling_sklearn'

tmtoolkit is a set of tools for text mining and topic modeling with Python. It contains
functions for text preprocessing like lemmatization, stemming or POS tagging especially for English and German
texts. Preprocessing is done in parallel by using all available processors on your machine. The topic modeling
features include topic model evaluation metrics, allowing to calculate models with different parameters in parallel
and comparing them (e.g. in order to find the best number of topics for a given set of documents). Topic models can
be generated in parallel for different copora and/or parameter sets using the LDA implementations either from
lda, scikit-learn or gensim.

