Metadata-Version: 2.1
Name: text-preprocessing
Version: 0.1.1
Summary: A python package for text preprocessing task in natural language processing
Home-page: https://github.com/berknology/text-preprocessing
Author: He Hao
Author-email: berknology@gmail.com
License: BSD license
Keywords: NLP
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
License-File: LICENSE
Requires-Dist: nltk
Requires-Dist: pyspellchecker
Requires-Dist: contractions
Requires-Dist: names-dataset (==2.1)
Requires-Dist: unittest-xml-reporting

==================================================
Text preprocessing for Natural Language Processing
==================================================

A python package for text preprocessing task in natural language processing.

Usage
-----
To use this text preprocessing package, first install it using pip:

.. code-block:: python

    pip install text-preprocessing


Then, import the package in your python script and call appropriate functions:

.. code-block:: python

    from text_preprocessing import preprocess_text
    from text_preprocessing import to_lower, remove_email, remove_url, remove_punctuation, lemmatize_word

    # Preprocess text using default preprocess functions in the pipeline
    text_to_process = 'Helllo, I am John Doe!!! My email is john.doe@email.com. Visit our website www.johndoe.com'
    preprocessed_text = preprocess_text(text_to_process)
    print(preprocessed_text)
    # output: hello email visit website

    # Preprocess text using custom preprocess functions in the pipeline
    preprocess_functions = [to_lower, remove_email, remove_url, remove_punctuation, lemmatize_word]
    preprocessed_text = preprocess_text(text_to_process, preprocess_functions)
    print(preprocessed_text)
    # output: helllo i am john doe my email is visit our website


Features
--------

.. csv-table::
   :header: "Feature", "Function"
   :widths: 50, 35

    "convert to lower case", "to_lower"
    "convert to upper case", "to_upper"
    "keep only alphabetic and numerical characters", "keep_alpha_numeric"
    "check and correct spellings", "check_spelling"
    "expand contractions", "expand_contraction"
    "remove URLs", "remove_url"
    "remove names", "remove_name"
    "remove emails", "remove_email"
    "remove phone numbers", "remove_phone_number"
    "remove SSNs", "remove_ssn"
    "remove credit card numbers", "remove_credit_card_number"
    "remove numbers", "remove_number"
    "remove bullets and numbering", "remove_itemized_bullet_and_numbering"
    "remove special characters", "remove_special_character"
    "remove punctuations", "remove_punctuation"
    "remove extra whitespace", "remove_whitespace"
    "normalize unicode (e.g., café -> cafe)", "normalize_unicode"
    "remove stop words", "remove_stopword"
    "tokenize words", "tokenize_word"
    "tokenize sentences", "tokenize_sentence"
    "substitute custom words (e.g., vs -> versus)", "substitute_token"
    "stem words", "stem_word"
    "lemmatize words", "lemmatize_word"
    "preprocess text through a sequence of preprocessing functions", "preprocess_text"
