Metadata-Version: 1.1
Name: text_preprocessing
Version: 0.0.8
Summary: A python package for text preprocessing task in natural language processing
Home-page: https://github.com/berknology/text-preprocessing
Author: He Hao
Author-email: berknology@gmail.com
License: BSD license
Description: ==================================================
        Text preprocessing for Natural Language Processing
        ==================================================
        
        A python package for text preprocessing task in natural language processing.
        
        Usage
        -----
        To use this text preprocessing package, first install it using pip:
        
        .. code-block:: python
        
            pip install text-preprocessing
        
        
        Then, import the package in your python script and call appropriate functions:
        
        .. code-block:: python
        
            from text_preprocessing import preprocess_text
            from text_preprocessing import to_lower, remove_email, remove_url, remove_punctuation, lemmatize_word
        
            # Preprocess text using default preprocess functions in the pipeline
            text_to_process = 'Helllo, I am John Doe!!! My email is john.doe@email.com. Visit our website www.johndoe.com'
            preprocessed_text = preprocess_text(text_to_process)
            print(preprocessed_text)
            # output: hello email visit website
        
            # Preprocess text using custom preprocess functions in the pipeline
            preprocess_functions = [to_lower, remove_email, remove_url, remove_punctuation, lemmatize_word]
            preprocessed_text = preprocess_text(text_to_process, preprocess_functions)
            print(preprocessed_text)
            # output: helllo i am john doe my email is visit our website
        
        
        Features
        --------
        
        .. csv-table::
           :header: "Feature", "Function"
           :widths: 50, 35
        
            "convert to lower case", "to_lower"
            "convert to upper case", "to_upper"
            "keep only alphabetic and numerical characters", "keep_alpha_numeric"
            "check and correct spellings", "check_spelling"
            "expand contractions", "expand_contraction"
            "remove URLs", "remove_url"
            "remove names", "remove_name"
            "remove emails", "remove_email"
            "remove phone numbers", "remove_phone_number"
            "remove SSNs", "remove_ssn"
            "remove credit card numbers", "remove_credit_card_number"
            "remove numbers", "remove_number"
            "remove bullets and numbering", "remove_itemized_bullet_and_numbering"
            "remove special characters", "remove_special_character"
            "remove punctuations", "remove_punctuation"
            "remove extra whitespace", "remove_whitespace"
            "normalize unicode (e.g., café -> cafe)", "normalize_unicode"
            "remove stop words", "remove_stopword"
            "tokenize words", "tokenize_word"
            "tokenize sentences", "tokenize_sentence"
            "substitute custom words (e.g., vs -> versus)", "substitute_token"
            "stem words", "stem_word"
            "lemmatize words", "lemmatize_word"
            "preprocess text through a sequence of preprocessing functions", "preprocess_text"
Keywords: NLP
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
