Metadata-Version: 1.2
Name: preprocessing
Version: 0.1.13
Summary: pre-processing package for text strings
Home-page: https://github.com/SpotlightData/preprocessing
Author: Mitchell Murphy
Author-email: mwtmurphy@gmail.com
License: MIT
Description-Content-Type: UNKNOWN
Description: .. image:: logo.png
           :align: center
           :alt: Spotlight Data Logo
        
        'preprocessing'
        ===============
        .. image:: https://readthedocs.org/projects/preprocessing/badge/?version=latest
           :target: http://preprocessing.readthedocs.io/en/latest/?badge=latest
           :alt: Documentation Status
        
        Summary
        -------
        
        Text pre-processing package to aid in NLP package development for Python3. With this package you 
        can order text cleaning functions in the order you prefer rather than relying on the order of an 
        arbitrary NLP package.
        
        Installation
        ------------
        
        pip:
        
        .. code-block:: console
        
           pip install preprocessing
        
        PyPI - You can also download the source distribution from:
        
        `https://pypi.python.org/pypi/preprocessing/ 
        <https://pypi.python.org/pypi/preprocessing/>`_
        
        You can then perform:
        
        .. code-block:: console
        
           pip install <path_to_tar_file>
        
        on the tar file, or
        
        .. code-block:: console
           
           python setup.py install
        
        on/inside, respectively, the extracted package to install *preprocessing*.
        
        Example
        -------
        
        Once you have the package installed, implementing it with Python3 takes the following form:
        
        .. code-block:: python
        
           import preprocessing.text as ptext
           from preprocessing.text import keyword_tokenize, remove_unbound_punct, remove_urls
        
           text_string = "important string at: http://example.com"
        
           clean_string = ptext.preprocess_text(text_string, [
               remove_urls,
               remove_unbound_punct,
               keyword_tokenize
           ])
        
        >>> print(clean_string)
        "important string"
        
        Should the functions be performed in a different order (i.e. keyword_tokenize -> remove_urls -> 
        remove_non_bound_punct) :
        
        >>> print(clean_string)
        "important string http example.com"
        
        Organisation
        ------------
        
        This package is comprised of a single module with no intended subpackages currently. The 
        *preprocessing* package is dependent on NLTK for tokenizers and stopwords. However, ignoring this,
        the package only has built-in dependencies from Python 3.
        
        Contributing
        ------------
        
        If you feel like contributing:
        
        * `Check for open issues <https://github.com/SpotlightData/preprocessing/issues>`_ or open a new issue
        * Fork the preprocessing repository to start making your changes
        * Write a test which shows the bug was fixed or that the feature works as expected
        * Send a pull request and remember to add yourself to `CONTRIBUTORS.md <https://github.com/SpotlightData/preprocessing/blob/master/CONTRIBUTORS.md>`_
        
        License
        -------
        
        This project is licensed under the MIT license (see `LICENSE <https://github.com/SpotlightData/preprocessing/blob/master/LICENSE>`_)
Keywords: text pre-processing
Platform: UNKNOWN
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Requires-Python: >=3
