Metadata-Version: 1.1
Name: textcleaner
Version: 0.4.9
Summary: textcleaner: text-data pre-processing library
Home-page: https://github.com/YugantM
Author: Yugant Hadiyal
Author-email: hadiyalyugant@gmail.com
License: MIT
Download-URL: https://github.com/YugantM/textcleaner/archive/V0.1.tar.gz
Description-Content-Type: UNKNOWN
Description: textcleaner V0.4.9
        ==================
        
        Text-Cleaner is a utility library for text-data pre-processing. Use it
        before passing the text data to a model.
        
        `website <https://yugantm.github.io/textcleaner/>`__ - for more
        
        Features!
        =========
        
        -  main\_cleaner to do all the below in one call ! or
        -  remove unnecessary blank lines
        -  stip out a perticular character or default one
        -  transfer all characters to lowercase if needed
        -  remove numbers, symblos and stop-words from the whole text
        -  tokenize the text-data on one call
        -  stemming & lemmatization powered by NLTK
        
            The goal is to make basic cleaning of data hassle free. Most of the
            developers who are working with text data have faced this situation
            where data is not consumable and they end up wasting their time on
            these issues rather than fine tunning the model and get better
            accuracy. In that scenario this library can be useful and save you a
            tone of time.
        
        Tech
        ~~~~
        
        textcleaner uses a number of open source projects to work properly:
        
        -  `NLTK <https://www.nltk.org/>`__ - for advanced cleaning
        -  `REGEX <https://pypi.org/project/regex/>`__ - for regular expression
        
        And of course textcleaner itself is open source with a `public
        repository <https://github.com/YugantM/textcleaner>`__ on GitHub.
        
        Installation
        ~~~~~~~~~~~~
        
        textcleaner requires `Python 3.x <https://www.python.org/downloads/>`__
        to run.
        
        Install the dependencies if you have not already installed it!
        
        -  NLTK : steps to install
           [`documentation <https://www.nltk.org/install.html>`__\ ]
        -  REGEX :
        
        .. code:: sh
        
            pip install regex
        
        -  textcleaner :
        
        .. code:: sh
        
            pip install textcleaner
        
        or
        
        .. code:: sh
        
            pip install textcleaner==0.4.9
        
        Usage
        ~~~~~
        
        .. code:: python
        
            import textcleaner as tc
            tc.main_cleaner('<FILE_NAME>')
            #or
            tc.document('<FILE_NAME>')
        
        Above command will convert the text file into list of words with
        cleaning. Default response of the function is list of list use *op*
        argument and set it to 'words' and you will get a flat list of words.
        
        Todos
        ~~~~~
        
        -  more advanced features
        -  ability to read more formats rather than only .txt
        
        License
        -------
        
        MIT
        
        **Free Software, Hell Yeah!**
        
Keywords: MEANINGFULL,NLP,DATASCIENCE,DATACLEANING,PREPROCESSING
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Build Tools
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
