Metadata-Version: 2.1
Name: bonltk
Version: 0.0.1
Summary: BoNLTK aims to provide out of the box support for various NLP tasks that an application developer might need for Boyig (TIbetan) language.
Home-page: https://github.com/Esukhia/bonltk
Author: Tenzin
Author-email: 10zin@esukhia.org
License: Apache Software License 2.0
Description: # Natural Language Toolkit for Bokey (BoNLTK)
        > BoNLTK aims to provide out of the box support for various NLP tasks that an application developer might need for Bokey, Tibetan language.
        
        [![](https://sourcerer.io/fame/10zinten/10zinten/bonltk/images/0)](https://sourcerer.io/fame/10zinten/10zinten/bonltk/links/0)[![](https://sourcerer.io/fame/10zinten/10zinten/bonltk/images/1)](https://sourcerer.io/fame/10zinten/10zinten/bonltk/links/1)[![](https://sourcerer.io/fame/10zinten/10zinten/bonltk/images/2)](https://sourcerer.io/fame/10zinten/10zinten/bonltk/links/2)[![](https://sourcerer.io/fame/10zinten/10zinten/bonltk/images/3)](https://sourcerer.io/fame/10zinten/10zinten/bonltk/links/3)[![](https://sourcerer.io/fame/10zinten/10zinten/bonltk/images/4)](https://sourcerer.io/fame/10zinten/10zinten/bonltk/links/4)[![](https://sourcerer.io/fame/10zinten/10zinten/bonltk/images/5)](https://sourcerer.io/fame/10zinten/10zinten/bonltk/links/5)[![](https://sourcerer.io/fame/10zinten/10zinten/bonltk/images/6)](https://sourcerer.io/fame/10zinten/10zinten/bonltk/links/6)[![](https://sourcerer.io/fame/10zinten/10zinten/bonltk/images/7)](https://sourcerer.io/fame/10zinten/10zinten/bonltk/links/7)
        
        ## Install
        
        `pip install bonltk`
        
        ## How to use
        
        Comming soon
        
        ## Todo:
         - Tokenizers:
            - [ ] Hugging face [tokenizers](https://github.com/huggingface/tokenizers/tree/master/bindings/python)
            - [x] [sentencepiece tokenizer](https://github.com/google/sentencepiece/tree/master/python)
            - [ ] Compare above tokenizers with [botok](https://github.com/esukhia/botok)
         - WordVectors:
            - [x] Word2Vec with [gensim](https://github.com/RaRe-Technologies/gensim)
            - [ ] Emlo
         - Language Models:
            - [ ] Huggingface [transformers](https://github.com/huggingface/transformers)
            - [ ] UMLFit Language model with [fastai](https://forums.fast.ai/t/language-model-zoo-gorilla/14623)
        
        - Text Similarity:
            - [ ] Sentence similarity using UMLFit, like in [inltk](https://github.com/goru001/inltk/blob/e6baa7f03164e977da899548a5c6e42a2a60db77/inltk/inltk.py#L120)
            - [ ] Implement Text similarity techniques mention in [here]((https://medium.com/@adriensieg/text-similarities-da019229c894)
            - [ ] Compare all the text similarity algorithms
        
        ### Resrouce links:
        - [UMLFit for sequence tagging](https://forums.fast.ai/t/ulmfit-for-sequence-tagging/20328)
        - [Text Similarities : Estimate the degree of similarity between two texts](https://medium.com/@adriensieg/text-similarities-da019229c894)
        
Keywords: nlp lstm pytorch fastai language-model
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.6
Description-Content-Type: text/markdown
