Metadata-Version: 2.1
Name: randomlib
Version: 4.5
Summary: An NLP Library for Marathi Language
Home-page: https://github.com/l3cube-pune/MarathiNLP.git
Author: Random
Author-email: random.randomlib@gmail.com
License: MIT
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
License-File: LICENSE

# **randomlib**

  

- **randomlib** is a python-based natural language processing library focused on the Indian language **Marathi**. It provides an easy interface for NLP features like sentiment analysis, named entity recognition, hate speech detection, etc. exclusively for Marathi text.

- **randomlib**, the author of this library aims to bring Marathi to the forefront of IndicNLP. Our vision is to make Marathi a resource-rich language and promote AI for Maharashtra!

- [Github Repo](https://github.com/randomlib-pune/MarathiNLP)
- [Demonstration with examples](https://cutt.ly/f1FYQak)
  

## **Features:**

##### **This library is customised to be used by a basic programmer and an ML practitioner.**

***

#### **1. Basic Usage:**

This mode of access is designed from a basic programmer point of view and follow simpler way to perform the desired tasks. It provides the following features:

- **Datasets:** Provides the functionality to load the dataset

- **Autocomplete:** Text prediction

- **Preprocess:** Data cleaning

- **Tokenizer:** Tokenizes text

- **Tagger:** Named entity recognision

- **MaskFill:** Predicts the masked tokens

- **Hate:** Detects hate speech

- **Sentiment:** Sentiment analysis

- **Similarity:** Detects similarity

  
  

#### **2. Advanced Usage:**

This way of accessing the library is designed from an ML Practitioner's point of view and has more flexibility to choose a model for the desired task.

* **MaskFill Model:** Predicts the masked tokens

* **GPT Model:** Text prediction

* **Hate Model:** Detects hate speech

* **NER Model:** Named entity recognision

* **Sentiment Model:** Sentiment analysis

* **Similarity Model:** Detects similarity

  

Some of the mentioned models have sub models within them that can be seen using the **listModels()** function.

  

## **Installation:**

- **pip install randomlib==[version]**
*Eg.: pip install randomlib==0.6*

- or we can simply use:
***pip install randomlib***
  
***

## **Few Examples:**

### **1. Tagger (from basic usage point of view)**

Stepwise execution:

- import
from randomlib.mask_fill import MaskPredictor

- create an object
model = MaskPredictor()

It provides one functionality
* **predict_mask:** Predicts the masked token

- **Example:**
* *pass the string with the word to be predicted replaced with '[MASK]':*
**text = 'मी महाराष्ट्रात [MASK].'**
*English Translation:
'I in Maharashtra [MASK]'*
* **model.predict_mask(text)**

* The output will contain some predictions like:
	* मी महाराष्ट्रात **आहे**.
	* मी महाराष्ट्रात **राहणार**.
	* मी महाराष्ट्रात **नाही**.
	* मी महाराष्ट्रात**च**.
	* मी महाराष्ट्रात **राहतो**.

* There are some optional parameters:
	-   **details**  (minimum, medium, all) in string - Default: minimum
	    -   Used to pass the detailedness to be considered
	-   **as_dict**  (True, False) in boolean - Default: False
	    -   Used to define the print type

* Example:
	- model.predict_mask(text9, 'all', True)
	- Output:
	[{'score': 0.46560075879096985, 'token': 1155, 'token_str': 'आहे', 'sequence': 'मी महाराष्ट्रात आहे.'},
	{'score': 0.07969045639038086, 'token': 92222, 'token_str': 'राहणार', 'sequence': 'मी महाराष्ट्रात राहणार.'},
	{'score': 0.07400081306695938, 'token': 1826, 'token_str': 'नाही', 'sequence': 'मी महाराष्ट्रात नाही.'},
	{'score': 0.050422605127096176, 'token': 1617, 'token_str': '##च', 'sequence': 'मी महाराष्ट्रातच.'},
	{'score': 0.04373728483915329, 'token': 62560, 'token_str': 'राहतो', 'sequence': 'मी महाराष्ट्रात राहतो.'}]

### **2. Sentiment (from advance usage point of view)**

Stepwise execution:

- import
from randomlib.model_repo import SentimentModel

- list the available models
	* modelSentiment.list_models()
	* Output:
		- sentiment models: MarathiSentiment : randomlib-pune/MarathiSentiment
		- tagger models: marathi-ner : randomlib-pune/marathi-ner
		- autocomplete models: marathi-gpt : randomlib-pune/marathi-gpt
		- similarity models: marathi-sentence-similarity-sbert : randomlib-pune/marathi-sentence-similarity-sbert
		marathi-sentence-bert-nli : randomlib-pune/marathi-sentence-bert-nli
		- mask_fill models: marathi-bert-v2 : randomlib-pune/marathi-bert-v2
		marathi-roberta : randomlib-pune/marathi-roberta marathi-albert : randomlib-pune/marathi-albert
		- hate models: mahahate-bert : randomlib-pune/mahahate-bert
		mahahate-multi-roberta : randomlib-pune/mahahate-multi-roberta

The library lists down the models available for all the models. These can be changed by the user.

 **To change the default model:**
Pass the name of the model as the argument:
modelSentiment = SentimentModel('name of model')
Eg.: modelSentiment = SentimentModel('MarathiSentiment')

* Sentiment provides one functionality
	- **get_polarity_score:**  Gives the polarity score of words in a sentence along with the tokens (Neutral, Positive, Negative)
	- Example:
	text = 'दिवाळीच्या सणादरम्यान सगळे आनंदी असतात.'
	*English Translation:
	'Everyone is happy during Diwali festival.'*
	- modelSentiment.get_polarity_score(text)
	- Output:
	label: Positive
	score: 0.995338

***

**Entire working of randomlib is explained in this [demo file](https://cutt.ly/f1FYQak). Please have a look at it to get a more better idea!**

Thank you
Team randomlib

***
