Metadata-Version: 2.1
Name: outlier-python-souravdlboy
Version: 1.0
Summary: Python package for Outlier Removal Algorithm using z_score or iqr.
Home-page: https://github.com/souravs17031999/outlier-python
Author: sourav kumar
Author-email: sauravkumarsct@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.2
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.0
Description-Content-Type: text/markdown

# outlier-python

# Package Description :
Python package for Outlier Removal Algorithm using z_score or iqr.   
# Motivation :   
This is a part of project - II made for UCS633 - Data analytics and visualization at TIET.     
@Author : Sourav Kumar    
@Roll no. : 101883068    
# Algorithm :       
* Z-SCORE : If the population mean and population standard deviation are known, the standard score of a raw score x is calculated as:     
z = (x - mean) / std.          
mean : is the mean of the sample.     
std : is the standard deviation of the sample.    

* Interquartile range : interquartile range (IQR), also called the midspread, middle 50%, or Hâ€‘spread, is a measure of statistical dispersion, being equal to the difference between 75th and 25th percentiles, or between upper and lower quartiles.     
IQR = Q3 âˆ’  Q1       
The IQR of a set of values is calculated as the difference between the upper and lower quartiles, Q3 and Q1. Each quartile is a median calculated as follows :     
Given an even 2n or odd 2n+1 number of values.      
first quartile Q1 = median of the n smallest values          
third quartile Q3 = median of the n largest values       
The second quartile Q2 is the same as the ordinary median.        

### Getting started Locally :  
> Run On Terminal       
```python -m outlier.outlier inputFilePath outputFilePath z_score```     
or
```python -m outlier.outlier inputFilePath outputFilePath iqr```       
ex. python -m outlier outlier C:/Users/DELL/Desktop/train.csv C:/Users/DELL/Desktop/output.csv z_score     

> Run In IDLE   
```from outlier import outlier```   
```o = outlier.outlier(inputFilePath, outputFilePath)```     
```o.outlier_main('z_score')```
or    
```o.outlier_main('iqr')```     

> Run on Jupyter   
Open terminal (cmd)   
```jupyter notebook```   
Create a new python3 file.     
```from outlier import outlier```   
```o = outlier.outlier(inputFilePath, outputFilePath)```
```o.outlier_main('z_score')```
or    
```o.outlier_main('iqr')```       

* NOTE : ```outlier_main()``` doesn't necessarily require any ```method``` argument , if no argument is provided, it uses ```z_score``` by default as the algorithm for removal of outliers from the dataset.    
* The algorithm only reports missing data containing columns and not handles them, it assumes that it has been handled already.   
Also in case of z-score method, it will not affect much, but it may be possible to give wrong output in case of IQR if missing values are found.    
### OUTPUT :
Removes all the valid rows contaning outlier values from the dataset and prints the number of rows removed along with the columns which were considered for the algorithm.    
Also , the final dataframe will be written to the output file path you provided.

![output result on jupyter](/test_images/t.JPG)
![output result on idle](/test_images/t1.JPG)
![output result on cmd](/test_images/t2.JPG) 

# TESTING : 
* The package has been extensively tested on various datasets consisting varied types of expected and unexpected input data and any preprocessing , if required has been taken care of.



