Metadata-Version: 2.1
Name: werpy
Version: 2.1.2
Summary: A powerful yet lightweight Python package to calculate and analyze the Word Error Rate (WER).
Keywords: wer word error rate python python package stt speech-to-text levenshtein distance nlp metrics
Author-Email: Ross Armstrong <ross.armstrong@analyticsinmotion.com>
License: BSD 3-Clause License
        
        Copyright (c) 2023-2024, Analytics in Motion
        
        Redistribution and use in source and binary forms, with or without
        modification, are permitted provided that the following conditions are met:
        
        1. Redistributions of source code must retain the above copyright notice, this
           list of conditions and the following disclaimer.
        
        2. Redistributions in binary form must reproduce the above copyright notice,
           this list of conditions and the following disclaimer in the documentation
           and/or other materials provided with the distribution.
        
        3. Neither the name of the copyright holder nor the names of its
           contributors may be used to endorse or promote products derived from
           this software without specific prior written permission.
        
        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
        AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
        IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
        DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
        FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
        DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
        SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
        CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
        OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
        OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Cython
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: MacOS
Classifier: Operating System :: POSIX :: Linux
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Software Development
Classifier: Topic :: Text Processing
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Mathematics
Project-URL: Repository, https://github.com/analyticsinmotion/werpy
Project-URL: Documentation, https://werpy.readthedocs.io/
Project-URL: Bug tracker, https://github.com/analyticsinmotion/werpy/issues
Requires-Python: >=3.8
Requires-Dist: numpy>=1.21.6; python_version < "3.11"
Requires-Dist: numpy>=1.23.2; python_version >= "3.11"
Requires-Dist: pandas>=1.3.0
Requires-Dist: sphinx==7.2.6; extra == "docs"
Requires-Dist: sphinx-nefertiti==0.3.2; extra == "docs"
Provides-Extra: docs
Description-Content-Type: text/markdown


![werpy-logo-word-error-rate](https://user-images.githubusercontent.com/52817125/235063664-2f21629c-0fad-46b6-a487-c2b5ef6f80e9.png)

<h1 align="center">Word Error Rate for Python
<a href="https://twitter.com/intent/tweet?text=Introducing%20%23werpy%20-%20the%20Python%20package%20for%20fast%20and%20accurate%20Word%20Error%20Rate%20(WER)%20calculation.%20Analyze%20text%20accuracy%2C%20enhance%20%23NLP%20models%2C%20and%20improve%20%23SpeechRecognition%20systems.%20Try%20it%20now%3A%20&url=https://github.com/analyticsinmotion/werpy&via=analyticsmotion&hashtags=PythonPackage,WordErrorRate,WER,NLP">
    <img src="https://img.shields.io/twitter/url/http/shields.io.svg?style=social" alt="Tweet">
  </a>
</h1>

<!-- badges: start -->

| | |
| --- | --- |
| Meta | [![Python Version](https://img.shields.io/badge/python-3.8%7C3.9%7C3.10%7C3.11%7C3.12-blue?logo=python&logoColor=ffdd54)](https://www.python.org/downloads/)&nbsp;&nbsp; [![werpy License](https://img.shields.io/badge/License-BSD_3--Clause-blue.svg)](https://github.com/analyticsinmotion/werpy/blob/main/LICENSE)&nbsp;&nbsp;<!-- ![Maintained](https://img.shields.io/badge/Maintained%3F-yes-green.svg)&nbsp;&nbsp;--> [![Black Code Style](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)&nbsp;&nbsp; [![Analytics in Motion](https://raw.githubusercontent.com/analyticsinmotion/.github/main/assets/images/analytics-in-motion-github-badge-rounded.svg)](https://www.analyticsinmotion.com) |
| Testing | [![CodeQL](https://github.com/analyticsinmotion/werpy/actions/workflows/codeql.yml/badge.svg)](https://github.com/analyticsinmotion/werpy/actions/workflows/codeql.yml)&nbsp;&nbsp; [![Codacy Security Scan](https://github.com/analyticsinmotion/werpy/actions/workflows/codacy.yml/badge.svg)](https://github.com/analyticsinmotion/werpy/actions/workflows/codacy.yml)&nbsp;&nbsp; [![CodeFactor](https://www.codefactor.io/repository/github/analyticsinmotion/werpy/badge)](https://www.codefactor.io/repository/github/analyticsinmotion/werpy)&nbsp;&nbsp; <!--[![Pylint](https://github.com/analyticsinmotion/werpy/actions/workflows/pylint.yml/badge.svg)](https://github.com/analyticsinmotion/werpy/actions/workflows/pylint.yml)&nbsp;&nbsp;--> <!--[![Bandit](https://github.com/analyticsinmotion/werpy/actions/workflows/bandit.yml/badge.svg)](https://github.com/analyticsinmotion/werpy/actions/workflows/bandit.yml)&nbsp;&nbsp;--> [![CircleCI](https://dl.circleci.com/status-badge/img/gh/analyticsinmotion/werpy/tree/main.svg?style=svg)](https://dl.circleci.com/status-badge/redirect/gh/analyticsinmotion/werpy/tree/main)&nbsp;&nbsp; [![codecov](https://codecov.io/gh/analyticsinmotion/werpy/graph/badge.svg?token=GGT823AVM8)](https://codecov.io/gh/analyticsinmotion/werpy) |
| Package | [![Pypi](https://img.shields.io/pypi/v/werpy?label=PyPI&color=blue)](https://pypi.org/project/werpy/)&nbsp;&nbsp; [![PyPI Downloads](https://img.shields.io/pypi/dm/werpy?label=PyPI%20downloads)](https://pypi.org/project/werpy/)&nbsp;&nbsp; [![Downloads](https://static.pepy.tech/badge/werpy)](https://pepy.tech/project/werpy)&nbsp;&nbsp;<!--[![sourcerank](https://img.shields.io/librariesio/sourcerank/pypi/werpy)](https://libraries.io/pypi/werpy)&nbsp;&nbsp;--> [![Documentation Status](https://readthedocs.org/projects/werpy/badge/?version=latest)](https://werpy.readthedocs.io/en/latest/?badge=latest)&nbsp;&nbsp; |


<!-- badges: end -->

## What is werpy?
**werpy** is a powerful yet lightweight Python package that rapidly calculates and analyzes the Word Error Rate (WER) between two sets of text. 
It has been designed with the flexibility to handle multiple input data types such as strings, lists and NumPy arrays.<br />

The package also includes a full set of features such as normalizing the input text to account for data collection variability and the capability to easily assign different weights/penalties to specific error classifications (insertions, deletions, and substitutions).
Additionally, the summary function provides a comprehensive breakdown of the calculated results to assist in analyzing the specific errors quickly and in more detail.
<br />

## Functions available in werpy
The following table provides an overview of the functions that can be used in werpy.

| Function  | Description | 
| ------------- | ------------- |
| normalize(text)  | Preprocess input text to remove punctuation, remove duplicated spaces, leading/trailing blanks and convert all words to lowercase. |
| wer(reference, hypothesis)  | Calculate the overall Word Error Rate for the entire reference and hypothesis texts. |
| wers(reference, hypothesis)  | Calculates a list of the Word Error Rates for each of the reference and hypothesis texts. |
| werp(reference, hypothesis)  | Calculates a weighted Word Error Rate for the entire reference and hypothesis texts. |
| werps(reference, hypothesis)  | Calculates a list of weighted Word Error Rates for each of the reference and hypothesis texts. |
| summary(reference, hypothesis)  | Provides a comprehensive breakdown of the calculated results including the WER, Levenshtein Distance and all the insertion, deletion and substitution errors. |
| summaryp(reference, hypothesis)  | Delivers an in-depth breakdown of the results, covering metrics like WER, Levenshtein Distance, and a detailed account of insertion, deletion, and substitution errors, inclusive of the weighted WER. |


## Installation
You can install the latest **werpy** release with Python's pip package manager:

```python
# Install werpy from PyPi
pip install werpy
```


## Usage
**Import the werpy package**

*Python Code:*
```python
import werpy
```
<br />

**Example 1 - Normalize a list of text**

*Python Code:*
```python
input_data = ["It's very popular in Antarctica.","The Sugar Bear character"]
reference = werpy.normalize(input_data)
print(reference)
```

*Results Output:*
```
['its very popular in antarctica', 'the sugar bear character']
```
<br />

**Example 2 - Calculate the overall Word Error Rate on a set of strings**

*Python Code:*
```python
wer = werpy.wer('i love cold pizza', 'i love pizza')
print(wer)
```

*Results Output:*
```
0.25
```
<br />

**Example 3 - Calculate the overall Word Error Rate on a set of lists**

*Python Code:*
```python
ref = ['i love cold pizza','the sugar bear character was popular']
hyp = ['i love pizza','the sugar bare character was popular']
wer = werpy.wer(ref, hyp)
print(wer)
```

*Results Output:*
```
0.2
```
<br />

**Example 4 - Calculate the Word Error Rates for each set of texts**

*Python Code:*
```python
ref = ['no one else could claim that','she cited multiple reasons why']
hyp = ['no one else could claim that','she sighted multiple reasons why']
wers = werpy.wers(ref, hyp)
print(wers)
```

*Results Output:*
```
[0.0, 0.2]
```
<br />

**Example 5 - Calculate the weighted Word Error Rates for the entire set of text**

*Python Code:*
```python
ref = ['it was beautiful and sunny today']
hyp = ['it was a beautiful and sunny day']
werp = werpy.werp(ref, hyp, insertions_weight=0.5, deletions_weight=0.5, substitutions_weight=1)
print(werp)
```

*Results Output:*
```
0.25
```
<br />

**Example 6 - Calculate a list of weighted Word Error Rates for each of the reference and hypothesis texts**

*Python Code:*
```python
ref = ['it blocked sight lines of central park', 'her father was an alderman in the city government']
hyp = ['it blocked sightlines of central park', 'our father was an elder man in the city government']
werps = werpy.werps(ref, hyp, insertions_weight = 0.5, deletions_weight = 0.5, substitutions_weight = 1)
print(werps)
```

*Results Output:*
```
[0.21428571428571427, 0.2777777777777778]
```
<br />

**Example 7 - Provide a complete breakdown of the Word Error Rate calculations for each of the reference and hypothesis texts**

*Python Code:*
```python
ref = ['it is consumed domestically and exported to other countries', 'rufino street in makati right inside the makati central business district', 'its estuary is considered to have abnormally low rates of dissolved oxygen', 'he later cited his first wife anita as the inspiration for the song', 'no one else could claim that']
hyp = ['it is consumed domestically and exported to other countries', 'rofino street in mccauti right inside the macasi central business district', 'its estiary is considered to have a normally low rates of dissolved oxygen', 'he later sighted his first wife anita as the inspiration for the song', 'no one else could claim that']
summary = werpy.summary(ref, hyp)
print(summary)
```

*Results Output:*
<!-- <img src=".github/assets/images/werpy-example-summary-results-word-error-rate-breakdown.png" width=100% height=100%> -->
<!-- <img src="https://github.com/analyticsinmotion/werpy/blob/main/.github/assets/images/werpy-example-summary-results-word-error-rate-breakdown.png" width=100% height=100%> -->
<!-- ![werpy summary DataFrame](.github/assets/images/werpy-example-summary-results-word-error-rate-breakdown.png)-->

![werpy-example-summary-results-word-error-rate-breakdown](https://user-images.githubusercontent.com/52817125/234950114-7efcce9b-7a76-4413-830f-7deda20cad75.png)

<br />

**Example 8 - Provide a complete breakdown of the Weighted Word Error Rate for each of the input texts**

*Python Code:*
```python
ref = ['the tower caused minor discontent because it blocked sight lines of central park', 'her father was an alderman in the city government', 'he was commonly referred to as the blacksmith of ballinalee']
hyp = ['the tower caused minor discontent because it blocked sightlines of central park', 'our father was an alderman in the city government', 'he was commonly referred to as the blacksmith of balen alley']
weighted_summary = werpy.summaryp(ref, hyp, insertions_weight = 0.5, deletions_weight = 0.5, substitutions_weight = 1)
print(weighted_summary)
```

*Results Output:*

![werpy-example-summaryp-results-word-error-rate-breakdown](https://github.com/analyticsinmotion/werpy/assets/52817125/e6678997-d797-4767-bd8a-c141331b034b)

<br />

## Dependencies
- <a href="https://www.numpy.org">NumPy</a> - Provides an assortment of routines for fast operations on arrays
- <a href="https://pandas.pydata.org/">Pandas</a> - Powerful data structures for data analysis, time series, and statistics

## Licensing

``werpy`` is released under the terms of the BSD 3-Clause License. Please refer to the <a href="https://github.com/analyticsinmotion/werpy/blob/main/LICENSE">LICENSE</a> file for full details.

This project also includes third-party packages distributed under the BSD-3-Clause license (NumPy, Pandas) and the Apache License 2.0 (Cython).

The full NumPy, Pandas and Cython licenses can be found in the <a href="https://github.com/analyticsinmotion/werpy/tree/main/LICENSES">LICENSES</a> directory in this repository. 

They can also be found directly in the following source codes:

- NumPy - <a href="https://github.com/numpy/numpy/blob/main/LICENSE.txt">https://github.com/numpy/numpy/blob/main/LICENSE.txt</a>
- Pandas - <a href="https://github.com/pandas-dev/pandas/blob/main/LICENSE">https://github.com/pandas-dev/pandas/blob/main/LICENSE</a>
- Cython - <a href="https://github.com/cython/cython/blob/master/LICENSE.txt">https://github.com/cython/cython/blob/master/LICENSE.txt</a>

