Metadata-Version: 2.1
Name: unitxt
Version: 1.0.45
Summary: Load any mixture of text to text data in one line of code
Home-page: https://github.com/ibm/unitxt
Author: IBM Research
Author-email: elron.bandel@ibm.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: datasets
Requires-Dist: evaluate
Requires-Dist: nltk
Requires-Dist: sacrebleu
Requires-Dist: absl-py
Requires-Dist: rouge-score
Requires-Dist: scikit-learn
Requires-Dist: dpath
Requires-Dist: jiwer
Requires-Dist: editdistance

<div align="center">
    <img src="./assets/banner.png" alt="Image Description" width="100%" />
</div>

Unitxt is a python library for getting data fired up and set for utilization.
In one line of code, it preps a dataset or mixtures-of-datasets into an input-output format for training and evaluation.
We aspire to be simple, adaptable and transperant. 

Unitxt builds on separation. Separation allows adding a dataset, without knowing anything about the models using it. Separation allows training without caring for preprocessing, switching models without loading the data differently and changing formats (instruction\ICL\etc.) without changing anything else. 

# 
[![version](https://img.shields.io/pypi/v/unitxt)](https://pypi.org/project/unitxt/)
![license](https://img.shields.io/github/license/ibm/unitxt)
![python](https://img.shields.io/badge/python-3.8%20|%203.9-blue)
![tests](https://img.shields.io/github/actions/workflow/status/ibm/unitxt/tests.yml?branch=main&label=tests)
[![codecov](https://codecov.io/gh/IBM/unitxt/branch/main/graph/badge.svg?token=mlrWq9cwz3)](https://codecov.io/gh/IBM/unitxt)
![Read the Docs](https://img.shields.io/readthedocs/unitxt)
[![downloads](https://static.pepy.tech/personalized-badge/unitxt?period=total&units=international_system&left_color=grey&right_color=green&left_text=downloads)](https://pepy.tech/project/unitxt)

#

<div align="center">
    <img src="./assets/unitxt_flow_light.gif" alt="Unitxt Flow" width="100%" />
</div>

# Where to start? 🦄
[![Button](https://img.shields.io/badge/Overview-pink?style=for-the-badge)](https://unitxt.readthedocs.io/)
[![Button](https://img.shields.io/badge/Concepts-pink?style=for-the-badge)](https://unitxt.readthedocs.io/en/latest/concepts.html)
[![Button](https://img.shields.io/badge/Tutorial-pink?style=for-the-badge)](https://unitxt.readthedocs.io/)
[![Button](https://img.shields.io/badge/Examples-pink?style=for-the-badge)](https://unitxt.readthedocs.io/)
[![Button](https://img.shields.io/badge/Docs-pink?style=for-the-badge)](https://unitxt.readthedocs.io/)
# Why Unitxt? 🦄

### 🦄 Simplicity
Everything is unitxt is simple and designed to feel natural and self explenatory.
### 🦄 Adaptability
Adding new datasets, loading recpepies, instructions and formattors is possible and encoureged!
### 🦄 Transperancy
The reosurces and formators of Unitxt are stored as shared datasets and therfore can easily reviewed by the crowed. Moreover, when assembling dataset with Unitxt it is very clear to others whats in it. 

#



