Metadata-Version: 2.1
Name: ragbuilder
Version: 0.0.2
Summary: RagBuilder is a toolkit designed to help you create optimal Production-ready Retrieval-Augmented Generation (RAG) pipeline for your data
Home-page: https://github.com/kruxai/ragbuilder
Author: Ashwin Aravind, Aravind Parameswaran
Author-email: ashwin@krux.ai, aravind@krux.ai
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: Flask
Requires-Dist: pytest ==7.2.1
Requires-Dist: pytest-xdist ~=3.2.0
Requires-Dist: coverage ~=7.1.0
Requires-Dist: black ~=23.1.0
Requires-Dist: pytest-timeout ~=2.1.0
Requires-Dist: pytest-env ~=0.8.1
Requires-Dist: python-dotenv
Requires-Dist: langchain
Requires-Dist: langchain-community
Requires-Dist: langchainhub
Requires-Dist: langchain-openai
Requires-Dist: langchain-chroma
Requires-Dist: bs4
Requires-Dist: langchain-core
Requires-Dist: unstructured
Requires-Dist: pdf2image
Requires-Dist: pdfminer.six
Requires-Dist: langchain-experimental
Requires-Dist: scikit-learn
Requires-Dist: ragas ==0.1.7
Requires-Dist: inquirer
Requires-Dist: llama-index
Requires-Dist: chromadb
Requires-Dist: sentence-transformers
Requires-Dist: llama-index-vector-stores-chroma
Requires-Dist: llama-index-readers-web
Requires-Dist: IPython
Requires-Dist: llama-index-retrievers-bm25
Requires-Dist: rake-nltk
Requires-Dist: llama-index-embeddings-langchain
Requires-Dist: llama-index-vector-stores-faiss
Requires-Dist: faiss-cpu
Requires-Dist: llama-index-llms-mistralai
Requires-Dist: llama-index-embeddings-mistralai
Requires-Dist: llama-index-embeddings-openai
Requires-Dist: llama-index-postprocessor-longllmlingua
Requires-Dist: llmlingua
Requires-Dist: llama-index-postprocessor-cohere-rerank
Requires-Dist: llama-index-postprocessor-jinaai-rerank
Requires-Dist: llama-index-postprocessor-rankgpt-rerank
Requires-Dist: llama-index-postprocessor-colbert-rerank
Requires-Dist: llama-index-postprocessor-rankllm-rerank
Requires-Dist: llama-index-llms-openai
Requires-Dist: langchain-huggingface
Requires-Dist: rank-bm25
Requires-Dist: ragas
Requires-Dist: flask
Requires-Dist: pandas
Requires-Dist: mixpanel
Requires-Dist: langchain-mistralai
Requires-Dist: huggingface-hub
Requires-Dist: datasets
Requires-Dist: langchain-text-splitters
Requires-Dist: llama-index-core
Requires-Dist: requests
Requires-Dist: markdown
Requires-Dist: singlestoredb
Requires-Dist: langchain-pinecone

# RagBuilder

RagBuilder is a toolkit designed to help you create optimal Production-ready Retrieval-Augmented Generation (RAG) pipeline for your data. Ragbuilder iterates through all possibile combinations of RAG parameters (Eg: chunking strategy: semantic, character etc.; chunk size 1000, 2000 etc.), and evaluates them against a test data set, and provides a dashboard for reviewing what parameter values work best for your source data. RagBuilder also includes several pre-defined state-of-the-art RAG templates that have shown strong performance across various other datasets.

## Table of Contents

- [Features](#features)
- [Installation](#installation)
- [Set your OpenAI API key](#set-your-openai-api-key)
- [Quickstart Guide](#quickstart-guide)
- [License](#license)

## Features

- **Pre-defined RAG Templates**: Use state-of-the-art templates that have demonstrated strong performance across various datasets.
- **Custom RAG Configurations**: Create configurations from granular parameters like chunking strategy, chunking size, embedding models, retriever types etc.
- **Evaluation Dataset Options**: Choose to generate a synthetic test dataset or provide your own.
- **Automatic Reuse**: Automatically re-uses previously generated synthetic test data when applicable.
- **Easy-to-use Interface**: Intuitive UI to guide you through setting up, configuring, and reviewing your RAG configurations.


## Installation

Install using pip:

```
pip install ragbuilder
```

## Set your OpenAI API key

Make sure your OpenAI API key is available by setting it as an environment variable. In MacOS and Linux, this is the command:

```
export OPENAI_API_KEY=XXXXX
```

and on Windows it is

```
set OPENAI_API_KEY=XXXXX
```

## Quickstart Guide

Now, run ragbuilder on your command line:

```
ragbuilder
```

This will start the Ragbuilder Flask app and open the browser. If the browser window doesn't open automatically, go to http://localhost:8001/ in your browser to access the RagBuilder dashboard.

1. Click **New Project** to start building your RAG.
2. **Description:** Describe your use-case. Let's specify "Q&A Chatbot" as the description for our demo.
3. **Source Data:** Specify the path to your source data. This could be a URL, local directory or local file path. For the sake of our demo, let's specify the URL: https://lilianweng.github.io/posts/2023-06-23-agent/
4. **Select Ragbuilder options:** 
   - Use Pre-defined RAG Templates - When selected, this'll include pre-defined RAG configuration templates that have demonstrated strong performance across various datasets and related use-cases. These templates will be evaluated against your data, providing you with performance metrics for each pre-defined configuration.
   - Create Custom RAG Configurations - When selected, this'll generate multiple RAG configurations based on detailed parameters like chunking strategy, chunking size, embedding model, retriever type etc. This option may yield hundreds or even thousands of unique configurations to compare, offering a comprehensive performance analysis tailored to your dataset. *[Note]: This may take several minutes to complete.*
5. Next, in order to tailor your RAG configurations, you can unselect any specific options you wish to exclude (For eg: Unselecting "Chunking Strategy: Character" will exclude all RAG configurations that have the CharacterTextSplitter). For best results, you may want to leave all settings unchanged. But for our Quickstart demo, we will unselect everything except the below:
    - Chunking strategy: Markdown
    - Embedding model: text-embedding-3-large
    - Retriever: Vector DB - Similarity Search
    - Top k: 5
    - LLM: GPT-3.5 Turbo
6. Next, in Evaluation dataset options, you have the option to:
    - **Use Existing Synthetic Test Data:** If synthetic test data was previously generated for your dataset, this option will appear alongside the path of the existing test data.
    - **Generate Synthetic Test Data from My Dataset:** Create a new synthetic test dataset based on your existing data.
    - **Provide a Manually Created Test Dataset:** Use your own test dataset file (CSV format with "question" and "ground_truth" columns).
    For our demo, let's go ahead and create a synthetic test data by selecting the **Generate Synthetic Test Data**** option.
7. Before running the tool, let's review all your selections:
8. Review all the selections and click **Confirm**
9. After processing we should see the dashboard with the results.

## License

This project is licensed under the Apache License - see the [LICENSE](LICENSE) file for details.

---
