Metadata-Version: 2.1
Name: safe_llm_parser
Version: 0.1.0
Summary: Reliable way to parse LLM outputs
License: MIT
Author: wmpons.pro
Author-email: wmpons.pro@gmail.com
Requires-Python: >=3.8,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: beautifulsoup4 (>=4.12.3,<5.0.0)
Requires-Dist: lxml (>=5.3.0,<6.0.0)
Requires-Dist: pydantic (>=2.9.1,<3.0.0)
Requires-Dist: types-beautifulsoup4 (>=4.12.0.20240907,<5.0.0.0)
Description-Content-Type: text/markdown

# SafeXMLParser - README

## Overview

`SafeXMLParser` is a Python class designed to provide a safer and fault-tolerant way to parse XML strings. It leverages **Large Language Models (LLMs)** to correct malformed XML in case the initial parsing fails. This class supports multiple attempts for parsing and logs every parsing attempt, including successful parses, errors, and any LLM-based corrections.

## Features

- **Multiple Attempts**: Provides the option to specify multiple parsing attempts to handle malformed XML.
- **LLM-based Correction**: Uses a specified LLM model to attempt XML correction if parsing fails.
- **Logging**: Records all attempts, including input, output, errors, and LLM correction details.
- **Flexible Configuration**: Customizable LLM model and number of attempts for robust XML parsing.

## Installation

1. Clone this repository or download the code.
2. Install the required dependencies (e.g., `beautifulsoup4` or any LLM model dependencies):
   ```bash
   pip install beautifulsoup4
   ```

## Example Usage

### 1. Importing the Class

First, import the `SafeXMLParser` class and any other necessary components:

```python
from safe_xml_parser import SafeXMLParser  # Example import path
```

### 2. Basic Usage (Single Parsing Attempt)

Here is an example of how to use `SafeXMLParser` for a basic XML parsing operation without a fallback model:

```python
# Example XML string
xml_data = "<root><child>data</child></root>"

# Initialize the parser
parser = SafeXMLParser()

# Attempt to parse the XML data (one attempt, no LLM correction)
try:
    parsed_data = parser.safe_parse(xml_data)
    print("Parsed Data:", parsed_data)
except Exception as e:
    print(f"Parsing failed: {e}")

# Output: {'root': {'child': 'data'}}
```

### 3. Multiple Attempts with LLM Correction

If you are dealing with malformed XML, you can provide a custom **LLM model** to correct the data between attempts:

```python
def fallback_correction(text):
    # Simple function to simulate fixing the broken XML
    return text.replace("<broken>", "<child>").replace("</broken>", "</child>")

# Malformed XML string
malformed_xml = "<root><broken>data</root>"

# Initialize the parser with the fallback correction model
parser = SafeXMLParser(default_llm_model=fallback_correction, default_nb_attempts=2)

# Attempt to parse the malformed XML data
try:
    parsed_data = parser.safe_parse(malformed_xml)
    print("Parsed Data:", parsed_data)
except Exception as e:
    print(f"Parsing failed: {e}")

# Output: {'root': {'child': 'data'}}
```

### 4. Accessing Parsing Logs

Logs are available for every parsing attempt, showing the input, output, error messages, and any LLM-based corrections applied:

```python
# Access logs after parsing
logs = parser.logs()
for log in logs:
    print(log)
```

Logs provide insights into each step of the parsing process, including what the LLM model was prompted with and what corrections it made.

### 5. Dynamic Configuration of LLM and Attempts

You can also configure the **LLM model** and **number of attempts** dynamically during parsing:

```python
# Initialize parser without setting defaults
parser = SafeXMLParser()

# Dynamically pass custom LLM model and attempts
try:
    parsed_data = parser.safe_parse(
        malformed_xml,
        nb_attempts=3,
        llm_model=fallback_correction
    )
    print("Parsed Data:", parsed_data)
except Exception as e:
    print(f"Parsing failed: {e}")
```

## Method Summary

### 1. `safe_parse(text_to_parse: str, nb_attempts: Optional[int] = None, llm_model: Optional[Callable] = None, correctness_prompt_maker: Callable = create_fix_xml_prompt) -> Dict[str, Union[Dict, str]]`

- **Description**: Attempts to parse the XML string multiple times, with the option of using an LLM model to correct any errors between attempts.
- **Args**:
  - `text_to_parse`: The XML string to be parsed.
  - `nb_attempts`: The number of attempts allowed for parsing (default: 1).
  - `llm_model`: The function used to correct XML between attempts (default: None).
  - `correctness_prompt_maker`: A function that creates prompts for LLM correction (default: `create_fix_xml_prompt`).
- **Returns**: A dictionary representation of the parsed XML.
- **Raises**: Raises an `Exception` if all attempts fail.

### 2. `logs(timestamp: bool = True) -> List[Dict[str, str]]`

- **Description**: Returns logs of each parsing attempt, with an option to include/exclude timestamps.
- **Args**:
  - `timestamp`: Whether to include timestamps in the logs (default: True).
- **Returns**: A list of dictionaries containing details of each parsing attempt.

## Logging Structure

Logs include the following information:

- **Input**: The XML string that was parsed.
- **Output**: The resulting parsed output (if successful) or "N/A".
- **Error**: Any error encountered during parsing, or "N/A" if successful.
- **Correctness Prompt**: The prompt sent to the LLM for correction (if applicable).
- **Correctness Output**: The corrected output from the LLM model (if applicable).

## Handling Edge Cases

- If parsing fails after all attempts, the parser raises an exception.
- The LLM model can be customized to handle different error types or malformed XML structures.

## Conclusion

The `SafeXMLParser` class offers a robust and flexible solution for parsing XML data, with built-in fault tolerance through LLM-based correction and detailed logging for easier debugging. This class is ideal for scenarios where XML data may be incomplete or malformed and multiple attempts are needed to ensure successful parsing.

### License

[Include any licensing details here.]

```

```

