Metadata-Version: 2.1
Name: llama-index-readers-structured-data
Version: 0.2.0
Summary: llama-index readers structured_data integration
License: MIT
Author: Haoran Chen
Author-email: congling.chr@alibaba-inc.com
Requires-Python: >=3.8.1,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: llama-index-core (>=0.11.0,<0.12.0)
Requires-Dist: pandas
Description-Content-Type: text/markdown

# LlamaIndex Readers Integration: Structured-Data

The function 'StructuredDataReader' supports reading files in JSON, JSONL, CSV, and XLSX formats. It provides parameters 'col_index' and 'col_metadata' to differentiate between columns that should be written into the document's main text and additional metadata.

## Install package

```bash
pip install llama-index-readers-structured-data
```

Or install locally:

```bash
pip install -e llama-index-integrations/readers/llama-index-readers-structured-data
```

## Usage

1. for single document:

```python
from pathlib import Path
from llama_index.readers.structured_data.base import StructuredDataReader

parser = StructuredDataReader(col_index=["col1", "col2"], col_metadata=0)
documents = parser.load_data(Path("your/file/path.json"))
```

2. for dictory of documents:

```python
from pathlib import Path
from llama_index.core import SimpleDirectoryReader
from llama_index.readers.structured_data.base import StructuredDataReader

parser = StructuredDataReader(col_index=[1, -1], col_metadata="col3")
file_extractor = {
    ".xlsx": parser,
    ".csv": parser,
    ".json": parser,
    ".jsonl": parser,
}
documents = SimpleDirectoryReader(
    "your/dic/path", file_extractor=file_extractor
).load_data()
```

