Metadata-Version: 2.1
Name: datafog-instructor
Version: 0.1.0b6
Summary: Scan, redact, and manage PII in your documents before they get uploaded to a Retrieval Augmented Generation (RAG) system.
Home-page: https://datafog.ai
Author: Sid Mohan
Author-email: sid@datafog.ai
Maintainer: DataFog
Maintainer-email: hi@datafog.ai
License: MIT
Project-URL: Homepage, https://datafog.ai
Project-URL: Documentation, https://docs.datafog.ai
Project-URL: Discord, https://discord.gg/bzDth394R4
Project-URL: Twitter, https://twitter.com/datafoginc
Project-URL: GitHub, https://github.com/datafog/datafog-instructor
Keywords: pii,redaction,nlp,rag,retrieval augmented generation,entity recognition
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Framework :: Pytest
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: System Administrators
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic==2.7.1
Requires-Dist: ollama<0.3.0,>=0.2.0
Requires-Dist: ollama-instructor==0.2.0
Requires-Dist: python-dotenv==1.0.1
Requires-Dist: openai==1.12.0
Requires-Dist: click==8.1.7
Provides-Extra: dev
Requires-Dist: pytest==8.0.0; extra == "dev"
Requires-Dist: black==24.1.1; extra == "dev"
Requires-Dist: flake8==7.0.0; extra == "dev"
Requires-Dist: mypy==1.8.0; extra == "dev"
Requires-Dist: types-requests==2.31.0.20240218; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx==7.2.6; extra == "docs"
Requires-Dist: sphinx-rtd-theme==2.0.0; extra == "docs"

# DataFog Instructor SDK

DataFog Instructor is a Python SDK for named entity recognition (NER) that supports multiple LLM backends, including Ollama and OpenAI. It provides an easy-to-use interface for detecting and classifying entities in text.

## Installation

To install the DataFog Instructor SDK, you can use pip:

```
pip install datafog-instructor
```

For development purposes, including testing and documentation tools:

```
pip install datafog-instructor[dev,docs]
```

## Quick Start

Here's a simple example to get you started with DataFog Instructor:

```python
from datafog_instructor import DataFog

# Initialize DataFog with default settings (uses Ollama)
datafog = DataFog()

# Detect entities in text
text = "Cisco acquires Hess for $20 billion"
result = datafog.detect_entities(text)

# Print results
for entity in result.entities:
    print(f"Text: {entity.text}, Type: {entity.type.value}")
```

## Configuration

You can customize the DataFog instance using environment variables or by passing parameters directly:

- `DATAFOG_LLM_BACKEND`: Choose the LLM backend ("ollama" or "openai")
- `DATAFOG_LLM_HOST`: The host URL for the Ollama service (default: "http://localhost:11434")
- `DATAFOG_LLM_MODEL`: The model to use for entity detection (default: "phi3")
- `OPENAI_API_KEY`: Your OpenAI API key (required if using the OpenAI backend)

Example with custom settings:

```python
import os
os.environ['DATAFOG_LLM_BACKEND'] = 'openai'
os.environ['OPENAI_API_KEY'] = 'your-api-key-here'

from datafog_instructor import DataFog

datafog = DataFog()
```

Or, configure directly in code:

```python
from datafog_instructor import DataFog, OpenAILLM

llm = OpenAILLM(api_key="your-api-key-here", model="gpt-4o")
datafog = DataFog(llm=llm)
```

## Features

### Detect Entities

Use the `detect_entities` method to identify and classify named entities in a given text:

```python
text = "Apple Inc. reported $100 billion in revenue for Q4 2023"
result = datafog.detect_entities(text)

for entity in result.entities:
    print(f"Text: {entity.text}, Type: {entity.type.value}")
```

### Manage Entity Types

You can add or remove entity types dynamically:

```python
# Add a new entity type
datafog.add_entity_type("CUSTOM", "Custom Entity")

# Remove an entity type
datafog.remove_entity_type("CUSTOM")

# Get all entity types
entity_types = datafog.get_entity_types()
print(entity_types)
```

## Default Entity Types

The SDK comes with an expanded list of predefined entity types, including:

- Organization Information: ORG, PERSON, TRANSACTION_TYPE, DEAL_STRUCTURE, FINANCIAL_INFO, PRODUCT, LOCATION, DATE, INDUSTRY, ROLE, REGULATORY, SENSITIVE_INFO, CONTACT, ID, STRATEGY, COMPANY, MONEY
- Personal Information: EMAIL, PHONE, SSN, CREDIT_CARD, IP_ADDRESS, URL, AGE, NATIONALITY, JOB_TITLE, EDUCATION
- Location Information: ADDRESS, CITY, STATE, ZIP, COUNTRY, REGION

## Error Handling

The SDK includes error handling for various scenarios. If there's an issue with processing the response or an unexpected response format, it will raise a `ValueError` with details about the error.

## Development and Testing

For development purposes, you can install additional dependencies:

```
pip install datafog-instructor[dev]
```

This includes tools like pytest, black, flake8, and mypy for testing and code quality.

## Documentation

To build the documentation locally:

```
pip install datafog-instructor[docs]
cd docs
make html
```

The documentation will be available in the `docs/_build/html` directory.

## Contributing

Contributions to the DataFog Instructor SDK are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the MIT License.

## Support

If you encounter any problems or have any questions, please open an issue on the GitHub repository or join our Discord community at https://discord.gg/bzDth394R4.

## Links

- Homepage: https://datafog.ai
- Documentation: https://docs.datafog.ai
- Twitter: https://twitter.com/datafoginc
- GitHub: https://github.com/datafog/datafog-instructor
