Metadata-Version: 2.1
Name: llmgraph
Version: 1.0.0
Summary: Create knowledge graphs with LLMs
Home-page: https://github.com/dylanhogg/llmgraph
License: MIT
Keywords: Knowledge graph,LLM
Author: Dylan Hogg
Author-email: dylanhogg@gmail.com
Requires-Python: >=3.9,<4.0
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Visualization
Requires-Dist: beautifulsoup4 (>=4.12.2,<5.0.0)
Requires-Dist: click (>=8.1.7,<9.0.0)
Requires-Dist: httpx (>=0.25.0,<0.26.0)
Requires-Dist: joblib (>=1.3.2,<2.0.0)
Requires-Dist: loguru (>=0.7.2,<0.8.0)
Requires-Dist: matplotlib (>=3.8.0,<4.0.0)
Requires-Dist: networkx (>=3.1,<4.0)
Requires-Dist: omegaconf (>=2.3.0,<3.0.0)
Requires-Dist: openai (>=0.28.1,<0.29.0)
Requires-Dist: python-dotenv (>=1.0.0,<2.0.0)
Requires-Dist: pyvis (>=0.3.2,<0.4.0)
Requires-Dist: rich (>=13.6.0,<14.0.0)
Requires-Dist: tenacity (>=8.2.3,<9.0.0)
Requires-Dist: tqdm (>=4.66.1,<5.0.0)
Requires-Dist: typer (>=0.9.0,<0.10.0)
Project-URL: Repository, https://github.com/dylanhogg/llmgraph
Description-Content-Type: text/markdown

# llmgraph

[![pypi Version](https://img.shields.io/pypi/v/llmgraph.svg?logo=pypi)](https://pypi.org/project/llmgraph/)
[![build](https://github.com/dylanhogg/llmgraph/actions/workflows/python-poetry-app.yml/badge.svg)](https://github.com/dylanhogg/llmgraph/actions/workflows/python-poetry-app.yml)

<!-- [![Dependencies](https://img.shields.io/librariesio/github/dylanhogg/llmgraph)](https://libraries.io/github/dylanhogg/llmgraph) -->

llmgraph enables you to create knowledge graphs in [GraphML](http://graphml.graphdrawing.org/), [GEXF](https://gexf.net/), and HTML formats (generated via [pyvis](https://github.com/WestHealth/pyvis)) from a given source entity Wikipedia page. The knowledge graphs are generated by extracting world knowledge from ChatGPT or other large language models (LLMs).

## Features

- Create knowledge graphs, given a source entity.
- Uses ChatGPT (or another LLM) to extract world knowledge.
- Generate knowledge graphs in HTML, GraphML, and GEXF formats.
- Many entity types and relationships supported by [customised prompts](https://github.com/dylanhogg/llmgraph/blob/main/llmgraph/prompts.yaml).
- Cache support to iteratively grow a knowledge graph, efficiently.
- Outputs `total tokens` used to understand LLM costs (even though a default run is only about 1 cent).
- Customisable model (default is `gpt-3.5-turbo` for speed and cost).

## Installation

You can install llmgraph using pip:

```bash
pip install llmgraph
```

## Example Output

In addition to GraphML and GEXF formats, an HTML [pyvis](https://github.com/WestHealth/pyvis) physics enabled graph can be viewed:

![example machine learning output](https://github.com/dylanhogg/llmgraph/blob/main/docs/img/machine-learning_artificial-intelligence_v0.3.0_level3.png?raw=true)

## Example Usage

The example above was generated with the following command, which requires an `entity_type` and a quoted `entity_wikipedia` souce url:

```bash
llmgraph machine-learning "https://en.wikipedia.org/wiki/Artificial_intelligence" --levels 3
```

This example creates a 3 level graph, based on the given start node 'Artificial Intelligence'.

Note that you will need to set an environment variable '`OPENAI_API_KEY`' prior to running. See the [OpenAI docs](https://platform.openai.com/docs/quickstart/step-2-setup-your-api-key) for more info. The `total tokens used` is output as the run progresses. For reference this 3 level example used a total of 7,650 gpt-3.5-turbo tokens, which is approx 1.5 cents as of Oct 2023.

The entity type sets the LLM prompt used to find related entities to include in the graph. The full list can be seen in [prompts.yaml](https://github.com/dylanhogg/llmgraph/blob/main/llmgraph/prompts.yaml) and include the following entity types:

- `automobile`
- `book`
- `computer-game`
- `concepts-general`
- `creative-general`
- `documentary`
- `food`
- `machine-learning`
- `movie`
- `music`
- `people-historical`
- `podcast`
- `software-engineering`
- `tv`

### Required Arguments

- `entity_type` (TEXT): Entity type (e.g. movie)
- `entity_wikipedia` (TEXT): Full Wikipedia link to the root entity

### Optional Arguments

- `--entity-root` (TEXT): Optional root entity name override if different from the Wikipedia page title [default: None]
- `--levels` (INTEGER): Number of levels deep to construct from the central root entity [default: 2]
- `--max-sum-total-tokens` (INTEGER): Maximum sum of tokens for graph generation [default: 200000]
- `--output-folder` (TEXT): Folder location to write outputs [default: ./_output/]
- `--llm-model` (TEXT): The model name [default: gpt-3.5-turbo]
- `--llm-temp` (FLOAT): LLM temperature value [default: 0.0]
- `--llm-use-localhost` (INTEGER): LLM use localhost:8081 instead of OpenAI [default: 0]
- `--help`: Show this message and exit.

## Example of Prompt Used to Generate Graph

Here is an example of the prompt template, with place holders, used to generate related entities from a given source entity. This is applied recursively to create a knowledge graph, merging duplicated nodes as required.

```
You are knowledgeable about {knowledgeable_about}.
List, in json array format, the top {top_n} {entities} most like '{{entity_root}}'
with Wikipedia link, reasons for similarity, similarity on scale of 0 to 1.
Format your response in json array format as an array with column names: 'name', 'wikipedia_link', 'reason_for_similarity', and 'similarity'.
Example response: {{{{"name": "Example {entity}","wikipedia_link": "https://en.wikipedia.org/wiki/Example_{entity_underscored}","reason_for_similarity": "Reason for similarity","similarity": 0.5}}}}
```

It works well on the primary tested LLM, being OpenAI gpt-3.5-turbo. Results are ok, but not as good using Llama2. The prompt source of truth and additional details can be see in [prompts.yaml](https://github.com/dylanhogg/llmgraph/blob/main/llmgraph/prompts.yaml).

Each entity type has custom placeholders, for example `concepts-general` and `documentary`:

```
concepts-general:
    system: You are a highly knowledgeable ontologist and creator of knowledge graphs.
    knowledgeable_about: many concepts and ontologies.
    entities: concepts
    entity: concept name
    top_n: 5

documentary:
    system: You are knowledgeable about documentaries of all types, and genres.
    knowledgeable_about: documentaries of all types, and genres
    entities: Documentaries
    entity: Documentary
    top_n: 5
```

## Cached LLM API calls

Each call to the LLM API (and Wikipedia) is cached locally in a `.joblib_cache` folder. This allows an interrupted run to be resumed without duplicating identical calls. It also allows a re-run with a higher `--level` option to re-use results from the lower level run (assuming the same entity type and source).

## Future Improvements

- Improve support for locally running LLM server (e.g. via [ollama](https://ollama.ai/))
- Contrast graph output from different LLM models (e.g. [Llama2](https://huggingface.co/docs/transformers/model_doc/llama2) vs [Mistral](https://huggingface.co/docs/transformers/model_doc/mistral) vs [ChatGPT-4](https://openai.com/chatgpt))
- Investigate the hypothosis that this approach provides insight into how an LLM views the world.
- Include more examples in this documentation and make examples available for easy browsing.
- Instructions for running locally and adding a custom `entity_type` prompt.
- Better pyviz html output, in particular including reasons for entity relationship in UI.
- Parallelise API calls and result processing.
- Remove dependency on Wikipedia entities as a source.
- Contrast results from llmgraphg with other non-LLM graph construction e.g. using wikipedia page links, or [direct article embeddings](https://txt.cohere.com/embedding-archives-wikipedia/).

## Contributing

Contributions to llmgraph are welcome. Please follow these steps:

1. Fork the repository.
2. Create a new branch for your feature or bug fix.
3. Make your changes and commit them.
4. Create a pull request with a description of your changes.

