Metadata-Version: 2.1
Name: streamlit-chromadb-connection
Version: 0.0.5
Summary: A simple adapter connection for any Streamlit LLM-powered app to use ChromaDB vector database.
Author-email: Dev317 <mineskiroxro@gmail.com>
Project-URL: Homepage, https://github.com/Dev317/streamlit_chromadb_connection
Project-URL: Issues, https://github.com/Dev317/streamlit_chromadb_connection/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: altair ==5.2.0
Requires-Dist: annotated-types ==0.6.0
Requires-Dist: anyio ==3.7.1
Requires-Dist: asgiref ==3.7.2
Requires-Dist: attrs ==23.1.0
Requires-Dist: backoff ==2.2.1
Requires-Dist: bcrypt ==4.1.1
Requires-Dist: blinker ==1.7.0
Requires-Dist: build ==1.0.3
Requires-Dist: cachetools ==5.3.2
Requires-Dist: certifi ==2023.11.17
Requires-Dist: charset-normalizer ==3.3.2
Requires-Dist: chroma-hnswlib ==0.7.3
Requires-Dist: chromadb ==0.4.18
Requires-Dist: click ==8.1.7
Requires-Dist: coloredlogs ==15.0.1
Requires-Dist: Deprecated ==1.2.14
Requires-Dist: docutils ==0.20.1
Requires-Dist: exceptiongroup ==1.2.0
Requires-Dist: fastapi ==0.104.1
Requires-Dist: filelock ==3.13.1
Requires-Dist: flatbuffers ==23.5.26
Requires-Dist: fsspec ==2023.10.0
Requires-Dist: gitdb ==4.0.11
Requires-Dist: GitPython ==3.1.40
Requires-Dist: google-auth ==2.24.0
Requires-Dist: googleapis-common-protos ==1.61.0
Requires-Dist: grpcio ==1.59.3
Requires-Dist: h11 ==0.14.0
Requires-Dist: httptools ==0.6.1
Requires-Dist: huggingface-hub ==0.19.4
Requires-Dist: humanfriendly ==10.0
Requires-Dist: idna ==3.6
Requires-Dist: importlib-metadata ==6.9.0
Requires-Dist: importlib-resources ==6.1.1
Requires-Dist: jaraco.classes ==3.3.0
Requires-Dist: Jinja2 ==3.1.2
Requires-Dist: jsonschema ==4.20.0
Requires-Dist: jsonschema-specifications ==2023.11.2
Requires-Dist: keyring ==24.3.0
Requires-Dist: kubernetes ==28.1.0
Requires-Dist: markdown-it-py ==3.0.0
Requires-Dist: MarkupSafe ==2.1.3
Requires-Dist: mdurl ==0.1.2
Requires-Dist: mmh3 ==4.0.1
Requires-Dist: monotonic ==1.6
Requires-Dist: more-itertools ==10.1.0
Requires-Dist: mpmath ==1.3.0
Requires-Dist: nh3 ==0.2.14
Requires-Dist: numpy ==1.26.2
Requires-Dist: oauthlib ==3.2.2
Requires-Dist: onnxruntime ==1.16.3
Requires-Dist: opentelemetry-api ==1.21.0
Requires-Dist: opentelemetry-exporter-otlp-proto-common ==1.21.0
Requires-Dist: opentelemetry-exporter-otlp-proto-grpc ==1.21.0
Requires-Dist: opentelemetry-instrumentation ==0.42b0
Requires-Dist: opentelemetry-instrumentation-asgi ==0.42b0
Requires-Dist: opentelemetry-instrumentation-fastapi ==0.42b0
Requires-Dist: opentelemetry-proto ==1.21.0
Requires-Dist: opentelemetry-sdk ==1.21.0
Requires-Dist: opentelemetry-semantic-conventions ==0.42b0
Requires-Dist: opentelemetry-util-http ==0.42b0
Requires-Dist: overrides ==7.4.0
Requires-Dist: packaging ==23.2
Requires-Dist: pandas ==2.1.3
Requires-Dist: Pillow ==10.1.0
Requires-Dist: pkginfo ==1.9.6
Requires-Dist: posthog ==3.0.2
Requires-Dist: protobuf ==4.25.1
Requires-Dist: pulsar-client ==3.3.0
Requires-Dist: pyarrow ==14.0.1
Requires-Dist: pyasn1 ==0.5.1
Requires-Dist: pyasn1-modules ==0.3.0
Requires-Dist: pydantic ==2.5.2
Requires-Dist: pydantic-core ==2.14.5
Requires-Dist: pydeck ==0.8.1b0
Requires-Dist: Pygments ==2.17.2
Requires-Dist: PyPika ==0.48.9
Requires-Dist: pyproject-hooks ==1.0.0
Requires-Dist: python-dateutil ==2.8.2
Requires-Dist: python-dotenv ==1.0.0
Requires-Dist: pytz ==2023.3.post1
Requires-Dist: PyYAML ==6.0.1
Requires-Dist: readme-renderer ==42.0
Requires-Dist: referencing ==0.31.1
Requires-Dist: requests ==2.31.0
Requires-Dist: requests-oauthlib ==1.3.1
Requires-Dist: requests-toolbelt ==1.0.0
Requires-Dist: rfc3986 ==2.0.0
Requires-Dist: rich ==13.7.0
Requires-Dist: rpds-py ==0.13.2
Requires-Dist: rsa ==4.9
Requires-Dist: six ==1.16.0
Requires-Dist: smmap ==5.0.1
Requires-Dist: sniffio ==1.3.0
Requires-Dist: starlette ==0.27.0
Requires-Dist: streamlit ==1.29.0
Requires-Dist: sympy ==1.12
Requires-Dist: tenacity ==8.2.3
Requires-Dist: tokenizers ==0.15.0
Requires-Dist: toml ==0.10.2
Requires-Dist: tomli ==2.0.1
Requires-Dist: toolz ==0.12.0
Requires-Dist: tornado ==6.4
Requires-Dist: tqdm ==4.66.1
Requires-Dist: twine ==4.0.2
Requires-Dist: typer ==0.9.0
Requires-Dist: typing-extensions ==4.8.0
Requires-Dist: tzdata ==2023.3
Requires-Dist: tzlocal ==5.2
Requires-Dist: urllib3 ==1.26.18
Requires-Dist: uvicorn ==0.24.0.post1
Requires-Dist: uvloop ==0.19.0
Requires-Dist: validators ==0.22.0
Requires-Dist: watchfiles ==0.21.0
Requires-Dist: websocket-client ==1.6.4
Requires-Dist: websockets ==12.0
Requires-Dist: wrapt ==1.16.0
Requires-Dist: zipp ==3.17.0

# 📂 ChromaDBConnection

Connection for Chroma vector database, `ChromaDBConnection`, has been released which makes it easy to connect any Streamlit LLM-powered app to.

With `st.connection()`, connecting to a Chroma vector database becomes just a few lines of code:


```python
import streamlit as st
from streamlit_chromadb_connection.chromadb_connection import ChromadbConnection

configuration = {
    "client": "PersistentClient",
    "path": "/tmp/.chroma"
}

collection_name = "documents_collection"

conn = st.connection("chromadb",
                    type=ChromaDBConnection,
                    **configuration)
documents_collection_df = conn.get_collection_data(collection_name)
st.dataframe(documents_collection_df)
```

## 📑 ChromaDBConnection API

### _connect()
There are 2 ways to connect to a Chroma client:
1. **PersistentClient**: Data will be persisted to a local machine
    ```python
    import streamlit as st
    from streamlit_chromadb_connection.chromadb_connection import ChromadbConnection

    configuration = {
        "client": "PersistentClient",
        "path": "/tmp/.chroma"
    }

    conn = st.connection(name="persistent_chromadb",
                        type=ChromadbConnection,
                        **configuration)
    ```

2. **HttpClient**: Data will be persisted to a cloud server where Chroma resides
    ```python
    import streamlit as st
    from streamlit_chromadb_connection.chromadb_connection import ChromadbConnection

    configuration = {
        "client": "HttpClient",
        "host": "localhost",
        "port": 8000,
    }

    conn = st.connection(name="http_connection",
                         type=ChromadbConnection,
                         **configuration)
    ```


### create_collection()
In order to create a Chroma collection, one needs to supply a `collection_name` and `embedding_function_name`, `embedding_config` and (optional) `metadata`.

There are current possible options for `embedding_function_name`:
- DefaultEmbeddingFunction
- SentenceTransformerEmbeddingFunction
- OpenAIEmbeddingFunction
- CohereEmbeddingFunction
- GooglePalmEmbeddingFunction
- GoogleVertexEmbeddingFunction
- HuggingFaceEmbeddingFunction
- InstructorEmbeddingFunction
- Text2VecEmbeddingFunction
- ONNXMiniLM_L6_V2

For `DefaultEmbeddingFunction`, the `embedding_config` argument can be left as an empty string. However, for other embedding functions such as `OpenAIEmbeddingFunction`, one needs to provide configuration such as:

```python
embedding_config = {
    api_key: "{OPENAI_API_KEY}",
    model_name: "{OPENAI_MODEL}",
}
```

One can also change the distance function by changing the `metadata` argument, such as:

```python
metadata = {"hnsw:space": "l2"} # Squared L2 norm
metadata = {"hnsw:space": "cosine"} # Cosine similarity
metadata = {"hnsw:space": "ip"} # Inner product
```

Sample code to create connection:

```python
collection_name = "documents_collection"
embedding_function_name = "DefaultEmbeddingFunction"
conn.create_collection(collection_name=collection_name,
                       embedding_function_name=embedding_function_name,
                       embedding_config={},
                       metadata = {"hnsw:space": "cosine"})
```

### get_collection_data()
This method returns a dataframe that consists of the embeddings and documents of a collection.
The `attributes` argument is a list of attributes to be included in the DataFrame.
The following code snippet will return all data in a collection in the form of a DataFrame, with 2 columns: `documents` and `embeddings`.

```python
collection_name = "documents_collection"
conn.get_collection_data(collection_name=collection_name,
                        attributes= ["documents", "embeddings"])
```

### delete_collection()
This method deletes the stated collection name.

```python
collection_name = "documents_collection"
conn.delete_collection(collection_name=collection_name)
```

### upload_document()
This method uploads documents to a collection.
If embeddings are not provided, the method will embed the documents using the embedding function specified in the collection.


```python
collection_name = "documents_collection"
conn.upload_document(collection_name=collection_name,
                     documents=["lorem ipsum", "doc2", "doc3"],
                     metadatas=[{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}, {"chapter": "29", "verse": "11"}],
                     ids=["id1", "id2", "id3"],
                     embeddings=None)
```

### query()
This method retrieves top k relevant document based on a list of queries supplied.
The result will be in a dataframe where each row will shows the top k relevant documents of each query.

```python
collection_name = "documents_collection"
conn.upload_document(collection_name=collection_name,
                     documents=["lorem ipsum", "doc2", "doc3"],
                     metadatas=[{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}, {"chapter": "29", "verse": "11"}],
                     ids=["id1", "id2", "id3"],
                     embeddings=None)

queried_data = conn.query(collection_name=collection_name,
                          query=["random_query1", "random_query2"],
                          num_results_limit=10,
                          attributes=["documents", "embeddings", "metadatas", "data"])
```

Metadata and document filters are also provided in `where_metadata_filter` and `where_document_filter` arguments respectively for more relevant search. For better understanding on the usage of where filters, please refer to: https://docs.trychroma.com/usage-guide#using-where-filters

```python
queried_data = conn.query(collection_name=collection_name,
                         query=["this is"],
                         num_results_limit=10,
                         attributes=["documents", "embeddings", "metadatas", "data"],
                         where_metadata_filter={"chapter": "3"})
```


***
🎉 That's it! `ChromaDBConnection` is ready to be used with `st.connection()`. 🎉
***

## Contribution 🔥
```
author={Vu Quang Minh},
github={Dev317},
year={2023}
```
