Metadata-Version: 2.1
Name: streamlit-chromadb-connection
Version: 1.0.0
Summary: A simple adapter connection for any Streamlit LLM-powered app to use ChromaDB vector database.
Author-email: Dev317 <mineskiroxro@gmail.com>
Project-URL: Homepage, https://github.com/Dev317/streamlit_chromadb_connection
Project-URL: Issues, https://github.com/Dev317/streamlit_chromadb_connection/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: altair==5.2.0
Requires-Dist: annotated-types==0.6.0
Requires-Dist: anyio==3.7.1
Requires-Dist: asgiref==3.7.2
Requires-Dist: attrs==23.1.0
Requires-Dist: backoff==2.2.1
Requires-Dist: bcrypt==4.1.1
Requires-Dist: blinker==1.7.0
Requires-Dist: build==1.0.3
Requires-Dist: cachetools==5.3.2
Requires-Dist: certifi==2023.11.17
Requires-Dist: charset-normalizer==3.3.2
Requires-Dist: chroma-hnswlib==0.7.3
Requires-Dist: chromadb==0.4.18
Requires-Dist: click==8.1.7
Requires-Dist: coloredlogs==15.0.1
Requires-Dist: Deprecated==1.2.14
Requires-Dist: docutils==0.20.1
Requires-Dist: exceptiongroup==1.2.0
Requires-Dist: fastapi==0.104.1
Requires-Dist: filelock==3.13.1
Requires-Dist: flatbuffers==23.5.26
Requires-Dist: fsspec==2023.10.0
Requires-Dist: gitdb==4.0.11
Requires-Dist: GitPython==3.1.40
Requires-Dist: google-auth==2.24.0
Requires-Dist: googleapis-common-protos==1.61.0
Requires-Dist: grpcio==1.59.3
Requires-Dist: h11==0.14.0
Requires-Dist: httptools==0.6.1
Requires-Dist: huggingface-hub==0.19.4
Requires-Dist: humanfriendly==10.0
Requires-Dist: idna==3.6
Requires-Dist: importlib-metadata==6.9.0
Requires-Dist: importlib-resources==6.1.1
Requires-Dist: jaraco.classes==3.3.0
Requires-Dist: Jinja2==3.1.2
Requires-Dist: jsonschema==4.20.0
Requires-Dist: jsonschema-specifications==2023.11.2
Requires-Dist: keyring==24.3.0
Requires-Dist: kubernetes==28.1.0
Requires-Dist: markdown-it-py==3.0.0
Requires-Dist: MarkupSafe==2.1.3
Requires-Dist: mdurl==0.1.2
Requires-Dist: mmh3==4.0.1
Requires-Dist: monotonic==1.6
Requires-Dist: more-itertools==10.1.0
Requires-Dist: mpmath==1.3.0
Requires-Dist: nh3==0.2.14
Requires-Dist: numpy==1.26.2
Requires-Dist: oauthlib==3.2.2
Requires-Dist: onnxruntime==1.16.3
Requires-Dist: opentelemetry-api==1.21.0
Requires-Dist: opentelemetry-exporter-otlp-proto-common==1.21.0
Requires-Dist: opentelemetry-exporter-otlp-proto-grpc==1.21.0
Requires-Dist: opentelemetry-instrumentation==0.42b0
Requires-Dist: opentelemetry-instrumentation-asgi==0.42b0
Requires-Dist: opentelemetry-instrumentation-fastapi==0.42b0
Requires-Dist: opentelemetry-proto==1.21.0
Requires-Dist: opentelemetry-sdk==1.21.0
Requires-Dist: opentelemetry-semantic-conventions==0.42b0
Requires-Dist: opentelemetry-util-http==0.42b0
Requires-Dist: overrides==7.4.0
Requires-Dist: packaging==23.2
Requires-Dist: pandas==2.1.3
Requires-Dist: Pillow==10.1.0
Requires-Dist: pkginfo==1.9.6
Requires-Dist: posthog==3.0.2
Requires-Dist: protobuf==4.25.1
Requires-Dist: pulsar-client==3.3.0
Requires-Dist: pyarrow==14.0.1
Requires-Dist: pyasn1==0.5.1
Requires-Dist: pyasn1-modules==0.3.0
Requires-Dist: pydantic==2.5.2
Requires-Dist: pydantic_core==2.14.5
Requires-Dist: pydeck==0.8.1b0
Requires-Dist: Pygments==2.17.2
Requires-Dist: PyPika==0.48.9
Requires-Dist: pyproject_hooks==1.0.0
Requires-Dist: python-dateutil==2.8.2
Requires-Dist: python-dotenv==1.0.0
Requires-Dist: pytz==2023.3.post1
Requires-Dist: PyYAML==6.0.1
Requires-Dist: readme-renderer==42.0
Requires-Dist: referencing==0.31.1
Requires-Dist: requests==2.31.0
Requires-Dist: requests-oauthlib==1.3.1
Requires-Dist: requests-toolbelt==1.0.0
Requires-Dist: rfc3986==2.0.0
Requires-Dist: rich==13.7.0
Requires-Dist: rpds-py==0.13.2
Requires-Dist: rsa==4.9
Requires-Dist: six==1.16.0
Requires-Dist: smmap==5.0.1
Requires-Dist: sniffio==1.3.0
Requires-Dist: starlette==0.27.0
Requires-Dist: streamlit==1.29.0
Requires-Dist: sympy==1.12
Requires-Dist: tenacity==8.2.3
Requires-Dist: tokenizers==0.15.0
Requires-Dist: toml==0.10.2
Requires-Dist: tomli==2.0.1
Requires-Dist: toolz==0.12.0
Requires-Dist: tornado==6.4
Requires-Dist: tqdm==4.66.1
Requires-Dist: twine==4.0.2
Requires-Dist: typer==0.9.0
Requires-Dist: typing_extensions==4.8.0
Requires-Dist: tzdata==2023.3
Requires-Dist: tzlocal==5.2
Requires-Dist: urllib3==1.26.18
Requires-Dist: uvicorn==0.24.0.post1
Requires-Dist: uvloop==0.19.0
Requires-Dist: validators==0.22.0
Requires-Dist: watchfiles==0.21.0
Requires-Dist: websocket-client==1.6.4
Requires-Dist: websockets==12.0
Requires-Dist: wrapt==1.16.0
Requires-Dist: zipp==3.17.0

# 📂 ChromaDBConnection

![Screenshot](demo_ss.png)

Connection for Chroma vector database, `ChromaDBConnection`, has been released which makes it easy to connect any Streamlit LLM-powered app to.

With `st.connection()`, connecting to a Chroma vector database becomes just a few lines of code:


```python
import streamlit as st
from streamlit_chromadb_connection.chromadb_connection import ChromadbConnection

configuration = {
    "client": "PersistentClient",
    "path": "/tmp/.chroma"
}

collection_name = "documents_collection"

conn = st.connection("chromadb",
                    type=ChromaDBConnection,
                    **configuration)
documents_collection_df = conn.get_collection_data(collection_name)
st.dataframe(documents_collection_df)
```

## 📑 ChromaDBConnection API

### _connect()
There are 2 ways to connect to a Chroma client:
1. **PersistentClient**: Data will be persisted to a local machine
    ```python
    import streamlit as st
    from streamlit_chromadb_connection.chromadb_connection import ChromadbConnection

    configuration = {
        "client": "PersistentClient",
        "path": "/tmp/.chroma"
    }

    conn = st.connection(name="persistent_chromadb",
                        type=ChromadbConnection,
                        **configuration)
    ```

2. **HttpClient**: Data will be persisted to a cloud server where Chroma resides
    ```python
    import streamlit as st
    from streamlit_chromadb_connection.chromadb_connection import ChromadbConnection

    configuration = {
        "client": "HttpClient",
        "host": "localhost",
        "port": 8000,
    }

    conn = st.connection(name="http_connection",
                         type=ChromadbConnection,
                         **configuration)
    ```


### create_collection()
In order to create a Chroma collection, one needs to supply a `collection_name` and `embedding_function_name`, `embedding_config` and (optional) `metadata`.

There are current possible options for `embedding_function_name`:
- DefaultEmbeddingFunction
- SentenceTransformerEmbeddingFunction
- OpenAIEmbeddingFunction
- CohereEmbeddingFunction
- GooglePalmEmbeddingFunction
- GoogleVertexEmbeddingFunction
- HuggingFaceEmbeddingFunction
- InstructorEmbeddingFunction
- Text2VecEmbeddingFunction
- ONNXMiniLM_L6_V2

For `DefaultEmbeddingFunction`, the `embedding_config` argument can be left as an empty string. However, for other embedding functions such as `OpenAIEmbeddingFunction`, one needs to provide configuration such as:

```python
embedding_config = {
    api_key: "{OPENAI_API_KEY}",
    model_name: "{OPENAI_MODEL}",
}
```

One can also change the distance function by changing the `metadata` argument, such as:

```python
metadata = {"hnsw:space": "l2"} # Squared L2 norm
metadata = {"hnsw:space": "cosine"} # Cosine similarity
metadata = {"hnsw:space": "ip"} # Inner product
```

Sample code to create connection:

```python
collection_name = "documents_collection"
embedding_function_name = "DefaultEmbeddingFunction"
conn.create_collection(collection_name=collection_name,
                       embedding_function_name=embedding_function_name,
                       embedding_config={},
                       metadata = {"hnsw:space": "cosine"})
```

### get_collection_data()
This method returns a dataframe that consists of the embeddings and documents of a collection.
The `attributes` argument is a list of attributes to be included in the DataFrame.
The following code snippet will return all data in a collection in the form of a DataFrame, with 2 columns: `documents` and `embeddings`.

```python
collection_name = "documents_collection"
conn.get_collection_data(collection_name=collection_name,
                        attributes= ["documents", "embeddings"])
```

### delete_collection()
This method deletes the stated collection name.

```python
collection_name = "documents_collection"
conn.delete_collection(collection_name=collection_name)
```

### upload_document()
This method uploads documents to a collection.
If embeddings are not provided, the method will embed the documents using the embedding function specified in the collection.


```python
collection_name = "documents_collection"
conn.upload_document(collection_name=collection_name,
                     documents=["lorem ipsum", "doc2", "doc3"],
                     metadatas=[{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}, {"chapter": "29", "verse": "11"}],
                     ids=["id1", "id2", "id3"],
                     embeddings=None)
```

### query()
This method retrieves top k relevant document based on a list of queries supplied.
The result will be in a dataframe where each row will shows the top k relevant documents of each query.

```python
collection_name = "documents_collection"
conn.upload_document(collection_name=collection_name,
                     documents=["lorem ipsum", "doc2", "doc3"],
                     metadatas=[{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}, {"chapter": "29", "verse": "11"}],
                     ids=["id1", "id2", "id3"],
                     embeddings=None)

queried_data = conn.query(collection_name=collection_name,
                          query=["random_query1", "random_query2"],
                          num_results_limit=10,
                          attributes=["documents", "embeddings", "metadatas", "data"])
```

Metadata and document filters are also provided in `where_metadata_filter` and `where_document_filter` arguments respectively for more relevant search. For better understanding on the usage of where filters, please refer to: https://docs.trychroma.com/usage-guide#using-where-filters

```python
queried_data = conn.query(collection_name=collection_name,
                         query=["this is"],
                         num_results_limit=10,
                         attributes=["documents", "embeddings", "metadatas", "data"],
                         where_metadata_filter={"chapter": "3"})
```


***
🎉 That's it! `ChromaDBConnection` is ready to be used with `st.connection()`. 🎉
***

## Contribution 🔥
```
author={Vu Quang Minh},
github={Dev317},
year={2023}
```
