Metadata-Version: 2.1
Name: rir-api
Version: 0.1.0
Summary: A reverse image search API for image captioning and visual question answering.
Home-page: https://github.com/mi92/reverse-image-rag
Author: Michael Moor
Author-email: 
License: UNKNOWN
Platform: UNKNOWN
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: playwright ==1.41.2
Requires-Dist: openai ==1.12.0
Requires-Dist: requests ==2.31.0
Requires-Dist: pandas ==2.2.0
Requires-Dist: numpy ==1.26.4
Requires-Dist: requests

# Reverse Image RAG - (RIR) 



![](img/slide1.png)

![](img/slide2.png)

### Synopsis: 
We build an API to retrieval-augment vision-language models with visual context retrieved from the web.

Concretely, for a query image and query text (e.g. a question), we leverage reverse image search to find most similar images and their titles / captions.

The final product is a VLM-API that allows to automatically leverage reverse-image-search based retrieval augmentation.  


### Usage:  


```python
api = RIR_API(openai_api_key)

image_url = "https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcSgN8RDkURVE8mgOf-n02TqJdC2l1o5cVFA32NpZtuVp8MaFfZY"
query_text = "What is in this image?"
response = api.query_with_image(image_url, query_text)
# >> runs reverse image search
# >> formats image-text context prompt
# >> queries VLM with full query
```



