Metadata-Version: 2.1
Name: contentmap
Version: 0.3.0
Summary: 
Author: Philippe Oger
Author-email: phil.oger@gmail.com
Requires-Python: >=3.9,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: aiohttp (>=3.9.1,<4.0.0)
Requires-Dist: langchain (>=0.1.8,<0.2.0)
Requires-Dist: lxml (==4.9.4)
Requires-Dist: requests (>=2.31.0,<3.0.0)
Requires-Dist: sentence-transformers (>=2.3.1,<3.0.0)
Requires-Dist: sqlite-vss (>=0.1.2,<0.2.0)
Requires-Dist: tqdm (>=4.66.1,<5.0.0)
Requires-Dist: trafilatura (>=1.6.4,<2.0.0)
Description-Content-Type: text/markdown

# Content map

A way to share content from a specific domain using SQLite as an alternative to 
RSS feeds. The purpose of this library is to simply create a dataset for all the
content on your website, using the XML sitemap as a starting point.


## Installation

```bash

pip install contentmap

```

## Quickstart

To build your contentmap.db that will contain all your content using your XML 
sitemap as a starting point, you only need to write the following: 

```python
from contentmap.sitemap import SitemapToContentDatabase

database = SitemapToContentDatabase("https://yourblog.com/sitemap.xml")
database.load()

```

You can control how many urls can be crawled concurrently and also set some timeout.
