Metadata-Version: 2.1
Name: webparser-py
Version: 0.3
Summary: A web parser wrapper on top of lxml and selectolax
Home-page: https://contentstudio.io
Author: Muhammad Azhar
Author-email: azhar@contentstudio.io
Maintainer: Muhammad Azhar
Maintainer-email: azhar@contentstudio.io
License: MIT License
Platform: UNKNOWN
Provides-Extra: full
Requires-Dist: lxml
Requires-Dist: feedparser
Requires-Dist: requests
Provides-Extra: full
Requires-Dist: urllib3; extra == 'full'
Requires-Dist: certifi; extra == 'full'

A web content parser using Python lxml


Compatibility
-------------

The library is compatible with Python3. Python2 is currently not supported.


Usage
-----

Install the package using pip.

```
pip install webparser-py
```

**Convert to Document**

Accept the html content document, convert it to the doc element, if we want to convert relative links to absolute links, 
we pass the domain url to the absolute links.

**convert_to_doc()**

```
from webparser.parser import convert_to_doc

doc = convert_to_doc('HTML content', 'http://yourwebsite.com')

```

**class FeedParser()**

Feed parser class is used for parsing the feed through the response content or using a URL.


```
from webparser.parser import FeedParser

feed = FeedParser() # optional feed URL can be provided.
parsed_links = feed.parse(url='http://viralnova.com/feed') # url will override constructor feed URL. 
```

**has_rss_feed()**

Check if the website/URL has a RSS feed link present.

 - Check the document with Mimetype of links using XPATH.
 - Fuzzy URL search e.g /feed at the end of the website URL. (Attempted if no links for the RSS URL found)

```
from webparser.parser import has_rss_feed
rss_links = has_rss_feed(doc=html_content, url=website_url)
```



