Metadata-Version: 2.1
Name: unique_queue
Version: 0.2
Summary: A UniqueQueue class - a FIFO queue, but which ignores attempts to re-add duplicate items, even after they're popped.
Project-URL: Homepage, https://gitopia.com/StudioBreeze/UniqueQueue
Project-URL: Issues, https://gitopia.com/StudioBreeze/UniqueQueue/issues
Project-URL: License, https://gitopia.com/StudioBreeze/UniqueQueue/tree/main/LICENSE
Author-email: StudioBreeze <StudioBreeze+unique_queue@proton.me>
License-File: LICENSE
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.5
Description-Content-Type: text/markdown

# UniqueQueue

A Python UniqueQueue class - a FIFO queue, but which ignores attempts to re-add duplicate items, even after they're popped.

## Example

When doing activities like web scraping, you must keep a queue of pending pages to visit/scrape, based on links found on already-scraped pages. You must also avoid re-visiting already-visited pages, or risk duplicates in the output dataset.

This is an example of one such simple scraper:

```python
from unique_queue import UniqueQueue
import requests

def extract_urls(html: str) -> list[str]:
    """Crude function to extract a list of URL links from a webpage."""
    urls: list[str] = []
    if '<a href="' in html:
        for x in html.split('<a href="')[1:]:
            url = x.split('"')[0]
            if 'choosealicense.com' in url and url.startswith('https://'):
                urls.append(url)
    return urls

# Seed URL to start traversal
seed_url = "https://choosealicense.com/"

# Initialize a UniqueQueue instance
q = UniqueQueue([seed_url])

# For the example, say this is what we're trying to solve:
# the number of time a string "example" appears
number_of_times_EXAMPLE_appears = 0

# Perform URL traversal
while not q.empty():
    # Get the next URL from the queue
    url = q.pop()

    # Load the page
    html = requests.get(url).text

    # Update the end result goal
    number_of_times_EXAMPLE_appears += html.lower().count('example')

    # Add the new URLs to search
    q.extend(extract_urls(html))
    
    # Print stats
    print(f"Completed: {q.completed_count()}. Remaining: {q.remaining_count()}.")

print(f"The string 'example' appears {number_of_times_EXAMPLE_appears} times on 'choosealicense.com'.")
```
