Metadata-Version: 2.1
Name: jwm.robotstxt
Version: 1.0.8
Summary: Provides python access to Googles parser for robot.txt files as used by their GoogleBot webscraper.
Author-email: Joel Morley <jwmorley73@gmail.com>
License: MIT License
        
        Copyright (c) 2024 Joel William Morley
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/jwmorley73/jwm.robotstxt
Project-URL: Documentation, https://jwmorley73.github.io/jwm.robotstxt/
Keywords: robotstxt,wrapper,google,parser
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: doc
Requires-Dist: sphinx; extra == "doc"
Requires-Dist: furo; extra == "doc"
Provides-Extra: test
Requires-Dist: coverage; extra == "test"
Requires-Dist: pytest; extra == "test"
Provides-Extra: format
Requires-Dist: isort; extra == "format"
Requires-Dist: black; extra == "format"
Requires-Dist: clang-format; extra == "format"
Provides-Extra: dev
Requires-Dist: jwm.robotstxt[doc,format,test]; extra == "dev"
Requires-Dist: cibuildwheel; extra == "dev"
Requires-Dist: setuptools; extra == "dev"
Requires-Dist: cmake; extra == "dev"
Requires-Dist: pybind11; extra == "dev"

[![build status](https://github.com/jwmorley73/jwm.robotstxt/actions/workflows/release.yaml/badge.svg)](https://github.com/jwmorley73/jwm.robotstxt/actions/workflows/release.yaml)

# jwm.robotstxt

## Python Wrapper for Googles Robotstxt Parser

Provides python access to Googles parser for `robot.txt` files as used by their `GoogleBot` webscraper. 

Websites may provide an optional `robots.txt` file in their domains root to govern the access and behavior of web scrapers. One of the most famous webscrapers `GoogleBot` is responsible for promoting this standard and sites interested in SEO will closely conform to `GoogleBot` behavior.

All credit for the parser goes to the Google team who created,  open sourced and promoted it.

> SEO (Search Engine Optimization): The process of modifying a websites content or metadata to boost rankings in search engines page indexes. Higher rankings lead to higher positions in user searches leading to more visitors. For further details, see the [SEO wikipedia page](https://en.wikipedia.org/wiki/Search_engine_optimization).

## Usage

Basic usage using the `RobotsMatcher` class provided by Google.
```python
import jwm.robotstxt.googlebot

robotstxt = """
user-agent: GoodBot
allowed: /path
"""

matcher = jwm.robotstxt.googlebot.RobotsMatcher()
assert matcher.AllowedByRobots(robotstxt, ("GoodBot",), "/path")
```

Check out the [documentation](https://jwmorley73.github.io/jwm.robotstxt/) for further details. For more use cases, see the test cases for [jwm.robotstxt](https://github.com/jwmorley73/jwm.robotstxt/blob/7a4bb603e6abedf39805f76c5999cedcf1f0ed07/tests/jwm/robotstxt/test_googlebot.py) and [robotstxt](https://github.com/google/robotstxt/blob/a732377373e8bbee9f720b52020e2a8d5dd19cf8/robots_test.cc).

## Installation

Install from Pypi under the `jwm.robotstxt` distribution.

```shell
pip install jwm.robotstxt
```

Import into your program through the `jwm.robotstxt.googlebot` package.

```python
import jwm.robotstxt.googlebot
```

### Virtual Environment

It is highly recommended to install python projects into a virtual environment, see [PEP405](https://peps.python.org/pep-0405/) for motivations.

Create a virtual environment in the `.venv` directory.

```shell
python3 -m venv ./.venv
```

Activate with the correct command for your system.
```shell
# Linux/MacOS
. ./.venv/bin/activate
```

```shell
# Windows
.\.venv\Scripts\activate
```

### Installing from source

Make sure you have cloned the repository **and** its submodules.

```shell
git clone --recurse-submodules https://github.com/jwmorley73/jwm.robotstxt.git
```

Install the project using pip. This will build the required `robotstxt` static library files and link them into the produced python package. 

```shell
pip install .
```

If you want to include the developer tooling, add the `dev` optional dependencies.

```shell
pip install .[dev]
```

## Known Issues

 - Windows 32 bit is not supported.
