Metadata-Version: 2.1
Name: scrapy-selenium-middleware
Version: 0.0.3
Summary: Scrapy middleware for downloading a page html source using selenium,
                and interacting with the web driver in the request context
                eventually returning an HtmlResponse to the spider                
Home-page: https://github.com/Tal-Leibman/scrapy-selenium-middleware
Author: Tal Leibman
Author-email: leibman2@gmail.com
License: UNKNOWN
Keywords: scrapy,selenium,middleware,proxy,web scraping,render javascript,selenium-wire,headless browser
Platform: UNKNOWN
Description-Content-Type: text/markdown
Requires-Dist: scrapy (==2.4.0)
Requires-Dist: selenium-wire (==2.1.1)
Requires-Dist: selenium (==3.141.0)

# scrapy-selenium-middleware

## requirements
* This downloader middleware should be used inside an existing [Scrapy](https://scrapy.org/) project
* Install  Firefox and [gekodriver](https://github.com/mozilla/geckodriver/releases) on the machine running this middleware

## pip
* `pip install scrapy-selenium-middleware`

## usage example
The middleware receives its settings from [scrapy project settings](https://docs.scrapy.org/en/latest/topics/settings.html) <br>

in your scrapy project settings.py file add the following settings
```python
DOWNLOADER_MIDDLEWARES = {"scrapy_selenium_middleware.SeleniumDownloader":451}
CONCURRENT_REQUESTS = 1 # multiple concurrent browsers are not supported yet
SELENIUM_IS_HEADLESS = False
SELENIUM_PROXY = "http://user:password@my-proxy-server:port" # set to None to not use a proxy
SELENIUM_USER_AGENT = "User-Agent: Mozilla/5.0 (<system-information>) <platform> (<platform-details>) <extensions>"           
SELENIUM_REQUEST_RECORD_SCOPE = ["api*"] # a list of regular expression to record the incoming requests by matching the url
SELENIUM_FIREFOX_PROFILE_SETTINGS = {}
```










