Metadata-Version: 2.1
Name: engineCrawler
Version: 1.1
Summary: Scraping images from the web.
Home-page: UNKNOWN
Author: Erich Garcia
Author-email: erich.info.work@gmail.com
License: UNKNOWN
Keywords: crawler,scraping,search engines,artificial intelligence,databases,images
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: Microsoft :: Windows :: Windows 7
Classifier: Operating System :: Microsoft :: Windows :: Windows 8
Classifier: Operating System :: Microsoft :: Windows :: Windows 8.1
Classifier: Operating System :: Microsoft :: Windows :: Windows 10
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Topic :: Database
Classifier: Topic :: Internet
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Image Processing
Classifier: Topic :: Scientific/Engineering :: Image Recognition
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Utilities
Description-Content-Type: text/markdown
Requires-Dist: selenium
Requires-Dist: requests
Requires-Dist: imagehash
Requires-Dist: PySocks (!=1.5.7,>=1.5.6)

# Image Scraper

Simple python module for scraping images from the web, created for AI development.

## Features

* scrape images from google.com and duckduckgo.com
* search duplicated and eliminate them.
* allow to create complex databases from the engine top search of supplied keyword.
* use tor network with firefox for scraping. (optional)

## Basic Usage

```
>> python imageCrawler.py -k cats dogs
>> Select by number the queries to ignore:
>> ( 0 ) cats
>> ( 1 ) cats with hats
>> 1
>> Start with cats download 4000 at engines\cats
>> 100%
>> Select by number the queries to ignore:
>> ( 0 ) dogs
>> ( 1 ) dogs with hats
>> 1
>> Start with cats download 4000 at engines\dogs
>> 100%
>> Searching duplicated...
>> END
```

### Results:

    \engines
        \cats
            \ keys.json
            \ +4000 images files
        \dogs
            \ keys.json
            \ +4000 images files

## Installing

### (1) Install.

* [Firefox](https://www.mozilla.org/en-US/firefox/new/)
* [TorBrowser](https://www.torproject.org/) (OPTIONAL).

### (2) Download and add to path.

#### geckodriver combability [check](https://stackoverflow.com/questions/45329528/which-firefox-browser-versions-supported-for-given-geckodriver-version) 

* [geckodriver](https://github.com/mozilla/geckodriver/releases)

### (3) Run this command.
```
pip install engineCrawler
```

