Metadata-Version: 1.1
Name: warcprox
Version: 2.1b1.dev71
Summary: WARC writing MITM HTTP/S proxy
Home-page: https://github.com/internetarchive/warcprox
Author: Noah Levitt
Author-email: nlevitt@archive.org
License: GPL
Description: warcprox - WARC writing MITM HTTP/S proxy
        -----------------------------------------
        .. image:: https://travis-ci.org/internetarchive/warcprox.svg?branch=master
            :target: https://travis-ci.org/internetarchive/warcprox
        
        Based on the excellent and simple pymiproxy by Nadeem Douba.
        https://github.com/allfro/pymiproxy
        
        Install
        ~~~~~~~
        
        Warcprox runs on python 2.7 or 3.4+.
        
        To install latest release run:
        
        ::
        
            # apt-get install libffi-dev libssl-dev python3-gdbm
            pip install warcprox
        
        You can also install the latest bleeding edge code:
        
        ::
        
            pip install git+https://github.com/internetarchive/warcprox.git
        
        
        Trusting the CA cert
        ~~~~~~~~~~~~~~~~~~~~
        
        For best results while browsing through warcprox, you need to add the CA
        cert as a trusted cert in your browser. If you don't do that, you will
        get the warning when you visit each new site. But worse, any embedded
        https content on a different server will simply fail to load, because
        the browser will reject the certificate without telling you.
        
        Usage
        ~~~~~
        
        ::
        
            usage: warcprox [-h] [-p PORT] [-b ADDRESS] [-c CACERT]
                            [--certs-dir CERTS_DIR] [-d DIRECTORY] [-z] [-n PREFIX]
                            [-s SIZE] [--rollover-idle-time ROLLOVER_IDLE_TIME]
                            [-g DIGEST_ALGORITHM] [--base32]
                            [--method-filter HTTP_METHOD]
                            [--stats-db-file STATS_DB_FILE] [-P PLAYBACK_PORT]
                            [--playback-index-db-file PLAYBACK_INDEX_DB_FILE]
                            [-j DEDUP_DB_FILE | --rethinkdb-servers RETHINKDB_SERVERS]
                            [--rethinkdb-db RETHINKDB_DB] [--rethinkdb-big-table]
                            [--kafka-broker-list KAFKA_BROKER_LIST]
                            [--kafka-capture-feed-topic KAFKA_CAPTURE_FEED_TOPIC]
                            [--onion-tor-socks-proxy ONION_TOR_SOCKS_PROXY]
                            [--version] [-v] [--trace] [-q]
        
            warcprox - WARC writing MITM HTTP/S proxy
        
            optional arguments:
              -h, --help            show this help message and exit
              -p PORT, --port PORT  port to listen on (default: 8000)
              -b ADDRESS, --address ADDRESS
                                    address to listen on (default: localhost)
              -c CACERT, --cacert CACERT
                                    CA certificate file; if file does not exist, it
                                    will be created (default: ./MacBook-Pro.local-
                                    warcprox-ca.pem)
              --certs-dir CERTS_DIR
                                    where to store and load generated certificates
                                    (default: ./MacBook-Pro.local-warcprox-ca)
              -d DIRECTORY, --dir DIRECTORY
                                    where to write warcs (default: ./warcs)
              -z, --gzip            write gzip-compressed warc records (default:
                                    False)
              -n PREFIX, --prefix PREFIX
                                    WARC filename prefix (default: WARCPROX)
              -s SIZE, --size SIZE  WARC file rollover size threshold in bytes
                                    (default: 1000000000)
              --rollover-idle-time ROLLOVER_IDLE_TIME
                                    WARC file rollover idle time threshold in seconds
                                    (so that Friday's last open WARC doesn't sit there
                                    all weekend waiting for more data) (default: None)
              -g DIGEST_ALGORITHM, --digest-algorithm DIGEST_ALGORITHM
                                    digest algorithm, one of sha1, md5, sha512,
                                    sha224, sha384, sha256 (default: sha1)
              --base32              write digests in Base32 instead of hex (default:
                                    False)
              --method-filter HTTP_METHOD
                                    only record requests with the given http method(s)
                                    (can be used more than once) (default: None)
              --stats-db-file STATS_DB_FILE
                                    persistent statistics database file; empty string
                                    or /dev/null disables statistics tracking
                                    (default: ./warcprox-stats.db)
              -P PLAYBACK_PORT, --playback-port PLAYBACK_PORT
                                    port to listen on for instant playback (default:
                                    None)
              --playback-index-db-file PLAYBACK_INDEX_DB_FILE
                                    playback index database file (only used if
                                    --playback-port is specified) (default:
                                    ./warcprox-playback-index.db)
              -j DEDUP_DB_FILE, --dedup-db-file DEDUP_DB_FILE
                                    persistent deduplication database file; empty
                                    string or /dev/null disables deduplication
                                    (default: ./warcprox-dedup.db)
              --rethinkdb-servers RETHINKDB_SERVERS
                                    rethinkdb servers, used for dedup and stats if
                                    specified; e.g.
                                    db0.foo.org,db0.foo.org:38015,db1.foo.org
                                    (default: None)
              --rethinkdb-db RETHINKDB_DB
                                    rethinkdb database name (ignored unless
                                    --rethinkdb-servers is specified) (default:
                                    warcprox)
              --rethinkdb-big-table
                                    use a big rethinkdb table called "captures",
                                    instead of a small table called "dedup"; table is
                                    suitable for use as index for playback (ignored
                                    unless --rethinkdb-servers is specified) (default:
                                    False)
              --kafka-broker-list KAFKA_BROKER_LIST
                                    kafka broker list for capture feed (default: None)
              --kafka-capture-feed-topic KAFKA_CAPTURE_FEED_TOPIC
                                    kafka capture feed topic (default: None)
              --onion-tor-socks-proxy ONION_TOR_SOCKS_PROXY
                                    host:port of tor socks proxy, used only to connect
                                    to .onion sites (default: None)
              --version             show program's version number and exit
              -v, --verbose
              --trace
              -q, --quiet
        
        License
        ~~~~~~~
        
        Warcprox is a derivative work of pymiproxy, which is GPL. Thus warcprox is also
        GPL.
        
        * Copyright (C) 2012 Cygnos Corporation
        * Copyright (C) 2013-2017 Internet Archive
        
        This program is free software; you can redistribute it and/or
        modify it under the terms of the GNU General Public License
        as published by the Free Software Foundation; either version 2
        of the License, or (at your option) any later version.
        
        This program is distributed in the hope that it will be useful,
        but WITHOUT ANY WARRANTY; without even the implied warranty of
        MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
        GNU General Public License for more details.
        
        You should have received a copy of the GNU General Public License
        along with this program; if not, write to the Free Software
        Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
        
        
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: License :: OSI Approved :: GNU General Public License (GPL)
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Topic :: Internet :: Proxy Servers
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Archiving
