Metadata-Version: 1.1
Name: s3
Version: 3.0.0
Summary: Python module which connects to Amazon's S3 REST API
Home-page: https://bitbucket.org/prometheus/s3/
Author: Paul Wexler
Author-email: paul@prometheusresearch.com
License: MIT
Description: 
        =========
        s3
        =========
        
        .. contents::
        
        Overview
        ========
        
        s3 is a connector to S3, Amazon's Simple Storage System REST API.
        
        Use it to upload, download, delete, copy, test files for existence in S3, or 
        update their metadata.
        
        S3 files may have metadata in addition to their content.  
        Metadata is a set of key/value pairs.  
        Metadata may be set when the file is uploaded 
        or it can be updated subsequently.
        
        S3 files are stored in S3 buckets.  Buckets can be created, listed, 
        configured, and deleted.  The bucket configuration can be read and the 
        bucket contents can be listed.
        
        In addition to the s3 Python module, 
        this package contains a command line tool also named s3.  
        The tool imports the module and offers a command line
        interface to some of the module's capability.
        
        Installation
        ============
        
        From PyPi
        ::
        
            $ pip install s3 
        
        From source
        ::
        
            $ hg clone ssh://hg@bitbucket.org/prometheus/s3
            $ pip install -e s3 
        
        The installation is successful if you can import s3 
        and run the command line tool.  The following commands 
        must produce no errors:
        ::
        
            $ python -c 'import s3'
            $ s3 --help
        
        API to remote storage
        =====================
        
        S3 Buckets
        ----------
        
        Buckets store files.  Buckets may be created and deleted.  They may be
        listed, configured, and loaded with files.  The configuration can be read,
        and the files in the bucket can be listed.
        
        Bucket names must be unique across S3 so it is best to use a unique prefix on
        all bucket names.  S3 forbids underscores in bucket names, and although
        it allows periods, these confound DNS and should be avoided.
        For example, at Prometheus Research 
        we prefix all of our bucket names with: **com-prometheus-**
        
        All the bucket configuration options work the same way - the caller
        provides XML or JSON data and perhaps headers or params as well.
        
        s3 accepts a python object for the data argument instead of a string.
        The object will be converted to XML or JSON as required.
        
        Likewise, s3 returns a python dict instead of the XML or JSON string
        returned by S3.  However, that string is readily available if need be,
        because the response returned by requests.request() is exposed to the
        caller.
        
        S3 Filenames
        ------------
        
        An S3 file name consists of a bucket and a key.  This pair of
        strings uniquely identifies the file within S3.
        
        The S3Name class is instantiated with a key and a bucket; the key
        is required and the bucket defaults to None.
        
        The Storage class methods take a **remote_name** argument which
        can be either a string which is the key, or an instance of the
        S3Name class.  When no bucket is given (or the bucket is None) then
        the default_bucket established when the connection is instantiated
        is used.  If no bucket is given (or the bucket is None) and there
        is no default bucket then a ValueError is raised.
        
        In other words, the S3Name class provides a means of using a bucket
        other than the default_bucket.
        
        S3 Directories
        --------------
        
        Although S3 storage is flat: buckets contain keys, S3 lets you impose
        a directory tree structure on your bucket by using a delimiter in your
        keys.
        
        For example, if you name a key 'a/b/f', and use '/' as the delimiter,
        then S3 will consider that 'a' is a directory, 'b' is a sub-directory
        of 'a', and 'f' is a file in 'b'.
        
        
        Headers and Metadata
        --------------------
        
        Additional http headers may be sent using the methods which write
        data.  These methods accept an optional **headers** argument which
        is a python dict.  The headers control various aspects of how the
        file may be handled.  S3 supports a variety of headers.  These are
        not discussed here.  See Amazon's S3 documentation for more info
        on S3 headers.
        
        Those headers whose key begins with the special prefix:
        **x-amz-meta-** are considered to be metadata headers and are
        used to set the metadata attributes of the file.
        
        The methods which read files also return the metadata which
        consists of only those response headers which begin with
        **x-amz-meta-**.
        
        Python classes for S3 data
        --------------------------
        
        To facilitate the transfer of data between S3 and applications various
        classes were defined which correspond to data returned by S3.
        
        All attributes of these classes are strings.
        
        * S3Bucket
            * creation_date
            * name
        
        * S3Key
            * e_tag
            * key
            * last_modified
            * owner
            * size
            * storage_class
        
        * S3Owner
            * display_name
            * id
        
        XML strings and Python objects
        ------------------------------
        
        An XML string consists of a series of nested tags.  An XML tag can be
        represented in python as an entry in a dict.  An OrderedDict from the
        collections module should be used when the order of the keys is
        important.
        
        The opening tag (everything between the '<' and the '>') is the key and
        everything between the opening tag and the closing tag is the value of
        the key.
        
        Since every value must be enclosed in a tag, not every python object can
        represent XML in this way.  In particular, lists may only contain dicts
        which have a single key.
        
        For example this XML::
        
            <a xmlns="foo">
                <b1>
                    <c1> 1 </c1>
                </b1>
                <b2>
                    <c2> 2 </c2>
                </b2>
            </a>
        
        is equivalent to this object::
        
            {'a xmlns="foo"': [{'b1': {'c1': 1}}, {'b2': {'c2': 2}}] }
        
        Storage Methods
        ---------------
        
        The arguments **remote_source**, **remote_destination**, and
        **remote_name** may be either a string, or an S3Name instance.
        
        **local_name** is a string and is the name of the file on the
        local system.  This string is passed directly to open().
        
        **bucket** is a string and is the name of the bucket.
        
        **headers** is a python dict used to encode additional request headers.
        
        **params** is either a python dict used to encode the request
        parameters, or a string containing all the text of the url query string
        after the '?'.
        
        **data** is a string or an object and is the body of the message.  The
        object will be converted to an XML or JSON string as appropriate.
        
        All methods return on success or raise StorageError on failure.
        
        Upon return **storage.response** contains the raw response object which
        was returned by the requests module.  So for example,
        storage.response.headers contains the response headers returned by S3.
        See
        http://docs.python-requests.org/en/latest/api/ for a description of the
        response object.
        
        See http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketOps.html
        for a description of the available bucket operations and their arguments.
        
        **storage.bucket_create(bucket, headers={}, data=None)**
            Create a bucket named **bucket**.  **headers** may be used to set
            either ACL or explicit access permissions.  **data** may be used to
            override the default region.  If data is None, data is set as
            follows::
        
                data = {
                        'CreateBucketConfiguration'
                        ' xmlns="http://s3.amazonaws.com/doc/2006-03-01/"': {
                                'LocationConstraint': self.connection.region}}
        
        **storage.bucket_delete(bucket)**
            Delete a bucket named **bucket**.
        
        **storage.bucket_delete_cors(bucket)**
            Delete cors configuration of bucket named **bucket**.
        
        **storage.bucket_delete_lifecycle(bucket)**
            Delete lifecycle configuration of bucket named **bucket**.
        
        **storage.bucket_delete_policy(bucket)**
            Delete policy of bucket named **bucket**.
        
        **storage.bucket_delete_tagging(bucket)**
            Delete tagging configuration of bucket named **bucket**.
        
        **storage.bucket_delete_website(bucket)**
            Delete website configuration of bucket named **bucket**.
        
        **exists = storage.bucket_exists(bucket)**
            Test if **bucket** exists in storage.
        
            exists - boolean.
        
        **storage.bucket_get(self, bucket, params={})**
            Gets the next block of keys from the bucket based on params.
        
        **d = storage.bucket_get_acl(bucket)**
            Returns bucket acl configuration as a dict.
        
        **d = storage.bucket_get_cors(bucket)**
            Returns bucket cors configuration as a dict.
        
        **d = storage.bucket_get_lifecycle(bucket)**
            Returns bucket lifecycle as a dict.
        
        **d = storage.bucket_get_location(bucket)**
            Returns bucket location configuration as a dict.
        
        **d = storage.bucket_get_logging(bucket)**
            Returns bucket logging configuration as a dict.
        
        **d = storage.bucket_get_notification(bucket)**
            Returns bucket notification configuration as a dict.
        
        **d = storage.bucket_get_policy(bucket)**
            Returns bucket policy as a dict.
        
        **d = storage.bucket_get_request_payment(bucket)**
            Returns bucket requestPayment configuration as a dict.
        
        **d = storage.bucket_get_tagging(bucket)**
            Returns bucket tagging configuration as a dict.
        
        **d = storage.bucket_get_versioning(bucket)**
            Returns bucket versioning configuration as a dict.
        
        **d = storage.bucket_get_versions(bucket, params={})**
            Returns bucket versions as a dict.
        
        **d = storage.bucket_get_website(bucket)**
            Returns bucket website configuration as a dict.
        
        **for bucket in storage.bucket_list():**
            Returns a Generator object which returns all the buckets for the
            authenticated user's account.  
        
            Each bucket is returned as an S3Bucket instance.
        
        **for key in storage.bucket_list_keys(bucket, delimiter=None, prefix=None, params={}):**
            Returns a Generator object which returns all the keys in the bucket.
            
            Each key is returned as an S3Key instance.
        
            * bucket - the name of the bucket to list
            * delimiter - used to request common prefixes
            * prefix - used to filter the listing
            * params - additional parameters.
        
            When delimiter is used, the keys (i.e. file names) are returned
            first, followed by the common prefixes (i.e. directory names).
            Each key is returned as an S3Key instance.  Each common prefix
            is returned as a string.
        
            As a convenience, the delimiter and prefix may be
            provided as either keyword arguments or as keys in params.  If the
            arguments are provided, they are used to update params.  In any case,
            params are passed to S3.
        
            See http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html
            for a description of delimiter, prefix, and the other parameters.
        
        **bucket_set_acl(bucket, headers={}, data='')**
            Configure bucket acl using xml data, or request headers.
        
        **bucket_set_cors(bucket, data='')**
            Configure bucket cors with xml data.
        
        **bucket_set_lifecycle(bucket, data='')**
            Configure bucket lifecycle with xml data.
        
        **bucket_set_logging(bucket, data='')**
            Configure bucket logging with xml data.
        
        **bucket_set_notification(bucket, data='')**
            Configure bucket notification with xml data.
        
        **bucket_set_policy(bucket, data='')**
            Configure bucket policy using json data.
        
        **bucket_set_request_payment(bucket, data='')**
            Configure bucket requestPayment with xml data.
        
        **bucket_set_tagging(bucket, data='')**
            Configure bucket tagging with xml data.
        
        **bucket_set_versioning(bucket, headers={}, data='')**
            Configure bucket versioning using xml data and request headers.
        
        **bucket_set_website(bucket, data='')**
            Configure bucket website with xml data.
        
        **storage.copy(remote_source, remote_destination, headers={})**
            Copy **remote_source** to **remote_destination**.
        
            The destination metadata is copied from **headers** when it
            contains metadata; otherwise it is copied from the source
            metadata.
        
        **storage.delete(remote_name)**
            Delete **remote_name** from storage.
        
        **exists, metadata = storage.exists(remote_name)**
            Test if **remote_name** exists in storage, retrieve its
            metadata if it does.
        
            exists - boolean, metadata - dict.
        
        **metadata = storage.read(remote_name, local_name)**
            Download **remote_name** from storage, save it locally as
            **local_name** and retrieve its metadata.
        
            metadata - dict.
        
        **storage.update_metadata(remote_name, headers)**
            Update (replace) the metadata associated with **remote_name**
            with the metadata headers in **headers**.
        
        **storage.write(local_name, remote_name, headers={})**
            Upload **local_name** to storage as **remote_name**, and set
            its metadata if any metadata headers are in **headers**.
        
        StorageError
        ------------
        
        There are two forms of exceptions.  
        
        The first form is when a request to S3 completes but fails.  For example a 
        read request may fail because the user does not have read permission.  
        In this case a StorageError is raised with:
        
        * msg - The name of the method that was called (e.g. 'read', 'exists', etc.)
          
        * exception - A detailed error message
        
        * response - The raw response object returned by requests.
        
        The second form is when any other exception happens.  For example a disk or 
        network error.  In this case StorageError is raised with:
        
        * msg - A detailed error message.
        
        * exception - The exception object
        
        * response - None
        
        Usage
        =====
        
        Configuration
        -------------
        
        First configure your yaml file.
        
        - **access_key_id** and **secret_access_key** are generated by the S3 
          account manager.  They are effectively the username and password for the 
          account.
        
        - **default_bucket** is the name of the default bucket to use when referencing
          S3 files.  bucket names must be unique (on earth) so by convention we use a
          prefix on all our bucket names: com-prometheus-  (NOTE: amazon forbids
          underscores in bucket names, and although they allow periods, periods will 
          confound DNS - so it is best not to use periods in bucket names.
          
        - **endpoint** and **region** are the Amazon server url to connect to and
          its associated region.  See 
          http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region for a list
          of the available endpoints and their associated regions.
        
        - **tls** True => use https://, False => use http://.  Default is True.
        
        - **retry** contains values used to retry requests.request().
          If a request fails with an error listed in `status_codes`,
          and the `limit` of tries has not been reached, 
          then a retry message is logged,
          the program sleeps for `interval` seconds, 
          and the request is sent again.   
          Default is::
        
            retry:
                limit: 5
                interval: 2.5
                status_codes: 
                  - 104
        
          **limit** is the number of times to try to send the request.
          0 means unlimited retries.
        
          **interval** is the number of seconds to wait between retries.
        
          **status_codes** is a list of request status codes (errors) to retry.
          
        Here is an example s3.yaml
        ::
        
            ---
            s3: 
                access_key_id: "XXXXX"
                secret_access_key: "YYYYYYY"
                default_bucket: "ZZZZZZZ"
                endpoint: "s3-us-west-2.amazonaws.com"
                region: "us-west-2"
        
        Next configure your S3 bucket permissions.  You can use s3 to create, 
        configure, and manage your buckets (see the examples below) or you can 
        use Amazon's web interface:
        
        - Log onto your Amazon account.
        - Create a bucket or click on an existing bucket.
        - Click on Properties.
        - Click on Permissions.
        - Click on Edit Bucket Policy.
        
        Here is a example policy with the required permissions:
        ::
        
            {
        	    "Version": "2008-10-17",
        	    "Id": "Policyxxxxxxxxxxxxx",
        	    "Statement": [
        		    {
        			    "Sid": "Stmtxxxxxxxxxxxxx",
        			    "Effect": "Allow",
        			    "Principal": {
        				    "AWS": "arn:aws:iam::xxxxxxxxxxxx:user/XXXXXXX"
        			    },
        			    "Action": [
        				    "s3:AbortMultipartUpload",
        				    "s3:GetObjectAcl",
        				    "s3:GetObjectVersion",
        				    "s3:DeleteObject",
        				    "s3:DeleteObjectVersion",
        				    "s3:GetObject",
        				    "s3:PutObjectAcl",
        				    "s3:PutObjectVersionAcl",
        				    "s3:ListMultipartUploadParts",
        				    "s3:PutObject",
        				    "s3:GetObjectVersionAcl"
        			    ],
        			    "Resource": [
        				    "arn:aws:s3:::com.prometheus.cgtest-1/*",
        				    "arn:aws:s3:::com.prometheus.cgtest-1"
        			    ]
        		    }
        	    ]
            }
        
        Examples
        --------
        
        Once the yaml file is configured you can instantiate a S3Connection and 
        you use that connection to instantiate a Storage instance.
        ::
        
            import s3
            import yaml
            
            with open('s3.yaml', 'r') as fi:
                config = yaml.load(fi)
        
            connection = s3.S3Connection(**config['s3'])    
            storage = s3.Storage(connection)
        
        Then you call methods on the Storage instance.  
        
        The following code creates a bucket called "com-prometheus-my-bucket" and  
        asserts the bucket exists.  Then it deletes the bucket, and asserts the 
        bucket does not exist.
        ::
        
            my_bucket_name = 'com-prometheus-my-bucket'
            storage.bucket_create(my_bucket_name)
            assert storage.bucket_exists(my_bucket_name)
            storage.bucket_delete(my_bucket_name)
            assert not storage.bucket_exists(my_bucket_name)
        
        The following code lists all the buckets and all the keys in each bucket.
        ::
        
            for bucket in storage.bucket_list():
                print bucket.name, bucket.creation_date
                for key in storage.bucket_list_keys(bucket.name):
                    print '\t', key.key, key.size, key.last_modified, key.owner.display_name
                    
        The following code uses the default bucket and uploads a file named "example" 
        from the local filesystem as "example-in-s3" in s3.  It then checks that 
        "example-in-s3" exists in storage, downloads the file as "example-from-s3", 
        compares the original with the downloaded copy to ensure they are the same, 
        deletes "example-in-s3", and finally checks that it is no longer in storage.
        ::
        
            import subprocess
            try:
                storage.write("example", "example-in-s3")
                exists, metadata = storage.exists("example-in-s3")
                assert exists
                metadata = storage.read("example-in-s3", "example-from-s3")
                assert 0 == subprocess.call(['diff', "example", "example-from-s3"])
                storage.delete("example-in-s3")
                exists, metadata = storage.exists("example-in-s3")
                assert not exists
            except StorageError, e:
                print 'failed:', e
                
        The following code again uploads "example" as "example-in-s3".  This time it 
        uses the bucket "my-other-bucket" explicitly, and it sets some metadata and 
        checks that the metadata is set correctly.  Then it changes the metadata 
        and checks that as well.
        ::
        
            headers = {
                'x-amz-meta-state': 'unprocessed',
                }
            remote_name = s3.S3Name("example-in-s3", bucket="my-other-bucket")
            try:
                storage.write("example", remote_name, headers=headers)
                exists, metadata = storage.exists(remote_name)
                assert exists
                assert metadata == headers
                headers['x-amz-meta-state'] = 'processed'
                storage.update_metadata(remote_name, headers)
                metadata = storage.read(remote_name, "example-from-s3")
                assert metadata == headers
            except StorageError, e:
                print 'failed:', e
        
        The following code configures "com-prometheus-my-bucket" with a policy 
        that restricts "myuser" to write-only.  myuser can write files but 
        cannot read them back, delete them, or even list them.
        ::
        
            storage.bucket_set_policy("com-prometheus-my-bucket", data={
                    "Version": "2008-10-17",
                    "Id": "BucketUploadNoDelete",
                    "Statement": [
                            {
                            "Sid": "Stmt01",
                            "Effect": "Allow",
                            "Principal": {
                                    "AWS": "arn:aws:iam::123456789012:user/myuser"
                                    },
                            "Action": [
                                    "s3:AbortMultipartUpload",
                                    "s3:ListMultipartUploadParts",
                                    "s3:PutObject",
                                    ],
                            "Resource": [
                                    "arn:aws:s3:::com-prometheus-my-bucket/*",
                                    "arn:aws:s3:::com-prometheus-my-bucket"
                                    ]
                            }
                            ]
                    })
        
        
        s3 Command Line Tool
        ====================
        
        This package installs both the s3 Python module 
        and the s3 command line tool.
        
        The command line tool provides a convenient way to upload and download 
        files to and from S3 without writing python code.
        
        As of now the tool supports the put, get, delete, and list commands; 
        but it does not support all the features of the module API.
        
        s3 expects to find ``s3.yaml`` in the current directory.
        If it is not there you must tell s3 where it is using the --config option.
        For example::
        
            $ s3 --config /path/to/s3.yaml command [command arguments]
        
        You must provide a command.  Some commands have required arguments 
        and/or optional arguments - it depends upon the command.
        
        Use the --help option to see 
        a list of supported commands and their arguments::
        
            $ s3 --help
            usage: s3 [-h] [-c CONFIG] [-v] [-b BUCKET]
                      {get,put,delete,list,create-bucket,delete-bucket,list-buckets} ...
        
            Commands operate on the default bucket unless the --bucket option is used.
        
            Create a bucket
              create-bucket [bucket_name]
              The default bucket_name is the default bucket.
               
            Delete a file from S3
              delete delete_file
        
            Delete a bucket
              delete-bucket [bucket_name]
              The default bucket_name is the default bucket.
        
            Get a file from S3
              get remote_src [local_dst]
        
            List all files or list a single file and its metadata.
              list [list_file]
        
            List all buckets or list a single bucket.  
              list-buckets [bucket_name]
              If bucket_name is given but does not exist, this is printed::
               
                  '%s NOT FOUND' % bucket_name
        
            Put a file to S3
              put local_src [remote_dst]
        
            arguments:
              bucket_name
                The name of the bucket to use.  
              delete_file
                The remote file to delete.
              list_file
                If present, the file to list (with its metadata),
                otherwise list all files.
              local_dst
                The name of the local file to create (or overwrite).
                The default is the basename of the remote_src.
              local_src
                The name of the local file to put.
              remote_dst
                The name of the s3 file to create (or overwrite).
                The default is the basename of the local_src.
              remote_src
                The name of the file in S3 to get.
        
            positional arguments:
              {get,put,delete,list,create-bucket,delete-bucket,list-buckets}
        
            optional arguments:
              -h, --help            show this help message and exit
              -c CONFIG, --config CONFIG
                                    CONFIG is the configuration file to use.
                                    Default is s3.yaml
              -v, --verbose         Show results of commands.
              -b BUCKET, --bucket BUCKET
                                    Use BUCKET instead of the default bucket.
        
        See `s3 Command Line Tool`_  in the API Reference. 
        
        .. _`s3 Command Line Tool`: reference.html#module-bin_s3
        
        
Keywords: amazon,aws,s3,upload,download
Platform: Any
Classifier: Programming Language :: Python
Classifier: Intended Audience :: Developers
