Metadata-Version: 1.0
Name: rdfadict
Version: 0.2
Summary: Simple RDFa parser and dictionary-like interface.
Home-page: http://wiki.creativecommons.org/RdfaDict
Author: Nathan R. Yergler
Author-email: nathan@creativecommons.org
License: MIT
Description: ========
        rdfadict
        ========
        
        :Date: $LastChangedDate: 2006-11-21 11:23:54 -0500 (Tue, 21 Nov 2006) $
        :Version: $LastChangedRevision: 4737 $
        :Author: Nathan R. Yergler <nathan@creativecommons.org>
        :Organization: `Creative Commons <http://creativecommons.org>`_
        :Copyright:
        2006, Nathan R. Yergler, Creative Commons;
        licensed to the public under the `MIT license
        <http://opensource.org/licenses/mit-license.php>`_.
        
        Installation
        ************
        
        rdfadict and its dependencies may be installed using `easy_install
        <http://peak.telecommunity.com/DevCenter/EasyInstall>`_ (recommended) ::
        
        $ easy_install rdfadict
        
        or by using the standard distutils setup.py::
        
        $ python setup.py install
        
        If installing using setup.py, `lxml <http://codespeak.net/lxml>`_
        will also need to be installed.
        
        Usage
        *****
        
        **rdfadict** parses RDFa metadata encoded in HTML or XHTML documents.  It can
        parse a block of text (as a string), or a URL.  For example, given the
        following block of sample text::
        
        >>> rdfa_sample = """
        ... <h1 property="dc:title">Vacation in the South of France</h1>
        ... <h2>created
        ... by <span property="dc:creator">Mark Birbeck</span>
        ... on <span property="dc:date" type="xsd:date"
        ...          content="2006-01-02">
        ...   January 2nd, 2006
        ...    </span>
        ... </h2>"""
        
        We'll import ``pprint`` so our output is reasonably formatted:
        
        >>> from pprint import pprint
        
        Triples can be extracted using **rdfadict**::
        
        >>> import rdfadict
        >>> base_uri = "http://example.com/rdfadict/"
        >>> parser = rdfadict.RdfaParser()
        >>> triples = parser.parsestring(rdfa_sample, base_uri)
        
        We define the variable ``base_uri`` to let the parser know what URI assertions
        without subjects apply to.
        
        Based on our example text, we expect to get three triples back -- title,
        creator and date.  Triple are indexed as a dictionary, first by subject,
        then by predicate, finally retuning a ``list`` of objects.  For example,
        a list of all subjects is retrieved using::
        
        >>> triples.keys()
        ['http://example.com/rdfadict/']
        
        If assertions were made about resources other than the default, those URIs
        would appear in this list.  We can verify how many predicates were found
        for this subject by accessing the next level of the dictionary::
        
        >>> len(triples['http://example.com/rdfadict/'].keys())
        3
        
        Finally, we can retrieve the value for the title by fully dereferencing
        the dictionary::
        
        >>> triples['http://example.com/rdfadict/'][
        ...     'http://purl.org/dc/elements/1.1/title']
        ['Vacation in the South of France']
        
        Note that the objects are stored as a list by the default triple sink.
        
        Triple Sinks
        ============
        
        **rdfadict** uses a simple interface (the triple sink) to pass RDF triples
        extracted back to some storage mechanism.  A class which acts as a triple
        sink only needs to define a single method, ``triple``.  For example::
        
        class StdOutTripleSink(object):
        """A triple sink which prints out the triples as they are received."""
        
        def triple(self, subject, predicate, object):
        """Process the given triple."""
        
        print subject, predicate, object
        
        The default triple sink models the triples as a nested dictionary,
        as described above.  Also included with the package is a list triple sink,
        which stores the triples as a list of 3-tuples.  To use a different sink,
        pass an instance in as the ``sink`` parameter to either parse method.  For
        example::
        
        >>> parser = rdfadict.RdfaParser()
        >>> list_sink = rdfadict.SimpleTripleSink()
        >>> parser.parsestring(rdfa_sample, base_uri, sink=list_sink)
        [('http://example.com/rdfadict/', 'http://purl.org/dc/elements/1.1/title', 'Vacation in the South of France'), ('http://example.com/rdfadict/', 'http://purl.org/dc/elements/1.1/creator', 'Mark Birbeck'), ('http://example.com/rdfadict/', 'http://purl.org/dc/elements/1.1/date', '2006-01-02')]
        
        >>> len(list_sink)
        3
        
        Note that the parse method returns the sink used.  Since the sink we're using
        is really just a ``list``, the interpreter prints the contents upon return.
        
        Limitations and Known Issues
        ****************************
        
        **rdfadict** currently does not implement the following areas properly:
        
        * Namespaces are expanded, but the triples passed to the sink are fully
        qualifed.  The sink should receive some notification of namespaces
        (or be able to assert whether it wants to have shortened or fully
        qualified triples URIs passed in).
        * The ``class`` attribute (used to make ``rdf:type`` assertions) is not
        parsed.
        * The ``type`` attribute (used to assert the datatype) is not implemented.
        * Subject resolution for ``<meta>`` and ``<link>`` tags (which does not
        traverse up the entire DOM tree) is not correctly implemented.
        
        Change History
        **************
        
        0.2 (2006-11-21)
        ================
        
        * Directly subclass list and dict for our sample triple sinks
        * Additional package metadata for PyPI
        * Additional documentation of sink interface and tests for the SimpleTripleSink
        
        0.1 (2006-11-20)
        ================
        
        * Initial public release
        
        
        Download
        ********
        
Platform: UNKNOWN
