Metadata-Version: 1.0
Name: cubicweb-dataio
Version: 0.2.0
Summary: Cube for data input/output, import and export
Home-page: http://www.cubicweb.org/project/cubicweb-dataio
Author: LOGILAB S.A. (Paris, FRANCE)
Author-email: contact@logilab.fr
License: LGPL
Description: =======
        Summary
        =======
        
        Cube for data input/output, import and export
        
        
        Massive Store
        =============
        
        The Massive Store is a CW store used to push massive amount
        of data using pure SQL logic, thus avoiding CW checks.
        It is faster than other CW stores (it does not check eid at each step,
        it use COPY FROM method), but is less safe (no data integrity securities),
        and does not return an eid while using create_entity function.
        
        WARNING: This store may be only used with PostgreSQL for now, as it
        relies on the COPY FROM method, and on specific PostgreSQL tables
        to get all the indexes.
        
        It should be used as following:
        
            # Initialize the store
            store = MassiveObjectStore(session)
            # Initialize the Relation table
            store.init_rtype_table('Person', 'lives', 'Location')
        
            # Import logic
            ...
            store.create_entity('Person', ...)
            store.create_entity('Location', ...)
        
            # Flush the data in memory to sql database
            store.flush()
        
            # Import logic
            ...
            store.create_entity('Person', ...)
            store.create_entity('Location', ...)
            # Person_iid and location_iid are unique iid that are data dependant (e.g URI)
            store.relate_by_iid(person_iid, 'lives', location_iid)
            ...
        
            # Flush the data in memory to sql database
            store.flush()
        
            # Build the meta data
            store.flush_meta_data()
        
            # Convert the relation
            store.convert_relations('Person', 'lives', 'Location')
        
            # Clean the store / rebuild indexes
            store.cleanup()
        
        In this case,  iid_subj and iid_obj represent an unique id
        (e.g. uri, or id from the imported database) that can be used to create
        relations after importing entities.
        
        
        
        
        RDF Store
        =========
        
        The RDF Store is used to import RDF data into a CubicWeb data, based on a Yams <-> RDF schema conversion.
        The conversion rules are stored in a XY structure.
        
        
        Building an XY structure
        ------------------------
        
        You have to create a file (usually called `xy.py`) in your cube, and import the dataio version of xy::
        
        
            from cubes.dataio import xy
        
        
        You have to register the different prefixes (common prefixes as skos or foaf are already registered)::
        
            xy.register_prefix('diseasome', 'http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseasome/')
        
        
        By default, the entity type is based on the rdf property "rdf:type", but you may changed it using::
        
           xy.register_rdf_etype_property('skos:inScheme')
        
        
        It is also possible to give a specific callback to determine the entity type from the rdf properties::
        
           def _rameau_etype_callback(rdf_properties):
               if 'skos:inScheme' in rdf_properties and 'skos:prefLabel' in rdf_properties:
               	  return 'Rameau'
        
           xy.register_etype_callback(_rameau_etype_callback)
        
        
        
        The URI is fetched from the "rdf:about" property, and can be normalized using a specific callback::
        
            def normalize_uri(uri):
            	if uri.endswith('.rdf'):
                   return uri[:-4]
        	return uri
        
            xy.register_uri_conversion_callback(normalize_uri)
        
        
        
        Defining the conversion rules
        -----------------------------
        
        
        Then, you may write the conversion rules:
        
         - xy.add_equivalence allows you to add a basic equivalence between entity type / attribute / relations,
           and RDF properties. You may use "*" as a wild cart in the Yams part.
           E.g. for entity types::
        
        	xy.add_equivalence('Gene', 'diseasome:genes')
        	xy.add_equivalence('Disease', 'diseasome:diseases')
        
           E.g. for attributes::
        
           	xy.add_equivalence('* name', 'diseasome:name')
        	xy.add_equivalence('* label', 'rdfs:label')
        	xy.add_equivalence('* label', 'diseasome:label')
        	xy.add_equivalence('* class_degree', 'diseasome:classDegree')
        	xy.add_equivalence('* size', 'diseasome:size')
        
        
           E.g. for relations::
        
              xy.add_equivalence('Disease close_match ExternalUri', 'diseasome:classes')
              xy.add_equivalence('Disease subtype_of Disease', 'diseasome:diseaseSubtypeOf')
              xy.add_equivalence('Disease associated_genes Gene', 'diseasome:associatedGene')
              xy.add_equivalence('Disease chromosomal_location ExternalUri', 'diseasome:chromosomalLocation')
              xy.add_equivalence('* sameas ExternalUri', 'owl:sameAs')
              xy.add_equivalence('Gene gene_id ExternalUri', 'diseasome:geneId')
              xy.add_equivalence('Gene bio2rdf_symbol ExternalUri', 'diseasome:bio2rdfSymbol')
        
        
         - A base URI can be given to automatically determine if a Resource should be considered
           as an external URI or an internal relation::
        
              xy.register_base_uri('http://www4.wiwiss.fu-berlin.de/diseasome/resource/')
        
        
           A more complex logic can be used by giving a specific callback::
        
             def externaluri_callback(uri):
             	 if uri.startswith('http://www4.wiwiss.fu-berlin.de/diseasome/resource/'):
                    if uri.endswith('disease') or uri.endswith('gene'):
                       return False
        	    return True
        	 return True
        
             xy.register_externaluri_callback(externaluri_callback)
        
        
        
        The values of attributes are built based on the Yams type. But you could use a specific
        callback to compute the correct values from the rdf properties::
        
        	 def _convert_date(_object, datetime_format='%Y-%m-%d'):
        	     """ Convert an rdf value to a date """
        	     try:
        		return datetime.strptime(_object.format(), datetime_format)
        	     except:
        	        return None
        
                 xy.register_attribute_callback('Date', _convert_date)
        
        or::
        
        	def format_isbn(rdf_properties):
            	    if 'bnf-onto:isbn' in rdf_properties:
        	       isbn = rdf_properties['bnf-onto:isbn'][0]
        	       isbn = [i for i in isbn if i in '0123456789']
        	       return int(''.join(isbn)) if isbn else None
        
        	xy.register_attribute_callback('Manifestation formatted_isbn', format_isbn)
        
        
        
        Importing data
        --------------
        
        Data may thus be imported using the "import-rdf" command of cubicweb-ctl::
        
        
            cubicweb-ctl import-rdf <my-instance> <filer-or-folder>
        
        The default library used for reading the data is "rdflib" but one may use "librdf" using the "--lib" option.
        
        It is also possible to force the rdf-format (it is automatically determined, but this may sometimes lead to errors),
        using the "--rdf-format" option.
        
        
        
        Exporting data
        --------------
        
        The view 'rdf' may be called and will create a RDF file from the result set. It is a modified version of the
        CubicWeb RDFView, that take into account the more complex conversion rules from the dataio cube.
        The format can also be forced (default is XML) using the "--format" option in the url (xml, n3 or nt).
        
        
        
        Examples
        --------
        
        Examples of use of dataio rdf import could be found in the nytimes and diseasome cubes.
Platform: UNKNOWN
Classifier: Environment :: Web Environment
Classifier: Framework :: CubicWeb
Classifier: Programming Language :: Python
Classifier: Programming Language :: JavaScript
