Metadata-Version: 1.1
Name: cdata
Version: 0.1.6
Summary: see data, handy snippets for conversion, and ETL.
Home-page: http://github.com/cnschema/cdata
Author: Li Ding
Author-email: lidingpku@gmail.com
License: Apache 2.0
Description: cdata
        -------------
        
        "see data", see data, handy snippets for conversion, cleaning and integration.
        
        install
        -------------
          pip install cdata
        
        
        json data manipulation
        -------------
        
        * json (and json stream) file IO, e.g.  items2file(...)
        * json data access, e.g. json_get(...), any2utf8, json_dict_copy
        * json array statistics, e.g. stat(...)
        
        .. code-block:: python
        
          from cdata.core import any2utf8
          the_input = {"hello": u"世界"}
          the_output = any2utf8(the_input)
          logging.info((the_input, the_output))
        
        
        .. code-block:: python
          property_list = [
              { "name":"name", "alternateName": ["name","title"]},
              { "name":"birthDate", "alternateName": ["dob","dateOfBirth"] },
              { "name":"description" }
          ]
          json_object = {"dob":"2010-01-01","title":"John","interests":"data","description":"a person"}
          ret = json_dict_copy(json_object, property_list)
        
        
        table data manipulation
        -------------
        
        * json array to/from excel
        
        .. code-block:: python
        
          import json
          from cdata.table import excel2json,json2excel
          filename = "test.xls"
          items = [{"first":"hello", "last":"world" }]
          json2excel(items, ["first","last"], filename)
          ret = excel2json(filename)
          print json.dumps(ret)
        
        
        
        JSON data from reading a single sheet excel file
        
        .. code-block:: json
        
          {
            "fields": {
                "00": [
                    "name",
                    "年龄",
                    "notes"
                ]
            },
            "data": {
                "00": [
                    {
                        "notes": "",
                        "年龄": 18.0,
                        "name": "张三"
                    },
                    {
                        "notes": "this is li si",
                        "年龄": 18.0,
                        "name": "李四"
                    }
                ]
            }
          }
        
        web stuff
        -------------
        
        * url domain extraction
        
        entity manipulation
        -------------
        
        * entity.SimpleEntity.ner()
        
        .. code-block:: python
        
          from cdata.entity import SimpleEntity
          entity_list = [{"@id":"1","name":u"张三"},{"@id":"2","name":u"李四"}]
          ner = SimpleEntity(entity_list)
          sentence = "张三给了李四一个苹果"
          ret = ner.ner(sentence)
          logging.info(json.dumps(ret, ensure_ascii=False, indent=4))
          """
          [{
              "text": "张三",
              "entities": [
                  {
                      "@id": "1",
                      "name": "张三"
                  }
              ],
              "index": 0
          },
          {
              "text": "李四",
              "entities": [
                  {
                      "@id": "2",
                      "name": "李四"
                  }
              ],
              "index": 4
          }]
          """
        
        * region.RegionEntity.guess_all()
        
        .. code-block:: python
        
          from cdata.region import RegionEntity
          addresses = ["北京海淀区阜成路52号（定慧寺）", "北京大学肿瘤医院"]
        
          city_data = RegionEntity()
          result = city_data.guess_all(addresses)
          logging.info(json.dumps(result, ensure_ascii=False))
          """
             {"province": "北京市",
             "city": "市辖区",
             "name": "海淀区",
             "district": "海淀区",
             "cityid": "110108",
             "type": "district"}
          """
        
        wikification
        -------------
        
        * 通过wikidata搜索，定位对应实体，查找实体中文名，别名等属性。wikidata_search (item/property) and wikidata_get
        
        .. code-block:: python
        
          query = u"居里夫人"
          ret = wikidata_search(query, lang="zh")
          logging.info(ret)
        
          nodeid = ret["itemList"][0]["identifier"]
          ret = wikidata_get(nodeid)
          lable_zh = ret["entities"][nodeid]["labels"]["zh"]["value"]
          logging.info(lable_zh)
        
        
        misc
        -------------
        
        * support simple cli function using argparse
        
        
        notes
        -------------
        release package using https://github.com/pypa/twine
        
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 2.7
Classifier: Topic :: Text Processing
