Metadata-Version: 2.0
Name: MetaCSV
Version: 0.0.8
Summary: Tools for documentation-aware data reading, writing, and analysis
Home-page: https://github.com/delgadom/metacsv
Author: Michael Delgado
Author-email: delgado.michaelt@gmail.com
License: MIT
Keywords: MetaCSV
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Requires-Dist: netCDF4
Requires-Dist: numpy (>1.10)
Requires-Dist: pandas (>=0.17)
Requires-Dist: pyyaml
Requires-Dist: xarray (>=0.7)

=======
MetaCSV
=======


.. image:: https://travis-ci.org/delgadom/metacsv.svg?branch=master
    :target: https://travis-ci.org/delgadom/metacsv

.. image:: https://badge.fury.io/py/metacsv.svg
    :target: https://badge.fury.io/py/metacsv

.. image:: https://coveralls.io/repos/github/delgadom/metacsv/badge.svg?branch=master 
    :target: https://coveralls.io/github/delgadom/metacsv?branch=master


``metacsv`` - Tools for documentation-aware data reading, writing, and analysis

See the full documentation at ReadTheDocs_ 

.. _ReadTheDocs: http://metacsv.rtfd.org

Overview
=========

Read in CSV data with a yaml-compliant header directly into 
a ``pandas`` ``Series``, ``DataFrame``, or ``Panel`` or an ``xarray`` 
``DataArray`` or ``Dataset``.

Data specification
----------------------------

Data can be specified using a yaml-formatted header, with the doc-separation 
string (``---``) above and below the yaml block. Only one yaml block is allowed. 
If the doc-separation string is not the first (non-whitespace) line in the file, 
all of the file's contents will be interpreted by the csv reader. The yaml data 
can have arbitrary complexity.

.. code-block:: python

    >>> import metacsv, io
    >>> doc = io.StringIO('''
    ---
    author: A Person
    date:   2000-01-01
    variables:
        pop:
          name: Population
          unit: millions
        gdp:
          name: Product
          unit: 2005 $Bn
    ---
    region,year,pop,gdp
    USA,2010,309.3,13599.3
    USA,2011,311.7,13817.0
    CAN,2010,34.0,1240.0
    CAN,2011,34.3,1276.7
    ''')


Special attributes
~~~~~~~~~~~~~~~~~~~~~~~

The ``coords`` and ``variables`` attributes are keywords and are not simply 
passed to the MetaCSV object's ``attrs`` attribute.

``variables`` describes columns in the resulting ``DataFrame`` or 
``Data variables`` in the resulting ``xarray.Dataset``. Variables is not used 
when the CSV has only one column and the argument ``squeeze=True`` is passed to 
``read_csv``.

``coords`` describes indices in the resulting ``DataFrame``/``Series``, or 
``Coordinates`` in the resulting ``xarray.Dataset/xarray.DataArray``. 
Coordinates are categorical or independent variables which index the object's 
``values``. 



Using MetaCSV-formatted files in python
--------------------------------------------

Read MetaCSV-formatted data into python using pandas-like syntax: 

.. code-block:: python

    >>> metacsv.read_csv(doc, index_col=[0,1])
    >>> df
    <metacsv.core.containers.DataFrame (4, 2)>
                   pop      gdp
    region year
    USA    2010  309.3  13599.3
           2011  311.7  13817.0
    CAN    2010   34.0   1240.0
           2011   34.3   1276.7

    Coordinates
      * region     (region) object CAN, USA
      * year       (year) int64 2010, 2011
    Variables
        pop
        gdp
    Attributes
        date: 2000-01-01
        author: A Person

Exporting MetaCSV data to other formats
-----------------------------------------------

CSV
~~~~~~~~~

A MetaCSV ``Series`` or ``DataFrame`` can be written as a yaml-prefixed CSV 
using the same ``to_csv`` syntax as it's ``pandas`` counterpart:

.. code-block:: python

    >>> df.attrs['new attribute'] = 'changed in python!'
    >>> # includes changes to data, attributes, variables, and coordinates
    ... df.to_csv('my_new_data.csv')




pandas
~~~~~~~~~~~~~~~

The coordinates and MetaCSV attributes can be easily stripped from a MetaCSV 
Container:

.. code-block:: python

    >>> df.to_pandas()
                   pop      gdp
    region year
    USA    2010  309.3  13599.3
           2011  311.7  13817.0
    CAN    2010   34.0   1240.0
           2011   34.3   1276.7



xarray/netCDF
~~~~~~~~~~~~~~~

``xarray``__ provides a pandas-like interface to operating on indexed ``ndarray`` 
data. It is modeled on the ``netCDF`` data storage format used frequently in 
climate science, but is useful for many applications with higher-order data.

.. __: http://xarray.pydata.org/


.. code-block:: python

    >>> ds = df.to_xarray()
    >>> ds
    <xarray.Dataset>
    Dimensions:  (region: 2, year: 2)
    Coordinates:
      * region   (region) object 'USA' 'CAN'
      * year     (year) int64 2010 2011
    Data variables:
        pop      (region, year) float64 309.3 311.7 34.0 34.3
        gdp      (region, year) float64 1.36e+04 1.382e+04 1.24e+03 1.277e+03
    Attributes:
        date: 2000-01-01
        author: A Person
    >>> ds.to_netcdf('my_netcdf_data.nc')

Others
~~~~~~~~~

Currently, MetaCSV only supports conversion back to CSV and to 
netCDF through the ``xarray`` module. However, feel free to suggest 
additional features and to contribute your own!


TODO
============

* Make ``coords`` and ``attrs`` persistent across slicing operations 
  (try ``df['pop'].to_xarray()`` from above example and watch it 
  fail...)

* Improve hooks between ``pandas`` and ``metacsv``:

  - update ``coord`` names on ``df.index.names`` assignment
  - update ``coords`` on stack/unstack
  - update ``coords`` on 

* Handle attributes indexed by coord/variable names --> assign to 
  coord/variable-specific ``attrs``

* Let's start an issue tracker and get rid of this section!

* Should we rethink "special attributes," e.g. coords? Maybe these should 
  have some special prefix like ``_coords`` when included in yaml headers to 
  avoid confusion with other generic attributes...

* Allow special attributes (``coords``, ``variables``) in ``read_csv`` call

* Allow external file headers

* Write tests

* Write documentation

* Maybe steal xarray's coordinate handling and save ourselves a whole lotta 
  work?


Feature Requests
==================
* Create syntax for ``multi-csv`` --> ``Panel`` or combining using filename 
  regex 
* Eventually? allow for on-disk manipulation of many/large files with 
  dask/xarray 
* Eventually? add xml, SQL, other structured syntax language conversions




==============  ==========================================================
Python support  Python 2.7, >= 3.3
Source          https://github.com/delgadom/metacsv
Docs            http://metacsv.rtfd.org
Changelog       http://metacsv.readthedocs.org/en/latest/history.html
API             http://metacsv.readthedocs.org/en/latest/api.html
Issues          https://github.com/delgadom/metacsv/issues
Travis          http://travis-ci.org/delgadom/metacsv
Test coverage   https://coveralls.io/r/delgadom/metacsv
pypi            https://pypi.python.org/pypi/metacsv
Ohloh           https://www.ohloh.net/p/metacsv
License         `BSD`_.
git repo        .. code-block:: bash

                    $ git clone https://github.com/delgadom/metacsv.git
install dev     .. code-block:: bash

                    $ git clone https://github.com/delgadom/metacsv.git metacsv
                    $ cd ./metacsv
                    $ virtualenv .env
                    $ source .env/bin/activate
                    $ pip install -e .
tests           .. code-block:: bash

                    $ python setup.py test
==============  ==========================================================

.. _BSD: http://opensource.org/licenses/BSD-3-Clause
.. _Documentation: http://metacsv.readthedocs.org/en/latest/
.. _API: http://metacsv.readthedocs.org/en/latest/api.html


=========
Changelog
=========

Here you can find the recent changes to MetaCSV..

.. changelog::
    :version: dev
    :released: Ongoing

    .. change::
        :tags:  docs

        Updated CHANGES.

.. changelog::
    :version: 0.0.1
    :released: 2016-05-04

    .. change::
        :tags: project

        First release on PyPi.

.. todo:: vim: set filetype=rst:


