Metadata-Version: 2.1
Name: ukbparse
Version: 0.14.0
Summary: UK Biobank data processing library
Home-page: https://git.fmrib.ox.ac.uk/fsl/ukbparse
Author: Paul McCarthy
Author-email: pauldmccarthy@gmail.com
License: Apache License Version 2.0
Description: ``ukbparse`` - the UK BioBank data parser
        =========================================
        
        
        .. image:: https://img.shields.io/pypi/v/ukbparse.svg
           :target: https://pypi.python.org/pypi/ukbparse/
        
        .. image:: https://anaconda.org/conda-forge/ukbparse/badges/version.svg
           :target: https://anaconda.org/conda-forge/ukbparse
        
        .. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.1997626.svg
           :target: https://doi.org/10.5281/zenodo.1997626
        
        .. image:: https://git.fmrib.ox.ac.uk/fsl/ukbparse/badges/master/coverage.svg
           :target: https://git.fmrib.ox.ac.uk/fsl/ukbparse/commits/master/
        
        
        ``ukbparse`` is a Python library for pre-processing of UK BioBank data.
        
        
        Installation
        ------------
        
        
        Install ``ukbparse`` via pip::
        
        
            pip install ukbparse
        
        
        Or from ``conda-forge``::
        
            conda install -c conda-forge ukbparse
        
        
        Comprehensive documentation does not yet exist.
        
        
        Introductory notebook
        ---------------------
        
        
        The ``ukbparse_demo`` command will start a Jupyter Notebook which introduces
        the main features provided by ``ukbparse``. To run it, you need to install a
        few additional dependencies::
        
        
            pip install ukbparse[demo]
        
        
        You can then start the demo by running ``ukbparse_demo``.
        
        
        .. note:: The introductory notebook uses ``bash``, so is unlikely to work on
                  Windows.
        
        
        Usage
        -----
        
        
        General usage is as follows::
        
        
            ukbparse [options] output.tsv input1.tsv input2.tsv
        
        
        You can get information on all of the options by typing ``ukbparse --help``.
        
        
        Options can be specified on the command line, and/or stored in a configuration
        file. For example, the options in the following command line::
        
        
            ukbparse \
              --overwrite \
              --import_all \
              --log_file log.txt \
              --icd10_map_file icd_codes.tsv \
              --category 10 \
              --category 11 \
              output.tsv input1.tsv input2.tsv
        
        
        Could be stored in a configuration file ``config.txt``::
        
        
            overwrite
            import_all
            log_file       log.txt
            icd10_map_file icd_codes.tsv
            category       10
            category       11
        
        
        And then executed as follows::
        
        
            ukbparse -cfg config.txt output.tsv input1.tsv input2.tsv
        
        
        Customising
        -----------
        
        
        ``ukbparse`` contains a large number of built-in rules which have been
        specifically written to pre-process UK BioBank data variables. These rules are
        stored in the following files:
        
        
         * ``ukbparse/data/variables.tsv``: Cleaning rules for individual variables
         * ``ukbparse/data/datacodings.tsv``: Cleaning rules for data codings
         * ``ukbparse/data/types.tsv``: Cleaning rules for specific types
         * ``ukbparse/data/processing.tsv``: Processing steps
        
        You can customise or replace these files as you see fit. You can also pass
        your own versions of these files to ``ukbparse`` via the ``--variable_file``,
        ``--datacoding_file``, ``--type_file`` and ``--processing_file`` command-line
        options respectively.
        
        The ``variables.tsv`` file defines all of the variables that ``ukbparse`` is
        aware of.  If your UK BioBank data set contains variables which are not listed
        in this file, you may wish to generate your own version - you can do so
        by following these steps:
        
        1. Use the ``ukbconv`` utility (available through the `BioBank Data showcase
           <http://biobank.ctsu.ox.ac.uk/showcase/>`_) to generate a HTML file
           describing all of the variables in your data set, and data codings used by
           them.
        
        2. Use the ``ukbparse_htmlparse`` command to convert this ``html`` file into
           variable and data coding "base" files, which just contain the meta-data
           for each variable/data coding.
        
        3. Code up your custom cleaning rules for each variable and data coding, in
           the same format as can be seen in the ``ukbparse/data/`` directory. For
           data codings, create these flies:
        
             * ``datacodings_navalues.tsv``: contains NA value replacement rules
             * ``datacodings_recoding.tsv``: contains categorical recoding rules
        
           And for variables, create these files:
        
             * ``variables_navalues.tsv``: Contains NA value replacement rules
             * ``variables_recoding.tsv``: Contains categorical recoding rules
             * ``variables_clean.tsv``: Contains variable-specific cleaning functions
             * ``variables_parentvalues.tsv``: Contains child value replacement rules.
        
        4. Use the ``ukbparse_join`` command to generate the final variable and data
           coding tables from your base files, e.g.::
        
             ukbparse_join final_variables_table.tsv \
                           variables_base.tsv \
                           variables_navalues.tsv \
                           variables_recoding.tsv \
                           variables_parentvalues.tsv \
                           variables_clean.tsv
             ukbparse_join final_datacodings.tsv \
                           datacodings_base.tsv \
                           datacodings_navalues.tsv \
                           datacodings_recoding.tsv
        
        
        Tests
        -----
        
        
        To run the test suite, you need to install some additional dependencies::
        
        
              pip install ukbparse[test]
        
        
        Then you can run the test suite using ``pytest``::
        
            pytest
        
        
        Citing
        ------
        
        
        If you would like to cite ``ukbparse``, please refer to its `Zenodo page
        <https://zenodo.org/record/2203808#.XBDJ-xP7RE4>`_.
        
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Provides-Extra: test
Provides-Extra: demo
