Metadata-Version: 2.1
Name: serenata-toolbox
Version: 15.1.5
Summary: Toolbox for Serenata de Amor project
Home-page: http://github.com/okfn-brasil/serenata-toolbox
Author: Serenata de Amor
Author-email: contato@serenata.ai
License: MIT
Keywords: serenata de amor,data science,brazil,corruption
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Topic :: Utilities
Requires-Python: >=3.6
Requires-Dist: aiofiles
Requires-Dist: aiohttp
Requires-Dist: beautifulsoup4 (>=4.4)
Requires-Dist: lxml (>=3.6)
Requires-Dist: pandas (>=0.18)
Requires-Dist: python-decouple (>=3.1)
Requires-Dist: tqdm

.. image:: https://travis-ci.org/okfn-brasil/serenata-toolbox.svg?branch=master
   :target: https://travis-ci.org/okfn-brasil/serenata-toolbox
   :alt: Travis CI build status (Linux)

.. image:: https://readthedocs.org/projects/serenata-toolbox/badge/?version=latest
   :target: http://serenata-toolbox.readthedocs.io/en/latest/?badge=latest
   :alt: Documentation Status

.. image:: https://landscape.io/github/okfn-brasil/serenata-toolbox/master/landscape.svg?style=flat
   :target: https://landscape.io/github/okfn-brasil/serenata-toolbox/master
   :alt: Code Health

.. image:: https://coveralls.io/repos/github/okfn-brasil/serenata-toolbox/badge.svg?branch=master
   :target: https://coveralls.io/github/okfn-brasil/serenata-toolbox?branch=master
   :alt: Coveralls

.. image:: https://badge.fury.io/py/serenata-toolbox.svg
   :alt: PyPI package version

.. image:: https://img.shields.io/badge/donate-apoia.se-EB4A3B.svg
   :target: https://apoia.se/serenata
   :alt: Donation Page

Serenata de Amor Toolbox
========================

`pip <https://pip.pypa.io/en/stable/>`_  installable package to support `Serenata de Amor <https://github.com/okfn-brasil/serenata-de-amor>`_
and `Rosie <https://github.com/okfn-brasil/serenata-de-amor/blob/master/rosie/README.md>`_ development.

Serenata_toolbox is compatible with Python 3.6+

Installation
------------

.. code-block:: bash

    $ pip install -U serenata-toolbox

If you are a regular user you are ready to get started after `pip install`.

If you are a core developer willing to upload datasets to the cloud you need to configure `AMAZON_ACCESS_KEY` and `AMAZON_SECRET_KEY` environment variables before running the toolbox.

Usage
-----

We have `plenty of them <https://github.com/okfn-brasil/serenata-de-amor/blob/master/research/CONTRIBUTING.md#datasets-data>`_ ready for you to download from our servers. And this toolbox helps you get them. Here some examples:

Example 1: Using the command line wrapper
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash

    # without any arguments will download our pre-processed datasets and store into data/ folder
    $ serenata-toolbox

    # will download these specific datasets and store into /tmp/serenata-data folder
    $ serenata-toolbox /tmp/serenata-data --module federal_senate chamber_of_deputies

    # you can specify a dataset and a year
    $ serenata-toolbox --module chamber_of_deputies --year 2009

    # or specify all options simultaneously
    $ serenata-toolbox /tmp/serenata-data --module federal_senate --year 2017

    # getting help
    $ serenata-toolbox --help

Example 2: How do I download the datasets?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Another option is creating your own Python script:

.. code-block:: python

  from serenata_toolbox.datasets import Datasets
  datasets = Datasets('data/')

  # now lets see what are the latest datasets available
  for dataset in datasets.downloader.LATEST:
      print(dataset)  # and you'll see a long list of datasets!

  # and let's download one of them
  datasets.downloader.download('2018-01-05-reimbursements.xz')  # yay, you've just downloaded this dataset to data/

  # you can also get the most recent version of all datasets:
  latest = list(datasets.downloader.LATEST)
  datasets.downloader.download(latest)

Example 3: Using shortcuts
^^^^^^^^^^^^^^^^^^^^^^^^^^

If the last example doesn't look that simple, there are some fancy shortcuts available:

.. code-block:: python

  from serenata_toolbox.datasets import fetch, fetch_latest_backup
  fetch('2018-01-05-reimbursements.xz', 'data/')
  fetch_latest_backup( 'data/')  # yep, we've just did exactly the same thing

Example 4: Generating datasets
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If you ever wonder how did we generated these datasets, this toolbox can help you too (at least with the more used ones — the other ones are generated `in our main repo <https://github.com/okfn-brasil/serenata-de-amor/blob/master/research/CONTRIBUTING.md#the-toolbox-and-our-the-source-files-src>`_):

.. code-block:: python

    from serenata_toolbox.federal_senate.dataset import Dataset as SenateDataset
    from serenata_toolbox.chamber_of_deputies.reimbursements import Reimbursements as ChamberDataset

    chamber = ChamberDataset('2018', 'data/')
    chamber()

    senate = SenateDataset('data/')
    senate.fetch()
    senate.translate()
    senate.clean()

Documentation (WIP)
-------------------

The `full documentation <https://serenata-toolbox.readthedocs.io>`_ is still a work in progress. If you wanna give us a hand you will need `Sphinx <http://www.sphinx-doc.org/>`_:

.. code-block:: bash

  $ cd docs
  $ make clean;make rst;rm source/modules.rst;make html

Contributing
------------

Firstly, you should create a development environment with Python's `venv <https://docs.python.org/3/library/venv.html#creating-virtual-environments>`_ module to isolate your development.
Then clone the repository and build the package by running:

.. code-block:: bash

  $ git clone https://github.com/okfn-brasil/serenata-toolbox.git
  $ cd serenata-toolbox
  $ python setup.py develop

Always add tests to your contribution — if you want to test it locally before opening the PR:

.. code-block:: bash

  $ pip install tox
  $ tox

When the tests are passing, also check for coverage of the modules you edited or added — if you want to check it before opening the PR:

.. code-block:: bash

  $ tox
  $ open htmlcov/index.html

Follow `PEP8 <https://www.python.org/dev/peps/pep-0008/>`_ and best practices implemented by `Landscape <https://landscape.io>`_ in the `veryhigh` strictness level — if you want to check them locally before opening the PR:

.. code-block:: bash

  $ pip install prospector
  $ prospector -s veryhigh serenata_toolbox

If this report includes issues related to `import` section of your files, `isort <https://github.com/timothycrosley/isort>`_ can help you:

.. code-block:: bash

  $ pip install isort
  $ isort **/*.py --diff

Always suggest a version bump. We use `Semantic Versioning <http://semver.org>`_ – or in `Elm community words <https://github.com/elm-lang/elm-package#version-rules>`_:

* MICRO: the API is the same, no risk of breaking code
* MINOR: values have been added, existing values are unchanged
* MAJOR: existing values have been changed or removed

This is really important because every new code merged to `master` triggers the CI and then the CI triggers a new release to PyPI. The attemp to roll out a new version of the toolbox will fail without a version bump. So we do encorouge to add a version bump even if all you have changed is the `README.rst` — this is the way to keep the `README.rst` updated in PyPI.

If you are not changing the API or `README.rst` in any sense and if you really do not want a version bump, you need to add `[skip ci]` to you commit message.

And finally take *The Zen of Python* into account:

.. code-block:: bash

  $ python -m this


