Metadata-Version: 2.1
Name: progress-tracker
Version: 1.0.0
Summary: A utility that wraps an Iterable and regularly prints out progress on the processing of that Iterable
Home-page: UNKNOWN
Author: exactEarth Ltd.
Author-email: open-source@exactearth.com
License: MIT
Platform: UNKNOWN
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.6

================
progress_tracker
================

``progress_tracker`` is an easy and flexible way to print custom progress messages while processing streams of events on the CLI.

It was originally developed at `exactEarth Ltd`_ . See `this presentation`_ to `DevHouse Waterloo`_ for the original motivation.

.. _exactEarth Ltd: https://exactearth.com/

Built and tested with Python 3.6+

.. contents:: Contents

Quick Start
-----------

.. code:: bash

  % pip install progress_tracker

.. code:: python

    >>> from progress_tracker import track_progress
    >>> for _ in track_progress(list(range(1000)), every_n_records=100):
    ...     continue
    ...
    100/1000 (10.0%) in 0:00:00.000114 (Time left: 0:00:00.001026)
    200/1000 (20.0%) in 0:00:00.000274 (Time left: 0:00:00.001096)
    300/1000 (30.0%) in 0:00:00.000374 (Time left: 0:00:00.000873)
    400/1000 (40.0%) in 0:00:00.000473 (Time left: 0:00:00.000710)
    500/1000 (50.0%) in 0:00:00.000572 (Time left: 0:00:00.000572)
    600/1000 (60.0%) in 0:00:00.000671 (Time left: 0:00:00.000447)
    700/1000 (70.0%) in 0:00:00.000770 (Time left: 0:00:00.000330)
    800/1000 (80.0%) in 0:00:00.000868 (Time left: 0:00:00.000217)
    900/1000 (90.0%) in 0:00:00.000979 (Time left: 0:00:00.000109)
    1000 in 0:00:00.001086

Usage
-----

``progress_tracker`` is very customizable to fit your desires, but tries to have sensible defaults.

The core of ``progress_tracker`` is a method called ``track_progress``.
By changing the parameters passed to ``track_progress``, you can customize how frequently (and with what messages) the tracker will report.

.. code:: python

    def track_progress( 
        iterable: Iterable[T], # The iterable to iterate over
        total: Optional[int] = None, # Override for the total message count, defaults to len(iterable)
        callback: Callable[[str], Any] = print, # A function (f(str) -> None) that gets called each time a condition matches
        format_callback: Callable[[Dict[str, Any], Set[str]], str] = default_format_callback, # A function (f(str) -> str) that formats the progress values into a string.
        every_n_percent: Optional[float] = None, # Reports after every n percent
        every_n_records: Optional[int] = None, # Reports every n records
        every_n_seconds: Optional[float] = None, # Reports every n seconds
        every_n_seconds_idle: Optional[float] = None, # Report if there has not been a record processed in the past n seconds. Useful for infinite streams.
        every_n_seconds_since_report: Optional[float] = None, # Report if there hasn’t been any report in the past n seconds.
        report_first_record: bool = False, # Report after the first record
        report_last_record: bool = False # Report after the last record
        ) -> None

Examples
^^^^^^^^

Print after every n records are processed
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``every_n_records`` parameter will trigger a report after every nth record is processed. 

.. code:: python

    >>> from progress_tracker import track_progress
    >>>
    >>> for _ in track_progress(list(range(1000)), every_n_records=100):
    ...     continue
    ...
    100/1000 (10.0%) in 0:00:00.000114 (Time left: 0:00:00.001026)
    200/1000 (20.0%) in 0:00:00.000274 (Time left: 0:00:00.001096)
    300/1000 (30.0%) in 0:00:00.000374 (Time left: 0:00:00.000873)
    400/1000 (40.0%) in 0:00:00.000473 (Time left: 0:00:00.000710)
    500/1000 (50.0%) in 0:00:00.000572 (Time left: 0:00:00.000572)
    600/1000 (60.0%) in 0:00:00.000671 (Time left: 0:00:00.000447)
    700/1000 (70.0%) in 0:00:00.000770 (Time left: 0:00:00.000330)
    800/1000 (80.0%) in 0:00:00.000868 (Time left: 0:00:00.000217)
    900/1000 (90.0%) in 0:00:00.000979 (Time left: 0:00:00.000109)
    1000 in 0:00:00.001086

Print after every x percent of records are processed
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``every_n_percent`` parameter will trigger a report after every nth percent of records are processed. 

.. code:: python

    >>> from progress_tracker import track_progress
    >>> for _ in track_progress(list(range(1000)), every_n_percent=10):
    ...     continue
    ...
    100/1000 (10.0%) in 0:00:00.000114 (Time left: 0:00:00.001026)
    200/1000 (20.0%) in 0:00:00.000274 (Time left: 0:00:00.001096)
    300/1000 (30.0%) in 0:00:00.000374 (Time left: 0:00:00.000873)
    400/1000 (40.0%) in 0:00:00.000473 (Time left: 0:00:00.000710)
    500/1000 (50.0%) in 0:00:00.000572 (Time left: 0:00:00.000572)
    600/1000 (60.0%) in 0:00:00.000671 (Time left: 0:00:00.000447)
    700/1000 (70.0%) in 0:00:00.000770 (Time left: 0:00:00.000330)
    800/1000 (80.0%) in 0:00:00.000868 (Time left: 0:00:00.000217)
    900/1000 (90.0%) in 0:00:00.000979 (Time left: 0:00:00.000109)
    1000 in 0:00:00.001086

``every_n_percent`` only works for bounded iterables. For unbounded iterables (ex. streams), using ``every_n_percent`` will report a ``RuntimeWarning``.

At most a single report is generated per processed record. Even if processing of a single record would meet the conditions multiple times 
(ex. if ``every_n_percent=10``, but there are only 2 records, then processing each record causes 50%, or 5 * 10%, progress), only a single report is created (containing the latest values).

Print every n records OR every n seconds during processing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is especially useful when you have highly variable processing times (ex. most records take 2 seconds to process, but some take 20 seconds to process).
You can use the ``every_n_seconds`` parameter to get reports between the expensive records.

.. code:: python

    import time
    from progress_tracker import track_progress

    def simulated_processing(item):
        if item == 'hard':
            time.sleep(10)

    variable_stream_simulation = (['easy'] * 15) + (['hard'] * 5) + (['easy'] * 15)

    for item in track_progress(variable_stream_simulation, every_n_records=5, every_n_seconds=10):
        simulated_processing(item)

    ...
    5/35 (14.285714285714285%) in 0:00:00.000014 (Time left: 0:00:00.000084)
    10/35 (28.57142857142857%) in 0:00:00.000095 (Time left: 0:00:00.000238)
    15/35 (42.857142857142854%) in 0:00:00.000120 (Time left: 0:00:00.000160)
    16/35 (45.714285714285715%) in 0:00:10.011364 (Time left: 0:00:11.888495)
    17/35 (48.57142857142857%) in 0:00:20.022107 (Time left: 0:00:21.199878)
    18/35 (51.42857142857142%) in 0:00:30.031801 (Time left: 0:00:28.363368)
    19/35 (54.285714285714285%) in 0:00:40.041754 (Time left: 0:00:33.719372)
    20/35 (57.14285714285714%) in 0:00:50.073991 (Time left: 0:00:37.555493)
    25/35 (71.42857142857143%) in 0:00:50.074246 (Time left: 0:00:20.029698)
    30/35 (85.71428571428571%) in 0:00:50.074286 (Time left: 0:00:08.345714)
    35 in 0:00:50.074319

During the processing of the slow records, ``track_progress`` reported after every record.

Note: Because the default "Time left" calculation is just a simple linear extrapolation, it is not as useful in the face of such variability in processing times.

Combining trigger conditions
----------------------------

As seen in the previous example, you can combine multiple conditions together to dictate when a report is created.

Each of the conditions are combined using an OR operator, meaning that if any condition is met, a report is created.

Even if multiple conditions are met simultaneously, only a single report will be created.

Report Creation Invariants
--------------------------

Report creation observes two invariants:

1. At most a single report is created per processed record.
2. Reports are only created in response to a record being processed.

Customizing the report formatting / Internationalization
--------------------------------------------------------

By default, ``progress_tracker`` formats the report into an English language string.
This can be overriden by supplying a different function as the ``format_callback`` parameter to ``track_progress``.

This can be used to perform advanced formatting, or to add internationalization/localization.

.. code:: python

    def format_en_francais(report: Dict[str, Any], reasons: Set[str]):
        i = report["i"]
        total = report["total"]
        if total is None or i == total:
            format_string = "{i} messages traités en {time_taken}"
        else:
            format_string = "{i}/{total} messages traités en {time_taken} (temps restant: {estimated_time_remaining})"
        return format_string.format(**report)

    for poste in track_progress(postes, every_n_records=100, format_callback=format_en_francais):
        traité(poste)

(Veuillez excuser toute erreur en français. C'est le résultat de Google Translate.)

Simple cases can also be done using a lambda:

.. code:: python

    >>> from progress_tracker import track_progress
    >>>
    >>> for _ in track_progress(list(range(5)), every_n_records=1, format_callback=lambda **kwargs: "Got one!"):
    ...     continue
    ...
    Got one!
    Got one!
    Got one!
    Got one!
    Got one!

Report values available
^^^^^^^^^^^^^^^^^^^^^^^

The following values are available in every report for use in the ``format_callback``:

.. table::
   :widths: auto

   ============================== =================== =======================================================================================================================================
   Value                          Type                Meaning
   ============================== =================== =======================================================================================================================================
   ``{records_seen}``             int                 The number of records processed so far.
   ``{total}``                    Optional[int]       The total of records in the iterable, if known. Else ``None``
   ``{percent_complete}``         Optional[float]     The percentage of records processed so far. ``None`` if ``{total}`` is ``None`` or ``records_seen`` = 0
   ``{time_taken}``               timedelta           The amount of time that processing has taken thus far.
   ``{estimated_time_remaining}`` Optional[timedelta] The estimated amount of time needed in order to process the rest of the records (simple linear estimate). ``None`` if total is ``None``
   ``{items_per_second}``         Optional[float]     The number of records processed so far / the number of seconds elapsed. ``None`` if no time have elapsed.
   ``{idle_time}``                timedelta           The amount of idle time between the previous record's processing and this record's arrival.
   ============================== =================== =======================================================================================================================================

Customizing the print behaviour
-------------------------------

By default, ``progress_tracker`` calls Python's `print`_ function with the formatted report.
This can be overriden by supplying a different function as the ``callback`` parameter to ``track_progress``.

.. _`print`: https://docs.python.org/3/library/functions.html#print

``every_n_seconds_idle``
------------------------

``every_n_seconds_idle`` allows you to trigger a report if there is ever more than ``n`` seconds when no records were processed.

Note: If processing of a single record takes longer than ``every_n_seconds_idle``, then it will be triggered after every record.

Difference between ``every_n_seconds`` and ``every_n_seconds_idle``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

* ``every_n_seconds`` triggers a report anytime it has been more than n seconds since ``every_n_seconds`` last triggered a report.
* ``every_n_seconds_idle`` triggers a report anytime there has not been a record processed in the past n seconds (ie. the processing has been idle).

For example:

.. table::
   :widths: auto

   ========== ================================== ============================= ================================================================ ======================
   After      # of records processed in interval Cummulative records processed every_n_seconds=3                                                every_n_seconds_idle=3
   ========== ================================== ============================= ================================================================ ======================
   0 seconds  0                                  0                                                                     
   1 second   1                                  1                                                                     
   2 seconds  1                                  2                                                                     
   3 seconds  1                                  3                             Triggered, since it is the first record T >= 3s (T >= 0s + 3s)
   4 seconds  1                                  4                                                                     
   5 seconds  1                                  5                                                                     
   6 seconds  1                                  6                             Triggered, since it is the first record T >= 6s (T >= 3s + 3s)                                         
   7 seconds  1                                  6                                                                     
   8 seconds  0                                  6                                                                     
   9 seconds  0                                  6                                                                     
   10 seconds 0                                  6                                                                     
   11 seconds 1                                  7                             Triggered, since it is the first record T >= 9s (T >= 6s + 3s)   Triggered, since it is the first record processed in the past 3 seconds (T >= 6s + 3s)                                      
   12 seconds 1                                  8                                                                     
   13 seconds 1                                  9                                                                     
   14 seconds 1                                  10                            Triggered, since it is the first record T >= 14s (T >= 11s + 3s)                                        
   15 seconds 1                                  11                                                                    
   ========== ================================== ============================= ================================================================ ======================

Note that ``every_n_seconds`` reports at 3 seconds and 6 seconds, as one would expect. Then it reports at 11 seconds, since that is the first time a record was processed after the 9 seconds mark.
Then note that instead of next reporting at 12 seconds (9s + 3s), it reports next at 14 seconds (11s + 3s).

``every_n_seconds_idle`` only reported at 11 seconds, since that was the only time that a record was processed without other records being processed during the previous 3 seconds.

Accessing tracker after processing
----------------------------------

By default, ``track_progress`` hides the internal ``ProgressTracker`` object underneath. However, in some cases you might want to be able to access the internals of the object after iteration.
In these cases, you can use ``track_progress`` an explicit context manager:

.. code:: python

    with track_progress(range(0, 101), every_n_percent=5) as tracker:
        for item in tracker:
            process(item)
        final_report = tracker.create_report()
        print(f"Processing took {final_report['time_taken']} and processed {final_report['records_seen']} records.")


Other Resources
---------------

- `This presentation`_ to `DevHouse Waterloo`_.

.. _This presentation: https://www.slideshare.net/MichaelOvermeyer/progress-tracker-a-handy-progress-printout-pattern
.. _DevHouse Waterloo: https://www.meetup.com/DevHouse-Waterloo/events/247071801/

