Metadata-Version: 2.1
Name: span-tree
Version: 0.0.1
Summary: Use trees, traces, and spans and never read a log message again!
Author: Espen Albert
License: MIT
Project-URL: Source, https://github.com/EspenAlbert/py-libs/tree/main/span_tree
Project-URL: Documentation, https://espenalbert.github.io/py-libs/span_tree
Classifier: Programming Language :: Python :: 3.11
Requires-Python: ==3.11.*
Description-Content-Type: text/markdown
Requires-Dist: rich (==13.5.2)
Requires-Dist: typing-extensions (>=4.3.0)
Requires-Dist: zero-3rdparty (==0.0.27)

# span-trace

## Goals

- Never have to look at log files again!
- Make error debugging easy by capturing context around errors and traces instead of only a traceback and locals
- Make health monitoring happening automatically
- Make performance monitoring easy by automatically tracking and report slow spans
- Support see once, annotate, and forget about it (instead of re-labeling the same error over and over...)

## Features in library

- trace-traces instead of flat log messages
    - a trace trace can have multiple "spans" as nodes
    - on each "node"
        - file_location
        - start and end timestamps
        - status: started|succeeded|failed
        - optional fields:
            - exit_error: if something fails without `except`
            - handled_errors: if we have errors caught with `except` and logged
            - level
    - each trace use a [ksuid](https://github.com/segmentio/ksuid) to ensure it is unique
    - each span has a name which is explicitly set or based on function name/call-context
    - traces are linked together when a new thread/task is started from an existing trace
- Smart printing
    - to terminal when running on localhost
    - only json when running on cloud/only normal logging no stdout/stderr?
    - smart at grouping together tasks and flushing before exit
- "test-mode": record all traces instead of just printing

## How to use the library

Basically, an Enhanced stdlib logger:

```python
from span_tree import get_logger

logger = get_logger(__name__)
logger.log(level,
           msg)  # level=debug|info|warning|error|critical, can also use `logger.info` logs will be attached to current span
logger(name: str, force_new_trace: bool = False, ** kwargs) -> `ContextManager[Span]`  # to start a new span/trace
logger.log_extra(msg: str = "", level: int = INFO, ** kwargs)  # to add attributes to span
```

## Features in "receiver"

- crash_reports/annotating errors
    - support marking error as crash/silenced in UI
    - support marking error with OK/WARN/ALERT
- find slow functions/threads
- distributed tracing
- support updating trace/spans (same published many times)

## Features in CLI app

- Find traces: {trace_id}
    - either a 32 bytes hex
    - or a full ksuid
- Select namespace (or selected directly if it already exists)
- Select traces based on tags
- See dashboard of Name|Status|Counts|LastTime
- Toggle error/crash/ok/all
- Toggle show slow/fast/all
- Query for filtering tasks
- Inside an span
    - Toggle for Debug/Info/Warning/Error/Critical
    - Arrow keys for choosing parents or scrolling down

## How to run and 3rdparty dependencies

- when running on localhost
    - depend on rich to see traces directly
- when running in a container/lambda service
    1. by default dumps the trace to stdout (need to configure receiver to parse this logs somehow)
    2. use httpx/request for forwarding directly
- when debugging
    - can run with database dependency directly and inject to local database to help localhost debugging

## DB Layer

- api-key in header
- payload(tags, list(span))
- Use a pydantic class for finding all ref_src and ref_dest

## Showcase (todo)

- Video of traditional/trace based logging
    - src stdout on one side
    - stdout on other side

## TBD & Uncertainties

- ref_src|ref_dest
    - uuid4
    - ref_src created on a trace when dumping a message and adding the ref to e.g., metadata
        - during injection an alternative index of ref_src -> trace
    - ref_dest used on a trace when parsing back the message then logging with log_ref(ref_dest)

    - RunSequenceGetter
        - unique sequence number per tags combination
    - API
        - small DB wrapper support using CLI or future frontend app
    - UI?
        - using same python code as rich and returning html?
- rich based tracebacks & locals collection?
    - Long tracebacks might not be necessary if I have call location
- How to do sampling?
- Later
    - Support creating alarms, graphs, etc.
    - Support search like feature like Kibana
    - Support pre-defined dashboards

## Implementation details

- principles
    - Never more than 1 span active per task, when "root-span" finishes, ALL subtasks must finish
    - Errors are stored and tracked when the parent completes
        - only logged with traceback if `logger.exception(error)` | or `__exit__` of root span has the error
- DataModel
    - user
        - email
        - last_namespace
        - last_access
    - client
        - list(namespaces)
    - namespace
        - api_key
        - tags|labels
        - all apps
            - name
            - versions
            - counter
            - last_ts
    - trace/trace
        - list(span)
    - spans
        - ts_start
        - ts_end
        - status=runs|OK|FAIL|CRASH?
        - kind? Producer/Consumer, Client/Server
- monkeypatches both
    - `Thread.__init__`
    - `ThreadPoolExecutor.submit`

## Roadmap

- Minimal library implementation, a 0.0.1 release, and a TestTrace as an "integration" test
    - Loop-slow ~10ms
    - Simple-math
    - url-get
    - different ways of dumping/parsing
        - yaml
        - toml
        - json
- A local "minimal-system" working
    - span-trace used in library with a publisher that writes to a DB
    - a textual CLI for viewing the traces
- A local "full-system" working
    - Support annotating traces
    - Support publishing status based on annotations
    - Support metrics publishing
    - Support health report
- Cloud receiver and storage
    - a receiver lambda working with an API token
    - library support to "post" messages to receiver
    - CLI configured to use online DB
- Cloud signup with openid and basic markdown UI

## Resources

- How does it integrate with Open Telemetry?
    - [Observability with InfluxDB and Open Telemetry](https://www.youtube.com/watch?v=LFAGFX_aCY8)
    - [Open Telemetry](https://www.youtube.com/watch?v=oe5YYh9mhzw)
    - [aws open telemetry for lambda](https://aws-otel.github.io/docs/getting-started/lambda)
