200. They’re blind to whether the work behind them finished — checkout sagas stuck retrying, webhooks dead-lettering, jobs failing halfway. Process Health instruments your processes so you can see completion rate, p95 duration, and exactly which step failed.
A run maps to a trace and each step to a span, so it rides the same pipeline as distributed tracing.
Install
Instrument a process
Wrap a unit of work withkodo.workflow() and instrument each step with wf.step(). Your control flow is unchanged — step() runs your function, times it, records success/failure, and rethrows on error.
What you get
- Completion rate — % of runs that finished without an errored step, per process.
- p95 duration — how long runs take end-to-end.
- Failed runs + the failing step — when a run errors, the exact step that broke is surfaced.
Telemetry is fire-and-forget — a failure to report never throws into your process. Kodo only observes; retries, checkpointing, and durability remain your concern (use it alongside Temporal, BullMQ, Cloudflare Workflows, or plain queues).
Zero-code: point Kōdo at a table you already have
If your runs already live in a table — a jobs queue, a saga table,webhook_deliveries — you don’t have to wrap anything. Declare the table in kodo.yaml and run the collector. It reads the table in your own infrastructure (the database connection never leaves your network) and ships only normalized run summaries to Kōdo.
status_map is optional — common values (success/completed/delivered → ok, failed/error → failed, pending/running → running) are recognized by default. The collector tracks a per-source cursor in .kodo-collector-state.json, so each run ships exactly once.
The postgres source records completed runs today (completion rate, p95, failures). In-flight / stuck detection for table sources is on the roadmap; use the SDK source for live stuck detection now.
Notes
- Server-side.
kodo.workflow()is for backend processes (workers, jobs, route handlers) where your async work runs. - Engine-agnostic. Observe completion across whatever you run — Temporal, Restate, DBOS, BullMQ, Cloudflare Workflows, or raw queues. Kōdo only observes; durability stays with your engine.
- Rides tracing. Runs are stored as workflow traces (
op: "workflow"), so Process Health is available on plans that include distributed tracing.
Related
- Distributed Traces
- Heartbeat Monitoring — for “did this recurring job run on schedule”
- SDK Reference