Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Metrics Catalog

All Torvyn metrics are pre-allocated at flow creation time. Counters are AtomicU64 with Relaxed ordering on the hot path. Histograms use fixed-bucket boundaries with logarithmic distribution from 100 ns to 10 s.

Per-Flow Metrics

Metric NameTypeUnitLabelsDescription
flow.elements.totalCountercountflow_idTotal stream elements processed.
flow.elements.errorsCountercountflow_idTotal elements that produced errors.
flow.latencyHistogramnsflow_idEnd-to-end latency per element (source entry to sink exit).
flow.throughputDerivedelements/sflow_idComputed from element count and wall time during export.
flow.copies.totalCountercountflow_idTotal buffer copy operations.
flow.copies.bytesCounterbytesflow_idTotal bytes copied across all copy operations.
flow.active_durationGaugensflow_idWall time since flow started.
flow.stateGauge (enum)flow_idCurrent flow lifecycle state.

Per-Component Metrics

Metric NameTypeUnitLabelsDescription
component.invocationsCountercountflow_id, component_idTotal invocations of this component.
component.errorsCountercountflow_id, component_idTotal error returns.
component.processing_timeHistogramnsflow_id, component_idWall time per invocation (excludes queue wait).
component.fuel_consumedCounterunitsflow_id, component_idWasm fuel consumed (if fuel metering enabled).
component.memory_currentGaugebytesflow_id, component_idCurrent Wasm linear memory size.

Per-Stream Metrics

Metric NameTypeUnitLabelsDescription
stream.elements.transferredCountercountflow_id, stream_idTotal elements transferred through this stream.
stream.backpressure.eventsCountercountflow_id, stream_idTotal backpressure activation events.
stream.backpressure.duration_nsCounternsflow_id, stream_idTotal time spent in backpressure.
stream.queue.current_depthGaugecountflow_id, stream_idCurrent queue depth.
stream.queue.peak_depthGaugecountflow_id, stream_idMaximum queue depth observed.
stream.queue.wait_timeHistogramnsflow_id, stream_idTime each element spent waiting in the queue.

Resource Pool Metrics

Metric NameTypeUnitLabelsDescription
pool.capacityGaugecounttierTotal slots in this pool tier.
pool.availableGaugecounttierFree buffers currently available.
pool.allocatedCountercounttierTotal buffers allocated since startup.
pool.returnedCountercounttierTotal buffers returned since startup.
pool.fallback_countCountercounttierAllocations that fell back to system allocator.
pool.exhaustion_eventsCountercounttierTimes the free list was empty when allocation was requested.
pool.reuse_rateDerivedratiotierreturned / allocated (computed during export).

Per-Capability Metrics

Metric NameTypeUnitLabelsDescription
capability.exercisesCountercountcomponent_id, capabilityTimes a capability was exercised.
capability.denialsCountercountcomponent_id, capabilityTimes a capability was denied.

System-Level Metrics

Metric NameTypeUnitDescription
system.flows.activeGaugecountCurrently active flows.
system.components.activeGaugecountCurrently instantiated components.
system.memory.totalGaugebytesTotal memory (host + all linear memories).
system.memory.hostGaugebytesHost-side memory (tables, queues, metrics).
system.scheduler.wakeupsCountercountTotal scheduler wakeup events.
system.scheduler.idle_nsCounternsTime spent idle (no work available).
system.spans_droppedCountercountTrace spans dropped due to export backpressure.

Querying Metrics

Prometheus: Scrape http://localhost:<port>/metrics (or the Unix domain socket at $TORVYN_STATE_DIR/torvyn.sock). All metrics are exported in Prometheus text exposition format.

OTLP: Configure [observability] otlp_metrics_enabled = true and otlp_export_interval_s to push metrics to an OpenTelemetry Collector, Grafana Cloud, or any OTLP-compatible backend.

torvyn bench: Benchmark reports include all metrics as computed deltas over the benchmark window.

Alerting Recommendations

ConditionMetricThresholdMeaning
Error rate spikeflow.elements.errors rate> 1% of totalComponents are failing. Investigate error logs.
Sustained backpressurestream.backpressure.duration_ns rate> 50% of wall timeConsumer cannot keep up. Scale or optimize downstream.
Pool exhaustionpool.exhaustion_events rate> 0 sustainedBuffer pool is undersized. Increase pool configuration.
High copy amplificationflow.copies.bytes / flow.throughput * avg_element_size> 3.0 per stageMore copies than expected. Investigate component data access patterns.
Memory growthcomponent.memory_currentSustained increasePossible memory leak in component. Investigate component logic.
Tail latencyflow.latency p99> 10× p50Occasional slow processing. Check for backpressure, GC, or I/O pauses.