Portunus
Observability

Prometheus Metrics

Every collector Portunus exposes, with cardinality budget notes.

portunus-server exposes Prometheus metrics on metrics_listen (default 127.0.0.1:7081). The endpoint is loopback-pinned — scrape from a sibling Prometheus on the same host or a sidecar.

curl -s http://127.0.0.1:7081/metrics

The same payload is also available at /v1/metrics on the operator HTTP listener, gated by superadmin RBAC. The Web UI dashboard reads that path so it doesn't have to cross listeners.

Cardinality budget

Per-rule collectors emit one row per live rule (labels {client, rule, owner}). Per-port and per-target detail surface only on demand via ?per_port=true / ?per_target=true — never as default /metrics series.

When a rule is removed, most per-rule rows are removed with it. The cumulative byte counters (portunus_rule_bytes_in_total / portunus_rule_bytes_out_total) are kept by Prometheus convention.

Server-level

portunus_clients_connected
portunus_auth_failures_total{reason}
portunus_operator_requests_total{outcome, reason}
portunus_audit_buffer_drops_total
portunus_audit_durable_writer_lag_seconds
portunus_store_busy_total

outcome{allow, deny}; reason is "ok" on allow or the static RbacError::code() string on deny (bounded label set).

portunus_audit_durable_writer_lag_seconds is the age of the oldest entry in the durable-audit hand-off queue (0 when idle). portunus_store_busy_total counts SQLITE_BUSY events mapped to a transient store error; it should stay near zero.

Per-rule TCP

portunus_rule_bytes_in_total{client, rule, owner}
portunus_rule_bytes_out_total{client, rule, owner}
portunus_rule_active_connections{client, rule, owner}
portunus_rule_dns_failures_total{client, rule, owner}
portunus_rule_target_failovers_total{client, rule, owner}

portunus_rule_target_failovers_total emits one row per multi-target rule (counting Healthy↔Failed transitions); single-target rules never emit a row.

Per-rule UDP

portunus_rule_udp_datagrams_in_total{client, rule, owner}
portunus_rule_udp_datagrams_out_total{client, rule, owner}
portunus_rule_active_flows{client, rule, owner}
portunus_rule_flows_dropped_overflow_total{client, rule, owner}

TLS SNI routing (v0.9+)

portunus_tls_sni_route_total{client, rule, owner, result}
portunus_tls_sni_listener_miss_total{client, port}
portunus_tls_sni_listener_parse_failures_total{client, port}
portunus_tls_sni_routes_active

result{exact, wildcard, fallback}. A connection whose SNI matches no rule (and has no fallback) is counted on portunus_tls_sni_listener_miss_total instead, not on tls_sni_route_total.

SNI peek histogram (v0.10+)

portunus_tls_client_hello_peek_duration_seconds_bucket{client, port, le}
portunus_tls_client_hello_peek_duration_seconds_sum
portunus_tls_client_hello_peek_duration_seconds_count

Finite buckets up to 3 s; observations above 3 s increment _count and le="+Inf" without bumping le="3". Only emitted for SNI-mode listeners.

Rate limiting (v0.11+)

portunus_rate_limit_reject_total{client, rule, owner, reason}
portunus_rate_limit_throttle_seconds_total{client, rule, owner, direction}
portunus_rate_limit_active_connections{client, rule, owner}

Reject reasons: conn_concurrent, conn_rate, udp_flow_rate, owner_concurrent, owner_conn_rate, owner_udp_flow_rate.

Per-rule rows carry the rule id in rule and the owner in owner. Owner-aggregated rows (cross-rule totals for an owner) set rule="" and keep owner populated. This applies to all three collectors, including portunus_rate_limit_active_connections, which also emits an owner-aggregate row with rule="". Slice with {rule!=""} for per-rule rows or {rule=""} for the owner aggregate.

Traffic quotas (v0.13+)

portunus_traffic_quota_bytes_used{user, client}
portunus_traffic_quota_bytes_limit{user, client}
portunus_traffic_quota_exhausted{user, client}
portunus_traffic_quota_period_resets_total{user, client}
portunus_traffic_quota_exhausted_total{user, client}

These are keyed by {user, client} — not owner — and track the per-(user, client) monthly byte budget. bytes_used is the cumulative bytes consumed in the current period and bytes_limit is the budget. portunus_traffic_quota_exhausted is a gauge (1 while the quota is currently exhausted, else 0). period_resets_total counts period boundary rollovers and exhausted_total counts first-time period exhaustions.

Useful queries

# Top 5 rules by ingress bytes/sec over last 5m
topk(5, sum by (rule) (rate(portunus_rule_bytes_in_total[5m])))

# Reject ratio per rule
rate(portunus_rate_limit_reject_total[5m]) /
rate(portunus_rule_active_connections[5m])

# Throttle wall-clock per rule (per direction)
rate(portunus_rate_limit_throttle_seconds_total[5m])

# Auth failures by reason
sum by (reason) (rate(portunus_auth_failures_total[5m]))

On this page