# Prometheus Metrics (https://portunus.bybee.dev/en/docs/observability/metrics)



`portunus-server` exposes Prometheus metrics on
`metrics_listen` (default `127.0.0.1:7081`). The endpoint is
loopback-pinned — scrape from a sibling Prometheus on the same host or
a sidecar.

```sh
curl -s http://127.0.0.1:7081/metrics
```

The same payload is also available at `/v1/metrics` on the operator
HTTP listener, **gated by superadmin RBAC**. The Web UI dashboard reads
that path so it doesn't have to cross listeners.

## Cardinality budget [#cardinality-budget]

Per-rule collectors emit **one row per live rule** (labels
`{client, rule, owner}`). Per-port and per-target detail surface
**only on demand** via `?per_port=true` / `?per_target=true` — never as
default `/metrics` series.

When a rule is removed, most per-rule rows are removed with it. The
cumulative byte counters (`portunus_rule_bytes_in_total` /
`portunus_rule_bytes_out_total`) are kept by Prometheus convention.

## Server-level [#server-level]

```
portunus_clients_connected
portunus_auth_failures_total{reason}
portunus_operator_requests_total{outcome, reason}
portunus_audit_buffer_drops_total
portunus_audit_durable_writer_lag_seconds
portunus_store_busy_total
```

`outcome` ∈ `{allow, deny}`; `reason` is `"ok"` on allow or the static
`RbacError::code()` string on deny (bounded label set).

`portunus_audit_durable_writer_lag_seconds` is the age of the oldest
entry in the durable-audit hand-off queue (0 when idle).
`portunus_store_busy_total` counts `SQLITE_BUSY` events mapped to a
transient store error; it should stay near zero.

## Per-rule TCP [#per-rule-tcp]

```
portunus_rule_bytes_in_total{client, rule, owner}
portunus_rule_bytes_out_total{client, rule, owner}
portunus_rule_active_connections{client, rule, owner}
portunus_rule_dns_failures_total{client, rule, owner}
portunus_rule_target_failovers_total{client, rule, owner}
```

`portunus_rule_target_failovers_total` emits one row per multi-target
rule (counting Healthy↔Failed transitions); single-target rules never
emit a row.

## Per-rule UDP [#per-rule-udp]

```
portunus_rule_udp_datagrams_in_total{client, rule, owner}
portunus_rule_udp_datagrams_out_total{client, rule, owner}
portunus_rule_active_flows{client, rule, owner}
portunus_rule_flows_dropped_overflow_total{client, rule, owner}
```

## TLS SNI routing (v0.9+) [#tls-sni-routing-v09]

```
portunus_tls_sni_route_total{client, rule, owner, result}
portunus_tls_sni_listener_miss_total{client, port}
portunus_tls_sni_listener_parse_failures_total{client, port}
portunus_tls_sni_routes_active
```

`result` ∈ `{exact, wildcard, fallback}`. A connection whose SNI
matches no rule (and has no fallback) is counted on
`portunus_tls_sni_listener_miss_total` instead, not on
`tls_sni_route_total`.

## SNI peek histogram (v0.10+) [#sni-peek-histogram-v010]

```
portunus_tls_client_hello_peek_duration_seconds_bucket{client, port, le}
portunus_tls_client_hello_peek_duration_seconds_sum
portunus_tls_client_hello_peek_duration_seconds_count
```

Finite buckets up to 3 s; observations above 3 s increment `_count` and
`le="+Inf"` without bumping `le="3"`. Only emitted for SNI-mode listeners.

## Rate limiting (v0.11+) [#rate-limiting-v011]

```
portunus_rate_limit_reject_total{client, rule, owner, reason}
portunus_rate_limit_throttle_seconds_total{client, rule, owner, direction}
portunus_rate_limit_active_connections{client, rule, owner}
```

Reject reasons: `conn_concurrent`, `conn_rate`, `udp_flow_rate`,
`owner_concurrent`, `owner_conn_rate`, `owner_udp_flow_rate`.

Per-rule rows carry the rule id in `rule` and the owner in `owner`.
Owner-aggregated rows (cross-rule totals for an owner) set `rule=""`
and keep `owner` populated. This applies to all three collectors,
including `portunus_rate_limit_active_connections`, which also emits an
owner-aggregate row with `rule=""`. Slice with `{rule!=""}` for per-rule
rows or `{rule=""}` for the owner aggregate.

## Traffic quotas (v0.13+) [#traffic-quotas-v013]

```
portunus_traffic_quota_bytes_used{user, client}
portunus_traffic_quota_bytes_limit{user, client}
portunus_traffic_quota_exhausted{user, client}
portunus_traffic_quota_period_resets_total{user, client}
portunus_traffic_quota_exhausted_total{user, client}
```

These are keyed by `{user, client}` — not `owner` — and track the
per-(user, client) monthly byte budget. `bytes_used` is the cumulative
bytes consumed in the current period and `bytes_limit` is the budget.
`portunus_traffic_quota_exhausted` is a gauge (`1` while the quota is
currently exhausted, else `0`). `period_resets_total` counts period
boundary rollovers and `exhausted_total` counts first-time period
exhaustions.

## Useful queries [#useful-queries]

```text
# Top 5 rules by ingress bytes/sec over last 5m
topk(5, sum by (rule) (rate(portunus_rule_bytes_in_total[5m])))

# Reject ratio per rule
rate(portunus_rate_limit_reject_total[5m]) /
rate(portunus_rule_active_connections[5m])

# Throttle wall-clock per rule (per direction)
rate(portunus_rate_limit_throttle_seconds_total[5m])

# Auth failures by reason
sum by (reason) (rate(portunus_auth_failures_total[5m]))
```
