Performance Report
Benchmark methodology, current Portunus measurements, comparable forwarding stacks, and reproducible test flow.
This report captures the current v1.6.1 performance. Read the numbers as same-host evidence — measurements taken on one specific machine — rather than as universal throughput claims. Forwarding speed is dominated by the kernel version, the CPU frequency policy, the network path, socket buffer sizing, and whether traffic stays inside the kernel or has to cross into userspace (the application's own memory).
For TCP, the v1.6.1 data plane is unchanged from v1.3.0, the release that
introduced the splice fast path. This revision re-measures the same host
end-to-end and adds a direct portunus-standalone vs
portunus-client comparison, because both binaries ship the same
portunus-forwarder data plane. The earlier v1.3.0 and v0.11 baselines
are preserved on the
Performance History page.
Quick read. On a plain TCP rule with no bandwidth limit, Portunus keeps pace with the kernel's own
iptablesforwarding all the way to 20 Gbit/s, and the choice between the standalone and client builds makes no measurable difference to throughput.
Bench host
All v1.6.1 numbers below were captured on:
Linux host 6.12.38+deb13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.12.38-1 x86_64
AMD EPYC 7B13 (4 vCPU), 7.8 GiB RAM
rustc 1.96.0 (ac68faa20 2026-05-25)
iperf 3.18 (cJSON 1.7.15)This is the same 4-vCPU Debian 13 / kernel 6.12 machine that produced
the v0.11 / v1.3.0 baselines.
Read each result as a same-machine, back-to-back delta — iperf3 vs
iptables vs Portunus in one run — not as an absolute number to compare
across reports, since the toolchain and kernel point release move between
sessions.
All tests run over the loopback interface (lo — traffic that never
leaves the machine), so the network itself is never the bottleneck and we
measure the forwarder's own cost.
v1.6.1 Headline (Linux TCP splice fast path)
Single 30-second TCP runs with no bandwidth limit (--seconds 30 --omit-seconds 5, where --omit-seconds discards the first seconds so
TCP slow-start does not skew the average), splice ON (the default),
measured back-to-back against the same iperf3 target:
Each forwarder is its own column, so you can compare them side by side
against the iptables baseline:
iptables REDIRECT (baseline) | portunus-standalone | portunus-client | iperf3 | |
|---|---|---|---|---|
| Throughput | 26,987 Mbit/s | 31,936 Mbit/s | 26,644 Mbit/s | 30,890 Mbit/s |
vs iptables | 100.0% | 118.3% | 98.7% | 114.5% |
We take iptables REDIRECT as the 100% baseline: it is the Linux
kernel forwarding the connection entirely in-kernel (a static NAT
redirect — no userspace process touches the bytes), which is the fastest
a forwarder can realistically be on this host. The vs iptables row
is each column's throughput divided by that baseline — 100% means on par
with kernel forwarding, above 100% means faster than it. So
portunus-standalone runs at 118.3% of kernel iptables and
portunus-client at 98.7% (on par). The iperf3 column — no forwarder
at all — is the raw ceiling, at 114.5%.
With the splice fast path, Portunus keeps pace with — and in some runs
exceeds — kernel iptables forwarding. splice is a Linux system call that moves bytes from one
socket to another without copying them into the application ("zero
copy"); Portunus pairs it with an internal kernel pipe so a forwarded
TCP connection's data never passes through userspace. That avoids the
read → memcpy → write round-trip that limits every ordinary userspace
TCP program on this host — iperf3 included. The proxied path also runs
as three processes (iperf client → Portunus → iperf server), which spread
across the 4 vCPUs better than the direct two-thread iperf3 baseline
can — which is why Portunus can come in slightly above 100%.
A rate-limited client rule (1,048,576 B/s cap) landed at 8.388 Mbit/s
against an 8.389 Mbit/s target — inside the +10% acceptance bound,
confirming the bandwidth cap is accurate.
The single-run uncapped maximum at ~30 Gbit/s is dominated by scheduler noise: at these speeds the result depends on which core the OS happens to place each process on, so it varies run to run. Read the columns above as "all three are in the same range", not as a fixed ranking — see the standalone vs client repeats below, where the order flips from one run to the next.
standalone vs client
portunus-standalone (driven by a static TOML file, with no control
plane) and portunus-client (which adds a bidirectional gRPC control
stream that receives pushed rules) consume the same
portunus-forwarder data plane end-to-end — the same proxy.rs /
splice hot path. Once a rule is active and bytes are flowing, the
forwarding code is byte-for-byte identical.
To tell a real difference apart from loopback noise, we ran six
back-to-back uncapped A/B repeats (--seconds 10 --omit-seconds 2),
plus the 30-second headline run — seven pairs in total:
| Run | portunus-standalone | portunus-client | standalone / client |
|---|---|---|---|
| headline (30s) | 31,936 | 26,644 | 119.9% |
| rep 1 | 32,528 | 35,507 | 91.6% |
| rep 2 | 26,250 | 34,782 | 75.5% |
| rep 3 | 14,616 | 27,311 | 53.5% |
| rep 4 | 35,801 | 24,801 | 144.4% |
| rep 5 | 35,025 | 32,886 | 106.5% |
| rep 6 | 27,525 | 27,086 | 101.6% |
| median | ~30,027 | ~30,099 | ~100% |
The standalone-to-client ratio swings from 53.5% to 144.4% — sometimes standalone wins, sometimes client does — yet the two medians land within ~0.2% of each other. There is no systematic data-plane throughput difference between the two builds. The wide swing is simply the uncapped ceiling at ~30 Gbit/s being governed by scheduler and core-placement luck, not by which binary is under test.
The clean, low-noise proof is the offered-load sweep
below: at every fixed paced rate from 100 Mbit/s to 20 Gbit/s, standalone
and client both deliver the requested rate within < 1% of each other and
of the iperf3 / iptables baselines.
The only structural difference is that portunus-client also runs the
gRPC control stream and reports stats periodically. On a CPU-saturated
host that is tiny background overhead — below the measurement noise of the
data plane. For raw forwarding throughput, pick the build
that fits your deployment model (central control plane vs static TOML);
it is not a performance decision.
Offered-load sweep
A raw maximum answers "how fast can it possibly go?" The more useful
operator question is "at my link speed, does the forwarder get in the
way?" To answer it we use iperf3 -b <rate>, which paces the sender to
a fixed rate — simulating a real WAN/VPS link of that speed — and check
whether each path delivers it (--seconds 5 --omit-seconds 1). Same
11-point sweep, all four paths back-to-back:
| Offered load | iperf3 | iptables REDIRECT | portunus-standalone | portunus-client |
|---|---|---|---|---|
| 100 Mbit/s | 100.02 | 100.02 | 100.02 | 100.02 |
| 500 Mbit/s | 499.82 | 499.92 | 499.90 | 500.05 |
| 1 Gbit/s | 999.94 | 999.83 | 999.82 | 999.78 |
| 2.5 Gbit/s | 2,499.94 | 2,499.75 | 2,499.89 | 2,499.75 |
| 5 Gbit/s | 4,999.57 | 4,999.71 | 4,999.48 | 4,999.68 |
| 7.5 Gbit/s | 7,499.46 | 7,499.55 | 7,499.48 | 7,499.30 |
| 10 Gbit/s | 9,998.99 | 9,999.29 | 9,999.61 | 9,999.33 |
| 12.5 Gbit/s | 12,499.27 | 12,499.22 | 12,499.17 | 12,501.03 |
| 15 Gbit/s | 14,999.09 | 14,997.84 | 14,999.13 | 14,999.24 |
| 18 Gbit/s | 17,846.25 | 18,001.85 | 18,013.34 | 18,030.70 |
| 20 Gbit/s | 19,998.82 | 19,924.28 | 19,481.14 | 20,180.43 |
Across the entire 100 Mbit/s → 20 Gbit/s range, all four paths hit the
offered rate within iperf3's short-run measurement noise. The splice
fast path keeps both Portunus builds level with the in-kernel iptables
REDIRECT baseline through the whole sweep — on this 4-vCPU host there is
no link speed at which the richer control plane costs measurable
throughput. The small variation in the 18–20 Gbit/s rows (e.g. standalone
19,481 vs client 20,180 at 20 Gbit/s) is short-run pacing noise, not a
ranking — the same noise that drives the uncapped-max swing above.
This is the band the pre-splice v0.11 baseline could not hold: there, Portunus dropped to ~33–49% of iptables above 12.5 Gbit/s. The splice fast path (introduced in v1.3.0) closed that gap, and v1.6.1 re-confirms it on the same host.
What does NOT change with splice
- Per-connection setup latency, half-close semantics, byte counters,
Prometheus metrics, audit events, RBAC, and rate limiting are all
unaffected. Capped rules stay on the original userspace path,
byte-identical to v1.2.0. splice eligibility is decided per
connection: it requires
Linux && TCP && !PORTUNUS_DISABLE_SPLICE && no bandwidth cap on the rule or its owner. Theconcurrent_connectionsandnew_connections_per_seclimits are checked when the connection is accepted and do not disable splice. See Rate Limiting & QoS — Interaction with the splice fast path. - Cross-platform behaviour is unchanged: macOS and Windows builds never
use splice (it is gated to Linux with
#[cfg(target_os = "linux")];nmconfirms zero splice symbols in the macOS release binary). - The operator surface is unchanged: no new rule field, no wire-protocol
field, no Web UI control, no CLI flag. The
PORTUNUS_DISABLE_SPLICE=1environment variable exists only for troubleshooting and bench A/B testing (see Disabling the Linux fast path for triage).
Method
Mature proxy performance reports separate methodology from results:
- Record the system under test: commit, build profile, CPU, OS, kernel, tool versions, and relevant sysctls.
- Use release binaries. Disable debug logging on the hot path.
- Measure a direct baseline first, then the proxy path on the same host.
- Use warm-up / omitted seconds so TCP slow-start and process startup do not dominate the sample.
- Report throughput, setup / RTT latency, connection behaviour, and any retransmits or rejects.
- Report the uncapped loopback maximum as a range, not a single number — at tens of Gbit/s over loopback it is dominated by scheduler noise.
- Keep regression gates separate from absolute marketing numbers. CI catches drift; dedicated hardware establishes product claims.
The repo uses three layers:
| Layer | Command | Purpose |
|---|---|---|
| Criterion TCP data plane | cargo bench -p portunus-client --bench data_plane -- --quick | Stable loopback proxy regression signal. |
| Criterion UDP data plane | cargo bench -p portunus-client --bench udp_data_plane -- --quick | UDP steady-state and RTT regression signal. |
| Real-process compare | scripts/perf_compare.py | Real portunus-server + portunus-client + portunus-standalone + iptables + iperf3 on one host. |
scripts/perf_compare.py is the v1.6.1 test harness. It drives the current
CLI end-to-end — server.toml operator_token bootstrap, the operator
HTTP POST /v1/client-enrollments enrollment flow, portunus-client enroll + bundle + push-rule, and a TOML-driven portunus-standalone
rule — measuring the iperf3, iptables REDIRECT, standalone, and client
paths back-to-back. (The older scripts/perf_loopback.py predates the
v1.6.1 one-time enrollment URI flow and the SQLite state.db file lock,
so its provision-client / bundle path no longer runs.)
Interpretation
Portunus is a userspace L4 forwarder: it accepts a TCP connection, dials the target, and moves bytes between the two sockets. That makes it far richer than a plain kernel NAT rule, but it still has to cross the userspace boundary that the kernel does not.
How much each connection costs depends on what the rule asks for:
- Plain TCP, no bandwidth cap, on Linux: the byte copy runs through
the kernel
splice + pipepath, so the payload never enters userspace. On this host the offered-load sweep shows both Portunus builds trackingiperf3andiptablesREDIRECT within iperf3 noise all the way to 20 Gbit/s. - Any bandwidth cap, or UDP, or macOS / Windows, or splice disabled:
the original userspace path runs. For these cases, kernel NAT and
nftables flowtables remain the performance ceiling — the userspace
read → memcpy → writecycle adds a real per-connection cost.
Either way, Portunus is the right tool when you need remote client enrollment, central rule push, RBAC, an audit trail, per-owner QoS, metrics, SNI routing, the PROXY protocol, and a managed rule lifecycle. A static DNAT rule (a fixed kernel destination rewrite) gives you none of these — so comparing raw throughput alone is the wrong question, unless raw throughput is genuinely all you need.
Comparable Forwarders
| Dimension | Portunus (v1.6.1) | iptables DNAT / MASQUERADE | nftables + flowtables | NGINX stream | HAProxy TCP mode | Envoy tcp_proxy | rinetd | socat | SSH -L / -R |
|---|---|---|---|---|---|---|---|---|---|
| Performance profile | Userspace L4 with a Linux TCP splice fast path for uncapped flows. On the bench host tracks iptables REDIRECT to 20 Gbit/s; capped / UDP / non-Linux paths stay on the standard userspace copy. | Highest. Kernel path, no userspace copy. | Highest for eligible flows; bypasses parts of the classic forwarding path. | High userspace proxy with mature event loop. | Very high userspace TCP proxy with excellent connection handling. | High but heavier userspace proxy, designed for service mesh / xDS. | Lightweight userspace redirector. | Flexible diagnostic pipe, not tuned as a managed production proxy. | Encrypted tunnel; throughput pays SSH crypto overhead. |
| TCP | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| UDP | Yes (first-packet enforced) | Yes | Yes | Yes | No generic UDP forwarding | Dedicated UDP filters / use cases | Yes | Yes | No native UDP |
| Dynamic remote rule push | First-class: central server pushes signed rule bundles to edge clients over pinned TLS; CLI, operator HTTP API, embedded Web UI, hot-reload. | No built-in control plane | No built-in control plane | Reload/API depends on edition/config | Runtime API supports many operations | xDS control plane | Config reload style | No | Per-session |
| RBAC / audit / metrics | Native per-user / per-client / per-protocol / per-port-range RBAC; structured audit trail; Prometheus metrics; embedded SQLite store. | External only | External only | Metrics via modules; no native tenant RBAC | Strong stats, ACLs, stick tables | Rich telemetry and policy | Minimal | Minimal | SSH auth/logs |
| QoS / rate limit | Per-rule and per-owner: bandwidth_in/out_bps, new_connections_per_sec, concurrent_connections. Token-bucket limiter; capped rules go through the standard userspace path. | Basic shaping via tc / nftables ecosystem | Via nftables / tc, not app-owner aware | limit_conn / limit_rate style controls | Rich connection/rate controls | Rich filters, overload manager | Minimal | Minimal | Minimal |
| Best fit | Centrally managed edge listeners with tenant-aware RBAC, per-owner QoS, observability, SNI dispatch, and PROXY protocol — TCP/UDP, single ports or ranges, IP or DNS targets. | Static local host/network NAT. | Modern Linux packet filtering and forwarding. | Generic TCP/UDP proxying. | L4/L7 load balancing where TCP is enough. | Service mesh or xDS-managed environments. | Simple port redirection baseline. | Experiments and debugging. | Operationally convenient encrypted tunnels. |
A fair benchmark is not "compare published numbers." It is:
- The same host, or the same two-host topology.
- The same target application (
iperf3TCP, and UDP where supported). - The same duration, warm-up, socket buffers, CPU governor, MTU, and kernel.
- A direct baseline first.
- One forwarding implementation at a time.
Reproduction
Local / VPS compare
# Debian/Ubuntu
sudo apt-get update
sudo apt-get install -y build-essential cmake pkg-config protobuf-compiler \
git curl iperf3 iptables
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
source "$HOME/.cargo/env"
git clone https://github.com/ZingerLittleBee/Portunus.git
cd Portunus
PORTUNUS_SKIP_WEBUI=1 cargo build --release \
-p portunus-standalone -p portunus-server -p portunus-client
# /tmp may be tmpfs; portunus refuses a tmpfs-backed state.db. Use $HOME.
export TMPDIR="$HOME"
# Headline: direct / iptables / standalone / client + capped convergence.
# --with-iptables is Linux-only and needs root or passwordless sudo.
python3 scripts/perf_compare.py --seconds 30 --omit-seconds 5 \
--with-iptables --cap-bytes-per-sec 1048576 \
--json-out perf-headline.json
# Offered-load sweep across all four paths.
python3 scripts/perf_compare.py --seconds 5 --omit-seconds 1 \
--with-iptables --cap-bytes-per-sec 0 \
--offered-mbps 100,500,1000,2500,5000,7500,10000,12500,15000,18000,20000 \
--json-out perf-sweep.jsonUseful flags: --skip-client (standalone only), --skip-standalone
(client only), and --server-bin / --client-bin / --standalone-bin
to point at prebuilt binaries. The headline JSON looks like:
{
"direct": { "mbps": 30889.985, "retransmits": 147 },
"iptables_redirect": { "mbps": 26987.079, "retransmits": 16 },
"iptables_vs_direct_pct": 87.37,
"standalone_uncapped": { "mbps": 31935.997, "retransmits": 9 },
"standalone_vs_direct_pct": 103.39,
"standalone_vs_iptables_pct": 118.34,
"client_uncapped": { "mbps": 26644.098, "retransmits": 4 },
"client_vs_direct_pct": 86.25,
"client_vs_iptables_pct": 98.73,
"standalone_vs_client_pct": 119.86,
"client_capped": {
"cap_bytes_per_sec": 1048576, "mbps": 8.388,
"target_mbps": 8.389, "within_plus_10pct": true
}
}For a two-host test, run portunus-server on the control host, run
portunus-client (or portunus-standalone) and iperf3 -s on the
edge/target host, then drive iperf3 -c <edge-listen-ip> -p <listen_port> -t 30 -O 5 -J from a third host. Keep a direct iperf3 to the target as
the baseline.
Criterion Regression
cargo bench -p portunus-client --bench data_plane
python3 scripts/bench_regression_gate.py --max-regression-pct 50
cargo bench -p portunus-client --bench udp_data_plane -- --quick
cargo bench -p portunus-server --bench operator_api -- --quickHistorical baselines
The pre-v1.6.1 reference numbers — the v1.3.0 splice-introduction tables (measured on the original Debian 13 host) and the v0.11 pre-splice Linux iptables comparison — live on their own page: Performance History. Consult it only if you need the prior reference for traceability.
Sources
- Envoy performance FAQ and benchmark guidance: envoyproxy.io docs
- HAProxy management and runtime/statistics documentation: HAProxy docs
- NGINX stream proxy module: nginx.org stream proxy docs
- nftables flowtables: nftables wiki
- iptables extensions and NAT targets: man7 iptables-extensions
- rinetd reference: Debian rinetd man page
- socat manual: Debian socat man page