NVMe/TCP vs NVMe/RDMA: Which NVMe-oF Transport Is Right for You?

Feature	NVMe/TCP	NVMe/RDMA
Latency	25–40 µs	10–20 µs
Throughput	~95% wire speed	~98% wire speed
Random I/O (IOPS)	~1.8M IOPS	~2.1M IOPS
Hardware Required	Standard Ethernet NICs	RDMA-capable NICs (RoCE/iWARP)
Setup Complexity	Low — standard TCP/IP stack	High — RDMA fabric config, PFC, ECN tuning
Infrastructure Cost	Standard Ethernet cost	2–4× higher (RDMA NICs + switches)
CPU Offload	Partial (some kernel bypass options)	Full kernel bypass
Operational Risk	Low — familiar TCP/IP ops	Higher — RDMA-specific failure modes

Understanding the Latency Gap

NVMe/RDMA achieves its 10–20 µs latency through two mechanisms that TCP fundamentally cannot replicate: kernel bypass and zero-copy data transfer. In a standard TCP stack, every I/O crosses the kernel networking subsystem multiple times — data is copied from application buffers into kernel space, processed through the TCP/IP stack, handed to the NIC driver, and the reverse happens on receipt. RDMA eliminates this entirely. The NIC reads from and writes to application memory directly, without involving the CPU or kernel for the data path. The result is that RDMA latency is bounded primarily by hardware propagation delay and NVMe drive access time, while TCP latency includes software scheduling jitter on top of that baseline.

That 15–20 µs difference is real and measurable. The question is whether it is meaningful for your specific workload. For high-frequency trading systems where decisions are made in sub-millisecond windows, 15 µs is significant. For a Kubernetes-hosted PostgreSQL replica responding to application queries that average 2–5 ms, the storage protocol's 15 µs contribution to that total is less than 1% — below any threshold that would change a business outcome. The trap is assuming that lower latency always translates to higher application throughput. It does at the extremes. For the vast majority of production workloads, other variables — CPU scheduling, query planning, application caching — dominate by orders of magnitude.

There is also the operational cost of achieving that RDMA latency. RoCEv2, the most common RDMA transport, requires Priority Flow Control (PFC) and Explicit Congestion Notification (ECN) to be precisely configured across every switch in the fabric. A misconfigured PFC pause frame can cause a livelock that takes down an entire storage fabric. iWARP avoids some of these issues but introduces its own implementation complexity. The NVMe/RDMA administrator needs skills that span storage, networking, and hardware firmware — a combination that is genuinely scarce. NVMe/TCP, by contrast, runs on the same TCP/IP stack your networking team already understands.

Use Case Matrix

Workload	Better Choice	Why
High-frequency trading	NVMe/RDMA	Every microsecond directly impacts trading strategy performance and P&L
AI/ML training (cloud)	NVMe/TCP	Standard infrastructure suffices; training throughput is GPU-bound, not storage-latency-bound
HPC clusters	NVMe/RDMA	Predictable ultra-low latency for tightly-coupled parallel workloads like weather modeling
Cloud-native Kubernetes	NVMe/TCP	No RDMA fabric to provision; ops teams use familiar TCP/IP tooling
Enterprise block storage	NVMe/TCP	Pragmatic TCO; the 2–4× hardware premium rarely yields proportional application benefit

When the Latency Difference Doesn't Matter

For most workloads — general-purpose databases, analytics pipelines, object stores, content delivery, and the overwhelming majority of Kubernetes persistent volumes — the 15–20 µs gap between NVMe/TCP and NVMe/RDMA is dwarfed by other latency contributors. A typical PostgreSQL query involves connection overhead, parsing, planning, index traversal, and result serialization. The storage access component might represent 10–30% of total query latency on a well-tuned system. Shaving 15 µs from that fraction will not move the p99 latency your users actually experience. NVMe/TCP, at 25–40 µs, already delivers dramatic improvements over older protocols like iSCSI (100–200 µs) while running on infrastructure you already own. For 90%+ of production deployments, that trade-off — standard hardware, standard skills, NVMe performance — is straightforwardly the right one.

Conclusion

NVMe/RDMA is the right answer for a narrow, well-defined category of latency-critical workloads where the operational and financial premium is justified. NVMe/TCP is the right answer for nearly everything else — and as kernel implementations mature and smart NICs bring partial offload to commodity hardware, the latency gap is narrowing. The deployment complexity gap is not. For cloud-native teams who want NVMe performance without managing an RDMA fabric, simplyblock.io delivers NVMe/TCP-based storage that integrates directly with Kubernetes — no RDMA expertise required.

NVMe/TCP vs NVMe/RDMA

TL;DR — Quick Verdict

Feature Comparison

Understanding the Latency Gap

Use Case Matrix

When the Latency Difference Doesn't Matter

Conclusion

More Comparisons

Related Terms

NVMe/TCP for Kubernetes