Performance

What is Latency?

Storage latency is the elapsed time from when an I/O request is submitted to when it completes — measured in microseconds (µs) for NVMe storage, where lower values mean faster application response times.

Technical Overview

Storage latency has multiple contributing components that stack end-to-end: host software overhead (kernel block layer, driver processing), network transmission time (propagation delay + queuing delay), target software processing, and device access time (NAND flash read latency, DRAM buffering). For locally attached NVMe devices, software and device latency dominate — modern NVMe SSDs have access latencies of 50–100 µs for reads. For networked storage, network latency adds to this baseline, making protocol efficiency critical.

Latency is commonly reported as average (mean), P50 (median), P99 (99th percentile), and P99.9 (tail latency). For database and transactional workloads, tail latency is often more important than average latency: a database query that involves thousands of storage operations will be bounded by the slowest I/O in the set. A protocol that has low average latency but poor tail latency (e.g., due to TCP retransmissions or congestion events) can cause unpredictable application slowdowns even when average performance looks acceptable.

NVMe/TCP achieves its low latency through several mechanisms: the NVMe command set eliminates the SCSI CDB interpretation overhead that adds microseconds per operation in iSCSI; TCP offload (TSO, LRO) reduces per-packet CPU processing; and the blk-mq multi-queue architecture minimizes lock contention in the kernel I/O path. End-to-end NVMe/TCP latency of 25–40 µs over a local network represents a genuine improvement over iSCSI's typical 100–200 µs, making NVMe/TCP suitable for latency-sensitive workloads that iSCSI could not serve.

How It Relates to NVMe/TCP

Latency reduction is one of the primary motivations for migrating from iSCSI to NVMe/TCP. The 3–5× latency improvement that NVMe/TCP provides over iSCSI on the same Ethernet hardware translates directly into faster database query times, lower transaction processing times, and improved application responsiveness. For RDMA-capable environments, NVMe/RDMA can go further — achieving 10–20 µs — but for the majority of deployments where standard Ethernet infrastructure is already in place, NVMe/TCP's 25–40 µs is a dramatic improvement that justifies migration without any hardware changes.

Key Characteristics

  • Unit: Microseconds (µs) for NVMe; milliseconds (ms) for HDD/legacy
  • Percentiles: P50, P99, P99.9 — tail latency critical for databases
  • Components: Host SW + Network + Target SW + Device access
  • NVMe local read: 50–100 µs (device-dominant)
  • NVMe/TCP overhead: Adds ~10–30 µs over local access
  • Impact of queue depth: Higher QD increases average latency (queueing theory)

Latency Comparison by Protocol

Protocol / Medium Typical Latency Notes
NVMe local (PCIe) 50–100 µs NAND flash access time
NVMe/TCP (local network) 25–40 µs added Network RTT + SW overhead
NVMe/RDMA (RoCE) 10–20 µs added Kernel bypass, lossless fabric
iSCSI 100–200 µs SCSI overhead + TCP stack
Fibre Channel 30–50 µs Deterministic, lossless fabric

What is Storage Latency and Why It Matters?