Storage latency is the elapsed time from when an I/O request is submitted to when it completes — measured in microseconds (µs) for NVMe storage, where lower values mean faster application response times.
Storage latency has multiple contributing components that stack end-to-end: host software overhead (kernel block layer, driver processing), network transmission time (propagation delay + queuing delay), target software processing, and device access time (NAND flash read latency, DRAM buffering). For locally attached NVMe devices, software and device latency dominate — modern NVMe SSDs have access latencies of 50–100 µs for reads. For networked storage, network latency adds to this baseline, making protocol efficiency critical.
Latency is commonly reported as average (mean), P50 (median), P99 (99th percentile), and P99.9 (tail latency). For database and transactional workloads, tail latency is often more important than average latency: a database query that involves thousands of storage operations will be bounded by the slowest I/O in the set. A protocol that has low average latency but poor tail latency (e.g., due to TCP retransmissions or congestion events) can cause unpredictable application slowdowns even when average performance looks acceptable.
NVMe/TCP achieves its low latency through several mechanisms: the NVMe command set eliminates the SCSI CDB interpretation overhead that adds microseconds per operation in iSCSI; TCP offload (TSO, LRO) reduces per-packet CPU processing; and the blk-mq multi-queue architecture minimizes lock contention in the kernel I/O path. End-to-end NVMe/TCP latency of 25–40 µs over a local network represents a genuine improvement over iSCSI's typical 100–200 µs, making NVMe/TCP suitable for latency-sensitive workloads that iSCSI could not serve.
Latency reduction is one of the primary motivations for migrating from iSCSI to NVMe/TCP. The 3–5× latency improvement that NVMe/TCP provides over iSCSI on the same Ethernet hardware translates directly into faster database query times, lower transaction processing times, and improved application responsiveness. For RDMA-capable environments, NVMe/RDMA can go further — achieving 10–20 µs — but for the majority of deployments where standard Ethernet infrastructure is already in place, NVMe/TCP's 25–40 µs is a dramatic improvement that justifies migration without any hardware changes.
| Protocol / Medium | Typical Latency | Notes |
|---|---|---|
| NVMe local (PCIe) | 50–100 µs | NAND flash access time |
| NVMe/TCP (local network) | 25–40 µs added | Network RTT + SW overhead |
| NVMe/RDMA (RoCE) | 10–20 µs added | Kernel bypass, lossless fabric |
| iSCSI | 100–200 µs | SCSI overhead + TCP stack |
| Fibre Channel | 30–50 µs | Deterministic, lossless fabric |