What is SPDK?
A 10,000-foot view. What the Storage Performance Development Kit actually is, who runs it in production, where it sits in the storage stack, and — just as importantly — what it deliberately does not try to be.
- The 30-second answer
- What problems SPDK solves (and why they exist)
- What SPDK deliberately does not solve
- Who actually runs SPDK in production
- Where SPDK sits in the storage stack
- The framework, the libraries, and the apps — the three-layer shape
- A short history: Intel, the move to the Linux Foundation
- Edge cases — when SPDK is the wrong tool
- Why this matters for the rest of the curriculum
The 30-second answer
SPDK (the Storage Performance Development Kit) is an open-source, userspace, poll-mode C framework for building storage applications that talk directly to NVMe SSDs, NVMe-oF fabrics, virtio devices, and iSCSI — without going through the kernel's I/O stack. It started inside Intel in 2013, was donated to the Linux Foundation in 2018, and is currently developed as a multi-vendor project with broad industry participation.
That sentence is accurate but doesn't tell you why it exists or when you'd want it. The rest of this page unpacks both.
What problems SPDK solves (and why they exist)
The primer at Layer 0.1 covers the "why userspace" story in depth. Quick recap, because you'll need it for the rest of this page:
Kernel I/O has overhead. Every
read()is a syscall. Every NVMe completion is an interrupt. Every page-cache miss is a copy. Individually these are cheap; together at a million IOPS they are not.Modern NVMe SSDs are absurdly fast. A single consumer-grade NVMe drive can do ~1M random 4KB reads per second. A rack of them is orders of magnitude beyond that. The kernel is not the bottleneck, but it is a meaningful fraction of the bottleneck.
Storage workloads are predictable. A database engine or hypervisor knows exactly which devices it owns, exactly which memory it will DMA into, and exactly which threads will submit I/O. That knowledge can be turned into code that does almost no work per I/O.
SPDK's answer to all three is the same: move the storage stack out of the kernel. The trade — loss of kernel safety nets, loss of filesystem semantics, loss of portability across devices — is explicit and accepted, not hidden. (See 0.1 for the full bill.)
Concretely, SPDK commits to four design choices that propagate through every library:
| Choice | What it means | Why it speeds things up |
|---|---|---|
| Poll-mode drivers | SPDK polls device registers; no interrupts for I/O completion. | No interrupt latency, no scheduling, no handler dispatch. You pay CPU in exchange for deterministic latency. |
| Per-core reactors | One polling thread per CPU core, each owning a subset of devices. | No cross-core locking, no shared queue contention, perfect cache locality on the hot path. |
| Message passing, no locks | Cross-thread work goes through a per-thread mailbox, not a mutex. | Lock contention is the silent killer of high-IOPS designs. SPDK never takes a lock in the I/O path. |
| Hugepages for DMA | I/O buffers come from 2 MB (or 1 GB) hugepages, pinned in RAM. | Fewer TLB misses on the DMA path, no page faults while the device is reading your buffer. |
What SPDK deliberately does not solve
A surprising number of people come to SPDK expecting a faster drop-in filesystem. It isn't, and it isn't trying to be. Here's the honest list of things outside its scope:
Filesystems. SPDK ships no ext4, no xfs, no zfs. You get raw block devices, period. If your app needs a filesystem, you put one on top (or use bdev_lvol + a blobstore).
POSIX file API. There is no open()/read()/write() equivalent. All I/O goes through SPDK's own submit/complete async model.
Multi-tenancy safety. One SPDK process is one trust boundary. A misbehaving bdev module can corrupt the whole process; there is no kernel to catch you.
Hot device migration. NVMe hotplug works (see module/bdev/nvme), but migrating a live bdev's backing device underneath it does not.
Cross-OS portability. SPDK is Linux + FreeBSD. There is no Windows or macOS production story. (WPDK exists for Windows; macOS is unsupported.)
Use the kernel for: general-purpose file serving, anything that needs POSIX semantics, multi-tenant isolation, network file protocols (NFS, SMB).
Use SPDK for: the data path. Many real deployments run SPDK as the fast tier and the kernel as the slow tier.
Use SPDK + a userspace filesystem together: SPDK + a userspace FS (e.g. WickedFS, BlobFS) is a known pattern for high-performance databases.
Use a hypervisor's kernel module when: you need virtio with hundreds of VMs and you can't dedicate CPUs to a SPDK process per host.
Who actually runs SPDK in production
The honest answer: SPDK is used in production by an unknown but meaningful number of large-scale storage and cloud operators. The project keeps a public list of contributors and downstream consumers on its website; specific end-user names tend to be public only when the user wants them to be. Here is what is verifiable.
Hyperscalers and public cloud
The big public clouds have shipped storage products and primitives built on (or heavily inspired by) SPDK patterns. Microsoft Azure's Storage SPDK initiative is one of the most-cited end users; their contributions to the project (for example, around VHD/VHDX acceleration and the blobstore) have been upstreamed. AWS and Google Cloud's storage fleets use a mix of SPDK, custom kernel-bypass stacks, and purpose-built hardware; specific public claims vary by team and year. Intel's own optane-based product work was a major early customer.
Storage vendors and appliance makers
All-flash array vendors have been among the earliest and steadiest
adopters. The product pattern is: SPDK provides the fast NVMe-oF
front-end (or local virtio front-end for hypervisor-attached
deployments), a vendor-proprietary layer handles replication, dedup,
compression, and erasure coding, and a custom JSON-RPC control plane
ties it to the vendor's management UI. The exact list of vendors is
intentionally not enumerated here, but searching public
MAINTAINERS in the repo and SPDK summit talk recordings
surfaces most of the major players.
Hypervisor and unikernel projects
Cloud Hypervisor
is the most prominent example. It uses SPDK as the backend for its
vhost-user-blk and vhost-user-scsi device
implementations — meaning every disk that Cloud Hypervisor attaches
to a VM is, under the hood, an SPDK bdev. Firecracker
has its own block device implementation (Rust, integrated with the
kernel's virtio stack) and is not an SPDK user; the two projects make
different trade-offs and SPDK is the right pick when you need
userspace NVMe-oF or vhost-SCSI, Firecracker is the right pick when
you want minimal host integration.
Database and analytics engines
Several high-performance database vendors and research projects have SPDK-based storage engines. The pattern is the same: bypass the kernel page cache, DMA straight from the SSD into the engine's buffer pool. RocksDB and some of the modern cloud-native databases have at least experimental SPDK integrations; specific names move around faster than the curriculum can keep up.
Where SPDK sits in the storage stack
The home page of this curriculum showed a five-band picture of the stack. Here it is again, expanded, with the band names aligned to the code you'll read in the rest of the curriculum.
flowchart TB subgraph S1["Band 5 — Your application"] A1[Database / VM / custom] A2[Go control plane like diskengine] end subgraph S2["Band 4 — SPDK front-ends (the protocol servers)"] B1[nvmf_tgt — NVMe-oF target] B2[vhost — virtio target] B3[iscsi_tgt — iSCSI target] B4[spdk_tgt — generic combined] end subgraph S3["Band 3 — The bdev layer (the plugin bus)"] C1[lib/bdev + module/bdev/* backends] C2[nvme, malloc, aio, lvol, raid, split, passthru, ...] end subgraph S4["Band 2 — The core libraries"] D1[lib/event — reactor + spdk_thread] D2[lib/thread — threading abstraction] D3[lib/jsonrpc — control plane] D4[lib/nvme — raw NVMe driver] D5[lib/blob + lib/lvol — blobstore] end subgraph S5["Band 1 — The environment (env)"] E1[lib/env_dpdk — default] E2[lib/env_ocf — cache alternative] end subgraph S6["Band 0 — Hardware"] F1[NVMe SSD] F2[RDMA NIC] F3[CPU cores + hugepages] end S1 --> S2 S2 --> S3 S3 --> S4 S4 --> S5 S5 --> S6
fig. 1 Five bands, top to bottom: your app, the protocol server, the bdev bus, the core libraries, the environment, and the hardware. An SPDK process is one program that contains all five.
Three things worth noticing about this picture:
One process contains all five bands. There is no kernel call between bands 3 and 4, or between 4 and 5. Your
nvmf_tgtprocess is itself the entire stack from the network to the flash.The bdev layer is the bus. A protocol server in band 4 never talks to a device directly. It always talks to a bdev, and the bdev (in band 3) talks to a backend module (
nvme,aio,lvol, etc.). This indirection is the whole reason the samenvmf_tgtcan serve an NVMe SSD today and a logical volume over a Ceph cluster tomorrow with no recompilation — see Layer 4.1.Band 1 is swappable. The default environment is
lib/env_dpdk, which uses DPDK to mmap PCIe BARs and allocate hugepages. Butlib/env_ocfexists as an alternative cache-aware environment, and ports to other models (e.g.xnvme) are practical. Most users never touch this; it's a useful thing to know exists.
The framework, the libraries, and the apps — the three-layer shape
SPDK's design is unusually tidy. Everything is one of three things:
The framework defines interfaces. A plugin implements an interface. An app wires framework libraries to a specific combination of plugins and exposes a JSON-RPC control plane. This is the same shape as a typical application framework (think: a web server with middleware), but applied to storage.
Concretely, this means:
lib/bdev/bdev.cdefines the bdev framework — what a block device looks like, what operations it supports, howspdk_bdev_openworks, etc. See Layer 4.1.module/bdev/nvme/implements one specific bdev backend — "a bdev that is backed by an NVMe namespace." Many such backends exist:malloc(RAM-backed),aio(kernel AIO to a file or block device),lvol(blobstore-backed),raid(software RAID across other bdevs),passthru(a thin wrapper that proxies to another bdev),rbd(Ceph RBD), etc.app/nvmf_tgt/is one specific app that wires up the bdev framework, the NVMe-oF target library, the JSON-RPC server, and the reactor — and produces a single binary that you run.
This pattern — a Makefile that names a high-level list of "things I want" and lets the build system resolve it — is everywhere in SPDK. It is the single biggest reason a new contributor finds SPDK's build system easier than expected: you don't write build rules, you write a manifest.
A short history: Intel, the move to the Linux Foundation
SPDK began inside Intel around 2013, originally as a set of userspace NVMe and NVMe-oF prototypes meant to demonstrate the performance of Intel's own server platforms. The first public release landed in 2015.
In late 2018 Intel transferred the project to the Linux Foundation under a new top-level project called the Storage Performance Development Kit (SPDK) project. The move put SPDK under neutral governance, opened the door to broader vendor participation, and aligned it with adjacent Linux Foundation projects (DPDK, OCF, Ceph, fio) that SPDK already depended on or interacted with.
What that history means for you, in practice, today:
The repo you're reading is multi-vendor. Look at the copyright headers — they read "Intel Corporation," "Mellanox Technologies," "NVIDIA CORPORATION & AFFILIATES," "Samsung Electronics," "Dell Inc," "Oracle and/or its affiliates." That variety is the point. Decisions are made by the technical steering committee, not by any single company.
The release cadence is quarterly. v23.x, v24.x, v25.x, and now v26.01. Major changes land in the "x.05" releases; bug-fix releases are "x.01," "x.09," etc. The repo you have locally pins to v26.01.0 — see
VERSION.The project's pace is fast and you should expect API churn. Public APIs are versioned via
spdk_*.mapfiles (linker version scripts); ABI breaks are real and frequent, but the project is committed to a stable internal API for a given release branch. If you embed SPDK in your own product, track a release branch, notmain.
Edge cases — when SPDK is the wrong tool
This is the section the SPDK website doesn't emphasize, and where real-world debugging usually starts. SPDK is a sharp tool, and it has sharp edges. The honest list:
You need POSIX file semantics. Open, close, seek, stat, mmap. SPDK gives you a bdev handle, not a file descriptor. If your app expects a filesystem, you'll spend more time building one than you save.
Your workload is mixed read/write with metadata. A 70/30 read/write workload with frequent fsync()s is the kernel's home turf. SPDK shines on big sequential or 100% random 4KB reads, not on metadata-heavy databases.
You need to share the device with other processes. The kernel arbitrates device access; SPDK owns its devices exclusively. If two processes need the same NVMe namespace, you put it on the kernel.
Your team doesn't have the C-fu. SPDK is C, with a heavy use of macros, function pointers, and a memory model that assumes you know what you're doing. Plan for a steep on-ramp.
You need iSCSI initiator, not target. SPDK ships an iSCSI target, not an initiator. If you need to consume iSCSI LUNs, use the kernel or a userspace initiator.
You're on macOS or Windows. SPDK is Linux + FreeBSD. There is no first-class port to Apple silicon or Windows. WPDK exists but is its own thing.
Your bottleneck is the network, not the storage. SPDK's nvmf target can be limited by NIC performance. A pure-kernel NVMe-oF target with a tuned kernel may match it for many workloads.
You're building a control plane, not a data plane. The JSON-RPC server in lib/jsonrpc is great for ops glue, but it is not a general-purpose management framework. If your work is "manage thousands of volumes across many hosts," you almost certainly want a separate control-plane stack (e.g. etcd + a Go service) talking to SPDK over JSON-RPC.
Why this matters for the rest of the curriculum
Everything in the rest of this curriculum is, in some sense, "SPDK, looked at more closely." Layer 2 zooms into the reactor; Layer 4 into the bdev framework; Layer 6 into the NVMe-oF target; Layer 8 into what it takes to write a bdev module. Knowing that SPDK is userspace, poll-mode, message-passing, and organized as framework + plugins + apps is the load-bearing context for every one of those layers.
Specifically, three things from this page that you should keep in your head as you go:
Poll-mode means "you do the waiting." Every "why is this spinning?" question you'll have, for the next several layers, is answered by the same thing: SPDK never sleeps waiting for an interrupt. It polls.
Per-core reactors mean "thread affinity is sacred." A bdev I/O must be submitted from the right thread, on the right core, or it goes through a mailbox hop. This is a Layer 2 topic, but you'll start seeing it as soon as Layer 3.
Framework + plugins means "you'll be writing a plugin, not a framework." You are not going to rewrite
lib/bdev. You are going to add a directory undermodule/bdev/and implement a known set of callbacks. The discipline of "match the interface exactly" is what makes SPDK composable.
The next page — 1.2 — How the repo is laid out — turns that abstraction into a directory tree. You'll see every band of the diagram from this page, materialized as a path.