A reactor is a thread that loops on epoll_wait and dispatches
work to registered callbacks. There is one reactor per CPU core. The loop
never voluntarily gives up the CPU. This single design choice is the
reason SPDK gets its numbers.
A reactor is a thread that loops forever, calling
spdk_thread_poll() for every lightweight thread currently
scheduled on its core. It's not a class, not a process, not a pthread
in the usual sense — it is a control flow. A reactor is the thing
that owns a core for as long as the application runs.
The minimum viable reactor in pseudocode:
while (running) { for each spdk_thread in this reactor's thread list: spdk_thread_poll(thread, 0, now);}
That's it. The reactor does not block on user-space primitives. It does
not wait for I/O with a syscall. It does not sleep. It runs a tight loop,
checking every lightweight thread on its core for ready work — pending
messages, expired timers, busy pollers that need to fire.
One reactor per core
SPDK's default scheduler creates exactly one reactor per CPU core that's
in the application core mask. The mask is set at startup via
--cpumask or --lcores:
Each reactor is a struct spdk_reactor allocated at
spdk_reactors_init() time, one slot per logical core on
the machine, even if the core isn't in the active mask. Inactive
cores get flags.is_valid = false and a no-op reactor
slot. Here's the actual struct:
The poll loop, line by line
Here is the entire reactor loop, verbatim. This is the single most
important function in the framework:
The actual work happens inside
_reactor_run():
How pollers get registered
A poller is a function you want called repeatedly. There are two
flavors: timed pollers (every N microseconds) and
active pollers (as fast as the reactor can fire them,
also called "busy" pollers). Both live on an
spdk_thread, not directly on a reactor. The reactor
just walks the thread list and asks each thread to run its
pollers.
From a user's perspective, you register a poller on the current
thread:
Internally, this calls into poller_register() at
lib/thread/thread.c:1707, which:
Pulls the current spdk_thread from
thread-local storage (tls_thread).
Allocates a struct spdk_poller with the
function, argument, and period.
Converts the period from microseconds to TSC ticks
(so the reactor can compare against its own
tsc_last without doing division in the
hot loop).
Calls thread_insert_poller() at
lib/thread/thread.c:955, which
either inserts the poller into the
active_pollers TAILQ (period = 0) or into
the timed_pollers red-black tree (period > 0).
The reactor's role is then trivial: walk the thread, ask the
thread to fire any due pollers, return.
Cooperative scheduling: why you must behave
Now you have the picture: the reactor runs a tight loop, and on
every iteration it asks every thread to do whatever it wants to
do — fire a poller, drain a message, the lot. There is no
preemption. If a poller takes 10 milliseconds, the
reactor does not switch to a different thread; it sits there and
waits. Every other lightweight thread on the same core is
blocked behind that one slow poller.
flowchart LR
subgraph Core2["CPU core 2 — reactor_2"]
direction TB
R2["reactor_2 loop"] --> T1["spdk_thread 'nvmf_tgt'"]
R2 --> T2["spdk_thread 'bdev_poll'"]
R2 --> T3["spdk_thread 'rpc'"]
T1 -.->|blocked| Slow["slow_poller (10 ms)"]
T2 -.->|blocked| Slow
T3 -.->|blocked| Slow
end
fig. 1 — one reactor, three threads, one bad poller · tap or scroll to zoom · ↗ for fullscreen
fig. 1 One reactor on core 2 has three threads: an nvmf
target, a bdev poller, and an RPC handler. If
slow_poller is busy-looping for 10 ms, the reactor
doesn't preempt it — every other thread on the same core waits
10 ms. That's why a single bad poller takes down the whole core.
This is why a single bdev module can starve a whole core. It is
also why the framework exposes per-core utilization in
spdk_top: you need to see this happening, because
it won't be obvious from the application's perspective. The
RPC will simply be slow.
How reactors are started
Reactors don't start themselves. spdk_reactors_start()
at lib/event/reactor.c:1097 is the
orchestrator:
From the application's perspective, spdk_app_start()
blocks until shutdown. From the OS's perspective, every
reactor is a pinned pthread, each with name
reactor_N, and one of them (the original main
thread) is the one that's actually insidespdk_app_start().
What happens at shutdown
Shutdown is initiated by spdk_app_stop() at
lib/event/app.c:1111. That function
sends a message to the app thread, which kicks off subsystem
finalization, which eventually calls
spdk_reactors_stop():
After the loop breaks, each reactor runs teardown:
Edge cases & what trips people up
This is the section that pays for the rest of the page. The
reactor model is simple in the abstract and full of sharp
edges in practice.
1. What happens if a reactor falls behind
The reactor is a tight loop with no catch-up. If a poller on
the reactor runs for 1 ms, the reactor is busy for that 1 ms,
and any timers that should have fired during it will
fire late. The reactor doesn't "burst" the missed
pollers on the next iteration; it just keeps going. A
periodic poller with a 100 µs period can drift to a 1 ms
effective period under load. You will not see a stack
trace; you will see tail latency. The fix is to profile
with spdk_top and find the slow poller.
2. What happens if a poller never returns
A poller that does while (1) hangs the
reactor. The kernel will not preempt it. The whole core is
dead. Worse: if your application accepts incoming TCP
connections on that core, the kernel TCP state machine
will still work, but the SPDK-side handlers never run.
Clients will see "connection accepted but no response."
This is the failure mode of an infinite loop in a poller.
The only recovery is to SIGKILL the process
and start over. Test your pollers with timeouts.
3. What happens if a poller blocks on a foreign mutex
Imagine your poller calls into a Go runtime (via a CGo
shim) that holds a Go mutex. The Go scheduler is
cooperative, but if the goroutine that owns the mutex is
on a different OS thread, your poller will spin or block.
In the best case, the kernel preempts the pthread and you
see involuntary context switches climbing in the rusage
log. In the worst case, you deadlock the reactor
permanently. Never call out of a poller to a system
that might block on something you don't own.
4. What happens at shutdown if a poller doesn't unregister
Look at the teardown loop again. The reactor keeps
polling a thread until the thread reports
SPDK_THREAD_STATE_EXITED. A thread can't
reach that state while it has registered pollers. The
teardown logs the poller name as a warning and
does free it, but this is a sign of a leaked
resource. If you have a long-running poller that you
conditionally register but forget to unregister, your
shutdown will hang. The error log you get is
"active_poller %s still registered at thread exit"
at lib/thread/thread.c:413.
Read that log line. It is telling you exactly what to fix.
5. Why you can't safely allocate huge amounts of memory from a poller
A poller that calls malloc(1<<30) might
trigger a page fault, and the kernel might decide to swap.
The kernel will happily schedule another thread while your
swap is in progress. The reactor will see a context
switch. Your tail latency will spike. Pre-allocate
large buffers in startup, not in the hot path. This
is a big reason SPDK has mempool abstractions — see
Layer 0.1 for the
"hugepages for DMA memory" rationale.
6. Why a reactor can't safely do file I/O
read() on a regular file descriptor can sleep
in the kernel. The kernel will sleep your pthread.
When it wakes up, you have a context switch you didn't
budget for. Worse, the file I/O is a perfect storm of
preemption: your pthread is descheduled, another thread
(maybe even another reactor on a different core) is
scheduled, your cache line gets invalidated, your
spdk_io_channel shared state is in flight.
If you absolutely need to do file I/O, do it on a
dedicated thread that is allowed to block. SPDK gives you
the abstraction: spdk_thread (see
2.2).
7. The "I'm sure it's just one syscall" trap
Every well-intentioned "it's just a single
gettimeofday(), no harm done" is a
thread-yielding syscall. gettimeofday() is
vDSO and usually fine. clock_gettime(CLOCK_REALTIME, ...)
with CLOCK_REALTIME_COARSE is fine. getpid()
is fine. sysinfo() is fine. stat()
on a path is not fine. open() is not fine.
malloc() is sometimes fine and sometimes not.
The rule of thumb: if it can block, don't call it
from a poller. Use TSC ticks instead of clock
reads whenever possible.
8. The diskengine client never sees a reactor
Your Go code in
Client.Call:43 talks JSON-RPC over a
Unix socket. The C side
( lib/event/reactor.c:558spdk_event_call()) routes the request to a
reactor, but the Go side has no concept of "which reactor
am I on" — it just sees a synchronous RPC reply. This
is by design. The threading model is internal to SPDK;
the JSON-RPC API is the abstraction boundary. The
consequence: you can never block the Go side waiting
for an SPDK resource that's only safe to use on a
specific thread. The RPC framework guarantees the
response is computed on a thread that's safe for the
RPC's handler.
What to take away
The reactor is the most important idea in the whole codebase,
and it's also the simplest. A thread per core. A loop. A list
of pollers. No preemption, no sleeping, no kernel scheduler
in your way. The price is a strict set of rules about what
you can do from a poller; the reward is that, when you follow
the rules, every microsecond of CPU time goes to your
application.
The next page — 2.2 —
spdk_thread — digs into the next layer
down: the lightweight thread that the reactor is actually
looping over. The reactor owns a core; the
spdk_thread is the unit of work that gets to
run on it.
Diagram
Pinch / ⌘+scroll to zoom · drag to pan · Esc to close