Layer 2 · Threading & reactor

The reactor poll loop.

A reactor is a thread that loops on epoll_wait and dispatches work to registered callbacks. There is one reactor per CPU core. The loop never voluntarily gives up the CPU. This single design choice is the reason SPDK gets its numbers.

~15 min read1 diagramprerequisite: Layer 0.1, Layer 0.3
On this page
  1. What a reactor is
  2. One reactor per core
  3. The poll loop, line by line
  4. How pollers get registered
  5. Cooperative scheduling: why you must behave
  6. How reactors are started
  7. What happens at shutdown
  8. Edge cases & what trips people up

What a reactor is

A reactor is a thread that loops forever, calling spdk_thread_poll() for every lightweight thread currently scheduled on its core. It's not a class, not a process, not a pthread in the usual sense — it is a control flow. A reactor is the thing that owns a core for as long as the application runs.

The minimum viable reactor in pseudocode:

while (running) {
    for each spdk_thread in this reactor's thread list:
        spdk_thread_poll(thread, 0, now);
}

That's it. The reactor does not block on user-space primitives. It does not wait for I/O with a syscall. It does not sleep. It runs a tight loop, checking every lightweight thread on its core for ready work — pending messages, expired timers, busy pollers that need to fire.

One reactor per core

SPDK's default scheduler creates exactly one reactor per CPU core that's in the application core mask. The mask is set at startup via --cpumask or --lcores:

Each reactor is a struct spdk_reactor allocated at spdk_reactors_init() time, one slot per logical core on the machine, even if the core isn't in the active mask. Inactive cores get flags.is_valid = false and a no-op reactor slot. Here's the actual struct:

The poll loop, line by line

Here is the entire reactor loop, verbatim. This is the single most important function in the framework:

The actual work happens inside _reactor_run():

How pollers get registered

A poller is a function you want called repeatedly. There are two flavors: timed pollers (every N microseconds) and active pollers (as fast as the reactor can fire them, also called "busy" pollers). Both live on an spdk_thread, not directly on a reactor. The reactor just walks the thread list and asks each thread to run its pollers.

From a user's perspective, you register a poller on the current thread:

  spdk_poller_register(my_periodic_fn, my_arg, 1000 /* µs */);

Internally, this calls into poller_register() at lib/thread/thread.c:1707 , which:

  1. Pulls the current spdk_thread from thread-local storage (tls_thread).
  2. Allocates a struct spdk_poller with the function, argument, and period.
  3. Converts the period from microseconds to TSC ticks (so the reactor can compare against its own tsc_last without doing division in the hot loop).
  4. Calls thread_insert_poller() at lib/thread/thread.c:955 , which either inserts the poller into the active_pollers TAILQ (period = 0) or into the timed_pollers red-black tree (period > 0).

The reactor's role is then trivial: walk the thread, ask the thread to fire any due pollers, return.

Cooperative scheduling: why you must behave

Now you have the picture: the reactor runs a tight loop, and on every iteration it asks every thread to do whatever it wants to do — fire a poller, drain a message, the lot. There is no preemption. If a poller takes 10 milliseconds, the reactor does not switch to a different thread; it sits there and waits. Every other lightweight thread on the same core is blocked behind that one slow poller.

flowchart LR
subgraph Core2["CPU core 2 — reactor_2"]
  direction TB
  R2["reactor_2 loop"] --> T1["spdk_thread 'nvmf_tgt'"]
  R2 --> T2["spdk_thread 'bdev_poll'"]
  R2 --> T3["spdk_thread 'rpc'"]
  T1 -.->|blocked| Slow["slow_poller (10 ms)"]
  T2 -.->|blocked| Slow
  T3 -.->|blocked| Slow
end
fig. 1 — one reactor, three threads, one bad poller · tap or scroll to zoom · ↗ for fullscreen

fig. 1   One reactor on core 2 has three threads: an nvmf target, a bdev poller, and an RPC handler. If slow_poller is busy-looping for 10 ms, the reactor doesn't preempt it — every other thread on the same core waits 10 ms. That's why a single bad poller takes down the whole core.

This is why a single bdev module can starve a whole core. It is also why the framework exposes per-core utilization in spdk_top: you need to see this happening, because it won't be obvious from the application's perspective. The RPC will simply be slow.

How reactors are started

Reactors don't start themselves. spdk_reactors_start() at lib/event/reactor.c:1097 is the orchestrator:

From the application's perspective, spdk_app_start() blocks until shutdown. From the OS's perspective, every reactor is a pinned pthread, each with name reactor_N, and one of them (the original main thread) is the one that's actually inside spdk_app_start().

What happens at shutdown

Shutdown is initiated by spdk_app_stop() at lib/event/app.c:1111 . That function sends a message to the app thread, which kicks off subsystem finalization, which eventually calls spdk_reactors_stop():

After the loop breaks, each reactor runs teardown:

Edge cases & what trips people up

This is the section that pays for the rest of the page. The reactor model is simple in the abstract and full of sharp edges in practice.

1. What happens if a reactor falls behind

The reactor is a tight loop with no catch-up. If a poller on the reactor runs for 1 ms, the reactor is busy for that 1 ms, and any timers that should have fired during it will fire late. The reactor doesn't "burst" the missed pollers on the next iteration; it just keeps going. A periodic poller with a 100 µs period can drift to a 1 ms effective period under load. You will not see a stack trace; you will see tail latency. The fix is to profile with spdk_top and find the slow poller.

2. What happens if a poller never returns

A poller that does while (1) hangs the reactor. The kernel will not preempt it. The whole core is dead. Worse: if your application accepts incoming TCP connections on that core, the kernel TCP state machine will still work, but the SPDK-side handlers never run. Clients will see "connection accepted but no response." This is the failure mode of an infinite loop in a poller. The only recovery is to SIGKILL the process and start over. Test your pollers with timeouts.

3. What happens if a poller blocks on a foreign mutex

Imagine your poller calls into a Go runtime (via a CGo shim) that holds a Go mutex. The Go scheduler is cooperative, but if the goroutine that owns the mutex is on a different OS thread, your poller will spin or block. In the best case, the kernel preempts the pthread and you see involuntary context switches climbing in the rusage log. In the worst case, you deadlock the reactor permanently. Never call out of a poller to a system that might block on something you don't own.

4. What happens at shutdown if a poller doesn't unregister

Look at the teardown loop again. The reactor keeps polling a thread until the thread reports SPDK_THREAD_STATE_EXITED. A thread can't reach that state while it has registered pollers. The teardown logs the poller name as a warning and does free it, but this is a sign of a leaked resource. If you have a long-running poller that you conditionally register but forget to unregister, your shutdown will hang. The error log you get is "active_poller %s still registered at thread exit" at lib/thread/thread.c:413 . Read that log line. It is telling you exactly what to fix.

5. Why you can't safely allocate huge amounts of memory from a poller

A poller that calls malloc(1<<30) might trigger a page fault, and the kernel might decide to swap. The kernel will happily schedule another thread while your swap is in progress. The reactor will see a context switch. Your tail latency will spike. Pre-allocate large buffers in startup, not in the hot path. This is a big reason SPDK has mempool abstractions — see Layer 0.1 for the "hugepages for DMA memory" rationale.

6. Why a reactor can't safely do file I/O

read() on a regular file descriptor can sleep in the kernel. The kernel will sleep your pthread. When it wakes up, you have a context switch you didn't budget for. Worse, the file I/O is a perfect storm of preemption: your pthread is descheduled, another thread (maybe even another reactor on a different core) is scheduled, your cache line gets invalidated, your spdk_io_channel shared state is in flight. If you absolutely need to do file I/O, do it on a dedicated thread that is allowed to block. SPDK gives you the abstraction: spdk_thread (see 2.2).

7. The "I'm sure it's just one syscall" trap

Every well-intentioned "it's just a single gettimeofday(), no harm done" is a thread-yielding syscall. gettimeofday() is vDSO and usually fine. clock_gettime(CLOCK_REALTIME, ...) with CLOCK_REALTIME_COARSE is fine. getpid() is fine. sysinfo() is fine. stat() on a path is not fine. open() is not fine. malloc() is sometimes fine and sometimes not. The rule of thumb: if it can block, don't call it from a poller. Use TSC ticks instead of clock reads whenever possible.

8. The diskengine client never sees a reactor

Your Go code in Client.Call:43 talks JSON-RPC over a Unix socket. The C side ( lib/event/reactor.c:558 spdk_event_call()) routes the request to a reactor, but the Go side has no concept of "which reactor am I on" — it just sees a synchronous RPC reply. This is by design. The threading model is internal to SPDK; the JSON-RPC API is the abstraction boundary. The consequence: you can never block the Go side waiting for an SPDK resource that's only safe to use on a specific thread. The RPC framework guarantees the response is computed on a thread that's safe for the RPC's handler.

What to take away

The reactor is the most important idea in the whole codebase, and it's also the simplest. A thread per core. A loop. A list of pollers. No preemption, no sleeping, no kernel scheduler in your way. The price is a strict set of rules about what you can do from a poller; the reward is that, when you follow the rules, every microsecond of CPU time goes to your application.

The next page — 2.2 — spdk_thread — digs into the next layer down: the lightweight thread that the reactor is actually looping over. The reactor owns a core; the spdk_thread is the unit of work that gets to run on it.