A spdk_thread is a logical thread of execution that the
framework multiplexes onto a reactor. It's not a pthread. It's a
struct with a mailbox. When you want to do work, you send a message
to the thread; the thread's reactor will run it. When the thread
has nothing to do, it sits idle and other threads on the same core
get all the CPU.
In 2.1 you saw that a reactor is one
pthread per core, running a tight poll loop. That's a powerful
primitive, but it's also rigid: a single reactor is one
execution context. It has one spdk_get_thread(),
one TLS variable, one "current io channel per subsystem" cache.
Real applications need many execution contexts. An
SPDK-based NVMe-oF target has:
an RPC handler thread (for JSON-RPC requests)
one nvmf target thread per core (for I/O submission)
a poller thread per subsystem (bdev, copy, etc.)
one thread per active TCP connection, in some transports
Each of these needs its own state. The bdev subsystem, for example,
caches an spdk_io_channel per execution context — and
an "execution context" here means "the thing that submitted the
I/O." If two threads on the same reactor shared a channel, they'd
contend on the same submission queue and the bdev's poller would
have no idea who was waiting for what completion.
The metaphor that helps: think of spdk_thread as a
goroutine, and a reactor as a worker M:N scheduled onto a
pthread. The mapping is the same idea. The reason the framework
uses this M:N model is that with one pthread per core, you can
have many "threads" of execution without paying the cost of a
pthread for each.
The struct, end to end
Creating a thread — and the lifetime rules
A thread is created with spdk_thread_create():
Once created, a thread is "live" — it is on some reactor's
threads list, the reactor is calling
spdk_thread_poll() on it, and any
spdk_thread_send_msg() targeted at it will be
delivered.
When you're done with a thread, you call
spdk_thread_exit() — but only from the thread
itself, and only after all of its I/O channels have been
spdk_put_io_channel()'d and all its pollers have
been spdk_poller_unregister()'d. The exit
sequence is asynchronous: spdk_thread_exit() just
flips the state to EXITING and starts a 5-second
timeout. Subsequent reactor iterations check
thread_exit() at
lib/thread/thread.c:672
to see whether all the cleanup has actually happened.
The mailbox: spdk_thread_send_msg
This is the workhorse of the framework. Every cross-thread
handoff that isn't a poller goes through
spdk_thread_send_msg():
There is one more variant: spdk_thread_send_critical_msg():
How a Go JSON-RPC call ends up on an spdk_thread
The connection between the two layers is worth tracing once,
end to end, because it's the abstraction boundary your
diskengine code crosses constantly.
STEP 01
Go code
<code>spdkClient.BdevLvolCreate(...)</code> in diskengine
→
STEP 02
JSON-RPC encode
<code>Client.Call</code> serializes to JSON, writes to Unix socket
→
STEP 03
SPDK RPC server
<code>spdk_jsonrpc_server</code> thread reads the socket
→
STEP 04
Dispatch
RPC handler runs on a poller thread (the framework routes it)
→
STEP 05
bdev_lvol_create
Handler submits bdev I/O via the bdev module's submit callback
→
STEP 06
Completion
bdev poller fires, completes the bdev_io
→
STEP 07
RPC response
Handler sends the response, the RPC server writes to the socket
→
STEP 08
Go code resumes
<code>Client.Call</code> unblocks with the result
The detail that matters is step 4. The RPC framework doesn't run
the handler on a thread you chose — it runs the handler on
whatever thread the framework dispatches JSON-RPC work to
(typically the "app thread" or a dedicated RPC thread). The
handler then may decide to send the work to yet
another thread (e.g. the nvmf target's submit thread)
via spdk_thread_send_msg(). This is what the
"every I/O channel is bound to one thread" rule looks like
in practice: the bdev module's submit callback
has to be called on the thread that owns the channel.
The diskengine side is intentionally simple: the Go code in
Client.Call:43 does a synchronous request/response
and waits. It has no concept of "which reactor am I on"
because it's not on one. The threading model is entirely
internal to the SPDK process. From Go's perspective, SPDK is
a service that responds to JSON-RPC requests.
Pollers vs. messages vs. threads
Three abstractions, three use cases. Mixing them up is the
most common architectural mistake.
Abstraction
What it does
When to use it
Poller
A function that runs repeatedly on the
thread, on a period (or as fast as possible).
When you need to poll a state, complete I/O,
recheck a queue, etc. Anything that needs
to run on every reactor iteration or on a
timer.
Message
A one-shot function delivered to the thread's
mailbox, run on the next reactor iteration.
When you want a callback to run "soon, on this
thread" without registering a recurring poller.
RPC handlers, I/O completions, deferred
cleanup, state transitions.
Thread
A logical unit of work that has its own state
(pollers, channels, message ring).
When you have a subsystem with
long-lived state. The bdev module's submit
callback, the nvmf target's poller, the RPC
server's request thread — each is its own
spdk_thread.
Rule of thumb: if you're tempted to "just register a 1 ms
poller to do this one thing," you're almost always better
off sending a message instead. Pollers are for recurring
work; messages are for one-shots.
Migration: when a thread hops reactors
With the static scheduler, threads never migrate. With the
dynamic scheduler (gpm), they can. Here's the
mechanism:
Edge cases & what trips people up
1. spdk_thread_send_msg() from the target thread itself
The function checks nothing; it cheerfully enqueues a
message on the very thread that just called it. The
message will sit in the ring until the next reactor
iteration, and then run. This is a recipe for
deadlock if your fn is waiting for the
message to be delivered. The pattern "send a
message to self, then wait for it to be processed" is
broken — by definition, you can't both be the producer
and the consumer. Use a regular function call (or
spdk_thread_exec_msg() at
include/spdk/thread.h:547,
which detects the local case and runs the function
immediately).
2. Calling spdk_get_io_channel() on a thread that doesn't exist
The function at
lib/thread/thread.c:2376 does
thread = _get_thread(); if (!thread) ... abort().
If you're in a pthread that the framework didn't set
up — for example, a Go goroutine that crossed the
CGo boundary — tls_thread is NULL and
you abort. There is no implicit "current
thread" for non-SPDK threads. Everything
inside SPDK requires you to be on a known
spdk_thread.
3. The first spdk_thread_create() sets the app thread
The atomic compare-and-exchange at
lib/thread/thread.c:632 means
"the first thread wins." If you create thread A, then
create thread B, then create thread C, all of A, B, and C
are normal threads, but A is the "app thread" because
it was first. Framework init and fini must
happen from the app thread.
4. What happens when the target's reactor is busy
Your spdk_thread_send_msg() succeeds —
the message is in the ring. The target thread is
still mid-poller on something slow. The message
waits. Send-and-forget has unbounded
latency. The framework gives you
spdk_thread_send_critical_msg() for
"I really need this to run now" but that still
waits for the current poller to return. There is
no preemption. If you need a back-pressure
mechanism, the framework gives you the ring's
fill level — check spdk_ring_count()
before sending, or design your message handlers
to be fast.
5. Migration while a poller is running
The migration check is at
lib/event/reactor.c:922, in
reactor_post_process_lw_thread(). It runs
after the thread's pollers, not during. So a
poller is guaranteed to run to completion on the
current reactor. After it returns, the thread might
get moved. If your poller stashes a pointer
to reactor-local data and assumes the data is still
valid in the next iteration, you're wrong.
Each iteration is "fresh." Persist data on the
spdk_thread struct, not on the
reactor.
6. Foreign threads, foreign locks
If a Go goroutine calls into the C side via CGo and
that path tries to take an spdk_spinlock,
it will trip the
SPIN_ERR_NOT_SPDK_THREAD assertion at
lib/thread/thread.c:3273.
The lock expects to be held by an
spdk_thread. If you need a lock that
a Go goroutine can take, take a pthread_mutex
on the Go side and design the C side to never block
waiting for it. The same is true for
spdk_io_channel — the channel is bound
to a thread, and "the thread" is the
spdk_thread that acquired it, not
whatever pthread happens to be running.
7. Holding an spdk_thread * across reactor iterations
The spdk_thread pointer is stable for
the lifetime of the thread. The thread can be
destroyed (via spdk_thread_exit +
spdk_thread_destroy), and once it's
destroyed the pointer is dangling. If
you're tempted to "just keep the pointer in a
global and send a message to it later," ask
yourself: who guarantees it's still alive?
The answer in practice is the framework's
for_each_count / pending_unregister_count
machinery, which is why spdk_for_each_thread()
bumps those counts and refuses to unregister a
thread that's the target of an in-flight
iteration. Read
lib/thread/thread.c:2049
if you ever write a spdk_for_each_thread
of your own.
8. The diskengine never knows which thread it talked to
Look at
BdevLvolCreate:97. The Go code
just gets back a UUID string. It has no idea which
spdk_thread the bdev module ran on,
which reactor processed the request, or how many
polls it took. This is the abstraction
working as designed. If you ever find
yourself wanting to "pass an spdk_thread pointer
back to Go and use it later," stop. The pointer
is meaningless outside the SPDK process.
What to take away
An spdk_thread is the unit of "where does
this I/O submission come from." It's a struct, a name,
a list of pollers, an io_channel tree, and a message
ring. The reactor loop walks the threads. The thread's
mailbox delivers cross-thread work. Pollers run on
schedule; messages run on demand. The combination
gives you a goroutine-like model on top of a pthread-per-core
runtime, with the property that no syscall can yield your
CPU to someone else.
The next page — 2.3 —
spdk_io_channel + pollers — looks at the
per-thread state that actually caches the I/O submission
path. The spdk_thread is the thing; the
spdk_io_channel is what the thing owns.
Diagram
Pinch / ⌘+scroll to zoom · drag to pan · Esc to close