The interface, distilled.
The bdev framework gives you a struct of function pointers and asks you to fill it in. Most bdev modules look complicated, but they all answer the same seven questions: how do I shut down, how do I handle I/O, what I/O types do I accept, how do I get per-thread state, how do I manage DMA buffers, how do I save my config, and how do I read it back. Get those seven right and you have a working bdev. This page is the distilled reference. The next two pages go deep on a real module and on the build glue.
- The shape of the contract
- 1.
destruct— teardown - 2.
submit_request— the hot path - 3.
io_type_supported— capability check - 4.
get_io_channel/cleanup_io_channel— per-thread state - 5.
get_buf_ctx/put_buf_ctx— DMA buffer management - 6.
write_config_json/read_config_json— persistence - 7.
config_text/parse_param— runtime config - The portability checklist
- Edge cases: what breaks in production
The shape of the contract
A bdev module is a struct spdk_bdev_module (one
per process) that carries lifecycle, plus a
struct spdk_bdev_fn_table (one per bdev type)
that carries the per-instance hot path. The seven things
below split across those two structs as follows:
| # | Concept | In the v26.01 source | Required? |
|---|---|---|---|
| 1 | Teardown | fn_table->destruct | Yes |
| 2 | Hot path | fn_table->submit_request | Yes |
| 3 | Capability check | fn_table->io_type_supported | Yes |
| 4 | Per-thread state | fn_table->get_io_channel + spdk_io_device_register | Yes |
| 5 | DMA buffer management | spdk_bdev_io_get_buf() (caller) / module's buf alloc (bdev) | Yes |
| 6 | Persistence | fn_table->write_config_json or module->config_json | Strongly recommended |
| 7 | Runtime config | JSON-RPC SPDK_RPC_REGISTER + examine_config | For bdevs that need runtime config |
Two of those seven don't have a function pointer at all — buffer management and runtime config are handled by the framework or by RPC handlers you write yourself. The other five are real function pointers. The framework does not crash if you leave some unset, but it will refuse to register your bdev (or, worse, silently misbehave) if the wrong one is missing.
flowchart TB Module[spdk_bdev_module struct] --> M1[1. module_init / module_fini] Module --> M2[2. config_json] Module --> M3[3. examine_config] Module --> M4[4. get_ctx_size] FnTable[spdk_bdev_fn_table struct] --> F1[1. destruct] FnTable --> F2[2. submit_request] FnTable --> F3[3. io_type_supported] FnTable --> F4[4. get_io_channel] FnTable --> F5[5. write_config_json] FnTable --> F6[6. dump_info_json] FnTable --> F7[7. get_memory_domains] RPC[vbdev_passthru_rpc.c] --> R1[SPDK_RPC_REGISTER 'bdev_passthru_create'] RPC --> R2[SPDK_RPC_REGISTER 'bdev_passthru_delete'] Module -.->|registered via| Ctor[SPDK_BDEV_MODULE_REGISTER macro] Ctor -.->|constructor| Linker[linker runs before main]
fig. 1 Five of the seven concerns are function pointers in
spdk_bdev_fn_table; the rest live in the
spdk_bdev_module struct or in your RPC file. All of
them get registered at link time through a constructor macro.
1. destruct — teardown
What it is: the framework calls this when your
bdev is being torn down. The trigger is either
spdk_bdev_unregister() (you or your RPC handler
called it) or a hot-remove event from the base bdev (for a
vbdev). The framework has already told all open descriptors to
close; you're being called to release module-private state.
Minimal implementation: in the simplest case
you do free(ctx) and return 0. For a vbdev, you
follow the exact teardown sequence the passthru module uses
(covered on the next page).
/* minimal — leaf module */
static int my_destruct(void *ctx)
{
struct my_bdev *b = ctx;
free(b->name);
free(b);
return 0;
}What if you forget it? the framework won't call it. Your bdev is still registered. Memory leaks. Worse, the framework considers the bdev in an inconsistent state and may panic on shutdown.
The most common bug: returning
1 (the async signal) and then never calling
spdk_bdev_destruct_done(). The framework
silently waits forever and your application hangs on
shutdown. This is one of the few "hang on exit" bugs in
SPDK, and the fix is one line. See
for the matching API.
2. submit_request — the hot path
What it is: every I/O submitted to your bdev lands here. This is the function pointer that makes your module a real bdev and not just a config entry. The framework has already done the heavy lifting (allocation, splitting, alignment checks). You dispatch.
Minimal implementation: a switch on
bdev_io->type that handles each I/O type the
bdev supports and fails the rest. For most I/O types, you
either complete immediately (reset, flush) or kick off async
work (read, write, unmap).
/* minimal — in-memory module */
static void my_submit_request(struct spdk_io_channel *ch,
struct spdk_bdev_io *bdev_io)
{
switch (bdev_io->type) {
case SPDK_BDEV_IO_TYPE_READ:
/* memcpy into iovs, then complete */
memcpy(bdev_io->u.bdev.iovs[0].iov_base,
my_buf + bdev_io->u.bdev.offset_blocks * BLOCK,
bdev_io->u.bdev.num_blocks * BLOCK);
spdk_bdev_io_complete(bdev_io, SPDK_BDEV_IO_STATUS_SUCCESS);
return;
case SPDK_BDEV_IO_TYPE_WRITE:
/* memcpy out of iovs, then complete */
memcpy(my_buf + bdev_io->u.bdev.offset_blocks * BLOCK,
bdev_io->u.bdev.iovs[0].iov_base,
bdev_io->u.bdev.num_blocks * BLOCK);
spdk_bdev_io_complete(bdev_io, SPDK_BDEV_IO_STATUS_SUCCESS);
return;
case SPDK_BDEV_IO_TYPE_FLUSH:
case SPDK_BDEV_IO_TYPE_RESET:
spdk_bdev_io_complete(bdev_io, SPDK_BDEV_IO_STATUS_SUCCESS);
return;
default:
spdk_bdev_io_complete(bdev_io, SPDK_BDEV_IO_STATUS_FAILED);
return;
}
}What if you forget it? the framework dereferences a NULL pointer and segfaults. This is the hardest fail in the framework: the bdev appears to register fine, then the first I/O crashes the process.
The most common bug: completing the I/O with
the wrong status, or completing it twice. If
submit_request returns without calling
spdk_bdev_io_complete() and without queuing
async work, the framework completes it as FAILED. If you
call spdk_bdev_io_complete() twice, you corrupt
the mempool and the next I/O uses freed memory. The fix is
the discipline: one call to complete per I/O, on
every code path.
3. io_type_supported — capability check
What it is: the framework calls this to populate a per-bdev bitmap of supported I/O types at register time, and again at submit time. The bitmap lets the upper layer know what it can do without trying and failing. For virtual bdevs, the typical pattern is to delegate to the base bdev's bitmap.
Minimal implementation: for a passthrough module, delegate. For a leaf module, hardcode the types you support.
/* minimal — leaf module supporting read/write/flush/reset */
static bool my_io_type_supported(void *ctx, enum spdk_bdev_io_type type)
{
switch (type) {
case SPDK_BDEV_IO_TYPE_READ:
case SPDK_BDEV_IO_TYPE_WRITE:
case SPDK_BDEV_IO_TYPE_FLUSH:
case SPDK_BDEV_IO_TYPE_RESET:
return true;
default:
return false;
}
}/* typical — virtual module, delegate to base */
static bool my_vbdev_io_type_supported(void *ctx,
enum spdk_bdev_io_type type)
{
struct my_vbdev *v = ctx;
return spdk_bdev_io_type_supported(v->base_bdev, type);
}What if you forget it? the framework
dereferences NULL. If you return the wrong answer (e.g.
true for everything), the upper layer will
submit I/O types your module can't handle, and the failure
shows up as a generic SPDK_BDEV_IO_STATUS_FAILED
with no clue as to why.
The most common bug: returning
true for SPDK_BDEV_IO_TYPE_WRITE_ZEROES
but never actually handling the case in
submit_request. The framework will pass it
through to your switch, which falls to the default and
completes it as FAILED. The two function pointers must
agree.
4. get_io_channel / cleanup_io_channel — per-thread state
What it is: the framework calls this to get a
per-thread spdk_io_channel for your bdev. You
allocate the channel once per thread, and the framework hands
the same channel back to your submit_request on
every subsequent I/O from that thread. The cleanup
counterpart is the channel-destroy callback you register
alongside your io_device at module init time.
Minimal implementation: in almost every
module, this is one line. You registered an io_device in
module_init, and now you ask the framework for
a channel of it.
/* minimal — passthru to your io_device */
static struct spdk_io_channel *my_get_io_channel(void *ctx)
{
return spdk_get_io_channel(&my_io_device);
}The "cleanup" side is set up at module init time, not at
get_io_channel time:
/* at module init: tell the framework how to create and destroy
your per-thread state */
spdk_io_device_register(&my_io_device,
my_ch_create_cb, /* called per thread */
my_ch_destroy_cb, /* called per thread */
sizeof(struct my_channel),
"my_module");/* the create callback — per-thread setup */
static int my_ch_create_cb(void *io_device, void *ctx_buf)
{
struct my_channel *ch = ctx_buf;
ch->state = 0;
return 0;
}
/* the destroy callback — per-thread teardown */
static void my_ch_destroy_cb(void *io_device, void *ctx_buf)
{
struct my_channel *ch = ctx_buf;
/* free anything you allocated in create_cb */
}What if you forget it? NULL pointer in the
vtable. The framework's I/O submit path calls
bdev->fn_table->get_io_channel(ctx) at
thread setup time and segfaults.
The most common bug: putting a global mutex or a global list into the channel struct. Channels are per-thread by design; the whole point is that two threads submitting to the same bdev don't share state. If you put a global lock inside the channel, you've killed the bdev's scalability on a multi-core box. Channel state should be either thread-local data (poller, accel channel, a per-thread queue) or pointer-handles to per-bdev resources you set up at register time.
5. get_buf_ctx / put_buf_ctx — DMA buffer management
What it is: this is the conceptual name for
"how do you give the bdev_io a buffer to read into or write
from." In the v26.01 framework, this is handled by the
spdk_bdev_io_get_buf() API in
bdev_module.h:1340. The caller (or your own
module, if you're a leaf) calls it to lazily allocate a
buffer when one is needed. The framework figures out whether
to allocate from the data buffer pool, the small-buffer pool,
or to skip allocation entirely (because the iov is already
populated and aligned).
Minimal implementation: a leaf module doesn't usually call this — the framework handles reads into a buffer that was attached at submit time. A virtual bdev, however, often needs to grab a buffer before forwarding a read down to its base. The pattern is:
/* typical — vbdev needs a buffer to read into */
case SPDK_BDEV_IO_TYPE_READ:
spdk_bdev_io_get_buf(bdev_io, my_get_buf_cb, len);
break;/* the callback when the buffer is ready */
static void my_get_buf_cb(struct spdk_io_channel *ch,
struct spdk_bdev_io *bdev_io, bool success)
{
if (!success) {
spdk_bdev_io_complete(bdev_io, SPDK_BDEV_IO_STATUS_FAILED);
return;
}
/* now bdev_io->u.bdev.iovs points at a real buffer;
forward to the base */
spdk_bdev_readv_blocks(my_base_desc, my_ch->base_ch,
bdev_io->u.bdev.iovs, bdev_io->u.bdev.iovcnt,
bdev_io->u.bdev.offset_blocks,
bdev_io->u.bdev.num_blocks,
my_complete_cb, bdev_io);
}The historical "put_buf_ctx" name maps to a buffer being
freed: when you call spdk_bdev_io_complete()
on an I/O that had spdk_bdev_io_get_buf() called
for it, the framework returns the buffer to the pool
automatically. You don't explicitly put it.
What if you forget it? for a leaf module that handles reads synchronously from a fixed buffer, you don't need it at all. For a virtual bdev that needs to forward reads, forgetting it means the iov is NULL when the base bdev tries to write to it — the framework fails the I/O with a generic error.
The most common bug: calling
spdk_bdev_io_get_buf() on an I/O type that
doesn't need a buffer (e.g. WRITE, where the
caller supplied the buffer). The framework will
double-buffer it for you anyway, but you'll waste a
memcpy. The right discipline: only call
spdk_bdev_io_get_buf() on
READ.
6. write_config_json / read_config_json — persistence
What it is: the framework calls
write_config_json (or
module->config_json) to ask your bdev to
write the JSON-RPC that would recreate it. This is what
powers save_config in the SPDK CLI: the
framework walks every bdev, asks it to write its
"constructor" RPC, and the result is a JSON file you can
feed back to the framework later. The "read" side is just
the JSON-RPC handler you wrote for creation; the framework
calls that RPC during init.
You implement one of these, not both — for per-bdev
output, choose whichever fits your creation model. The
malloc module uses fn_table->write_config_json
because each bdev has its own construction RPC. The
passthru module uses module->config_json
because the module walks a global list of passthru bdevs
and emits a create RPC for each.
Minimal implementation: emit the same fields your create RPC accepts.
/* typical — passthru style, in module->config_json */
static int my_config_json(struct spdk_json_write_ctx *w)
{
struct my_bdev *b;
TAILQ_FOREACH(b, &g_my_bdevs, link) {
spdk_json_write_object_begin(w);
spdk_json_write_named_string(w, "method", "my_bdev_create");
spdk_json_write_named_object_begin(w, "params");
spdk_json_write_named_string(w, "base_bdev_name", b->base_name);
spdk_json_write_named_string(w, "name", b->name);
spdk_json_write_object_end(w);
spdk_json_write_object_end(w);
}
return 0;
}What if you forget it?
save_config will produce a JSON file that
doesn't include your bdev. When you restart the SPDK
process and replay that file, your bdev won't be
recreated. For one-off test setups this is fine; for
anything that needs to survive a reboot it's a bug.
The most common bug: writing fields the
create RPC doesn't accept, or vice versa. If you save
"num_blocks": 1024 but your create RPC
ignores it, your bdev comes back with the wrong size.
The discipline: the create RPC and
write_config_json are two views of the same
struct; keep them in sync.
7. config_text / parse_param — runtime config
What it is: this is the conceptual name for "how does the user tell the bdev module how to configure itself at runtime." In v26.01, this is split across three mechanisms:
JSON-RPC methods registered with
SPDK_RPC_REGISTERin yourvbdev_xxx_rpc.cfile. The user invokes them withrpc.pyor any JSON-RPC client.Config-file sections parsed by your module at startup. The passthru module uses
spdk_conf_readvia the SPDK app framework to read a[Passthru]section fromspdk.conf.The
examine_configcallback that the framework calls for every bdev that gets registered. This is how a vbdec "discovers" a base bdev that was created in the same config.
Minimal implementation: a function that
takes a spdk_jsonrpc_request and a JSON
params object, decodes them, and calls into the bdev
create function.
/* minimal — RPC for "create my bdev" */
struct rpc_my_create {
char *base_bdev_name;
char *name;
};
static const struct spdk_json_object_decoder rpc_my_create_decoders[] = {
{"base_bdev_name", offsetof(struct rpc_my_create, base_bdev_name), spdk_json_decode_string},
{"name", offsetof(struct rpc_my_create, name), spdk_json_decode_string},
};
static void rpc_my_create(struct spdk_jsonrpc_request *request,
const struct spdk_json_val *params)
{
struct rpc_my_create req = {NULL};
struct spdk_json_write_ctx *w;
if (spdk_json_decode_object(params, rpc_my_create_decoders,
SPDK_COUNTOF(rpc_my_create_decoders), &req)) {
spdk_jsonrpc_send_error_response(request,
SPDK_JSONRPC_ERROR_INTERNAL_ERROR,
"spdk_json_decode_object failed");
goto cleanup;
}
/* call into the bdev create function */
int rc = my_bdev_create(req.base_bdev_name, req.name);
if (rc != 0) {
spdk_jsonrpc_send_error_response(request, rc, spdk_strerror(-rc));
goto cleanup;
}
w = spdk_jsonrpc_begin_result(request);
spdk_json_write_string(w, req.name);
spdk_jsonrpc_end_result(request, w);
cleanup:
free(req.base_bdev_name);
free(req.name);
}
SPDK_RPC_REGISTER("my_bdev_create", rpc_my_create, SPDK_RPC_RUNTIME)What if you forget it? if your bdevs are statically configured in a config file and you have no runtime API, you don't need RPC at all. But if the user expects to be able to create a bdev over JSON-RPC (the normal SPDK pattern), and you haven't registered an RPC, they'll get a "method not found" error.
The most common bug: the RPC handler
validates the input, but the actual create function
doesn't, so users get generic error codes from deep in
the framework instead of useful diagnostics. The fix is
to validate early and fail with descriptive
SPDK_JSONRPC_ERROR_INVALID_PARAMS.
The portability checklist
Take this checklist to any bdev module in the SPDK tree and you'll know what's there and what's missing in under a minute:
| Item | How to check | Smell if wrong |
|---|---|---|
module_init | grep module_init module/bdev/<name>/∗.c | Empty body, no spdk_io_device_register |
module_fini | grep module_fini module/bdev/<name>/∗.c | Missing or just returns 0 |
get_ctx_size | Look in spdk_bdev_module struct | Returns 0 but uses driver_ctx |
destruct | Look in spdk_bdev_fn_table | Leaks memory or returns 1 without calling spdk_bdev_destruct_done |
submit_request | Look in spdk_bdev_fn_table | Default case does nothing |
io_type_supported | Look in spdk_bdev_fn_table | Returns true for everything, or doesn't match the switch in submit_request |
get_io_channel | Look in spdk_bdev_fn_table | NULL pointer; no io_device registered |
| Channel create/destroy | Look for spdk_io_device_register | Missing destroy_cb (leaks thread state) |
| Buffer management | Search for spdk_bdev_io_get_buf in submit path | NULL iov when forwarding reads |
| write_config_json / config_json | Look in fn_table or module struct | save_config drops this bdev |
| JSON-RPC | Look for SPDK_RPC_REGISTER in <name>_rpc.c | "method not found" errors |
| examine_config (vbdev only) | Look in spdk_bdev_module | Module never claims a base bdev |
Edge cases: what breaks in production
The async destruct hang
If your destruct returns 1 (the
async signal), the framework waits for
spdk_bdev_destruct_done() to be called. If you
never call it, the application hangs at shutdown. Worse,
the hang only shows up on clean shutdown — production
processes that get SIGKILL'd never notice. The discipline:
if you return 1, you must call
spdk_bdev_destruct_done() on every code path,
including the error paths.
io_type_supported lying
Returning true for a type you don't actually
handle in submit_request looks fine at register
time (the bitmap fills in), but the I/O fails at submit
time with no diagnostic. The fix is to keep
io_type_supported and
submit_request in lockstep — if you add a new
I/O type, update both in the same commit.
The destruct order
For a virtual bdev, the destruct sequence is rigid: remove
from your global list, release the claim, close the base
desc, unregister the io_device. Get the order wrong and you
can deadlock, double-free, or leak. The passthru module's
vbdev_passthru_destruct() at
is the canonical example; the next page walks it line by line.
Missing write_config_json
The framework won't complain at runtime if
write_config_json is NULL. But
save_config produces a file that doesn't
include your bdev, and the next time you replay it the
bdev is gone. Always implement this; always.
Two bdevs with the same name
spdk_bdev_register() returns
-EEXIST. You have to clean up the
half-registered bdev on the failure path (free the name
string, unregister the io_device). The passthru module's
vbdev_passthru_register() does this carefully
at module/bdev/passthru/vbdev_passthru.c:589 .
Channel create callback failing partway
The framework assumes your create callback is atomic. If
you allocate state in create and fail partway, the
framework will still call destroy — so destroy has to
handle partially-constructed channels. A common bug is to
unconditionally spdk_put_io_channel() in
destroy when the channel was never created in the first
place.