Layer 8 · Write a bdev module

The interface, distilled.

The bdev framework gives you a struct of function pointers and asks you to fill it in. Most bdev modules look complicated, but they all answer the same seven questions: how do I shut down, how do I handle I/O, what I/O types do I accept, how do I get per-thread state, how do I manage DMA buffers, how do I save my config, and how do I read it back. Get those seven right and you have a working bdev. This page is the distilled reference. The next two pages go deep on a real module and on the build glue.

~15 min read1 diagramprerequisites: 4.1 · 4.2
On this page
  1. The shape of the contract
  2. 1. destruct — teardown
  3. 2. submit_request — the hot path
  4. 3. io_type_supported — capability check
  5. 4. get_io_channel / cleanup_io_channel — per-thread state
  6. 5. get_buf_ctx / put_buf_ctx — DMA buffer management
  7. 6. write_config_json / read_config_json — persistence
  8. 7. config_text / parse_param — runtime config
  9. The portability checklist
  10. Edge cases: what breaks in production

The shape of the contract

A bdev module is a struct spdk_bdev_module (one per process) that carries lifecycle, plus a struct spdk_bdev_fn_table (one per bdev type) that carries the per-instance hot path. The seven things below split across those two structs as follows:

#ConceptIn the v26.01 sourceRequired?
1Teardownfn_table->destructYes
2Hot pathfn_table->submit_requestYes
3Capability checkfn_table->io_type_supportedYes
4Per-thread statefn_table->get_io_channel + spdk_io_device_registerYes
5DMA buffer managementspdk_bdev_io_get_buf() (caller) / module's buf alloc (bdev)Yes
6Persistencefn_table->write_config_json or module->config_jsonStrongly recommended
7Runtime configJSON-RPC SPDK_RPC_REGISTER + examine_configFor bdevs that need runtime config

Two of those seven don't have a function pointer at all — buffer management and runtime config are handled by the framework or by RPC handlers you write yourself. The other five are real function pointers. The framework does not crash if you leave some unset, but it will refuse to register your bdev (or, worse, silently misbehave) if the wrong one is missing.

flowchart TB
Module[spdk_bdev_module struct] --> M1[1. module_init / module_fini]
Module --> M2[2. config_json]
Module --> M3[3. examine_config]
Module --> M4[4. get_ctx_size]

FnTable[spdk_bdev_fn_table struct] --> F1[1. destruct]
FnTable --> F2[2. submit_request]
FnTable --> F3[3. io_type_supported]
FnTable --> F4[4. get_io_channel]
FnTable --> F5[5. write_config_json]
FnTable --> F6[6. dump_info_json]
FnTable --> F7[7. get_memory_domains]

RPC[vbdev_passthru_rpc.c] --> R1[SPDK_RPC_REGISTER 'bdev_passthru_create']
RPC --> R2[SPDK_RPC_REGISTER 'bdev_passthru_delete']

Module -.->|registered via| Ctor[SPDK_BDEV_MODULE_REGISTER macro]
Ctor -.->|constructor| Linker[linker runs before main]
fig. 1 — the seven concerns, mapped to the framework · tap or scroll to zoom · ↗ for fullscreen

fig. 1   Five of the seven concerns are function pointers in spdk_bdev_fn_table; the rest live in the spdk_bdev_module struct or in your RPC file. All of them get registered at link time through a constructor macro.

1. destruct — teardown

What it is: the framework calls this when your bdev is being torn down. The trigger is either spdk_bdev_unregister() (you or your RPC handler called it) or a hot-remove event from the base bdev (for a vbdev). The framework has already told all open descriptors to close; you're being called to release module-private state.

Minimal implementation: in the simplest case you do free(ctx) and return 0. For a vbdev, you follow the exact teardown sequence the passthru module uses (covered on the next page).

/* minimal — leaf module */
static int my_destruct(void *ctx)
{
    struct my_bdev *b = ctx;
    free(b->name);
    free(b);
    return 0;
}

What if you forget it? the framework won't call it. Your bdev is still registered. Memory leaks. Worse, the framework considers the bdev in an inconsistent state and may panic on shutdown.

The most common bug: returning 1 (the async signal) and then never calling spdk_bdev_destruct_done(). The framework silently waits forever and your application hangs on shutdown. This is one of the few "hang on exit" bugs in SPDK, and the fix is one line. See

include/spdk/bdev_module.h:1242

for the matching API.

2. submit_request — the hot path

What it is: every I/O submitted to your bdev lands here. This is the function pointer that makes your module a real bdev and not just a config entry. The framework has already done the heavy lifting (allocation, splitting, alignment checks). You dispatch.

Minimal implementation: a switch on bdev_io->type that handles each I/O type the bdev supports and fails the rest. For most I/O types, you either complete immediately (reset, flush) or kick off async work (read, write, unmap).

/* minimal — in-memory module */
static void my_submit_request(struct spdk_io_channel *ch,
                              struct spdk_bdev_io *bdev_io)
{
    switch (bdev_io->type) {
    case SPDK_BDEV_IO_TYPE_READ:
        /* memcpy into iovs, then complete */
        memcpy(bdev_io->u.bdev.iovs[0].iov_base,
               my_buf + bdev_io->u.bdev.offset_blocks * BLOCK,
               bdev_io->u.bdev.num_blocks * BLOCK);
        spdk_bdev_io_complete(bdev_io, SPDK_BDEV_IO_STATUS_SUCCESS);
        return;
    case SPDK_BDEV_IO_TYPE_WRITE:
        /* memcpy out of iovs, then complete */
        memcpy(my_buf + bdev_io->u.bdev.offset_blocks * BLOCK,
               bdev_io->u.bdev.iovs[0].iov_base,
               bdev_io->u.bdev.num_blocks * BLOCK);
        spdk_bdev_io_complete(bdev_io, SPDK_BDEV_IO_STATUS_SUCCESS);
        return;
    case SPDK_BDEV_IO_TYPE_FLUSH:
    case SPDK_BDEV_IO_TYPE_RESET:
        spdk_bdev_io_complete(bdev_io, SPDK_BDEV_IO_STATUS_SUCCESS);
        return;
    default:
        spdk_bdev_io_complete(bdev_io, SPDK_BDEV_IO_STATUS_FAILED);
        return;
    }
}

What if you forget it? the framework dereferences a NULL pointer and segfaults. This is the hardest fail in the framework: the bdev appears to register fine, then the first I/O crashes the process.

The most common bug: completing the I/O with the wrong status, or completing it twice. If submit_request returns without calling spdk_bdev_io_complete() and without queuing async work, the framework completes it as FAILED. If you call spdk_bdev_io_complete() twice, you corrupt the mempool and the next I/O uses freed memory. The fix is the discipline: one call to complete per I/O, on every code path.

3. io_type_supported — capability check

What it is: the framework calls this to populate a per-bdev bitmap of supported I/O types at register time, and again at submit time. The bitmap lets the upper layer know what it can do without trying and failing. For virtual bdevs, the typical pattern is to delegate to the base bdev's bitmap.

Minimal implementation: for a passthrough module, delegate. For a leaf module, hardcode the types you support.

/* minimal — leaf module supporting read/write/flush/reset */
static bool my_io_type_supported(void *ctx, enum spdk_bdev_io_type type)
{
    switch (type) {
    case SPDK_BDEV_IO_TYPE_READ:
    case SPDK_BDEV_IO_TYPE_WRITE:
    case SPDK_BDEV_IO_TYPE_FLUSH:
    case SPDK_BDEV_IO_TYPE_RESET:
        return true;
    default:
        return false;
    }
}
/* typical — virtual module, delegate to base */
static bool my_vbdev_io_type_supported(void *ctx,
                                        enum spdk_bdev_io_type type)
{
    struct my_vbdev *v = ctx;
    return spdk_bdev_io_type_supported(v->base_bdev, type);
}

What if you forget it? the framework dereferences NULL. If you return the wrong answer (e.g. true for everything), the upper layer will submit I/O types your module can't handle, and the failure shows up as a generic SPDK_BDEV_IO_STATUS_FAILED with no clue as to why.

The most common bug: returning true for SPDK_BDEV_IO_TYPE_WRITE_ZEROES but never actually handling the case in submit_request. The framework will pass it through to your switch, which falls to the default and completes it as FAILED. The two function pointers must agree.

4. get_io_channel / cleanup_io_channel — per-thread state

What it is: the framework calls this to get a per-thread spdk_io_channel for your bdev. You allocate the channel once per thread, and the framework hands the same channel back to your submit_request on every subsequent I/O from that thread. The cleanup counterpart is the channel-destroy callback you register alongside your io_device at module init time.

Minimal implementation: in almost every module, this is one line. You registered an io_device in module_init, and now you ask the framework for a channel of it.

/* minimal — passthru to your io_device */
static struct spdk_io_channel *my_get_io_channel(void *ctx)
{
    return spdk_get_io_channel(&my_io_device);
}

The "cleanup" side is set up at module init time, not at get_io_channel time:

/* at module init: tell the framework how to create and destroy
   your per-thread state */
spdk_io_device_register(&my_io_device,
                        my_ch_create_cb,    /* called per thread */
                        my_ch_destroy_cb,   /* called per thread */
                        sizeof(struct my_channel),
                        "my_module");
/* the create callback — per-thread setup */
static int my_ch_create_cb(void *io_device, void *ctx_buf)
{
    struct my_channel *ch = ctx_buf;
    ch->state = 0;
    return 0;
}

/* the destroy callback — per-thread teardown */
static void my_ch_destroy_cb(void *io_device, void *ctx_buf)
{
    struct my_channel *ch = ctx_buf;
    /* free anything you allocated in create_cb */
}

What if you forget it? NULL pointer in the vtable. The framework's I/O submit path calls bdev->fn_table->get_io_channel(ctx) at thread setup time and segfaults.

The most common bug: putting a global mutex or a global list into the channel struct. Channels are per-thread by design; the whole point is that two threads submitting to the same bdev don't share state. If you put a global lock inside the channel, you've killed the bdev's scalability on a multi-core box. Channel state should be either thread-local data (poller, accel channel, a per-thread queue) or pointer-handles to per-bdev resources you set up at register time.

5. get_buf_ctx / put_buf_ctx — DMA buffer management

What it is: this is the conceptual name for "how do you give the bdev_io a buffer to read into or write from." In the v26.01 framework, this is handled by the spdk_bdev_io_get_buf() API in bdev_module.h:1340. The caller (or your own module, if you're a leaf) calls it to lazily allocate a buffer when one is needed. The framework figures out whether to allocate from the data buffer pool, the small-buffer pool, or to skip allocation entirely (because the iov is already populated and aligned).

Minimal implementation: a leaf module doesn't usually call this — the framework handles reads into a buffer that was attached at submit time. A virtual bdev, however, often needs to grab a buffer before forwarding a read down to its base. The pattern is:

/* typical — vbdev needs a buffer to read into */
case SPDK_BDEV_IO_TYPE_READ:
    spdk_bdev_io_get_buf(bdev_io, my_get_buf_cb, len);
    break;
/* the callback when the buffer is ready */
static void my_get_buf_cb(struct spdk_io_channel *ch,
                          struct spdk_bdev_io *bdev_io, bool success)
{
    if (!success) {
        spdk_bdev_io_complete(bdev_io, SPDK_BDEV_IO_STATUS_FAILED);
        return;
    }
    /* now bdev_io->u.bdev.iovs points at a real buffer;
       forward to the base */
    spdk_bdev_readv_blocks(my_base_desc, my_ch->base_ch,
                          bdev_io->u.bdev.iovs, bdev_io->u.bdev.iovcnt,
                          bdev_io->u.bdev.offset_blocks,
                          bdev_io->u.bdev.num_blocks,
                          my_complete_cb, bdev_io);
}

The historical "put_buf_ctx" name maps to a buffer being freed: when you call spdk_bdev_io_complete() on an I/O that had spdk_bdev_io_get_buf() called for it, the framework returns the buffer to the pool automatically. You don't explicitly put it.

What if you forget it? for a leaf module that handles reads synchronously from a fixed buffer, you don't need it at all. For a virtual bdev that needs to forward reads, forgetting it means the iov is NULL when the base bdev tries to write to it — the framework fails the I/O with a generic error.

The most common bug: calling spdk_bdev_io_get_buf() on an I/O type that doesn't need a buffer (e.g. WRITE, where the caller supplied the buffer). The framework will double-buffer it for you anyway, but you'll waste a memcpy. The right discipline: only call spdk_bdev_io_get_buf() on READ.

6. write_config_json / read_config_json — persistence

What it is: the framework calls write_config_json (or module->config_json) to ask your bdev to write the JSON-RPC that would recreate it. This is what powers save_config in the SPDK CLI: the framework walks every bdev, asks it to write its "constructor" RPC, and the result is a JSON file you can feed back to the framework later. The "read" side is just the JSON-RPC handler you wrote for creation; the framework calls that RPC during init.

You implement one of these, not both — for per-bdev output, choose whichever fits your creation model. The malloc module uses fn_table->write_config_json because each bdev has its own construction RPC. The passthru module uses module->config_json because the module walks a global list of passthru bdevs and emits a create RPC for each.

Minimal implementation: emit the same fields your create RPC accepts.

/* typical — passthru style, in module->config_json */
static int my_config_json(struct spdk_json_write_ctx *w)
{
    struct my_bdev *b;
    TAILQ_FOREACH(b, &g_my_bdevs, link) {
        spdk_json_write_object_begin(w);
        spdk_json_write_named_string(w, "method", "my_bdev_create");
        spdk_json_write_named_object_begin(w, "params");
        spdk_json_write_named_string(w, "base_bdev_name", b->base_name);
        spdk_json_write_named_string(w, "name", b->name);
        spdk_json_write_object_end(w);
        spdk_json_write_object_end(w);
    }
    return 0;
}

What if you forget it? save_config will produce a JSON file that doesn't include your bdev. When you restart the SPDK process and replay that file, your bdev won't be recreated. For one-off test setups this is fine; for anything that needs to survive a reboot it's a bug.

The most common bug: writing fields the create RPC doesn't accept, or vice versa. If you save "num_blocks": 1024 but your create RPC ignores it, your bdev comes back with the wrong size. The discipline: the create RPC and write_config_json are two views of the same struct; keep them in sync.

7. config_text / parse_param — runtime config

What it is: this is the conceptual name for "how does the user tell the bdev module how to configure itself at runtime." In v26.01, this is split across three mechanisms:

  1. JSON-RPC methods registered with SPDK_RPC_REGISTER in your vbdev_xxx_rpc.c file. The user invokes them with rpc.py or any JSON-RPC client.

  2. Config-file sections parsed by your module at startup. The passthru module uses spdk_conf_read via the SPDK app framework to read a [Passthru] section from spdk.conf.

  3. The examine_config callback that the framework calls for every bdev that gets registered. This is how a vbdec "discovers" a base bdev that was created in the same config.

Minimal implementation: a function that takes a spdk_jsonrpc_request and a JSON params object, decodes them, and calls into the bdev create function.

/* minimal — RPC for "create my bdev" */
struct rpc_my_create {
    char *base_bdev_name;
    char *name;
};

static const struct spdk_json_object_decoder rpc_my_create_decoders[] = {
    {"base_bdev_name", offsetof(struct rpc_my_create, base_bdev_name), spdk_json_decode_string},
    {"name",           offsetof(struct rpc_my_create, name),           spdk_json_decode_string},
};

static void rpc_my_create(struct spdk_jsonrpc_request *request,
                          const struct spdk_json_val *params)
{
    struct rpc_my_create req = {NULL};
    struct spdk_json_write_ctx *w;

    if (spdk_json_decode_object(params, rpc_my_create_decoders,
                                SPDK_COUNTOF(rpc_my_create_decoders), &req)) {
        spdk_jsonrpc_send_error_response(request,
            SPDK_JSONRPC_ERROR_INTERNAL_ERROR,
            "spdk_json_decode_object failed");
        goto cleanup;
    }

    /* call into the bdev create function */
    int rc = my_bdev_create(req.base_bdev_name, req.name);
    if (rc != 0) {
        spdk_jsonrpc_send_error_response(request, rc, spdk_strerror(-rc));
        goto cleanup;
    }

    w = spdk_jsonrpc_begin_result(request);
    spdk_json_write_string(w, req.name);
    spdk_jsonrpc_end_result(request, w);

cleanup:
    free(req.base_bdev_name);
    free(req.name);
}
SPDK_RPC_REGISTER("my_bdev_create", rpc_my_create, SPDK_RPC_RUNTIME)

What if you forget it? if your bdevs are statically configured in a config file and you have no runtime API, you don't need RPC at all. But if the user expects to be able to create a bdev over JSON-RPC (the normal SPDK pattern), and you haven't registered an RPC, they'll get a "method not found" error.

The most common bug: the RPC handler validates the input, but the actual create function doesn't, so users get generic error codes from deep in the framework instead of useful diagnostics. The fix is to validate early and fail with descriptive SPDK_JSONRPC_ERROR_INVALID_PARAMS.

The portability checklist

Take this checklist to any bdev module in the SPDK tree and you'll know what's there and what's missing in under a minute:

ItemHow to checkSmell if wrong
module_initgrep module_init module/bdev/<name>/∗.cEmpty body, no spdk_io_device_register
module_finigrep module_fini module/bdev/<name>/∗.cMissing or just returns 0
get_ctx_sizeLook in spdk_bdev_module structReturns 0 but uses driver_ctx
destructLook in spdk_bdev_fn_tableLeaks memory or returns 1 without calling spdk_bdev_destruct_done
submit_requestLook in spdk_bdev_fn_tableDefault case does nothing
io_type_supportedLook in spdk_bdev_fn_tableReturns true for everything, or doesn't match the switch in submit_request
get_io_channelLook in spdk_bdev_fn_tableNULL pointer; no io_device registered
Channel create/destroyLook for spdk_io_device_registerMissing destroy_cb (leaks thread state)
Buffer managementSearch for spdk_bdev_io_get_buf in submit pathNULL iov when forwarding reads
write_config_json / config_jsonLook in fn_table or module structsave_config drops this bdev
JSON-RPCLook for SPDK_RPC_REGISTER in <name>_rpc.c"method not found" errors
examine_config (vbdev only)Look in spdk_bdev_moduleModule never claims a base bdev

Edge cases: what breaks in production

The async destruct hang

If your destruct returns 1 (the async signal), the framework waits for spdk_bdev_destruct_done() to be called. If you never call it, the application hangs at shutdown. Worse, the hang only shows up on clean shutdown — production processes that get SIGKILL'd never notice. The discipline: if you return 1, you must call spdk_bdev_destruct_done() on every code path, including the error paths.

io_type_supported lying

Returning true for a type you don't actually handle in submit_request looks fine at register time (the bitmap fills in), but the I/O fails at submit time with no diagnostic. The fix is to keep io_type_supported and submit_request in lockstep — if you add a new I/O type, update both in the same commit.

The destruct order

For a virtual bdev, the destruct sequence is rigid: remove from your global list, release the claim, close the base desc, unregister the io_device. Get the order wrong and you can deadlock, double-free, or leak. The passthru module's vbdev_passthru_destruct() at

module/bdev/passthru/vbdev_passthru.c:116

is the canonical example; the next page walks it line by line.

Missing write_config_json

The framework won't complain at runtime if write_config_json is NULL. But save_config produces a file that doesn't include your bdev, and the next time you replay it the bdev is gone. Always implement this; always.

Two bdevs with the same name

spdk_bdev_register() returns -EEXIST. You have to clean up the half-registered bdev on the failure path (free the name string, unregister the io_device). The passthru module's vbdev_passthru_register() does this carefully at module/bdev/passthru/vbdev_passthru.c:589 .

Channel create callback failing partway

The framework assumes your create callback is atomic. If you allocate state in create and fail partway, the framework will still call destroy — so destroy has to handle partially-constructed channels. A common bug is to unconditionally spdk_put_io_channel() in destroy when the channel was never created in the first place.