DISK(9) BSD Kernel Developer's Manual DISK(9)NAME
disk, disk_init, disk_attach, disk_begindetach, disk_detach,
disk_destroy, disk_busy, disk_unbusy, disk_isbusy, disk_find,
disk_blocksize — generic disk framework
SYNOPSIS
#include <sys/types.h>
#include <sys/disklabel.h>
#include <sys/disk.h>
void
disk_init(struct disk *, const char *name,
const struct dkdriver *driver);
void
disk_attach(struct disk *);
void
disk_begindetach(struct disk *, int (*lastclose)(device_t),
device_t self, int flags);
void
disk_detach(struct disk *);
void
disk_destroy(struct disk *);
void
disk_busy(struct disk *);
void
disk_unbusy(struct disk *, long bcount, int read);
bool
disk_isbusy(struct disk *);
struct disk *
disk_find(const char *);
void
disk_blocksize(struct disk *, int blocksize);
DESCRIPTION
The NetBSD generic disk framework is designed to provide flexible, scal‐
able, and consistent handling of disk state and metrics information. The
fundamental component of this framework is the disk structure, which is
defined as follows:
struct disk {
TAILQ_ENTRY(disk) dk_link; /* link in global disklist */
const char *dk_name; /* disk name */
prop_dictionary_t dk_info; /* reference to disk-info dictionary */
int dk_bopenmask; /* block devices open */
int dk_copenmask; /* character devices open */
int dk_openmask; /* composite (bopen|copen) */
int dk_state; /* label state ### */
int dk_blkshift; /* shift to convert DEV_BSIZE to blks */
int dk_byteshift; /* shift to convert bytes to blks */
/*
* Metrics data; note that some metrics may have no meaning
* on certain types of disks.
*/
struct io_stats *dk_stats;
const struct dkdriver *dk_driver; /* pointer to driver */
/*
* Information required to be the parent of a disk wedge.
*/
kmutex_t dk_rawlock; /* lock on these fields */
u_int dk_rawopens; /* # of openes of rawvp */
struct vnode *dk_rawvp; /* vnode for the RAW_PART bdev */
kmutex_t dk_openlock; /* lock on these and openmask */
u_int dk_nwedges; /* # of configured wedges */
/* all wedges on this disk */
LIST_HEAD(, dkwedge_softc) dk_wedges;
/*
* Disk label information. Storage for the in-core disk label
* must be dynamically allocated, otherwise the size of this
* structure becomes machine-dependent.
*/
daddr_t dk_labelsector; /* sector containing label */
struct disklabel *dk_label; /* label */
struct cpu_disklabel *dk_cpulabel;
};
The system maintains a global linked-list of all disks attached to the
system. This list, called disklist, may grow or shrink over time as
disks are dynamically added and removed from the system. Drivers which
currently make use of the detachment capability of the framework are the
ccd, dm, and vnd pseudo-device drivers.
The following is a brief description of each function in the framework:
disk_init() Initialize the disk structure.
disk_attach() Attach a disk; allocate storage for the disklabel, set
the “attached time” timestamp, insert the disk into the
disklist, and increment the system disk count.
disk_begindetach()
Check whether the disk is open, and if not, return 0.
If the disk is open, and DETACH_FORCE is not set in
flags, return EBUSY. Otherwise, call the provided
lastclose routine (if not NULL) and return its exit
code.
disk_detach() Detach a disk; free storage for the disklabel, remove
the disk from the disklist, and decrement the system
disk count. If the count drops below zero, panic.
disk_destroy() Release resources used by the disk structure when it is
no longer required.
disk_busy() Increment the disk's “busy counter”. If this counter
goes from 0 to 1, set the timestamp corresponding to
this transfer.
disk_unbusy() Decrement a disk's busy counter. If the count drops
below zero, panic. Get the current time, subtract it
from the disk's timestamp, and add the difference to
the disk's running total. Set the disk's timestamp to
the current time. If the provided byte count is
greater than 0, add it to the disk's running total and
increment the number of transfers performed by the
disk. The third argument read specifies the direction
of I/O; if non-zero it means reading from the disk,
otherwise it means writing to the disk.
disk_isbusy() Returns true if disk is marked as busy and false if it
is not.
disk_find() Return a pointer to the disk structure corresponding to
the name provided, or NULL if the disk does not exist.
disk_blocksize() Initialize dk_blkshift and dk_byteshift members of
struct disk with suitable values derived from the sup‐
plied physical blocksize. It is only necessary to call
this function if the device's physical blocksize is not
DEV_BSIZE.
The functions typically called by device drivers are disk_init()disk_attach(), disk_begindetach(), disk_detach(), disk_destroy(),
disk_busy(), disk_unbusy(), and disk_blocksize(). The function
disk_find() is provided as a utility function.
DISK IOCTLS
The following ioctls should be implemented by disk drivers:
DIOCGDINFO struct disklabel
Get disklabel.
DIOCSDINFO struct disklabel
Set in-memory disklabel.
DIOCWDINFO struct disklabel
Set in-memory disklabel and write on-disk disklabel.
DIOCGPART struct partinfo
Get partition information. This is used internally.
DIOCRFORMAT struct format_op
Read format.
DIOCWFORMAT struct format_op
Write format.
DIOCSSTEP int
Set step rate.
DIOCSRETRIES int
Set number of retries.
DIOCKLABEL int
Specify whether to keep or drop the in-memory disklabel when the
device is closed.
DIOCWLABEL int
Enable or disable writing to the part of the disk that contains
the label.
DIOCSBAD struct dkbad
Set kernel dkbad.
DIOCEJECT int
Eject removable disk.
DIOCLOCK int
Lock or unlock disk pack. For devices with removable media,
locking is intended to prevent the operator from removing the
media.
DIOCGDEFLABEL struct disklabel
Get default label.
DIOCCLRLABEL
Clear disk label.
DIOCGCACHE int
Get status of disk read and write caches. The result is a bit‐
mask containing the following values:
DKCACHE_READ Read cache enabled.
DKCACHE_WRITE Write(back) cache enabled.
DKCACHE_RCHANGE Read cache enable is changeable.
DKCACHE_WCHANGE Write cache enable is changeable.
DKCACHE_SAVE Cache parameters may be saved, so that they per‐
sist across reboots or device detach/attach
cycles.
DIOCSCACHE int
Set status of disk read and write caches. The input is a bitmask
in the same format as used for DIOCGCACHE.
DIOCCACHESYNC int
Synchronise the disk cache. This causes information in the
disk's write cache (if any) to be flushed to stable storage. The
argument specifies whether or not to force a flush even if the
kernel believes that there is no outstanding data.
DIOCBSLIST struct disk_badsecinfo
Get bad sector list.
DIOCBSFLUSH
Flush bad sector list.
DIOCAWEDGE struct dkwedge_info
Add wedge.
DIOCGWEDGEINFO struct dkwedge_info
Get wedge information.
DIOCDWEDGE struct dkwedge_info
Delete wedge.
DIOCLWEDGES struct dkwedge_list
List wedges.
DIOCGSTRATEGY struct disk_strategy
Get disk buffer queue strategy.
DIOCSSTRATEGY struct disk_strategy
Set disk buffer queue strategy.
DIOCGDISKINFO struct plistref
Get disk-info dictionary.
USING THE FRAMEWORK
This section includes a description on basic use of the framework and
example usage of its functions. Actual implementation of a device driver
which uses the framework may vary.
Each device in the system uses a “softc” structure which contains auto‐
configuration and state information for that device. In the case of
disks, the softc should also contain one instance of the disk structure,
e.g.:
struct foo_softc {
device_t sc_dev; /* generic device information */
struct disk sc_dk; /* generic disk information */
[ . . . more . . . ]
};
In order for the system to gather metrics data about a disk, the disk
must be registered with the system. The disk_attach() routine performs
all of the functions currently required to register a disk with the sys‐
tem including allocation of disklabel storage space, recording of the
time since boot that the disk was attached, and insertion into the
disklist. Note that since this function allocates storage space for the
disklabel, it must be called before the disklabel is read from the media
or used in any other way. Before disk_attach() is called, a portions of
the disk structure must be initialized with data specific to that disk.
For example, in the “foo” disk driver, the following would be performed
in the autoconfiguration “attach” routine:
void
fooattach(device_t parent, device_t self, void *aux)
{
struct foo_softc *sc = device_private(self);
[ . . . ]
/* Initialize and attach the disk structure. */
disk_init(&sc->sc_dk, device_xname(self), &foodkdriver);
disk_attach(&sc->sc_dk);
/* Read geometry and fill in pertinent parts of disklabel. */
[ . . . ]
disk_blocksize(&sc->sc_dk, bytes_per_sector);
}
The foodkdriver above is the disk's “driver” switch. This switch cur‐
rently includes a pointer to the disk's “strategy” routine. This switch
needs to have global scope and should be initialized as follows:
void foostrategy(struct buf *);
const struct dkdriver foodkdriver = {
.d_strategy = foostrategy,
};
Once the disk is attached, metrics may be gathered on that disk. In
order to gather metrics data, the driver must tell the framework when the
disk starts and stops operations. This functionality is provided by the
disk_busy() and disk_unbusy() routines. Because struct disk is part of
device driver private data it needs to be guarded. Mutual exclusion must
be done by driver disk_busy() and disk_unbusy() are not thread safe. The
disk_busy() routine should be called immediately before a command to the
disk is sent, e.g.:
void
foostart(sc)
struct foo_softc *sc;
{
[ . . . ]
/* Get buffer from drive's transfer queue. */
[ . . . ]
/* Build command to send to drive. */
[ . . . ]
/* Tell the disk framework we're going busy. */
mutex_enter(&sc->sc_dk_mtx);
disk_busy(&sc->sc_dk);
mutex_exit(&sc->sc_dk_mtx);
/* Send command to the drive. */
[ . . . ]
}
When disk_busy() is called, a timestamp is taken if the disk's busy
counter moves from 0 to 1, indicating the disk has gone from an idle to
non-idle state. At the end of a transaction, the disk_unbusy() routine
should be called. This routine performs some consistency checks, such as
ensuring that the calls to disk_busy() and disk_unbusy() are balanced.
This routine also performs the actual metrics calculation. A timestamp
is taken and the difference from the timestamp taken in disk_busy() is
added to the disk's total running time. The disk's timestamp is then
updated in case there is more than one pending transfer on the disk. A
byte count is also added to the disk's running total, and if greater than
zero, the number of transfers the disk has performed is incremented. The
third argument read specifies the direction of I/O; if non-zero it means
reading from the disk, otherwise it means writing to the disk.
void
foodone(xfer)
struct foo_xfer *xfer;
{
struct foo_softc = (struct foo_softc *)xfer->xf_softc;
struct buf *bp = xfer->xf_buf;
long nbytes;
[ . . . ]
/*
* Get number of bytes transferred. If there is no buf
* associated with the xfer, we are being called at the
* end of a non-I/O command.
*/
if (bp == NULL)
nbytes = 0;
else
nbytes = bp->b_bcount - bp->b_resid;
[ . . . ]
mutex_enter(&sc->sc_dk_mtx);
/* Notify the disk framework that we've completed the transfer. */
disk_unbusy(&sc->sc_dk, nbytes,
bp != NULL ? bp->b_flags & B_READ : 0);
mutex_exit(&sc->sc_dk_mtx);
[ . . . ]
}
disk_isbusy() is used to get status of disk device it returns true if
device is currently busy and false if it is not. Like disk_busy() and
disk_unbusy() it requires explicit locking from user side.
CODE REFERENCES
The disk framework itself is implemented within the file
sys/kern/subr_disk.c. Data structures and function prototypes for the
framework are located in sys/sys/disk.h.
The NetBSD machine-independent SCSI disk and CD-ROM drivers use the disk
framework. They are located in sys/scsi/sd.c and sys/scsi/cd.c.
The NetBSD ccd, dm, and vnd drivers use the detachment capability of the
framework. They are located in sys/dev/ccd.c, sys/dev/vnd.c, and
sys/dev/dm/device-mapper.c.
SEE ALSOccd(4), dm(4), vnd(4)HISTORY
The NetBSD generic disk framework appeared in NetBSD 1.2.
AUTHORS
The NetBSD generic disk framework was architected and implemented by
Jason R. Thorpe ⟨thorpej@NetBSD.org⟩.
BSD December 30, 2009 BSD