grio2 man page on IRIX

Man page or keyword search:  
man Server   31559 pages
apropos Keyword Search (all sections)
Output format
IRIX logo
[printable version]



NAME
     grio2 - Guaranteed Rate I/O Version 2

DESCRIPTION
     Guaranteed-Rate I/O (GRIO) refers to a guarantee made by the system to a
     user process that it will deliver data from a storage device at a
     predefined rate regardless of any other I/O activity on the system or on
     other nodes within its cluster. If a process issues I/O at a rate above
     its requested rate, GRIO ensures that it does not exceed its reservation
     and throttles its I/O if necessary.

   Terminology
     The term reservation is used to refer to the set of Quality-of-Service
     (QoS) parameters (bandwidth, reservation interval) requested by an
     application. Reservation requests are forwarded to the GRIO bandwidth
     management daemon ggd2(1M). If the request is granted then the
     application is said to have received a guarantee from the system that its
     QoS requirements will be met. Within the kernel an object is instantiated
     that encodes the requested QoS parameters and maintains the necessary
     scheduling and monitoring state. This object is referred to as a GRIO
     stream. The stream ID is returned to the user application.	 Stream IDs
     are unique across reservations and across the cluster.

     This manual page describes the second version of the GRIO product. Where
     it is necessary to distinguish between this release, and the previous
     release the terms GRIOv2 and GRIOv1 are used. Where the term GRIO is used
     without qualification it refers to the second version of the product.

BACKGROUND
     GRIOv1 was designed for use with tightly controlled, locally attached
     storage devices. It depends on detailed performance data for every piece
     of hardware in the I/O path including: the storage devices themselves,
     the SCSI and Fiber Channel busses, system interconnects and bridges. It
     only works with the XLV volume manager and does not support shared CXFS
     filesystems.

     Modern storage systems are moving towards large interconnected Storage
     Area Networks (SANs) in which heterogeneous systems and storage devices
     are connected via a dedicated high-speed network. In this model, large
     storage resources, such as multi-terabyte RAID devices, are shared
     amongst a number of clients using a shared filesystem such as CXFS.

     GRIOv2 has been created to broaden the GRIO QoS framework to this next
     generation of storage architectures.

     Its key features are as follows:

     1.	  Support for shared filesystems and clustered heterogeneous
	  operation.

	  GRIOv2 has been designed from the outset to work with the XVM volume
	  manager and fully supports guaranteed-rate I/O to both local XFS and
	  shared CXFS filesystems. It is designed to manage I/O from multiple
	  heterogeneous nodes and to ensure that a GRIO reservation on one
	  node is not affected by I/O elsewhere in a cluster.

									Page 1

grio2(5)							      grio2(5)

     2.	  A new filesystem-level performance qualification model.

	  GRIOv1 uses a complicated per-device qualification model, in which
	  the maximum sustainable bandwidth for each component in the I/O
	  path, from disk device to memory, is qualified separately. A
	  synthetic benchmark grio_bandwidth(1M) is used to profile individual
	  storage devices.

	  GRIOv1 depends on this information being complete and accurate. This
	  approach is appropriate for the tightly controlled environment of a
	  locally attached filesystem. However, as storage networks become
	  increasingly heterogeneous and topologies increasingly complex, this
	  approach becomes impractical.

	  As a result, GRIOv2 has moved to a filesystem-level qualification
	  model in which the maximum sustainable bandwidth is measured across
	  the entire filesystem under a realistic application workload.
	  Empirical measurement of actual filesystem performance is used to
	  determine the QoS parameters that can be delivered in practice by a
	  particular configuration. This is referred to as the qualified
	  bandwidth for the filesystem (and the XVM volume on which it
	  resides).

	  For local volumes the qualified bandwidth is stored in /etc/griotab,
	  for shared volumes it is stored in the cluster configuration
	  database (CDB). Refer to the GRIO Version 2 Guide, ggd2(1M) and
	  griotab(4) for more information on measuring and setting the
	  qualified bandwidth for a filesystem.

     3.	  Comprehensive QoS Monitoring.

	  GRIOv2 provides comprehensive tools for measuring and monitoring
	  delivered QoS levels. This includes in-kernel collection of per-
	  stream performance metrics. Refer to grioqos(1M) for further
	  information.

	  The information provided by the QoS facilities can be used to help
	  choose the tradeoff between resource utilisation and delivered I/O
	  performance that is most appropriate for a given application mix,
	  workload, and production environment.

     4.	  Cluster-wide encapsulation and control of non-GRIO I/O.

	  When GRIOv2 begins managing an XVM volume, every node with access to
	  that volume is notified. From that point on, all user and system I/O
	  that doesn't have an explicit GRIO reservation is encapsulated. This
	  means that all non-GRIO I/O is automatically associated with a
	  system managed nongrio kernel stream.

	  The central bandwidth management component of GRIOv2 ggd2 allocates
	  otherwise unused filesystem bandwidth to these streams - allowing
	  non-GRIO I/O to be processed even when there are active reservations

									Page 2

grio2(5)							      grio2(5)

	  in the system. ggd2 dynamically adjusts the amount of bandwidth
	  allocated for this purpose based on monitoring of filesystem demand
	  and utilisation. In addition to this Dynamic Bandwidth Allocation,
	  an administrator can reserve bandwidth at the node-level for use by
	  all nongrio applications running on that node, this is referred to
	  as a Static Bandwidth Allocation. Refer to ggd2(1M) and
	  grioadmin(1M) for more information.

USAGE RESTRICTIONS
     In order to utilize a GRIO reservation a file must be read or written
     using direct I/O. The open(2) manual page describes the use and buffer
     alignment restrictions of the direct I/O interface. A GRIO reservation
     can be made for any file within an XFS or CXFS filesystem created on an
     XVM volume.

     In some applications more deterministic performance can be achieved by
     creating files on a dedicated real-time subvolume. To allocate a file on
     the real-time subvolume of an XFS or CXFS filesystem the fcntl(2)
     F_FSSETXATTR command must be used to set the XFS_XFLAG_REALTIME flag.
     This can only be issued on a newly created file. It is not possible to
     mark a file as real-time once non-real-time data blocks have been
     allocated to it.

SOFTWARE COMPONENTS
     GRIOv2 functionality is distributed between three main components:	 the
     new guarantee-granting daemon ggd2; the userspace library libgrio2 and
     command line utilities; and the kernel.

     ggd2(1M) is a user level process started at system boot. It is
     responsible for activating and deactivating the GRIOv2 kernel scheduler,
     processing client requests to reserve and release bandwidth, tracking
     bandwidth utilisation, managing unallocated bandwidth, and enforcing the
     GRIOv2 software licenses.

     grioadmin(1M) is used to perform node-level administration tasks for XFS
     and CXFS filesystems including: querying available bandwidth, listing
     active GRIO reservations, and creating, modifying and releasing node-
     level static bandwidth allocations.

     grioqos(1M) is used to extract and report the QoS metrics that GRIO
     maintains for each stream.

     libgrio2 implements the GRIOv2 userspace API. User processes communicate
     with the daemon using the following core API calls:

	  grio_avail()	    - get available bandwidth for a filesystem
	  grio_reserve()    - reserve bandwidth from a filesystem
	  grio_reserve_fd() - reserve bandwidth and bind a file descriptor
	  grio_bind()	    - bind a file descriptor to a stream
	  grio_unbind()	    - unbind a file descriptor
	  grio_modify()	    - modify an existing stream
	  grio_get_stream() - map a bound file descriptor to its stream ID

									Page 3

grio2(5)							      grio2(5)

	  grio_release()    - signal that a stream should be reclaimed

     The process that initially reserves bandwidth with a call to grio_reserve
     or grio_reserve_fd is referred to as the owning process. Any streams not
     already released when their owning process exits will be automatically
     released. Streams can be shared between processes. The ownership of a
     GRIOv2 stream is non-transferable.

     GRIOv2 functionality in the kernel includes stream management, the I/O
     scheduler, cluster integration and messaging.

DEPLOYMENT CONSIDERATIONS
     There are two important constraints that must be observed when setting up
     a GRIOv2 filesystems:

     1.	  If any of the luns on a particular device will be managed as GRIO
	  volumes, then all of the luns should be managed as GRIO volumes.
	  Typically there will be hardware contention between separate luns,
	  both in the SAN and within the storage device. If only a subset of
	  the luns are managed, I/O to the unmanaged luns could still cause
	  oversubscription of the device and in turn violate GRIO rate
	  guarantees on the managed volumes.

     2.	  For a similar reason, a storage device containing GRIO managed
	  volumes should not be shared between clusters. The GRIO daemons
	  running within different clusters are not coordinated, and unmanaged
	  I/O from one cluster can cause GRIO rate guarantees in the other
	  cluster to be violated.

     It may be appropriate to relax these constraints if a storage device can
     be configured such that there is no internal or external contention
     between independent luns.

DATA LAYOUT & EXAMPLE
     This section provides some tips on how to set up a filesystem on a RAID
     device to achieve correct filesystem device alignment and maximise I/O
     performance. There are three steps that are essential to ensuring correct
     filesystem alignment:

     1.	  Ensuring that each data partition is correctly aligned with the
	  internal disk layout of its lun.

     2.	  Setting XVM stripe parameters correctly.

     3.	  Passing correct volume geometry (stripe unit and width) to
	  mkfs_xfs(1M).

     These three issues are demonstrated with an example.

     Consider a RAID device with 28 disks arranged as 4 volume groups, with 7
     disks per volume group, with each volume group configured as 6+1 RAID 5
     (6 data disks, 1 parity disk). These are mapped directly to 4 luns - 1

									Page 4

grio2(5)							      grio2(5)

     lun per volume group.

     If the back end transfer size of the RAID device is 128 KB (i.e. the size
     of transfers between the RAID controllers and individual disks), then
     each lun will have an aligned transfer size of 6*128 KB which is 768 KB
     or 1536 filesystem blocks (512 bytes each).

     The first step is to ensure that the raw data partitions are correctly
     aligned with the start of their corresponding luns (i.e.  the first disk
     in the volume group). In this case luns are 1536 blocks wide, so the
     start of the data partition should be a multiple of this number. As we
     already require space at the start of the lun for the volume header (e.g.
     4096 blocks by default for XLV/XVM) a good choice would be to move the
     start of the data partition to 4*1536 or 6144 blocks.

     GRIOv2 can only be used with XVM volumes, so xvm(1M) is used to partition
     each lun. The location of the data partition is controlled by adjusting
     the size of volume header and the size of the XVM volume label. In this
     case, by passing the following options to the label command:

	  xvm> label -volhdrblks 5120 -xvmlabelblks 1024 <devname>

     The luns are then arranged into a stripe. The stripe unit must match the
     aligned transfer size of the luns (or a multiple thereof). This is
     specified in the stripe subcommand as follows:

	  xvm> stripe -unit 1536 <slices ...>

     Now a filesystem is created on the XVM volume. If the stripe is used as
     the data subvolume the following command creates a filesystem with the
     correct alignment:

	  mkfs_xfs -d sunit=1536,swidth=6144 <xvm_devname>

     As there are four luns in total the stripe width swidth is four times the
     aligned transfer size of the individual luns. Specifying the stripe unit
     and width to mkfs_xfs allows it to ensure that key internal regions of
     the filesystem are correctly aligned with the underlying volume
     structure.

     If the stripe is used as the realtime subvolume then the realtime extent
     size should be set to a multiple of the volume stripe width. This extent
     size also becomes the optimal I/O size that should be used by
     applications doing I/O to the filesystem. The following command sets the
     extent size to the stripe width (note that the 'b' suffix is required to
     specify filesystem blocks):

	  mkfs_xfs -r extsize=6144b <xvm_devname>

     This will optimize the filesystem for I/Os spanning the entire disk
     array.

									Page 5

grio2(5)							      grio2(5)

     Note that if a non-GRIO XFS filesystem was created directly on one of
     these luns, the fx(1M) command is used to partition the disk and move the
     start of the data partition. For example the following sequence of
     commands:

	  fx -x -d <devname>
	  fx> repartition
	  fx/repartition> optiondrive
	  ...
	  fx/repartition> expert -b
	  ...

     will partition a drive as an option drive and then allow the layout of
     the partitions to be adjusted interactively (-b specifies input values
     are in filesystem blocks). The data partition should be selected and its
     first block moved to 6144 - placing the start of the data partition on
     the first disk of the lun.

     Remember, however, that an XFS filesystem must be made on an XVM volume
     if it is to be managed by GRIO.

FILES
     /etc/griotab

SEE ALSO
     ggd2(1M), grioadmin(1M), griotab(4) grio_avail(3X), grio_bind(3X),
     grio_modify(3X), grio_release(3X), grio_reserve(3X), grio_reserve_fd(3X)
     grio_unbind(3X), grioqos(1M)

									Page 6

[top]

List of man pages available for IRIX

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net