pdsh(1)pdsh(1)NAMEpdsh - issue commands to groups of hosts in parallel
SYNOPSISpdsh [options]... command
DESCRIPTIONpdsh is a variant of the rsh(1) command. Unlike rsh(1), which runs com‐
mands on a single remote host, pdsh can run multiple remote commands in
parallel. pdsh uses a "sliding window" (or fanout) of threads to con‐
serve resources on the initiating host while allowing some connections
to time out.
When pdsh receives SIGINT (ctrl-C), it lists the status of current
threads. A second SIGINT within one second terminates the program.
Pending threads may be canceled by issuing ctrl-Z within one second of
ctrl-C. Pending threads are those that have not yet been initiated, or
are still in the process of connecting to the remote host.
If a remote command is not specified on the command line, pdsh runs
interactively, prompting for commands and executing them when termi‐
nated with a carriage return. In interactive mode, target nodes that
time out on the first command are not contacted for subsequent com‐
mands, and commands prefixed with an exclamation point will be executed
on the local system.
The core functionality of pdsh may be supplemented by dynamically load‐
able modules. The modules may provide a new connection protocol
(replacing the standard rcmd(3) protocol used by rsh(1)), filtering
options (e.g. removing hosts that are "down" from the target list),
and/or host selection options (e.g., -a selects all hosts from a con‐
figuration file.). By default, pdsh must have at least one "rcmd" mod‐
ule loaded. See the RCMD MODULES section for more information.
RCMD MODULES
The method by which pdsh runs commands on remote hosts may be selected
at runtime using the -R option (See OPTIONS below). This functionality
is ultimately implemented via dynamically loadable modules, and so the
list of available options may be different from installation to instal‐
lation. A list of currently available rcmd modules is printed when
using any of the -h, -V, or -L options. The default rcmd module will
also be displayed with the -h and -V options.
A list of rcmd modules currently distributed with pdsh follows.
rsh Uses an internal, thread-safe implementation of BSD rcmd(3) to
run commands using the standard rsh(1) protocol.
exec Executes an arbitrary command for each target host. The first
of the pdsh remote arguments is the local command to execute,
followed by any further arguments. Some simple parameters are
substitued on the command line, including %h for the target
hostname, %u for the remote username, and %n for the remote
rank [0-n] (To get a literal % use %%). For example, the fol‐
lowing would duplicate using the ssh module to run hostname(1)
across the hosts foo[0-10]:
pdsh-R exec -w foo[0-10] ssh -x -l %u %h hostname
and this command line would run grep(1) in parallel across the
files console.foo[0-10]:
pdsh-R exec -w foo[0-10] grep BUG console.%h
ssh Uses a variant of popen(3) to run multiple copies of the ssh(1)
command.
mrsh This module uses the mrsh(1) protocol to execute jobs on remote
hosts. The mrsh protocol uses a credential based authentica‐
tion, forgoing the need to allocate reserved ports. In other
aspects, it acts just like rsh. Remote nodes must be running
mrshd(8) in order for the mrsh module to work.
qsh Allows pdsh to execute MPI jobs over QsNet. Qshell propagates
the current working directory, pdsh environment, and Elan capa‐
bilities to the remote process. The following environment vari‐
able are also appended to the environment: RMS_RANK,
RMS_NODEID, RMS_PROCID, RMS_NNODES, and RMS_NPROCS. Since pdsh
needs to run setuid root for qshell support, qshell does not
directly support propagation of LD_LIBRARY_PATH and LD_PREOPEN.
Instead the QSHELL_REMOTE_LD_LIBRARY_PATH and
QSHELL_REMOTE_LD_PREOPEN environment variables will may be used
and will be remapped to LD_LIBRARY_PATH and LD_PREOPEN by the
qshell daemon if set.
mqsh Similar to qshell, but uses the mrsh protocol instead of the
rsh protocol.
krb4 The krb4 module allows users to execute remote commands after
authenticating with kerberos. Of course, the remote rshd dae‐
mons must be kerberized.
xcpu The xcpu module uses the xcpu service to execute remote com‐
mands.
OPTIONS
The list of available options is determined at runtime by supplementing
the list of standard pdsh options with any options provided by loaded
rcmd and misc modules. In some cases, options provided by modules may
conflict with each other. In these cases, the modules are incompatible
and the first module loaded wins.
Standard target nodelist options-w [rcmd_type:][user@]host,host,...
Target the specified list of hosts. Do not use with any other
node selection options (e.g. -a, -g if they are available). No
spaces are allowed in the comma-separated list. A list consist‐
ing of a single `-' character causes the target hosts to be read
from stdin, one per line. The host list may contain hostlist
expressions of the form ``host[1-5,7]''. For more information
about the hostlist format, see the HOSTLIST EXPRESSIONS section
below. A list of hosts may also be preceded by "user@" to spec‐
ify a remote username other than the default, or "rcmd_type:" to
specify an alternate rcmd connection type for these hosts. When
used together, the rcmd type must be specified first, e.g.
"ssh:user1@host0" would use ssh to connect to host0 as user
"user1."
-x host,host,...
Exclude the specified hosts. May be specified in conjunction
with other target node list options such as -a and -g (when
available). Hostlists may also be specified to the -x option
(see the HOSTLIST EXPRESSIONS section below).
Standard pdsh options-S Return the largest of the remote command return values.
-h Output usage menu and quit. A list of available rcmd modules
will also be printed at the end of the usage message.
-s Only on AIX, separate remote command stderr and stdout into two
sockets.
-q List option values and the target nodelist and exit without
action.
-b Disable ctrl-C status feature so that a single ctrl-C kills par‐
allel job. (Batch Mode)
-l user
This option may be used to run remote commands as another user,
subject to authorization. For BSD rcmd, this means the invoking
user and system must be listed in the user´s .rhosts file (even
for root).
-t seconds
Set the connect timeout. Default is 30 seconds.
-u seconds
Set a limit on the amount of time a remote command is allowed to
execute. Default is no limit. See note in LIMITATIONS if using
-u with ssh.
-f number
Set the maximum number of simultaneous remote commands to num‐
ber. The default is 64.
-R name
Set rcmd module to name. This option may also be set via the
PDSH_RCMD_TYPE environment variable. A list of available rcmd
modules may be obtained via the -h, -V, or -L options. The
default will be listed with -h or -V.
-L List info on all loaded pdsh modules and quit.
-N Disable hostname: prefix on lines of output.
-d Include more complete thread status when SIGINT is received, and
display connect and command time statistics on stderr when done.
-V Output pdsh version information, along with list of currently
loaded modules, and exit.
qsh/mqsh module options
-n tasks_per_node
Set the number of tasks spawned per node. Default is 1.
-m block | cyclic
Set block versus cyclic allocation of processes to nodes.
Default is block.
-r railmask
Set the rail bitmask for a job on a multirail system. The
default railmask is 1, which corresponds to rail 0 only. Each
bit set in the argument to -r corresponds to a rail on the sys‐
tem, so a value of 2 would correspond to rail 1 only, and 3
would indicate to use both rail 1 and rail 0.
machines module options-a Target all nodes from machines file.
genders module options
In addition to the genders options presented below, the genders
attribute pdsh_rcmd_type may also be used in the genders database to
specify an alternate rcmd connect type than the pdsh default for hosts
with this attribute. For example, the following line in the genders
file
host0 pdsh_rcmd_type=ssh
would cause pdsh to use ssh to connect to host0, even if rsh were the
default. This can be overridden on the commandline with the
"rcmd_type:host0" syntax.
-A Target all nodes in genders database. The -A option will target
every host listed in genders -- if you want to omit some hosts
by default, see the -a option below.
-a Target all nodes in genders database except those with the
"pdsh_all_skip" attribute. This is shorthand for running "pdsh
-A -X pdsh_all_skip ..."
-g attr[=val][,attr[=val],...]
Target nodes that match any of the specified genders attributes
(with optional values). Conflicts with -a and -w options. This
option targets the alternate hostnames in the genders database
by default. The -i option provided by the genders module may be
used to translate these to the canonical genders hostnames. If
the installed version of genders supports it, attributes sup‐
plied to -g may also take the form of genders queries. Genders
queries will query the genders database for the union, intersec‐
tion, difference, or complement of genders attributes and val‐
ues. The set operation union is represented by two pipe symbols
('||'), intersection by two ampersand symbols ('&&'), difference
by two minus symbols ('--'), and complement by a tilde ('~').
Parentheses may be used to change the order of operations. See
the nodeattr(1) manpage for examples of genders queries.
-X attr[=val][,attr[=val],...]
Exclude nodes that match any of the specified genders attributes
(optionally with values). This option may be used in combina‐
tion with any other of the node selection options (e.g. -w, -g,
-a, -X may also take the form of genders queries. Please see
documentation for the genders -g option for more information
about genders queries.
-i Request translation between canonical and alternate hostnames.
-F filename
Read genders information from filename instead of the system
default genders file.
nodeupdown module options-v Eliminate target nodes that are considered "down" by libnodeup‐
down.
slurm module options
The slurm module allows pdsh to target nodes based on currently running
SLURM jobs. The slurm module is typically called after all other node
selection options have been processed, and if no nodes have been
selected, the module will attempt to read a running jobid from the
SLURM_JOBID environment variable (which is set when running under a
SLURM allocation). If SLURM_JOBID references an invalid job, it will be
silently ignored.
-j jobid[,jobid,...]
Target list of nodes allocated to the SLURM job jobid. This
option may be used multiple times to target multiple SLURM jobs.
The special argument "all" can be used to target all nodes run‐
ning SLURM jobs, e.g. -j all.
rms module options
The rms module allows pdsh to target nodes based on an RMS resource.
The rms module is typically called after all other node selection
options, and if no nodes have been selected, the module will examine
the RMS_RESOURCEID environment variable and attempt to set the target
list of hosts to the nodes in the RMS resource. If an invalid resource
is denoted, the variable is silently ignored.
SDR module options
The SDR module supports targeting hosts via the System Data Repository
on IBM SPs.
-a Target all nodes in the SDR. The list is generated from the
"reliable hostname" in the SDR by default.
-i Translate hostnames between reliable and initial in the SDR,
when applicable. If the a target hostname matches either the
initial or reliable hostname in the SDR, the alternate name will
be substitued. Thus a list composed of initial hostnames will
instead be replaced with a list of reliable hostnames. For
example, when used with -a above, all initial hostnames in the
SDR are targeted.
-v Do not target nodes that are marked as not responding in the SDR
on the targeted interface. (If a hostname does not appear in the
SDR, then that name will remain in the target hostlist.)
-G In combination with -a, include all partitions.
nodeattr module options
The nodeattr module supports access to the genders database via the
nodeattr(1) command. See the genders section above for a list of sup‐
port options with this module. The option usage with the nodeattr mod‐
ule is the same as genders, above, with the exception that the -i
option may only be used with -a or -g.
dshgroup module options
The dshgroup module allows pdsh to use dsh (or Dancer's shell) style
group files from /etc/dsh/group/ or ~/.dsh/group/.
-g groupname,...
Target nodes in dsh group file "groupname" found in either
~/.dsh/group/groupname or /etc/dsh/group/groupname.
-X groupname,...
Exclude nodes in dsh group file "groupname."
netgroup module options
The netgroup module allows pdsh to use standard netgroup entries to
build lists of target hosts. (/etc/netgroup or NIS)
-g groupname,...
Target nodes in netgroup "groupname."
-X groupname,...
Exclude nodes in netgroup "groupname."
ENVIRONMENT VARIABLES
PDSH_RCMD_TYPE
Equivalent to the -R option, the value of this environment vari‐
able will be used to set the default rcmd module for pdsh to use
(e.g. ssh, rsh).
PDSH_SSH_ARGS
Override the standard arguments that pdsh passes to the ssh(1)
command ("-2 -a -x").
PDSH_SSH_ARGS_APPEND
Append additional options to the ssh(1) command invoked by pdsh.
For example, PDSH_SSH_ARGS_APPEND="-q" would run ssh in quiet
mode, or "-v" would increase the verbosity of ssh.
WCOLL If no other node selection option is used, the WCOLL environment
variable may be set to a filename from which a list of target
hosts will be read. The file should contain a list of hosts, one
per line (though each line may contain a hostlist expression.
See HOSTLIST EXPRESSIONS section below).
DSHPATH
If set, the path in DSHPATH will be used as the PATH for the
remote processes.
FANOUT Set the pdsh fanout (See description of -f above).
HOSTLIST EXPRESSIONS
As noted in sections above pdsh accepts lists of hosts the general
form: prefix[n-m,l-k,...], where n < m and l < k, etc., as an alterna‐
tive to explicit lists of hosts. This form should not be confused with
regular expression character classes (also denoted by ``[]''). For
example, foo[19] does not represent an expression matching foo1 or
foo9, but rather represents the degenerate hostlist: foo19.
The hostlist syntax is meant only as a convenience on clusters with a
"prefixNNN" naming convention and specification of ranges should not be
considered necessary -- this foo1,foo9 could be specified as such, or
by the hostlist foo[1,9].
Some examples of usage follow:
Run command on foo01,foo02,...,foo05
pdsh-w foo[01-05] command
Run command on foo7,foo9,foo10
pdsh-w foo[7,9-10] command
Run command on foo0,foo4,foo5
pdsh-w foo[0-5] -x foo[1-3] command
A suffix on the hostname is also supported:
Run command on foo0-eth0,foo1-eth0,foo2-eth0,foo3-eth0
pdsh-w foo[0-3]-eth0 command
As a reminder to the reader, some shells will interpret brackets ('['
and ']') for pattern matching. Depending on your shell, it may be nec‐
essary to enclose ranged lists within quotes. For example, in tcsh,
the first example above should be executed as:
pdsh-w "foo[01-05]" command
ORIGIN
Originally a rewrite of IBM dsh(1) by Jim Garlick <garlick@llnl.gov> on
LLNL's ASCI Blue-Pacific IBM SP system. It is now used on Linux clus‐
ters at LLNL.
LIMITATIONS
When using ssh for remote execution, expect the stderr of ssh to be
folded in with that of the remote command. When invoked by pdsh, it is
not possible for ssh to prompt for passwords if RSA/DSA keys are con‐
figured properly, etc.. For ssh implementations that suppport a con‐
nect timeout option, pdsh attempts to use that option to enforce the
timeout (e.g. -oConnectTimeout=T for OpenSSH), otherwise connect time‐
outs are not supported when using ssh. Finally, there is no reliable
way for pdsh to ensure that remote commands are actually terminated
when using a command timeout. Thus if -u is used with ssh commands may
be left running on remote hosts even after timeout has killed local ssh
processes.
Output from multiple processes per node may be interspersed when using
qshell or mqshell rcmd modules.
The number of nodes that pdsh can simultaneously execute remote jobs on
is limited by the maximum number of threads that can be created concur‐
rently, as well as the availability of reserved ports in the rsh and
qshell rcmd modules. On systems that implement Posix threads, the limit
is typically defined by the constant PTHREADS_THREADS_MAX.
FILESSEE ALSOrsh(1), ssh(1), dshbak(1), pdcp(1)pdsh-2.16 hpux11.31 pdsh(1)