lamssi_rpi man page on YellowDog

Man page or keyword search:  
man Server   18644 pages
apropos Keyword Search (all sections)
Output format
YellowDog logo
[printable version]

lamssi_rpi(7)		     LAM SSI RPI OVERVIEW		 lamssi_rpi(7)

NAME
       LAM SSI RPI - overview of LAM's RPI SSI modules

DESCRIPTION
       The  "kind"  for	 RPI  SSI  modules is "rpi".  Specifically, the string
       "rpi" (without the quotes) should be used to specify which  RPI	should
       be used on the mpirun command line with the -ssi switch.	 For example:

       mpirun -ssi rpi tcp C my_mpi_program
	   Specifies  to  use  the tcp RPI (and to launch a single copy of the
	   executable "foo" on each node).

       The "rpi" string is also used as a prefix send parameters  to  specific
       RPI modules.  For example:

       mpirun -ssi rpi tcp -ssi rpi_tcp_short 131072 C my_mpi_program
	   Specifies  to  use  the tcp RPI, and to pass in the value of 131072
	   (128K) as the short message length for TCP messages.	 See each  RPI
	   section  below  for	a  full	 description  of  parameters  that are
	   accepted by each RPI.

       LAM currently supports five different RPI SSI modules: gm,  lamd,  tcp,
       sysv, usysv.

SELECTING AN RPI MODULE
       Only  one RPI module may be selected per command execution.  The selec‐
       tion of which module occurs during MPI_INIT, and is used for the	 dura‐
       tion  of the MPI process.  It is erroneous to select different RPI mod‐
       ules for different processes.

       The kind for selecting an RPI is "rpi".	For example:

       mpriun -ssi rpi tcp C my_mpi_program
	   Selects to use the tcp RPI and run a single copy of the foo exectu‐
	   able on each node.

AVAILABLE MODULES
       As with all SSI modules, it is possible to pass parameters at run time.
       This section discusses the built-in LAM RPI modules,  as	 well  as  the
       run-time parameters that they accept.

       In  the discussion below, the parameters are discussed in terms of kind
       and name.  The kind and name may be specified as command line arguments
       to the mpirun command with the -ssi switch, or they may be set in envi‐
       ronment variables of the form LAM_MPI_SSI_name=value.  Note that	 using
       the -ssi command line switch will take precendence over any environment
       variables.

       If the RPI that is selected is unable to run (e.g., attempting  to  use
       the gm RPI when gm support was not compiled into LAM, or if no gm hard‐
       ware is available on the nodes), an appropriate error message  will  be
       printed and execution will abort.

   crtcp RPI
       The  crtcp RPI is a checkpoint/restart-able version of the tcp RPI (see
       below).	It is separate from the tcp RPI because the current  implemen‐
       tation  imposes	a  slight performance penalty to enable the ability to
       checkpoint and restart MPI jobs.	 Its tunable parameters are  the  same
       as the tcp RPI.	This RPI probably only needs to be used when the abil‐
       ity to checkpoint and restart MPI jobs is required.

       See the LAM/MPI User's Guide for more details on the crtcp RPI as  well
       as  the	checkpoint/restart  capabilities of LAM/MPI.  The lamssi_cr(7)
       manual page also contains additional information.

   gm RPI
       The gm RPI is used with native Myrinet networks.	 Please note that  the
       gm  RPI exists, but has not yet been optimized.	It gives significantly
       better performance than TCP over Myrinet networks, but has not yet been
       properly tuned and instrumented in LAM.

       That being said, there are several tunable parameters in the gm RPI:

       rpi_gm_maxport N
	   If  rpi_gm_port  is not specified, LAM will attempt to find an open
	   GM port to use for MPI communications starting with port 1 and end‐
	   ing	with the N value speified by the rpi_gm_maxport parameter.  If
	   unspecified, LAM will try all existing GM ports.

       rpi_gm_port N
	   LAM will attempt to use gm port N for MPI communications.

       rpi_gm_tinymsglen N
	   Specifies the maximum message size (in bytes) for  "tiny"  messages
	   (i.e.,  messages  that  are sent entirely in one gm message).  Tiny
	   messages are memcpy'ed into the header before it  is	 sent  to  the
	   destination,	 and  memcpy'ed out of the header into the destination
	   buffer on the receiver.  Hence, it is not advisable	to  make  this
	   value too large.

       rpi_gm_fast 1
	   Specifies to use the "fast" protocol for sending short gm messages.
	   Unreliable in the presence of GM errors or timeouts; this parameter
	   is  not  advised  for MPI applications that essentially do not make
	   continual progress within MPI.

       rpi_gm_cr 1
	   Enable checkpoint/restart  behavior	for  gm.   This	 can  only  be
	   enabled  if	the  gm	 rpi  module was compiled with support for the
	   gm_get() function, which is	disabled  by  default.	 See  the  LAM
	   Installation and User's Guides for more information on this parame‐
	   ter before you use it.

   lamd RPI
       The lamd RPI uses LAM's "out-of-band" communication mechanism for pass‐
       ing  MPI	 messages.   Specifically, MPI messages are sent from the user
       process to the local LAM daemon, then to the remote LAM daemon (if  the
       destination  process  is on a different node), and then to the destina‐
       tion process.

       While this adds latency to message passing because of  the  extra  hops
       that  each message must travel, it allows for true asynchronous message
       passing.	 Since the LAM daemon is running in its own  execution	space,
       it  can make progress on message passing regardless of the state / sta‐
       tus of the user's program.  This can be an overall net savings in  per‐
       formance and execution time for some classes of MPI programs.

       It  is  expected	 that  this  RPI will someday become obsolete when LAM
       becomes multi-threaded and allows progress to be made on message	 pass‐
       ing in separate threads rather than in separate processes.

       The lamd RPI has no tunable parameters.

   tcp RPI
       The tcp RPI uses pure TCP for all MPI message passing.  TCP sockets are
       opened between MPI processes and are used for all MPI traffic.

       The tcp RPI has one tunable parameter:

       rpi_tcp_short <bytes>
	   Tells the tcp RPI the smallest size (in bytes) for a message to  be
	   considered  "long".	 Short	messages are sent eagerly (even if the
	   receiving side is not expecting them).  Long messages use a	rende‐
	   vouz	 protocol  (i.e., a three-way handshake) such that the message
	   is not actually sent until the  receiver  is	 expecting  it.	  This
	   value defaults to 64k.

   sysv RPI
       The sysv RPI uses shared memory for communication between MPI processes
       on the same node, and TCP sockets for communication  between  MPI  pro‐
       cesses  on  different  nodes.  System V semaphores are used to lock the
       shared memory pools.  This RPI is best used when running	 multiple  MPI
       processes  on  uniprocessors  (or  oversubscribed  SMPs) because of the
       blocking / yielding nature of semaphores.

       The sysv RPI has the following tunable parameters:

       rpi_tcp_short <bytes>
	   Since the sysv RPI uses parts of the tcp RPI for off-node  communi‐
	   cation,  this  parameter  also  has relevance to the sysv RPI.  The
	   meaning of this parameter is discussed in the tcp RPI section.

       rpi_sysv_short <bytes>
	   Tells the sysv RPI the smallest size (in bytes) for a message to be
	   considered  "long".	 Short shared memory messages are sent using a
	   small "postbox" protocol; long messages use a more  general	shared
	   memory pool method.	This value defaults to 8k.

       rpi_sysv_pollyield <bool>
	   If set to a nonzero number, force the use of a system call to yield
	   the processor.  The system call will be yield(), sched_yield(),  or
	   select()  (with  a  1ms  timeout),  depending  what LAM's configure
	   script finds at configuration time.	This value defaults to 1.

       rpi_sysv_shmpoolsize <bytes>
	   The size of the shared memory pool that is used  for	 long  message
	   transfers.  It is allocated once on each node for each MPI parallel
	   job.	 Specifically, if multiple MPI processes from the same	paral‐
	   lel	job are spawned on a single node, this pool will only be allo‐
	   cated once.

	   The configure script will try to determine a default size  for  the
	   pool	 if none is explicitly specified (you should always check this
	   to see if it is reasonable).	 Larger values should improve  perfor‐
	   mance  especially  when  an	application passes large messages, but
	   will also increase the system resources used by each task.

       rpi_sysv_shmmaxalloc <bytes>
	   To prevent a single large message transfer  from  monopolizing  the
	   global pool, allocations from the pool are actually restricted to a
	   maximum  of	rpi_sysv_shmmaxalloc  bytes  each.   Even  with	  this
	   restriction,	 it  is	 possible  for	the global pool to temporarily
	   become exhausted. In this case, the transport  will	fall  back  to
	   using the postbox area to transfer the message. Performance will be
	   degraded, but the application will progress.

	   The configure script will try to determine a default size  for  the
	   maximum  atomic  transfer size if none is explicitly specified (you
	   should always check this to see if it is reasonable).  Larger  val‐
	   ues	should	improve	 performance  especially  when	an application
	   passes large messages, but will also increase the system  resources
	   used by each task.

   usysv RPI
       The  usysv  RPI	uses  shared memory for communication between MPI pro‐
       cesses on the same node, and TCP sockets for communication between  MPI
       processes  on  different nodes.	Spin locks are used to lock the shared
       memory pools.  This RPI is best used when the multiple of MPI processes
       on  a  single  node  is	less than or equal to the number of processors
       because it allows LAM to fully occupy the processor while waiting for a
       message and never be swapped out.

       The usysv RPI has many of the same tunable parameters as the sysv RPI:

       rpi_tcp_short <bytes>
	   Same meaning as in the sysv RPI.

       rpi_usysv_short <bytes>
	   Same meaning as rpi_sysv_short in the sysv RPI.

       rpi_usysv_pollyield <bool>
	   Same meaning as rpi_sysv_pollyield in the sysv RPI.

       rpi_usysv_shmpoolsize <bytes>
	   Same meaning as rpi_sysv_shmpoolsize in the sysv RPI.

       rpi_usysv_shmmaxalloc <bytes>
	   Same meaning as rpi_sysv_shmmaxalloc in the sysv RPI.

       rpi_usysv_readlockpoll <iterations>
	   Number  of  iterations  to spin before yielding the processor while
	   waiting to read.  This value defaults to 10,000.

       rpi_usysv_writelockpoll <iterations>
	   Number of iterations to spin before yielding	 the  processor	 while
	   waiting to write.  This value defaults to 10.

SEE ALSO
       lamssi(7), lamssi_cr(7), mpirun(1), LAM User's Guide

LAM 7.1.2			  March, 2006			 lamssi_rpi(7)
[top]

List of man pages available for YellowDog

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net