MPI man page on IRIX

Man page or keyword search:  
man Server   31559 pages
apropos Keyword Search (all sections)
Output format
IRIX logo
[printable version]



MPI(1)									MPI(1)

NAME
     MPI - Introduction to the Message Passing Interface (MPI)

DESCRIPTION
     The Message Passing Interface (MPI) is a component of the Message Passing
     Toolkit (MPT), which is a software package that supports parallel
     programming across a network of computer systems through a technique
     known as message passing.	The goal of MPI, simply stated, is to develop
     a widely used standard for writing message-passing programs. As such, the
     interface establishes a practical, portable, efficient, and flexible
     standard for message passing.

     This MPI implementation supports the MPI 1.2 standard, as documented by
     the MPI Forum in the spring 1997 release of MPI: A Message Passing
     Interface Standard.  In addition, certain MPI-2 features are also
     supported.	 In designing MPI, the MPI Forum sought to make use of the
     most attractive features of a number of existing message passing systems,
     rather than selecting one of them and adopting it as the standard.	 Thus,
     MPI has been strongly influenced by work at the IBM T. J. Watson Research
     Center, Intel's NX/2, Express, nCUBE's Vertex, p4, and PARMACS. Other
     important contributions have come from Zipcode, Chimp, PVM, Chameleon,
     and PICL.

     MPI requires the presence of an Array Services daemon (arrayd) on each
     host that is to run MPI processes. In a single-host environment, no
     system administration effort should be required beyond installing and
     activating arrayd. However, users wishing to run MPI applications across
     multiple hosts will need to ensure that those hosts are properly
     configured into an array.	For more information about Array Services, see
     the arrayd(1M), arrayd.conf(4), and array_services(5) man pages.

     When running across multiple hosts, users must set up their .rhosts files
     to enable remote logins. Note that MPI does not use rsh, so it is not
     necessary that rshd be running on security-sensitive systems; the .rhosts
     file was simply chosen to eliminate the need to learn yet another
     mechanism for enabling remote logins.

     Other sources of MPI information are as follows:

     *	 Man pages for MPI library functions

     *	 A copy of the MPI standard as PostScript or hypertext on the World
	 Wide Web at the following URL:

	      http://www.mpi-forum.org/

     *	 Other MPI resources on the World Wide Web, such as the following:

	      http://www.mcs.anl.gov/mpi/index.html
	      http://www.erc.msstate.edu/mpi/index.html
	      http://www.mpi.nd.edu/lam/

									Page 1

MPI(1)									MPI(1)

   Getting Started
     For IRIX systems, the Modules software package is available to support
     one or more installations of MPT.	To use the MPT software, load the
     desired mpt module.

     After you have initialized modules, enter the following command:

	  module load mpt

     To unload the mpt module, enter the following command:

	  module unload mpt

     MPT software can be installed in an alternate location for use with the
     modules software package.	If MPT software has been installed on your
     system for use with modules, you can access the software with the module
     command shown in the previous example.  If MPT has not been installed for
     use with modules, the software resides in default locations on your
     system (/usr/include, /usr/lib, /usr/array/PVM, and so on), as in
     previous releases.	 For further information, see Installing MPT for Use
     with Modules, in the Modules relnotes.

   Using MPI
     Compile and link your MPI program as shown in the following examples.

     IRIX systems:

     To use the 64-bit MPI library, choose one of the following commands:

	  cc -64 compute.c -lmpi
	  f77 -64 -LANG:recursive=on compute.f -lmpi
	  f90 -64 -LANG:recursive=on compute.f -lmpi
	  CC -64 compute.C -lmpi++ -lmpi

     To use the 32-bit MPI library, choose one of the following commands:

	  cc -n32 compute.c -lmpi
	  f77 -n32 -LANG:recursive=on compute.f -lmpi
	  f90 -n32 -LANG:recursive=on compute.f -lmpi
	  CC -n32 compute.C -lmpi++ -lmpi

     Linux systems:

     To use the 64-bit MPI library on Linux IA64 systems, choose one of the
     following commands:

									Page 2

MPI(1)									MPI(1)

	  g++ -o myprog myproc.C -lmpi++ -lmpi
	  gcc -o myprog myprog.c -lmpi

     For Altix the libmpi++.so library is not binary compatible with code
     generated by g++ 3.0 compilers.  For this reason an additional library is
     supported for g++ 3.0 users as well as Intel C++ 8.0 users. The library
     is libg++3mpi++.so and can be linked in by using -lg++3mpi++ instead of
     -lmpi++.

     For IRIX systems, if Fortran 90 compiler 7.2.1 or higher is installed,
     you can add the -auto_use option as follows to get compile-time checking
     of MPI subroutine calls:

	  f90 -auto_use mpi_interface -64 compute.f -lmpi
	  f90 -auto_use mpi_interface -n32 compute.f -lmpi

     For IRIX with MPT version 1.4 or higher, and Altix with MPT 1.9 or
     higher, the Fortran 90 USE MPI feature is supported.  You can replace the
     include 'mpif.h' statement in your Fortran 90 source code with USE MPI.
     This facility includes MPI type and parameter definitions, and performs
     compile-time checking of MPI function and subroutine calls.

     For Altix users, if you USE MPI you must supply a -I option with the efc
     command line to specify the directory in which the MPI.mod file resides.
     efc will fail to find MPI.mod unless you supply a -I option; there is no
     default search path for Fortran module files.  For default-location
     installations, -I/usr/include is correct; replace /usr/include with the
     corresponding directory in your non-default-location installation if
     necessary.

     The Intel efc compiler does not support the notion of "allow any type"
     formal arguments, so definitions for such routines as MPI_Send and
     MPI_Recv which have buffer or other arguments which may be of any type
     are omitted from USE MPI on Altix.	 Compile-time checking of these
     functions is therefore not available on Altix.

     NOTE:  Do not use the IRIX Fortran 90 -auto_use mpi_interface option to
     compile IRIX Fortran 90 source code that contains the USE MPI statement.
     They are incompatible with each other.

     For IRIX systems, applications compiled under a previous release of MPI
     should not require recompilation to run under this new (3.3) release.
     However, it is not possible for executable files running under the 3.2
     release to interoperate with others running under the 3.3 release.

     The C version of the MPI_Init(3) routine ignores the arguments that are
     passed to it and does not modify them.

									Page 3

MPI(1)									MPI(1)

     Stdin is enabled only for those MPI processes with rank 0 in the first
     MPI_COMM_WORLD (which does not need to be located on the same host as
     mpirun).  Stdout and stderr results are enabled for all MPI processes in
     the job, whether launched via mpirun, or one of the MPI-2 spawn
     functions.

     This version of the IRIX MPI implementation is compatible with the sproc
     system call and can therefore coexist with doacross loops.	 SGI MPI can
     likewise coexist with OpenMP on Linux systems. By default MPI is not
     threadsafe.  Therefore, calls to MPI routines in a multithreaded
     application will require some form of mutual exclusion.  The
     MPI_Init_thread call can be used to request thread safety.	 In this case,
     MPI calls can be made within parallel regions.  MPI_Init_thread is
     available on IRIX only.

     For IRIX and Linux systems, this implementation of MPI requires that all
     MPI processes call MPI_Finalize eventually.

   Buffering
     The current implementation buffers messages unless the MPI_BUFFER_MAX
     environment variable is set or if the message size is large enough and
     certain safe MPI functions are used.

     Buffered messages are grouped into two classes based on length: short
     (messages with lengths of 64 bytes or less) and long (messages with
     lengths greater than 64 bytes).

     When MPI_BUFFER_MAX is set, messages greater than this value are
     candidates for single-copy transfers.  For IRIX systems, the data from
     the sending process must reside in the symmetric data, symmetric heap, or
     global heap segment and be a contiguous type.  For Linux systems, the
     data from the sending process can reside in the static region, stack, or
     private heap and must be a contiguous type.

     For more information on single-copy transfers, see the MPI_BUFFER_MAX and
     MPI_DEFAULT_SINGLE_COPY_OFF environment variables.

   Myrinet (GM) Support
     This release provides support for use of the GM protocol over Myrinet
     interconnects on IRIX systems. Support is currently limited to 64-bit
     applications.

   Using MPI with cpusets
     You can use cpusets to run MPI applications (see cpuset(4)).  However, it
     is highly recommended that the cpuset have the MEMORY_LOCAL attribute.
     On Origin systems, if this attribute is not used, you should disable NUMA
     optimizations (see the MPI_DSM_OFF environment variable description in
     the following section).

									Page 4

MPI(1)									MPI(1)

   Default Interconnect Selection
     Beginning with the MPT 1.6 release, the search algorithm for selecting a
     multi-host interconnect has been significantly modified.  By default, if
     MPI is being run across multiple hosts, or if multiple binaries are
     specified on the mpirun command, the software now searches for
     interconnects in the following order (for IRIX systems):

	  1) XPMEM (NUMAlink - only available on partitioned systems)
	  2) GSN
	  3) MYRINET
	  4) TCP/IP

     The only supported interconnects on Linux systems are XPMEM and TCP/IP.

     MPI uses the first interconnect it can detect and configure correctly.
     There will only be one interconnect configured for the entire MPI job,
     with the exception of XPMEM.  If XPMEM is found on some hosts, but not on
     others, one additional interconnect is selected.

     The user can specify a mandatory interconnect to use by setting one of
     the following new environment variables.  These variables will be
     assessed in the following order:

	  1) MPI_USE_XPMEM
	  2) MPI_USE_GSN
	  3) MPI_USE_GM
	  4) MPI_USE_TCP

     For a mandatory interconnect to be used, all of the hosts on the mpirun
     command line must be connected via the device, and the interconnect must
     be configured properly.  If this is not the case, an error message is
     printed to stdout and the job is terminated.  XPMEM is an exception to
     this rule, however.

     If MPI_USE_XPMEM is set, one additional interconnect can be selected via
     the MPI_USE variables.  Messaging between the partitioned hosts will use
     the XPMEM driver while messaging between non-partitioned hosts will use
     the second interconnect.  If a second interconnect is required but not
     selected by the user, MPI will choose the interconnect to use, based on
     the default hierarchy.

     If the global -v verbose option is used on the mpirun command line, a
     message is printed to stdout, indicating which multi-host interconnect is
     being used for the job.

     The following interconnect selection environment variables have been
     deprecated in the MPT 1.6 release: MPI_GSN_ON, MPI_GM_ON, and
     MPI_BYPASS_OFF.  If any of these variables are set, MPI prints a warning
     message to stdout. The meanings of these variables are ignored.

									Page 5

MPI(1)									MPI(1)

   Using MPI-2 Process Creation and Management Routines
     This release provides support for MPI_Comm_spawn and
     MPI_Comm_spawn_multiple.  However, options must be specified as an
     argument on the mpirun command line or as an environment variable to
     enable this feature.  On IRIX, this feature is only supported for MPI
     jobs running within a single host running IRIX 6.5.2 or later.  Support
     on Linux is restricted to Altix numalinked systems.  Consult the mpirun
     man page for details on how to enable spawn support.

ENVIRONMENT VARIABLES
     This section describes the variables that specify the environment under
     which your MPI programs will run. Unless otherwise specified, these
     variables are available for both Linux and IRIX systems.  Environment
     variables have predefined values.	You can change some variables to
     achieve particular performance objectives; others are required values for
     standard-compliant programs.

     MPI_ARRAY
	  Sets an alternative array name to be used for communicating with
	  Array Services when a job is being launched.

	  Default:  The default name set in the arrayd.conf file

     MPI_BAR_COUNTER (IRIX systems only)
	  Specifies the use of a simple counter barrier algorithm within the
	  MPI_Barrier(3) and MPI_Win_fence(3) functions.

	  Default:  Enabled for jobs using fewer than 64 MPI processes.

     MPI_BAR_DISSEM
	  Specifies the use of of a dissemination/butterfly algorithm within
	  the MPI_Barrier(3) and MPI_Win_fence(3) functions. This algorithm
	  has generally been found to provide the best performance.  By
	  default on IRIX systems this algorithm is used for MPI_COMM_WORLD
	  and congruent communicators.	Explicitly specifying this environment
	  variable also enables the use of this algorithm for other
	  communicators on both IRIX and Linux systems.

	  Default:  On IRIX systems enabled for MPI_COMM_WORLD for jobs using
	  more than 64 processes.  On Altix systems enabled by default for all
	  MPI communicators for all process counts.

     MPI_BAR_TREE
	  Specifies the use of a tree barrier within the MPI_Barrier(3) and
	  MPI_Win_fence(3) functions.	This variable can also be used to
	  change the default arity(fan-in) of the tree barrier algorithm.
	  Typically this barrier is slower than the butterfly/dissemination
	  barrier.

	  Default:  Not enabled.  Default arity is 8 when enabled.

									Page 6

MPI(1)									MPI(1)

     MPI_BUFFER_MAX
	  Specifies a minimum message size, in bytes, for which the message
	  will be considered a candidate for single-copy transfer.

	  On IRIX, this mechanism is available only for communication between
	  MPI processes on the same host. The sender data must reside in
	  either the symmetric data, symmetric heap, or global heap. The MPI
	  data type on the send side must also be a contiguous type.

	  On IRIX, if the XPMEM driver is enabled (for single host jobs, see
	  MPI_XPMEM_ON and for multihost jobs, see MPI_USE_XPMEM), MPI allows
	  single-copy transfers for basic predefined MPI data types from any
	  sender data location, including the stack and private heap.  The
	  XPMEM driver also allows single-copy transfers across partitions.

	  On IRIX, if cross mapping of data segments is enabled at job
	  startup, data in common blocks will reside in the symmetric data
	  segment.  On systems running IRIX 6.5.2 or higher, this feature is
	  enabled by default. You can employ the symmetric heap by using the
	  shmalloc(shpalloc) functions available in LIBSMA.

	  On Linux, this feature is supported for both single host MPI jobs
	  and MPI jobs running across partitions. MPI uses the xpmem module to
	  map memory from one MPI process onto another during job startup.
	  The mapped areas include the static region, private heap, and stack
	  region.  Single-copy is supported for contiguous data types from any
	  of the mapped regions.

	  Memory mapping is enabled by default on Linux.  To disable it, set
	  the MPI_MEMMAP_OFF environment variable.  In addition, the xpmem
	  kernel module must be installed on your system for single-copy
	  transfers. The xpmem module is released with the OS.

	  Testing of this feature has indicated that most MPI applications
	  benefit more from buffering of medium-sized messages than from
	  buffering of large size messages, even though buffering of medium-
	  sized messages requires an extra copy of data.  However, highly
	  synchronized applications that perform large message transfers can
	  benefit from the single-copy pathway.

	  Single-copy can occur by default for certain MPI functions that
	  transfer large size messages.	 See MPI_DEFAULT_SINGLE_COPY_OFF for
	  more information and how to disable it.

	  Default:  Not enabled

     MPI_BUFS_PER_HOST
	  Determines the number of shared message buffers (16 KB each) that
	  MPI is to allocate for each host.  These buffers are used to send
	  long messages and interhost messages.

	  Default:  32 pages (1 page = 16KB)

									Page 7

MPI(1)									MPI(1)

     MPI_BUFS_PER_PROC
	  Determines the number of private message buffers (16 KB each) that
	  MPI is to allocate for each process.	These buffers are used to send
	  long messages and intrahost messages.

	  Default:  32 pages (1 page = 16KB)

     MPI_CHECK_ARGS
	  Enables checking of MPI function arguments. Segmentation faults
	  might occur if bad arguments are passed to MPI, so this is useful
	  for debugging purposes.  Using argument checking adds several
	  microseconds to latency.

	  Default:  Not enabled

     MPI_COMM_MAX
	  Sets the maximum number of communicators that can be used in an MPI
	  program.  Use this variable to increase internal default limits.
	  (Might be required by standard-compliant programs.)  MPI generates
	  an error message if this limit (or the default, if not set) is
	  exceeded.

	  Default:  256

     MPI_COREDUMP
	  Controls which ranks of an MPI job can dump core on receipt of a
	  core-dumping signal.	Valid values are NONE, FIRST, ALL, or INHIBIT.
	  NONE means that no rank should dump core.  FIRST means that the
	  first rank on each host to receive a core-dumping signal should dump
	  core. ALL means that all ranks should dump core if they receive a
	  core-dumping signal.	 INHIBIT disables MPI signal-handler
	  registration for core-dumping signals.

	  When MPI_Init() is called, the MPI library attempts to register a
	  signal handler for each signal for which reception causes a core
	  dump. If a signal handler was previously registered, MPI removes the
	  MPI registration and restores the other signal handler for that
	  signal. If no previously-registered handler is present, the MPI
	  handler is invoked if and when the rank receives a core-dumping
	  signal.

	  When the MPI signal handler is invoked, it displays a stack
	  traceback for the first rank entering the handler on each host, and
	  then consults MPI_COREDUMP to determine if a core dump should be
	  produced.

	  Note that process limits on core dump size interact with this
	  setting.  First a process decides to dump core or is inhibited from
	  dumping core based on the MPI_COREDUMP setting. Then "limit
	  coredump" applies to the resulting core dump file(s), if any.

	  Default: FIRST

									Page 8

MPI(1)									MPI(1)

     MPI_COREDUMP_DEBUGGER (Linux only)
	  This variable lets you optionally specify which debugger should be
	  used by MPT to display the stack traceback when your program
	  receives a core-dumping signal.  Set MPI_VERBOSE to have MPT display
	  the debugger command just before it executes it. If the environment
	  variable is not defined, MPT uses the idb debugger.

	  You can specify this variable in any of the following formats:

	       Format			Meaning

	       Basename of a debugger	If you specify idb or gdb, MPT uses
					that debugger, customizing the command
					line argument and debugger commands
					sent to the debugger, as appropriate.

					Note that the program you specify must
					be located in one of the directories
					specified by the PATH environment
					variable in the MPT job. This might be
					different from the PATH variable in
					your interactive sessions.  If you
					receive a message similar to sh: idb:
					command not found in the stack
					traceback, you can use the pathname to
					the debugger (described in the
					following format) to supply a full
					pathname instead.

	       Pathname to a debugger	If you specify a value that contains a
					/, but no spaces, MPT takes the value
					as the pathname to the debugger you
					wish to use. The final four characters
					of the value must be /idb or /gdb.
					Command-line arguments are not
					supplied to the debugger, but debugger
					commands are customized according to
					the debugger specified.	 If you need
					to specify command-line arguments to
					the debugger, use a complete command
					line (described in the following
					format).

	       Complete command line	If the value contains a space, it is
					taken as the complete command line to
					be passed to system(1).	 Up to four
					occurrences of %d in the command line
					are replaced by the process ID of the
					process upon which the debugger should
					be run.	 You will need to arrange for
					debugger commands to be sent to the
					debugger.  The third and fourth

									Page 9

MPI(1)									MPI(1)

					examples below show samples of this.

	  Examples:  (There are four examples here, each of which must be
	  typed all on one line)

	  setenv MPI_COREDUMP_DEBUGGER gdb
	  setenv MPI_COREDUMP_DEBUGGER /my/test/version/of/idb
	  setenv MPI_COREDUMP_DEBUGGER "(echo print my_favorite_variable; echo where; echo quit) | gdb -p %d"
	  setenv MPI_COREDUMP_DEBUGGER '(echo set \$stoponattach = 1; echo attach %d /proc/%d/exe; echo where; echo quit) | /sw/com/intel-compilers/7.1.013/compiler70/ia64/bin/idb | sed -e "s/^/coredump: /"'

	  Default: idb

     MPI_COREDUMP_VERBOSE
	  Instructs mpirun(1) to print information about coredump control and
	  traceback handling.	Notably, a message will be printed if a user-
	  or library-registered signal handler overrides a signal handler
	  which the MPT library would otherwise have installed.	 Output is
	  sent to stderr.

	  Default: Not enabled

     MPI_DEFAULT_SINGLE_COPY_OFF
	  Disables the single-copy mode by default optimization.  This
	  optimization causes transfers of more than 2000 bytes that use
	  MPI_Isend, MPI_Sendrecv, MPI_Alltoall, MPI_Bcast, MPI_Allreduce and
	  MPI_Reduce to use the single-copy mode optimization.	Users of
	  MPI_Send should continue to use the MPI_BUFFER_MAX environment
	  variable to enable single-copy.

	  Default:  Not enabled

     MPI_DIR
	  Sets the working directory on a host. When an mpirun(1) command is
	  issued, the Array Services daemon on the local or distributed node
	  responds by creating a user session and starting the required MPI
	  processes. The user ID for the session is that of the user who
	  invokes mpirun, so this user must be listed in the .rhosts file on
	  the corresponding nodes. By default, the working directory for the
	  session is the user's $HOME directory on each node. You can direct
	  all nodes to a different directory (an NFS directory that is
	  available to all nodes, for example) by setting the MPI_DIR variable
	  to a different directory.

	  Default:  $HOME on the node. If using the -np option of mpirun(1),
	  the default is the current directory.

     MPI_DPLACE_INTEROP_OFF (IRIX systems only)
	  Disables an MPI/dplace interoperability feature available beginning
	  with IRIX 6.5.13.  By setting this variable, you can obtain the
	  behavior of MPI with dplace on older releases of IRIX.

								       Page 10

MPI(1)									MPI(1)

	  Default:  Not enabled

     MPI_DSM_CPULIST
	  Specifies a list of CPUs on which to run an MPI application.	To
	  ensure that processes are linked to CPUs, this variable should be
	  used in conjunction with the MPI_DSM_MUSTRUN variable.

	  For an explanation of the syntax for this environment variable, see
	  the section titled "Using a CPU List."

     MPI_DSM_CPULIST_TYPE
	  Specifies the way in which MPI should interpret the CPU values given
	  by the MPI_DSM_CPULIST variable.  This variable can be set to the
	  following values:

	       Value	      Action

	       hwgraph	      This tells MPI to interpret the CPU numbers
			      designated by the MPI_DSM_CPULIST variable as
			      cpunum values as defined in the hardware
			      graph(see hwgraph(4)). This is the default
			      interpretation when running MPI outside of a
			      cpuset(see cpuset(4)).

	       cpuset	      This tells MPI to interpret the CPU numbers
			      designated by the MPI_DSM_CPULIST variable as
			      relative processors within a cpuset.  This is
			      the default interpretation of this list when MPI
			      is running within a cpuset.  Setting
			      MPI_DSM_CPULIST_TYPE to this value when not
			      running within a cpuset has no effect.

     MPI_DSM_DISTRIBUTE (Linux systems only)
	  Ensures that each MPI process gets a unique CPU and physical memory
	  on the node with which that CPU is associated.  Currently, the CPUs
	  are chosen by simply starting at relative CPU 0 and incrementing
	  until all MPI processes have been forked.  To choose specific CPUs,
	  use the MPI_DSM_CPULIST environment variable.	 This feature is most
	  useful if running on a dedicated system or running within a cpuset.
	  Some batch schedulers including LSF 5.1 will cause
	  MPI_DSM_DISTRIBUTE to be set automatically when using dynamic
	  cpusets.

	  Default:  Not enabled

     MPI_DSM_MUSTRUN
	  Enforces memory locality for MPI processes.  Use of this feature
	  ensures that each MPI process will get a CPU and physical memory on
	  the node to which it was originally assigned.	 This variable has
	  been observed to improve program performance on IRIX systems running
	  release 6.5.7 and earlier, when running a program on a quiet system.
	  With later IRIX releases, under certain circumstances, setting this

								       Page 11

MPI(1)									MPI(1)

	  variable is not necessary. Internally, this feature directs the
	  library to use the process_cpulink(3) function instead of
	  process_mldlink(3) to control memory placement.

	  MPI_DSM_MUSTRUN should not be used when the job is submitted to
	  miser (see miser_submit(1)) because program hangs may result.

	  The process_cpulink(3) function is inherited across process fork(2)
	  or sproc(2).	For this reason, when using mixed MPI/OpenMP
	  applications, it is recommended either that this variable not be
	  set, or that _DSM_MUSTRUN also be set (see pe_environ(5)).

	  On Linux systems, this environment variable has been deprecated and
	  will be removed in a future release. Use the MPI_DSM_DISTRIBUTE
	  environment variable instead.

	  Default:  Not enabled

     MPI_DSM_OFF
	  Turns off nonuniform memory access (NUMA) optimization in the MPI
	  library.

	  Default:  Not enabled

     MPI_DSM_PLACEMENT (IRIX systems only)
	  Specifies the default placement policy to be used for the stack and
	  data segments of an MPI process.  Set this variable to one of the
	  following values:

	       Value	      Action

	       firsttouch     With this policy, IRIX attempts to satisfy
			      requests for new memory pages for stack, data,
			      and heap memory on the node where the requesting
			      process is currently scheduled.

	       fixed	      With this policy, IRIX attempts to satisfy
			      requests for new memory pages for stack, data,
			      and heap memory on the node associated with the
			      memory locality domain (mld) with which an MPI
			      process was linked at job startup. This is the
			      default policy for MPI processes.

	       roundrobin     With this policy, IRIX attempts to satisfy
			      requests for new memory pages in a round robin
			      fashion across all of the nodes associated with
			      the MPI job. It is generally not recommended to
			      use this setting.

	       threadroundrobin
			      This policy is intended for use with hybrid
			      MPI/OpenMP applications only. With this policy,

								       Page 12

MPI(1)									MPI(1)

			      IRIX attempts to satisfy requests for new memory
			      pages for the MPI process stack, data, and heap
			      memory in a roundrobin fashion across the nodes
			      allocated to its OpenMP threads. This placement
			      option might be helpful for large OpenMP/MPI
			      process ratios.  For non-OpenMP applications,
			      this value is ignored.

	  Default:  fixed

     MPI_DSM_PPM
	  Sets the number of MPI processes per memory locality domain (mld).
	  For Origin 2000 systems, values of 1 or 2 are allowed. For Origin
	  3000 and Origin 300 systems, values of 1, 2, or 4 are allowed. On
	  Altix systems, values of 1 or 2 are allowed.

	  Default:  Origin 2000 systems, 2; Origin 3000 and Origin 300
	  systems, 4; Altix systems, 2.

     MPI_DSM_TOPOLOGY (IRIX systems only)
	  Specifies the shape of the set of hardware nodes on which the PE
	  memories are allocated.  Set this variable to one of the following
	  values:

	       Value	      Action

	       cube	      A group of memory nodes that form a perfect
			      hypercube.  The number of processes per host
			      must be a power of 2.  If a perfect hypercube is
			      unavailable, a less restrictive placement will
			      be used.

	       cube_fixed     A group of memory nodes that form a perfect
			      hypercube.  The number of processes per host
			      must be a power of 2.  If a perfect hypercube is
			      unavailable, the placement will fail, disabling
			      NUMA placement.

	       cpucluster     Any group of memory nodes.  The operating system
			      attempts to place the group numbers close to one
			      another, taking into account nodes with disabled
			      processors.  (Default for Irix 6.5.11 and
			      higher).

	       free	      Any group of memory nodes.  The operating system
			      attempts to place the group numbers close to one
			      another.	(Default for Irix 6.5.10 and earler
			      releases).

     MPI_DSM_VERBOSE
	  Instructs mpirun(1) to print information about process placement for
	  jobs running on nonuniform memory access (NUMA) machines (unless

								       Page 13

MPI(1)									MPI(1)

	  MPI_DSM_OFF is also set). Output is sent to stderr.

	  Default:  Not enabled

     MPI_DSM_VERIFY (IRIX systems only)
	  Instructs mpirun(1) to run some diagnostic checks on proper memory
	  placement of MPI data structures at job startup. If errors are
	  found, a diagnostic message is printed to stderr.

	  Default:  Not enabled

     MPI_GM_DEVS (IRIX systems only)
	  Sets the order for opening GM(Myrinet) adapters. The list of devices
	  does not need to be space-delimited (0321 is valid).	In this
	  release, a maximum of 8 adpaters are supported on a single host.

	  Default:  MPI will use all available GM(Myrinet) devices.

     MPI_GM_VERBOSE
	  Setting this variable allows some diagnostic information concerning
	  messaging between processes using GM (Myrinet) to be displayed on
	  stderr.

	  Default:  Not enabled

     MPI_GROUP_MAX
	  Determines the maximum number of groups that can simultaneously
	  exist for any single MPI process.  Use this variable to increase
	  internal default limits. (This variable might be required by
	  standard-compliant programs.)	 MPI generates an error message if
	  this limit (or the default, if not set) is exceeded.

	  Default:  32

     MPI_GSN_DEVS (IRIX 6.5.12 systems or later)
	  Sets the order for opening GSN adapters. The list of devices does
	  not need to be quoted or space-delimited (0123 is valid).

	  Default:  MPI will use all available GSN devices

     MPI_GSN_VERBOSE (IRIX 6.5.12 systems or later)
	  Allows additional MPI initialization information to be printed in
	  the standard output stream. This information contains details about
	  the GSN (ST protocol) OS bypass connections and the GSN adapters
	  that are detected on each of the hosts.

	  Default:  Not enabled

     MPI_MAPPED_HEAP_SIZE (Linux systems only)
	  Sets the new size (in bytes) for the amount of heap that is memory
	  mapped per MPI process.  The default size of the mapped heap is the
	  physical memory available per CPU less the static region size. For

								       Page 14

MPI(1)									MPI(1)

	  more information regarding memory mapping, see MPI_MEMMAP_OFF.

	  Default:  The physical memory available per CPU less the static
	  region size

     MPI_MAPPED_STACK_SIZE (Linux systems only)
	  Sets the new size (in bytes) for the amount of stack that is memory
	  mapped per MPI process.  The default size of the mapped stack is the
	  stack limit size. If the stack is unlimited, the mapped region is
	  set to the physical memory available per CPU. For more information
	  regarding memory mapping, see MPI_MEMMAP_OFF.

	  Default:  The stack limit size

     MPI_MEMMAP_OFF (Linux systems only)
	  Turns off the memory mapping feature.

	  The memory mapping feature provides support for single-copy
	  transfers and MPI-2 one-sided communication on Linux.	 These
	  features are supported for single host MPI jobs and MPI jobs that
	  span partitions. At job startup, MPI uses the xpmem module to map
	  memory from one MPI process onto another.  The mapped areas include
	  the static region, private heap, and stack.

	  Memory mapping is enabled by default on Linux.  To disable it, set
	  the MPI_MEMMAP_OFF environment variable.

	  For memory mapping, the xpmem kernel module must be installed on
	  your system.	The xpmem module is released with the OS.

	  Default: Not enabled

     MPI_MEMMAP_VERBOSE (Linux systems only)
	  Allows MPI to display additional information regarding the memory
	  mapping initialization sequence.  Output is sent to stderr.

	  Default: Not enabled

     MPI_MSG_RETRIES
	  Specifies the number of times the MPI library will try to get a
	  message header, if none are available.  Each MPI message that is
	  sent requires an initial message header.  If one is not available
	  after MPI_MSG_RETRIES, the job will abort.

	  Note that this variable no longer applies to processes on the same
	  host, or when using the GM (Myrinet) protocol. In these cases,
	  message headers are allocated dynamically on an as-needed basis.

	  Default:  500

								       Page 15

MPI(1)									MPI(1)

     MPI_MSGS_MAX
	  This variable can be set to control the total number of message
	  headers that can be allocated.  This allocation applies to messages
	  exchanged between processes on a single host, or between processes
	  on different hosts when using the GM(Myrinet) OS bypass protocol.
	  Note that the initial allocation of memory for message headers is
	  128 Kbytes.

	  Default:  Allow up to 64 Mbytes to be allocated for message headers.
	  If you set this variable, specify the maximum number of message
	  headers.

     MPI_MSGS_PER_HOST
	  Sets the number of message headers to allocate for MPI messages on
	  each MPI host. Space for messages that are destined for a process on
	  a different host is allocated as shared memory on the host on which
	  the sending processes are located. MPI locks these pages in memory.
	  Use the MPI_MSGS_PER_HOST variable to allocate buffer space for
	  interhost messages.

	  Caution:  If you set the memory pool for interhost packets to a
	  large value, you can cause allocation of so much locked memory that
	  total system performance is degraded.

	  The previous description does not apply to processes that use the
	  GM(Myrinet) OS bypass protocol. In this case, message headers are
	  allocated dynamically as needed. See the MPI_MSGS_MAX variable
	  description.

	  Default:  1024 messages

     MPI_MSGS_PER_PROC
	  This variable is effectively obsolete. Message headers are now
	  allocated on an as needed basis for messaging either between
	  processes on the same host, or between processes on different hosts
	  when using the GM (Myrinet) OS bypass protocol.  The new
	  MPI_MSGS_MAX variable can be used to control the total number of
	  message headers that can be allocated.

	  Default:  1024

     MPI_NAP
	  This variable affects the way in which ranks wait for events to
	  occur. For example, when a receive is issued for which there are as
	  yet no matching sends, the receiving rank awaits the matching send
	  issued event.

	  When MPI_NAP is not defined (that is, unsetenv MPI_NAP), the library
	  spins in a tight loop when awaiting events. While this provides the
	  best possible response time when the event occurs, each waiting rank
	  uses CPU time at wall-clock rates until then.	 Leaving MPI_NAP
	  undefined is best if sends and matching receives occur nearly

								       Page 16

MPI(1)									MPI(1)

	  simultaneously.

	  If defined with no value (that is, setenv MPI_NAP), the library
	  makes a system call while waiting, which might yield the CPU to
	  another eligible process that can use it.  If no such process
	  exists, the rank receives control back nearly immediately, and CPU
	  time accrues at near wall-clock rates.  If another process does
	  exist, it is given some CPU time, after which the MPI rank is again
	  given the CPU to test for the event.	This is best if the system is
	  oversubscribed (there are more processes ready to run than there are
	  CPUs).  This option was previously available in MPT, but was not
	  documented.

	  If defined with a positive integer value (for example, setenv
	  MPI_NAP 10), the rank sleeps for that many milliseconds before again
	  testing to determine if an event has occurred.  This dramatically
	  reduces the CPU time that is charged against the rank, and might
	  increase the system's "idle" time.  This setting is best if there is
	  usually a significant time difference between the times that sends
	  and matching receives are posted.

	  Default:  Not applicable - one of the cases above always applies.

     MPI_OPENMP_INTEROP
	  Setting this variable modifies the placement of MPI processes to
	  better accomodate the OpenMP threads associated with each process.
	  For more information, see the section titled Using MPI with OpenMP.

	  NOTE: This option is available only on Origin 300 and Origin 3000
	  servers and Altix systems.

	  Default:  Not enabled

     MPI_REQUEST_MAX
	  Determines the maximum number of nonblocking sends and receives that
	  can simultaneously exist for any single MPI process.	Use this
	  variable to increase internal default limits.	 (This variable might
	  be required by standard-compliant programs.)	MPI generates an error
	  message if this limit (or the default, if not set) is exceeded.

	  Default:  16384

     MPI_SHARED_VERBOSE
	  Setting this variable allows for some diagnostic information
	  concerning messaging within a host to be displayed on stderr.

	  Default:  Not enabled

     MPI_SIGTRAP  (Linux systems only)
	  Specifies if MPT's signal handler should override any existing
	  signal handlers for signals SIGSEGV, SIGQUIT, SIGILL, SIGABRT,
	  SIGBUS, and SIGFPE.  If set to ON, the MPT signal handler will

								       Page 17

MPI(1)									MPI(1)

	  override any pre-existing signal handler for these signals.	If
	  OFF, then the existing signal handlers will remain in effect.

	  These signals are sometimes handled by compiler-language-specific
	  runtime libraries.  In some cases, the signal handler in the runtime
	  library makes inappropriate references to memory-mapped fetchop
	  areas, which may result in a system panic.  This has been observed
	  with Intel's efc 7.x compilers.

	  Default:  ON	(This may change in future releases.)

     MPI_SIGTRAP_VERBOSE (Linux systems only)
	  If set, MPT will display the value of the MPI_SIGTRAP environment
	  variable, and messages about the actions taken if MPT overrides a
	  pre-existing signal handler.	See also MPI_COREDUMP_VERBOSE.

	  Default:  Not enabled

     MPI_SLAVE_DEBUG_ATTACH
	  Specifies the MPI process to be debugged. If you set
	  MPI_SLAVE_DEBUG_ATTACH to N, the MPI process with rank N prints a
	  message during program startup, describing how to attach to it from
	  another window using the dbx debugger on IRIX or the gdb or idb
	  debugger on Linux.  The message includes the number of seconds you
	  have to attach the debugger to process N.  If you fail to attach
	  before the time expires, the process continues.

     MPI_STATIC_NO_MAP (IRIX systems only)
	  Disables cross mapping of static memory between MPI processes.  This
	  variable can be set to reduce the significant MPI job startup and
	  shutdown time that can be observed for jobs involving more than 512
	  processors on a single IRIX host.  Note that setting this shell
	  variable disables certain internal MPI optimizations and also
	  restricts the usage of MPI-2 one-sided functions.  For more
	  information, see the MPI_Win man page.

	  Default:  Not enabled

     MPI_STATS
	  Enables printing of MPI internal statistics.	Each MPI process
	  prints statistics about the amount of data sent with MPI calls
	  during the MPI_Finalize process.  Data is sent to stderr.  To prefix
	  the statistics messages with the MPI rank, use the -p option on the
	  mpirun command. For additional information, see the MPI_SGI_stats
	  man page.

	  NOTE: Because the statistics-collection code is not thread-safe,
	  this variable should not be set if the program uses threads.

	  Default:  Not enabled

								       Page 18

MPI(1)									MPI(1)

     MPI_TYPE_DEPTH
	  Sets the maximum number of nesting levels for derived data types.
	  (Might be required by standard-compliant programs.) The
	  MPI_TYPE_DEPTH variable limits the maximum depth of derived data
	  types that an application can create.	 MPI generates an error
	  message if this limit (or the default, if not set) is exceeded.

	  Default:  8 levels

     MPI_TYPE_MAX
	  Determines the maximum number of data types that can simultaneously
	  exist for any single MPI process. Use this variable to increase
	  internal default limits.  (This variable might be required by
	  standard-compliant programs.)	 MPI generates an error message if
	  this limit (or the default, if not set) is exceeded.

	  Default:  1024

     MPI_UNBUFFERED_STDIO
	  Normally, mpirun line-buffers output received from the MPI processes
	  on both the stdout and stderr standard IO streams.  This prevents
	  lines of text from different processes from possibly being merged
	  into one line, and allows use of the mpirun -prefix option.

	  Of course, there is a limit to the amount of buffer space that
	  mpirun has available (currently, about 8,100 characters can appear
	  between new line characters per stream per process).	If more
	  characters are emitted before a new line character, the MPI program
	  will abort with an error message.

	  Setting the MPI_UNBUFFERED_STDIO environment variable disables this
	  buffering.  This is useful, for example, when a program's rank 0
	  emits a series of periods over time to indicate progress of the
	  program. With buffering, the entire line of periods will be output
	  only when the new line character is seen.  Without buffering, each
	  period will be immediately displayed as soon as mpirun receives it
	  from the MPI program.	  (Note that the MPI program still needs to
	  call fflush(3) or FLUSH(101) to flush the stdout buffer from the
	  application code.)

	  Additionally, setting MPI_UNBUFFERED_STDIO allows an MPI program
	  that emits very long output lines to execute correctly.

	  NOTE: If MPI_UNBUFFERED_STDIO is set, the mpirun -prefix option is
	  ignored.

	  Default:  Not set

     MPI_UNIVERSE  (Linux systems only)
	  When running MPI applications on partitioned Altix systems which use
	  the MPI_Comm_spawn and MPI_Comm_spawn_multiple functions, it may be
	  necessary to explicitly specify the partitions on which additional

								       Page 19

MPI(1)									MPI(1)

	  MPI processes may be launched.  The MPI_UNIVERSE environment
	  variable may be used for this purpose.

	  For more information, see the section titled "Launching Spawn
	  Capable Jobs on Altix Partitioned Systems" from the mpirun man page.

	  Default:  Not set

     MPI_UNIVERSE_SIZE	(Linux systems only)
	  When running MPI applications on partitioned Altix systems which use
	  the MPI_Comm_spawn and MPI_Comm_spawn_multiple functions users can
	  now specify MPI_UNIVERSE_SIZE instead of using the -up option on the
	  mpirun command.

	  For more information, see the section titled "Launching Spawn
	  Capable Jobs on Altix Partitioned Systems" from the mpirun man page.

	  Default:  Not set

     MPI_USE_GM	 (IRIX systems only)
	  Requires the MPI library to use the Myrinet (GM protocol) OS bypass
	  driver as the interconnect when running across multiple hosts or
	  running with multiple binaries.  If a GM connection cannot be
	  established among all hosts in the MPI job, the job is terminated.

	  For more information, see the section titled "Default Interconnect
	  Selection."

	  Default:  Not set

     MPI_USE_GSN (IRIX 6.5.12 systems or later)
	  Requires the MPI library to use the GSN (ST protocol) OS bypass
	  driver as the interconnect when running across multiple hosts or
	  running with multiple binaries.  If a GSN connection cannot be
	  established among all hosts in the MPI job, the job is terminated.

	  GSN imposes a limit of one MPI process using GSN per CPU on a
	  system. For example, on a 128-CPU system, you can run multiple MPI
	  jobs, as long as the total number of MPI processes using the GSN
	  bypass does not exceed 128.

	  Once the maximum allowed MPI processes using GSN is reached,
	  subsequent MPI jobs return an error to the user output, as in the
	  following example:

		 MPI: Could not connect all processes to GSN adapters. The maximum
		      number of GSN adapter connections per system is normally equal
		      to the number of CPUs on the system.

	  If there are a few CPUs still available, but not enough to satisfy
	  the entire MPI job, the error will still be issued and the MPI job

								       Page 20

MPI(1)									MPI(1)

	  terminated.

	  For more information, see the section titled "Default Interconnect
	  Selection."

	  Default:  Not set

     MPI_USE_TCP
	  Requires the MPI library to use the TCP/IP driver as the
	  interconnect when running across multiple hosts or running with
	  multiple binaries.

	  For more information, see the section titled "Default Interconnect
	  Selection."

	  Default:  Not set

     MPI_USE_XPMEM  (IRIX 6.5.13 systems or later and Linux systems)
	  Requires the MPI library to use the XPMEM driver as the interconnect
	  when running across multiple hosts or running with multiple
	  binaries.  This driver allows MPI processes running on one partition
	  to communicate with MPI processes on a different partition via the
	  NUMAlink network.  The NUMAlink network is powered by block transfer
	  engines (BTEs).  BTE data transfers do not require processor
	  resources.

	  For IRIX, the XPMEM (cross partition) device driver is available
	  only on Origin 3000 and Origin 300 systems running IRIX 6.5.13 or
	  greater.

	  NOTE: Due to possible MPI program hangs, you should not run MPI
	  across partitions using the XPMEM driver on IRIX versions 6.5.13,
	  6.5.14, or 6.5.15.  This problem has been resolved in IRIX version
	  6.5.16.

	  For Linux, the XPMEM device driver requires the xpmem kernel module
	  to be installed.  The xpmem module is released with the OS.

	  If all of the hosts specified on the mpirun command do not reside in
	  the same partitioned system, you can select one additional
	  interconnect via the MPI_USE variables.  MPI communication between
	  partitions will go through the XPMEM driver, and communication
	  between non-partitioned hosts will go through the second
	  interconnect.

	  For more information, see the section titled "Default Interconnect
	  Selection."

	  Default:  Not set

								       Page 21

MPI(1)									MPI(1)

     MPI_XPMEM_ON (IRIX 6.5.15 systems or later)
	  Enables the XPMEM single-copy enhancements for processes residing on
	  the same host.

	  The XPMEM enhancements allow single-copy transfers for basic
	  predefined MPI data types from any sender data location, including
	  the stack and private heap.  Without enabling XPMEM, single-copy is
	  allowed only from data residing in the symmetric data, symmetric
	  heap, or global heap.

	  Both the MPI_XPMEM_ON and MPI_BUFFER_MAX variables must be set to
	  enable these enhancements.  Both are disabled by default.

	  If the following additional conditions are met, the block transfer
	  engine (BTE) is invoked instead of bcopy, to provide increased
	  bandwidth:

	       *   Send and receive buffers are cache-aligned.

	       *   Amount of data to transfer is greater than or equal to the
		   MPI_XPMEM_THRESHOLD value.

	  NOTE:	 The XPMEM driver does not support checkpoint/restart at this
	  time. If you enable these XPMEM enhancements, you will not be able
	  to checkpoint and restart your MPI job.

	  The XPMEM single-copy enhancements require an Origin 3000 and Origin
	  300 servers running IRIX release 6.5.15 or greater.

	  Default: Not set

     MPI_XPMEM_THRESHOLD (IRIX 6.5.15 systems or later)
	  Specifies a minimum message size, in bytes, for which single-copy
	  messages between processes residing on the same host will be
	  transferred via the BTE, instead of bcopy.  The following conditions
	  must exist before the BTE transfer is invoked:

	       *   Single-copy mode is enabled (MPI_BUFFER_MAX).

	       *   XPMEM single-copy enhancements are enabled (MPI_XPMEM_ON).

	       *   Send and receive buffers are cache-aligned.

	       *   Amount of data to transfer is greater than or equal to the
		   MPI_XPMEM_THRESHOLD value.

	  Default: 8192

     MPI_XPMEM_VERBOSE
	  Setting this variable allows additional MPI diagnostic information
	  to be printed in the standard output stream. This information
	  contains details about the XPMEM connections.

								       Page 22

MPI(1)									MPI(1)

	  Default:  Not enabled

     PAGESIZE_DATA (IRIX systems only)
	  Specifies the desired page size in kilobytes for program data areas.
	  On Origin series systems, supported values include 16, 64, 256,
	  1024, and 4096.  Specified values must be integer.

	  NOTE:	 Setting MPI_DSM_OFF  disables the ability to set the data
	  pagesize via this shell variable.

	  Default:  Not enabled

     PAGESIZE_STACK (IRIX systems only)
	  Specifies the desired page size in kilobytes for program stack
	  areas.  On Origin series systems, supported values include 16, 64,
	  256, 1024, and 4096.	Specified values must be integer.

	  NOTE:	 Setting MPI_DSM_OFF  disables the ability to set the data
	  page size via this shell variable.

	  Default:  Not enabled

     SMA_GLOBAL_ALLOC (IRIX systems only)
	  Activates the LIBSMA based global heap facility.  This variable is
	  used by 64-bit MPI applications for certain internal optimizations,
	  as well as support for the MPI_Alloc_mem function. For additional
	  details, see the intro_shmem(3) man page.

	  Default: Not enabled

     SMA_GLOBAL_HEAP_SIZE (IRIX systems only)
	  For 64-bit applications, specifies the per process size of the
	  LIBSMA global heap in bytes.

	  Default: 33554432 bytes

   Using a CPU List
     You can manually select CPUs to use for an MPI application by setting the
     MPI_DSM_CPULIST shell variable.  This setting is treated as a comma
     and/or hyphen delineated ordered list, specifying a mapping of MPI
     processes to CPUs.	 If running across multiple hosts or when using
     multiple executables, the per host and per executable components of the
     CPU list are delineated by colons. The shepherd process(es) and mpirun
     are not included in this list.  This feature is not compatible with job
     migration features available in IRIX.

     Examples when launching an MPI job with the following syntax:

		    mpirun -np 3 a.out

								       Page 23

MPI(1)									MPI(1)

	       Value	      CPU Assignment

	       8,16,32	      Place three MPI processes on CPUs 8, 16, and 32.

	       32,16,8	      Place the MPI process rank zero on CPU 32, one
			      on 16, and two on CPU 8.

     Examples when launching an MPI job with the following syntax:

		    mpirun -np 16 a.out

	       Value	      CPU Assignment

	       8-15,32-39     Place the MPI processes 0 through 7 on CPUs 8 to
			      15.  Place the MPI processes 8 through 15 on
			      CPUs 32 to 39.

	       39-32,8-15     Place the MPI processes 0 through 7 on CPUs 39
			      to 32.  Place the MPI processes 8 through 15 on
			      CPUs 8 to 15.

     Example when launching an MPI job with the following syntax:

		    mpirun host1,host2 8 a.out

	       Value	      CPU Assignment

	       8-15:16-23     Place the MPI processes 0 through 7 on the first
			      host on CPUs 8 through 15.  Place MPI processes
			      8 through 15 on CPUs 16 to 23 on the second
			      host.

     Example when launching an MPI job with the following syntax:

		    mpirun host1,host2 8 a.out : host2 8 b.out

	       Value	      CPU Assignment

	       8-15:16-23:28-35
			      Place the MPI processes 0 through 7 running
			      application a.out on the first host on CPUs 8
			      through 15.  Place MPI processes 8 through 15
			      running a.out on CPUs 16 to 23 on the second
			      host.   Place MPI processes 16 to 23 running
			      b.out on CPUS 28 to 35 on the second host.

     Note that the process rank is the MPI_COMM_WORLD rank.  The
     interpretation of the CPU values specified in the MPI_DSM_CPULIST depends
     on whether the MPI job is being run within a cpuset.  If the job is run

								       Page 24

MPI(1)									MPI(1)

     outside of a cpuset, the CPUs specify cpunum values given in the hardware
     graph (hwgraph(4)).  When running within a cpuset, the default behavior
     is to interpret the CPU values as relative processor numbers within the
     cpuset.  To specify cpunum values instead, you can use the
     MPI_DSM_CPULIST_TYPE shell variable.

     On Linux systems, the CPU values are always treated as relative processor
     numbers within the cpuset.	 It is assumed that the system will always
     have a default (unnamed) cpuset consisting of the entire system of
     available processors and nodes.

     The number of processors specified should equal the number of MPI
     processes (excluding the shepherd process) that will be used.  The number
     of colon delineated parts of the list must equal the number of hosts or
     executables used for the MPI job. If an error occurs in processing the
     CPU list, the default placement policy is used.  If the number of
     specified processors is smaller than the total number of MPI processes,
     only a subset of the MPI processes will be placed on the specified
     processors.  For example, if four processors are specified using the
     MPI_DSM_CPULIST variable, but five MPI processes are started, the last
     MPI process will not be attached to a processor.

     This feature should not be used with MPI jobs running in spawn capable
     mode.

   Using MPI with OpenMP
     Hybrid MPI/OpenMP applications might require special memory placement
     features to operate efficiently on ccNUMA Origin and Altix servers.  A
     method for realizing this memory placement is available.  The basic idea
     is to space out the MPI processes to accomodate the OpenMP threads
     associated with each MPI process.	In addition, assuming a particular
     ordering of library init code (see the DSO(5) man page), procedures are
     employed to insure that the OpenMP threads remain close to the parent MPI
     process. This type of placement has been found to improve the performance
     of some hybrid applications significantly when more than four OpenMP
     threads are used by each MPI process.

     To take partial advantage of this placement option, the following
     requirements must be met:

	       *   The user must set the MPI_OPENMP_INTEROP shell variable
		   when running the application.

	       *   On IRIX systems, the user must use a MIPSpro compiler and
		   the -mp option to compile the application.  This placement
		   option is not available with other compilers.

	       *   The user must run the application on an Origin 300, Origin
		   3000, or Altix series server.

								       Page 25

MPI(1)									MPI(1)

     To take full advantage of this placement option on IRIX systems, the user
     must be able to link the application such that the libmpi.so init code is
     run before the libmp.so init code.	 This is done by linking the
     MPI/OpenMP application as follows:

	  cc -64 -mp compute_mp.c -lmp -lmpi
	  f77 -64 -mp compute_mp.f -lmp -lmpi
	  f90 -64 -mp compute_mp.f -lmp -lmpi
	  CC -64 -mp compute_mp.C -lmp -lmpi++ -lmpi

     This linkage order insures that the libmpi.so init runs procedures for
     restricting the placement of OpenMP threads before the libmp.so init is
     run.  Note that this is not the default linkage if only the -mp option is
     specified on the link line.

     On IRIX systems, you can use an additional memory placement feature for
     hybrid MPI/OpenMP applications by using the MPI_DSM_PLACEMENT shell
     variable. Specification of a threadroundrobin policy results in the
     parent MPI process stack, data, and heap memory segments being spread
     across the nodes on which the child OpenMP threads are running.  For more
     information, see the ENVIRONMENT VARIABLES section of this man page.

     MPI reserves nodes for this hybrid placement model based on the number of
     MPI processes and the number of OpenMP threads per process, rounded up to
     the nearest multiple of 4 on IRIX systems and 2 on Altix systems.	For
     instance, on IRIX systems, if 6 OpenMP threads per MPI process are going
     to be used for a 4 MPI process job, MPI will request a placement for 32
     (4 X 8) CPUs on the host machine.	You should take this into account when
     requesting resources in a batch environment or when using cpusets.	 In
     this implementation, it is assumed that all MPI processes start with the
     same number of OpenMP threads, as specified by the OMP_NUM_THREADS or
     equivalent shell variable at job startup.

     NOTE:  This placement is not recommended when setting _DSM_PPM to	a
     non-default value (for more information, see pe_environ(5)).   This
     placement is also not recommended when running on a host with partially
     populated nodes.  Also, on IRIX systems, if you  are using
     MPI_DSM_MUSTRUN, it is important to also set _DSM_MUSTRUN to properly
     schedule the OpenMP  threads.

     On Linux systems, the OpenMP threads are not actually pinned to specific
     CPUs but are limited to the set of CPUs near the MPI rank.	 Actual
     pinning of the threads will be supported in a future release.

SEE ALSO
     mpirun(1), shmem_intro(1)

     arrayd(1M)

     MPI_Buffer_attach(3), MPI_Buffer_detach(3), MPI_Init(3), MPI_IO(3)

								       Page 26

MPI(1)									MPI(1)

     arrayd.conf(4)

     array_services(5)

     For more information about using MPI, including optimization, see the
     Message Passing Toolkit: MPI Programmer's Manual. You can access this
     manual online at http://techpubs.sgi.com.

     Man pages exist for every MPI subroutine and function, as well as for the
     mpirun(1) command.	 Additional online information is available at
     http://www.mcs.anl.gov/mpi, including a hypertext version of the
     standard, information on other libraries that use MPI, and pointers to
     other MPI resources.

								       Page 27

[top]

List of man pages available for IRIX

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net