prof_intro man page on OSF1

prof_intro man page on OSF1

Printed from http://www.polarhome.com/service/man/?qf=prof_intro&af=0&tf=2&of=OSF1

prof_intro(1)							 prof_intro(1)

NAME
       prof_intro  -  Introduction  to application profilers, profiling, opti‐
       mization, and performance analysis

DESCRIPTION
       Tru64 UNIX supports four approaches to performance  improvement:	 Auto‐
       matic  and  profile-directed  optimizations. For example: pixie -update
       a.out data/* cc -non_shared  -O3	 -spike	 -feedback  a.out  *.c	Manual
       design  and  code optimizations. For example: hiprof -all -display pro‐
       gram data/* | more hiprof -flat -all -display  program  data/*  |  more
       uprofile -heavy program data/* | more Minimizing system-resource usage.
       For example: third -display program data/* |  more  Verifying  signifi‐
       cance  of test cases. For example: pixie -testcoverage program data/* |
       more

       One approach might be enough, but more might be beneficial if no single
       approach	 addresses all aspects of a program's performance. The follow‐
       ing sections describe each approach and the  tools  provided  by	 Tru64
       UNIX to support them.

AUTOMATIC AND PROFILE-DIRECTED OPTIMIZATIONS
   Techniques
       Automatic   and	 profile-directed   optimizations   are	 the  simplest
       approaches to improving application performance.

       Some degree of automatic optimization can be achieved by using the com‐
       piler's and linker's optimization options. These can help in the gener‐
       ation of minimal instruction sequences that make best use  of  the  CPU
       architecture and cache memory.

       However,	 the  compiler	and  linker can improve their optimizations if
       they are given information on  which  instructions  are	executed  most
       often  when  the program is run with its normal input data and environ‐
       ment. While the default optimizations  give  improved  performance  for
       most  common  situations, the optimizers can do even better if they can
       tune the program in favor of the heavily used instruction sequences  as
       determined from a sample run.

       Tru64  UNIX  helps  you provide the optimizers with this information on
       processing hot-spots by allowing a profiler's results to	 be  fed  back
       into  a	recompilation.	This customized, profile-directed optimization
       can be used in conjunction with automatic optimization.

   Tools and Examples
       The cc compiler command's automatic optimization options	 are  selected
       with  -O,  -fast, -inline, -spike, and other related options. See cc(1)
       for details and Chapter 10 of the Programmer's Guide for more  informa‐
       tion on the many options and tradeoffs available.

       For example, this command selects a high degree of optimization in both
       the compiler and the linker: cc -non_shared -O3 -spike *.c

       The pixie profiler provides profile information that the	 cc  command's
       -feedback  and -spike options can use to tune the generated instruction
       sequences to the demands placed on the program by  particular  sets  of
       input data.

       The steps, shown in the following example, consist of (1) preparing the
       program for profile-directed optimization, (2) creating an instrumented
       version	of the program and running it to collect profiling statistics,
       and (3) feeding that information back to the  compiler  and  linker  to
       help  them  optimize  the executable code: rm -f program cc -non_shared
       -feedback  program  -o  program	-O3  *.c  pixie	 -update  program   cc
       -non_shared -feedback program -o program -O3 -spike *.c

       To  apply  profile-directed optimizations to shared libraries, generate
       profile data with an exerciser program, and  store  it  in  the	shared
       library	prior  to  recompiling	with that feedback. For example: rm -f
       libexample.so cc -feedback libexample.so -o libexample.so  -shared  -O3
       lib*.c  cc  -o  exerciser  exerciser.c  -L. -lexample pixie -L. -incobj
       libexample.so -run exerciser prof -pixie	 -update  libexample.so	 exer‐
       ciser.Counts cc -spike -feedback libexample.so -o libexample.so -shared
       -O3 lib*.c

MANUAL DESIGN AND CODE OPTIMIZATIONS
   Techniques
       The effectiveness of the automatic optimizations	 described  previously
       is limited by the efficiency of the algorithms that the program uses. A
       program's performance can be further improved  by  manually  optimizing
       its  algorithms	and  data  structures.	Such optimizations may include
       reducing complexity from N-squared to log-N, avoiding copying of	 data,
       and  reducing the amount of data used. It may also extend to tuning the
       algorithm to the architecture of the particular machine it will be  run
       on  -  for  example,  processing large arrays in small blocks such that
       each block remains in the data cache for all processing, instead of the
       whole array being read into the cache for each processing phase.

       Tru64 UNIX supports manual optimization with its profiling tools, which
       identify the parts of the application that use most CPU resources - CPU
       cycles,	cache misses, and so on. By evaluating different profiles of a
       program, you can identify which parts  of  the  program	use  most  CPU
       resources and you can then redesign or recode algorithms in those parts
       to use less resources. The profiles also make this exercise more	 cost-
       effective by helping you to focus on the most demanding code instead of
       the least demanding code.

   Tools and Examples
       .SS(a) CPU-Time Profiling with Call-Graph

       A call-graph profile shows how much CPU time is used by each procedure,
       and how much is used by all of the other procedures that it calls. This
       can show which phases or subsystems in a	 program  spend	 most  of  the
       total  CPU  time,  which can help in gaining a general understanding of
       the program's performance.

       The hiprof profiler instruments the program and records	a  call	 graph
       while  the  instrumented program executes. The hiprof profiler does not
       require that the program be compiled in any  particular	way,  but  the
       names  of  local (for example, static) procedures will be hidden if the
       cc command's default -g0 option was used, and procedures will be hidden
       if they are inlined. For example: cc -g1 -O2 -o program *.c hiprof -all
       -display program data/* | more

       By default, hiprof uses a low-frequency sampling technique. It can pro‐
       file  all  of  the code executed by the program, including all selected
       libraries, though its call graph excludes procedures in threads-related
       system libraries. It can also provide detailed profiles at the level of
       source lines or machine instructions.

       For non-threaded programs, hiprof can alternatively count the number of
       machine cycles used or page faults that occur during program execution.
       In these modes, the CPU time or	page-faults  count  reported  for  the
       instrumented  routines  includes	 that  for the uninstrumented routines
       that they call. This can summarize the costs and	 reduce	 the  run-time
       overhead,  but  note that the machine-cycle counter wraps if no instru‐
       mented procedure is called at least every few seconds.

       The cc compiler's -pg  option  uses  the	 same  sampling	 technique  as
       hiprof.	This technique is supported in a very similar way on different
       vendors' UNIX systems. For example: cc  -g1  -O2	 -pg  -o  program  *.c
       ./program data/* gprof program gmon.out | more

       However, hiprof may be preferred because the -pg option has some disad‐
       vantages: The program needs to  be  specially  compiled	with  the  -pg
       option.	Only a few of the archive libraries that are provided with the
       operating system were compiled to generate a call-graph profile.	  Only
       the executable is profiled. Shared libraries are not.

       The  optional  dxprof  command  provides a graphical display of various
       call-graph profiles.

       .SS(b) CPU-Time/Event Profiles for Sourcelines/Instructions

       A good performance-improvement strategy may  start  with	 a  procedure-
       level  profile  of the whole program (perhaps with a call graph too, to
       give the big picture), but it will often progress to detailed profiling
       of individual source-lines and instructions.

       The  uprofile  profiler uses a sampling technique to generate a profile
       of the CPU time or events such as cache	misses	associated  with  each
       procedure or source-line or instruction. The sampling frequency depends
       on the processor type and the statistic being sampled, but for CPU time
       it  is on the order of a millisecond.  The profiler achieves this with‐
       out modifying the target program at all by using hardware counters that
       are  built  into	 the  Alpha CPU.  Running the uprofile command with no
       arguments yields a list of all the kinds of events  that	 a  particular
       machine	can  profile, depending on the nature of its architecture. The
       default is to profile machine cycles, resulting in a CPU-time  profile.
       The  following  example	shows  how  to display a profile of the source
       lines that experienced the top 90% of data  cache  misses  on  an  EV56
       Alpha:  cc  -g1 -O2 -o program *.c uprofile -h -q 90cum% dcacheldmisses
       program data/* | more

       This technique has the advantage of very low run-time  overhead.	 Also,
       the detailed information it can provide on the costs of executing indi‐
       vidual instructions or source lines is essential in identifying exactly
       which operation in a procedure is slowing down the program.

       The  disadvantages  of  uprofile	 are that only executables can be pro‐
       filed, the results can be skewed unless all processors  have  the  same
       cycle  speed,  only one program can be profiled with the hardware coun‐
       ters at one time, threads can not be  profiled  individually,  and  the
       Alpha  EV6 architecture's execution of instructions out of sequence can
       significantly reduce the accuracy of fine-grained profiles.

       If hiprof's -flat option is used, its default  sampling	technique  can
       provide the same fine-grain profiles (CPU time only) and low intrusive‐
       ness as uprofile. Also, it is accurate even with mixed processor	 cycle
       speeds,	and it can profile all of a program's shared libraries as well
       as its individual threads. For example: hiprof -flat  -h	 -all  program
       data/* | more

       The  cc compiler's -p option uses the same low-frequency sampling tech‐
       nique as hiprof. It is common to many UNIX systems, and (on Tru64 UNIX)
       it  is  able to profile all the shared libraries used by a program. The
       program needs to be relinked with the -p option, but it does  not  need
       to  be recompiled from source, so long as the original compilation used
       an acceptable debug level, such as the -g1 compiler option.  For	 exam‐
       ple,  to profile individual instructions of a program: cc -p -o program
       *.o setenv PROFFLAGS '-all -stride 1' ./program data/* prof  -all  -asm
       -quit 5% program mon.out | more

       The  pixie tool can also profile source lines and instructions (includ‐
       ing shared libraries),  but  note  that	when  it  displays  counts  of
       “Cycles”, it is actually reporting counts of instructions executed, not
       machine cycles. For example: cc -g1  -O2	 -o  program  *.c  pixie  -all
       -lines -quit 20 program data/* | more

       The  optional  dxprof  command provides a graphical display of profiles
       collected by either pixie or the cc command's -p option.

MINIMIZING SYSTEM RESOURCE USAGE
   Techniques
       The preceding techniques can improve an application's use of  just  the
       CPU.  Further  performance  improvements	 can  be made by improving the
       efficiency with which the application uses the other components of  the
       computer	 system:  heap memory, disk files, network connections, and so
       on.

       As with CPU profiling, the first phase of a resource usage  improvement
       process is to monitor how much memory, data I/O and disk space, elapsed
       time, and so on, is used. Then the throughput of the  computer  can  be
       increased  or  tuned  in	 ways  that help the program, or the program's
       design can be tuned to make better use of the computer  resources  that
       are  available. For example: Reduce the size of the data files that the
       program reads and writes.  Use memory-map files instead of regular I/O.
       Allocate memory incrementally on demand instead of allocating at start-
       up the maximum that could be required.  Fix  heap  leaks,  and  do  not
       leave allocated memory unused.  See the System Configuration and Tuning
       manual for a broader discussion of analyzing and tuning	a  Tru64  UNIX
       system.

   Tools and Examples
       .SS(a) System Monitors

       The  Tru64  UNIX base system commands ps u, swapon -s, and vmstat 3 can
       show the currently active processes' usage of system resources such  as
       CPU  time, physical and virtual memory, swap space, page faults, and so
       on.

       The optional pview command provides  a  graphical  display  of  similar
       information for the processes that comprise an application.

       The  time commands provided by the Tru64 UNIX system and command shells
       provide an easy way to measure the total elapsed time and CPU time  for
       a program and its descendants.

       The collect tool is an optional, low overhead, system performance moni‐
       tor.

       Many other related commands are described in the	 System	 Configuration
       and Tuning manual.

       .SS(b) Heap Memory Analyzers

       The  third  command  reports heap memory leaks in a program, by instru‐
       menting it with the Third Degree memory-usage checker, running it,  and
       displaying  a log of leaks detected at program exit. For example: third
       -display program data/* | more

       If you are interested only in leaks occurring during the normal	opera‐
       tion  of	 the  program, not during startup or shutdown, you can specify
       additional places to check for previously unreported leaks.  For	 exam‐
       ple,  the  pre-shutdown	leak  report will give this information: third
       -display -after startup -before shutdown program data/* | more

       Third Degree can also detect various kinds of bugs that may be  affect‐
       ing  the	 correctness or performance of a program. See the Programmer's
       Guide for further details on debugging and leak-detection.

       The optional dxheap command  provides  a	 graphical  display  of	 Third
       Degree's heap and bug reports.

       The  optional mview command provides a graphical analysis of heap usage
       over time. This view of a program's heap can clearly show the  presence
       (if  not	 the  cause) of significant leaks or other undesireable trends
       such as wasted memory.

VERIFYING SIGNIFICANCE OF TEST CASES
   Techniques
       Most of the preceding profiling techniques are effective	 only  if  you
       profile and optimize or tune the parts of the program that are executed
       in the scenarios whose performance is important. Careful	 selection  of
       the  data  used for the profiled test runs is often sufficient, but you
       may want a quantitative analysis of which code was and was not executed
       in a given set of tests.

   Tools and Examples
       The  pixie  command's -t[estcoverage] option reports lines of code that
       were not executed in a given test run. For example:  pixie  -t  program
       data/* | more

       Conversely,  pixie's  -p[rocedure],  -h[eavy],  and -a[sm] options show
       which procedures, source lines, and instructions were executed.

       If multiple test runs are needed to build up a  typical	scenario,  the
       prof  command  can  be  run  separately on a set of profile data files:
       pixie -pids program  ./program.pixie  data1/*  ./program.pixie  data2/*
       prof -pixie -t program program.Counts.*

SEE ALSO
       Optimizing:   cc(1), spike(1)

       Profiling:   hiprof(1), pixie(1), third(1), uprofile(1)

       System Monitoring:   collect(8), ps(1), swapon(1), vmstat(1)

       Graphical  tools,  available from the Graphical Program Analysis subset
       of the Tru64 UNIX Associated Products installation media, or as part of
       the  Enterprise Toolkit for Windows/NT desktops with Microsoft's Visual
       Studio 97: dxheap(1), dxprof(1), mview(1), pview(1)

       Programmer's Guide

       System Configuration and Tuning

								 prof_intro(1)

[top]

                             _         _         _ 
                            | |       | |       | |     
                            | |       | |       | |     
                         __ | | __ __ | | __ __ | | __  
                         \ \| |/ / \ \| |/ / \ \| |/ /  
                          \ \ / /   \ \ / /   \ \ / /   
                           \   /     \   /     \   /    
                            \_/       \_/       \_/

More information is available in HTML format for server OSF1

List of man pages available for OSF1

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]

Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................

Vote for polarhome