GPTL - General Purpose Timing Library

(with optional PAPI interface)

Download the latest source code here


Description

GPTL is a library to instrument C, C++, and Fortran codes for performance analysis and profiling. The instrumentation can be inserted manually by the user wherever they wish, and/or it can be done automatically by the compiler at function entry and exit points if the application being profiled is built with GNU, Clang, Intel, PGI, or AIX compilers (Note: AIX has not been tested in quite some time). To auto-instrument an application, add -finstrument-functions (GNU, Intel, Clang) or -Minstrument:functions (PGI) or -qdebug=function_trace (AIX) to the compile and link flags of the source files to be profiled. In order to get correct behavior from the auto-profiling feature, often it is necessary to add the -rdynamic link flag to the application being profiled. Otherwise profiled function names may be reported only as addresses, which is not very useful.

Automatic instrumentation of a number of MPI routines is also possible, utilizing the PMPI profiling layer provided by most MPI distributions. In this case no special compiler flags are necessary, and profiles are obtained with zero changes to application source files. See Example 6 for further details.

Here is a portion of GPTL printout after running the HPCC benchmark with compiler-based automatic instrumentation enabled:

Stats for thread 0: Called Recurse Wallclock max min FP_OPS e6_/_sec CI total 1 - 64.021 64.021 64.021 3.50e+08 5.47 7.20e-02 HPCC_Init 11 10 0.157 0.157 0.000 95799 0.61 8.90e-02 * HPL_pdinfo 120 118 0.019 0.018 0.000 96996 4.99 8.56e-02 * HPL_all_reduce 7 - 0.043 0.036 0.000 448 0.01 1.03e-02 * HPL_broadcast 21 - 0.041 0.036 0.000 126 0.00 6.72e-03 HPL_pdlamch 2 - 0.004 0.004 0.000 94248 21.21 1.13e-01 * HPL_fprintf 240 120 0.001 0.000 0.000 1200 0.93 6.67e-03 HPCC_InputFileInit 41 40 0.001 0.001 0.000 194 0.27 8.45e-03 ReadInts 2 - 0.000 0.000 0.000 12 3.00 1.61e-02 PTRANS 21 20 22.667 22.667 0.000 4.19e+07 1.85 3.19e-02 MaxMem 5 4 0.000 0.000 0.000 796 2.70 1.79e-02 * iceil_ 132 - 0.000 0.000 0.000 792 2.88 1.75e-02 * ilcm_ 14 - 0.000 0.000 0.000 84 2.71 1.71e-02 param_dump 18 12 0.000 0.000 0.000 84 0.82 7.05e-03 Cblacs_get 5 - 0.000 0.000 0.000 30 1.43 1.67e-02 Cblacs_gridmap 35 30 0.005 0.001 0.000 225 0.05 1.79e-03 * Cblacs_pinfo 7 1 0.000 0.000 0.000 40 3.08 1.54e-02 * Cblacs_gridinfo 60 50 0.000 0.000 0.000 260 2.28 2.10e-02 Cigsum2d 5 - 0.088 0.047 0.000 165 0.00 6.37e-03 pdmatgen 20 - 21.497 1.213 0.942 4.00e+07 1.86 3.08e-02 * numroc_ 96 - 0.000 0.000 0.000 576 2.87 1.69e-02 * setran_ 25 - 0.000 0.000 0.000 150 2.94 1.72e-02 * pdrand 3.7e+06 2e+06 15.509 0.041 0.000 1.72e+07 1.11 2.24e-02 xjumpm_ 57506 57326 0.219 0.030 0.000 230384 1.05 2.66e-02 jumpit_ 60180 40120 0.214 0.021 0.000 280840 1.32 2.18e-02 slboot_ 5 - 0.000 0.000 0.000 30 1.30 1.01e-02 Cblacs_barrier 10 5 0.481 0.167 0.000 50 0.00 3.26e-03 sltimer_ 10 - 0.000 0.000 0.000 614 3.05 1.90e-02 * dwalltime00 15 - 0.000 0.000 0.000 150 2.54 2.57e-02 * dcputime00 15 - 0.000 0.000 0.000 373 3.06 1.91e-02 * HPL_ptimer_cputime 17 - 0.000 0.000 0.000 170 2.66 2.29e-02 pdtrans 14 9 0.124 0.045 0.000 573505 4.61 1.36e-01 Cblacs_dSendrecv 12 8 0.115 0.042 0.000 56 0.00 2.24e-03 pdmatcmp 5 - 0.448 0.295 0.003 1.29e+06 2.87 2.94e-01 * HPL_daxpy 2596 - 0.008 0.000 0.000 1.34e+06 177.06 4.40e-01 * HPL_idamax 2966 - 0.007 0.000 0.000 767291 104.75 4.15e-01 ...
Function names on the left of the output are indented to indicate their parent, and depth in the call tree. An asterisk next to an entry means it has more than one parent (see Example 2 for further details). Other entries in this output show the number of invocations, number of recursive invocations, wallclock timing statistics, and PAPI-based information. In this example, HPL_daxpy produced 1.34e6 floating point operations, 177.06 MFlops/sec, and had a computational intensity (floating point ops per memory reference) of 0.415.

If the PAPI library is installed on the target platform, GPTL can be used to access all available PAPI events. To count single-precision floating point operations for example, one need only add a call that looks like:

    ret = GPTLsetoption (PAPI_SP_OPS, 1);
    
The second argument "1" in the above call means "enable". Any non-zero integer means "enable", and a zero means "disable". Multiple GPTL or PAPI options can be specified with additional calls to GPTLsetoption(). The man pages provided with the distribution describe the full API specification. The interface is identical for both Fortran and C/C++ codes, except for the case-insensitivity of Fortran.

Calls to GPTLstart() and GPTLstop() can be nested to an arbitrary depth. As shown above, GPTL handles nested regions by presenting output in an indented fashion. The example also shows how auto-instrumentation can be used to easily produce a dynamic call tree of the application being profiled, where region names correspond to function entry and exit points.


Download and Installation


Examples

These pages contain simple codes which illustrate the use of some features of GPTL. Most examples were run on a Linux x86 using GNU compilers. The examples also assume that environment variable $GPTL contains the path to where the GPTL library was installed. Depending on how the libary was configured and built, the compilation and linking commands in the examples may require modification. Examples include needing to link with -lunwind for auto-profiled codes if GPTL was built without --disable-libunwind; Needing to compile with MPI wrappers and/or link with -lmpi if GPTL was built with --disable-shared. In most cases the causes of compilation problems or unsatisfied externals in building the tests should be obvious.

Example 1 is a manually-instrumented threaded Fortran code.

Example 2 is a C code compiled with gcc's auto-instrumentation hooks to print a dynamic call tree.

Example 3 demonstrates the use of GPTLpr_summary() to obtain a statistical summary of timing statistics across OpenMP threads and MPI tasks.

Example 4 is an auto-instrumented C++ code. Issues related to in-line constructors are illustrated.

Example 5 is a Fortran code which uses gptlprocess_namelist() and an associated namelist file to set GPTL options.

Example 6 is a Fortran code which utilizes the ENABLE_PMPI option to automatically time various MPI calls and print the average number of bytes transferred.

Example 7 is a Fortran code which utilizes the functions GPTLstart_handle() and GPTLstop_handle(), which avoid much of the table lookup overhead of their siblings GPTLstart() and GPTLstop().

Example 8 is a C code which employs GPTL's capability to report memory usage during a code being profiled. Memory usage is checked on calls to both manually and auto-instrumented calls to start and stop routines, so the name of the routine responsible for memory growth is included in the printout.


Bugs


Bug Reports

Please email me bug reports and/or feature requests (jmrosinski AT gmail DOT com).

Author

GPTL was written by Jim Rosinski. Previous work was done on the library while employed at UCAR, NOAA/ESRL, ORNL, and SiCortex (now defunct). Thanks to Ed Hartnett (currently at NOAA) for his initial work autoconf-izing GPTL. Also contributors to the library Pat Worley, Jim Edwards, John Dennis, Chuck Bardeen, and others.

Copyright

This software is Open Source. See the file COPYING in the main directory for restrictions on its use.
Example 1