Example 2 Example 4 Back to GPTL home page

Example 3: GPTLsummary

This hybrid OpenMP/MPI code demonstrates the use of summary routine GPTLpr_summary(). It simulates variable work load by sleeping some amount of time depending on rank and thread number. Here we show only the output--the code is available in tests/global.c. Since only 2 threads were involved, the code was modified from the committed version to sleep some number seconds rather than milliseconds.

Compile and link, then run with 2 threads and 8 MPI tasks. In this example we modified the sleep time to be seconds rather than milliseconds to make the output more easily understood:

% cd tests
% make global  
% env OMP_NUM_THREADS=2 mpiexec -n 8 ./global
Output file timing.summary was created by a call to GPTLpr_summary(MPI_COMM_WORLD).

timing.summary:

Total ranks in communicator=8 nthreads on rank 0=2 'N' used for mean, std. dev. calcs.: 'ncalls'/'nthreads' 'ncalls': number of times the region was invoked across tasks and threads. 'nranks': number of ranks which invoked the region. mean, std. dev: computed using per-rank max time across all threads on each rank wallmax and wallmin: max, min time across tasks and threads. name ncalls nranks mean_time std_dev wallmax (rank thread) wallmin (rank thread) total 8 8 7.376 3.021 9.001 ( 1 0) 2.001 ( 7 0) nranks-iam+mythread 16 8 5.500 2.449 9.000 ( 0 1) 1.000 ( 7 0) 1-5_iam 5 5 3.000 1.581 5.000 ( 5 0) 1.000 ( 1 0)
In this example iam is the MPI rank and mythread is the thread number. The output shows that sleeping nranks-iam+mythread has a max time of 9 seconds on rank 0, thread 1, an a min time of 1 second on rank 7 thread 0. Mean and standard deviation stats are also printed. The other region, 1-5_iam, is not threaded and only MPI ranks 1 through 5 participate. Max time is on the highest rank participating (5 seconds on rank 5), and min time is on the lowest rank participating (1 second on rank 1).
Example 2 Example 4 Back to GPTL home page