Compile and link, then run with 2 threads and 8 MPI tasks. In this example we modified the sleep time to be seconds rather than milliseconds to make the output more easily understood:
% cd tests % make global % env OMP_NUM_THREADS=2 mpiexec -n 8 ./globalOutput file timing.summary was created by a call to GPTLpr_summary(MPI_COMM_WORLD).
timing.summary:
In this example iam is the MPI rank and mythread is the thread number. The output shows that sleeping nranks-iam+mythread has a max time of 9 seconds on rank 0, thread 1, an a min time of 1 second on rank 7 thread 0. Mean and standard deviation stats are also printed. The other region, 1-5_iam, is not threaded and only MPI ranks 1 through 5 participate. Max time is on the highest rank participating (5 seconds on rank 5), and min time is on the lowest rank participating (1 second on rank 1).Total ranks in communicator=8 nthreads on rank 0=2 'N' used for mean, std. dev. calcs.: 'ncalls'/'nthreads' 'ncalls': number of times the region was invoked across tasks and threads. 'nranks': number of ranks which invoked the region. mean, std. dev: computed using per-rank max time across all threads on each rank wallmax and wallmin: max, min time across tasks and threads. name ncalls nranks mean_time std_dev wallmax (rank thread) wallmin (rank thread) total 8 8 7.376 3.021 9.001 ( 1 0) 2.001 ( 7 0) nranks-iam+mythread 16 8 5.500 2.449 9.000 ( 0 1) 1.000 ( 7 0) 1-5_iam 5 5 3.000 1.581 5.000 ( 5 0) 1.000 ( 1 0)