Example 6 Example 8 Back to GPTL home page

Example 7: Using GPTLstart_handle() and GPTLstop_handle()

This is a threaded Fortran code which uses functions GPTLstart_handle() and GPTLstop_handle(). The purpose of these functions is to lower GPTL overhead by maintaining in user-space the value of the hash function for the region of interest, avoiding the overhead of hash table lookup every time the start or stop functions are called. On initial invocation, a zero input value of the "handle" argument is a flag which tells GPTL to compute the hash value and store its contents for later use by the library.

The hash value for any given GPTL region is invariant across threads. So per-thread copies of the handle are not needed. Also, these functions can be freely mixed with their GPTLstart() and GPTLstop() counterparts, as shown in the example below.

Though not done in the example below, GPTLinit_handle() can be called prior to GPTLstart_handle() and GPTLstop_handle() to obtain the handle prior to invoking start/stop functions. This approach has the advantage that the overhead of generating the handle is removed even on the first call to GPTLstart_handle.

handle.F90:

program handle use gptl implicit none integer :: handle1 ! Hash index integer :: n integer :: ret ret = gptlinitialize () ret = gptlstart ('total') ! Time the entire code ! IMPORTANT: Start with a zero handle value so GPTLstart_handle knows to initialize ! Instead of setting handle1=0 here we could also do: ! ret=gptlinit_handle('loop', handle1) ! This latter approach is actually preferable to avoid one-time multiple threads ! computing the handle value inside the threaded loop. handle1 = 0 !$OMP PARALLEL DO SHARED (handle1) do n=1,1000000 ! First call the "_handle" versions of start and stop for the region ret = gptlstart_handle ('loop', handle1) ret = gptlstop_handle ('loop', handle1) ! Now call the standard start and stop functions for the same region ret = gptlstart ('loop') ret = gptlstop ('loop') end do ret = gptlstop ('total') ! Time the entire code ret = gptlpr (0) stop end program handle
Now compile and run:
% gfortran -fopenmp -o handle handle.F90 -I${GPTL}/include -I${GPTL}/lib -lgptlf -lgptl % ./handle
Here's the important output from the timing.0 file that got created by the call to gptlpr(0):
Total overhead of 1 GPTL start or GPTLstop call=1.08e-07 seconds Components are as follows: Fortran layer: 2.0e-09 = 1.9% of total Get thread number: 2.0e-08 = 18.5% of total Generate hash index: 3.1e-08 = 28.7% of total Find hashtable entry: 2.2e-08 = 20.4% of total Underlying timing routine: 3.3e-08 = 30.6% of total ... Stats for thread 0: Called Recurse Wallclock max min self_OH parent_OH total 1 - 0.159 0.159 0.159 0.000 0.000 loop 500000 - 0.045 1.40e-05 0.00e+00 0.018 0.091 Overhead sum = 0.108 wallclock seconds Total calls = 500001 ... Same stats sorted by timer for threaded regions: Thd Called Recurse Wallclock max min self_OH parent_OH 000 loop 500000 - 0.045 1.40e-05 0.00e+00 0.018 0.091 001 loop 500000 - 0.046 2.50e-05 0.00e+00 0.018 0.091 002 loop 500000 - 0.048 8.60e-05 0.00e+00 0.018 0.091 003 loop 500000 - 0.049 3.30e-03 0.00e+00 0.018 0.091 SUM loop 2.0e+06 - 0.189 3.30e-03 0.00e+00 0.070 0.362

Explanation of the above output

Only a single region named "loop" was timed. It was called a total of 2 million times across 4 threads. One million for the loop induction variable, and another factor of two for the two different flavors of GPTLstart() and GPTLstop(). These different flavors each accumulated time into the same reported timer ("loop").

It is worth noting that the reported overhead assumes that only GPTLstart() and GPTLstop() were called. This estimate can be further refined in this example by taking the reported 28.7% of overhead that is due to generating the hash index, multiplying it by 0.5 (since half of the start/stop calls used the "_handle" GPTL routines which don't need to generate hash indices), and subtracting that fraction from the 0.108 seconds reported overhead to a new overhead estimate of 0.092 seconds.

Note that the reported overhead was very high relative to the cost of the work being timed. This is understandable considering that no real work is being done between GPTL "start" and "stop" calls.


Example 6 Example 8 Back to GPTL home page