In the second article of the series on latest best practices in parallel programming in C with the OpenMP standard, we focus our attention on source code rewriting rules that more clearly exhibit the parallelism available in the sequential source code. The idea is simple: match the programming pattern in your source code and rewrite following the instructions given below. We present two source code transformations that address I/O instructions and piecewise-defined functions.
1. Separate I/O instructions in differente loops within OpenMP parallel regions
In general, I/O degrades the performance of OpenMP-enabled parallel codes as it augments the time that the processor is iddle. Thus, this rewriting rule proposes to separate compute-intensive instructions and I/O instructions in different loops within an OpenMP parallel region. This code transformation is usually called loop fission in the literature specialized on optimizing and parallelizing compilers.
Consider the following example (from now on Example 1) that consists of a loop that contains both arithmetic instructions to calculate A[i]Â and I/O instructions to output the values of A[i] on standard output.
1 2 3 4 5 6 7 8 | void parallware__root(int *A, int *B){ for(int i=0;i<SIZE;i++) { A[i]=B[i]+f(); printf("%d",A[i]); } } |
The transformation is simple. The rewritten source code shown below (from now on Example 2) has two loops: the first loop computes the values of A[i], and the second loop outputs those values using the printf instruction.
1 2 3 4 5 6 7 8 9 10 11 12 13 | void parallware__root(int *A, int *B){ for(int i=0;i<SIZE;i++) { A[i]=B[i]+f(); } for(int i=0;i<SIZE;i++) { printf("%d",A[i]); } } |
In this graph, we compare the performance of Example 1 and Example 2 in terms of speedup. The workload of the loop has been adjusted to be high enough to provide significant speedups for this simple example source code.
The graph shows higher speed-ups for the source code with the I/O isolated in a separate loop. It confirms experimentally that this rewriting rule may provide significant speedups.
2. Store the results of piecewise-defined functions in scalar variables within OpenMP parallel regions
The second one focusses on simulation codes that compute the result of a mathematical function where the result value is given by a different formula in different ranges of the input variables. In mathematics, they are typically called piecewise-defined functions.
In simulation programs piecewise-defined functions are typically evaluated for all the points of a mesh. The source code is typically written as the source code below. The piecewise-defined function returns B[i]+1 for values of i less than SIZE/2, and returns B[i]-1 otherwise.
1 2 3 4 5 6 7 8 9 10 11 12 | int parallware_root(int *A,int *B){ for(int i=0;i<SIZE;i++) { if(i<SIZE/2){ A[i]=B[i]+1; } else{ A[i]=B[i]-1; } } return 0; } |
The big challenge for software that assists programmers is that these programs present complex control flows at execution time, and that the range of array elements that can be modified in each flow path is potentially different. This situation is extremely difficult to be managed automatically by these software tools. In particular, this situations are a challenge for parallelizing compilers.
We propose an alternative coding style that reduces the complexity from the point of view of parallelizing compilers. In the source code shown below, we have created an auxiliary temporary scalar variable aux that stores the value returned in each piece of the piecewise-defined function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | int parallware_root(int *A,int *B){ for(int i=0;i<SIZE;i++) { int aux; if(i<SIZE/2){ aux=B[i]+1; } else{ aux=B[i]-1; } A[i]=aux; } return 0; } |
From the point of view of the programmer, the transformation is again simple. Define a new temporary scalar variable aux at the beginning of the loop body, and set the value of aux in each branch of the if instruction. At the end, assign the value of aux to the corresponding element A[i] for the loop to compute the same results.
Conclusions
It’s a good practice to follow these recommendation from the beginning when doing parallel programming in C with OpenMP. In addition, following these programming suggestions you will naturally increase the effectiveness of Parallelware to discover parallelism in your sequential code and generating equivalent OpenMP-enabled parallel code.
Have you already seen how Parallelware Trainer works? If not, just remember that Parallelware guarantees correctness and performance in your OpenMP-enabled parallel code. When you are starting to develop your code, it might be beneficial to compile it with Parallelware as it checks many of the best programming practices in C presented in this articles.
If you like this post, please share!
Try, Buy, or Learn about Parallelware Tools.
Make Code Parallel
Parallelware Trainer is an interactive, real-time code editor with features that facilitate the learning, usage, and implementation of parallel programming by understanding how and why sections of code can be parallelized.
Leave a Reply