1. Separate I/O instructions in differente loops within OpenMP parallel regionsIn general, I/O degrades the performance of OpenMP-enabled parallel codes as it augments the time that the processor is iddle. Thus, this rewriting rule proposes to separate compute-intensive instructions and I/O instructions in different loops within an OpenMP parallel region. This code transformation is usually called loop fission in the literature specialized on optimizing and parallelizing compilers. Consider the following example (from now on Example 1) that consists of a loop that contains both arithmetic instructions to calculate A[i] and I/O instructions to output the values of A[i] on standard output. The transformation is simple. The rewritten source code shown below (from now on Example 2) has two loops: the first loop computes the values of A[i], and the second loop outputs those values using the printf instruction. In this graph, we compare the performance of Example 1 and Example 2 in terms of speedup. The workload of the loop has been adjusted to be high enough to provide significant speedups for this simple example source code. The graph shows higher speed-ups for the source code with the I/O isolated in a separate loop. It confirms experimentally that this rewriting rule may provide significant speedups.
2. Store the results of piecewise-defined functions in scalar variables within OpenMP parallel regionsThe second one focusses on simulation codes that compute the result of a mathematical function where the result value is given by a different formula in different ranges of the input variables. In mathematics, they are typically called piecewise-defined functions. In simulation programs piecewise-defined functions are typically evaluated for all the points of a mesh. The source code is typically written as the source code below. The piecewise-defined function returns B[i]+1 for values of i less than SIZE/2, and returns B[i]-1 otherwise. The big challenge for software that assists programmers is that these programs present complex control flows at execution time, and that the range of array elements that can be modified in each flow path is potentially different. This situation is extremely difficult to be managed automatically by these software tools. In particular, this situations are a challenge for parallelizing compilers. We propose an alternative coding style that reduces the complexity from the point of view of parallelizing compilers. In the source code shown below, we have created an auxiliary temporary scalar variable aux that stores the value returned in each piece of the piecewise-defined function. From the point of view of the programmer, the transformation is again simple. Define a new temporary scalar variable aux at the beginning of the loop body, and set the value of aux in each branch of the if instruction. At the end, assign the value of aux to the corresponding element A[i] for the loop to compute the same results.
ConclusionsIt’s a good practice to follow these recommendation from the beginning when doing parallel programming in C with OpenMP. In addition, following these programming suggestions you will naturally increase the effectiveness of Parallelware to automatically discover parallelism in your sequential code and auto-generating equivalent OpenMP-enabled parallel code. Have you already seen the live demo of Parallelware? If not, just remember that Parallelware guarantees correctness and performance in your OpenMP-enabled parallel code. When you are starting to develop your code, it might be beneficial to compile it with Parallelware as it automatically checks many of the best programming practices in C presented in this articles. If you like this post, please share!
Make Code Parallel
Parallelware Trainer is an interactive, real-time code editor with features that facilitate the learning, usage, and implementation of parallel programming by understanding how and why sections of code can be parallelized.