In the first article of the series on latest best practises in parallel programming in C, we pay attention to current features and limitations of compilers that support the OpenMP standard. We present the following two case studies. First, we suggest to declare program variables at the beginning of the smallest loop scope, instead of declaring them at the beginning of the program or function/procedure. Second, we study how to overcome that struct variables cannot be passed as arguments to OpenMP directives and clauses.
1. Declare program variables at the innermost loop scope
Modern programming languages allow to declare program variables at the beginning of the function/procedure, but also in the middle of the source code, for instance. From the point of view of parallelization, we are very interested in loops because they typically are the most time-consuming parts of the program. Thus, we address the following questions:
- When is it valid to declare a program variable within a loop scope, i.e., at the beginning of a loop body?
- What are the advantages of this programming practice for the parallelization of sequential codes with OpenMP?
For illustrative purposes, consider the source code of the function parallware__root() shown below. It mainly consists of a loop for_i that computes the sum of a set of values given by auxiliary variables aux and matrix. The OpenMP-enabled parallel implementation computes a parallel reduction operation on variable reduction, privatizing the auxiliary variables aux and matrix.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | int parallware__root(){ int aux,reduction; int matrix[SIZE]; int i, j, k; reduction=0; #pragma omp parallel for private(aux,matrix,i,j,k) reduction(+:reduction) for(i=0;i<N_ITER;i++){ aux=initialize(); for(j=0;j<SIZE;j++){ matrix[j]=aux; } for(k=0;k<SIZE;k++){ aux+=matrix[k]; } reduction+=aux; } return reduction; } |
In large real programs, there may be a large list of privatizable variables that may be difficult to maintain. So, how can we simplify the OpenMP pragmas and avoid programming errors when deciding what variables have to be declared as private within the parallel region?
The answer is: Declare loop-level temporary variables within loop scopes, i.e., at the beginning of the loop body. A variable (either scalar, array,…) is said to be temporary within a loop if it is fully initialized at the beginning of each loop iteration, and there is not any use of the variable between the loop entry and the initialization of the variable. In the example source code shown above, the variables  j, k, aux and matrix are temporary in the loop for_i. The equivalent OpenMP-enabled source code shown below declares temporaries within the loop body, which leads to simpler OpenMP pragmas because loop-level temporaries are assumed to be thread-private.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | int parallware__root(){ int reduction; reduction=0; #pragma omp parallel for reduction(+:reduction) for(int i=0;i<N_ITER;i++){ int matrix[SIZE]; int aux=initialize(); for(int j=0;j<SIZE;j++){ matrix[j]=aux; } for(int k=0;k<SIZE;k++){ aux+=matrix[k]; } reduction+=aux; } return reduction; } |
Finally note that in the last example, an OpenMP clause PRIVATE(i,j,k,aux,matrix) is not necessary and, indeed, it is not valid because the variables don’t exist yet and OpenMP compiler do not compile the source code successfully.
2. Avoid the use of struct variables in OpenMP clauses
OpenMP does not support a field of a struct in a clause. Thus, if the struct variable is a loop-level temporary you can declare it inside of the corresponding loop scope. Otherwise, it is a good programming practice to declare each struct field as an independent variable.
In the following example source code we can see a struct variable named a declared inside the external loop for_i. The OpenMP code compiles successfully.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | typedef struct { int x; int y; } point; int parallware__root(){ int sum=0; #pragma omp parallel for reduction(+:sum) for(int i=0;i<SIZE;i++){ point a; a.x=initializex(); a.y=initializey(); sum=sum+a.x+a.y; } return sum; } |
In the following example, the struct variable a is declared at the beginning of the procedure, outside the loop for_i. Within the loop for_i the field a.x is private in the scope of the loop. However, there is also a reduction on the field a.y, which is not loop-level temporary and thus cannot be privatized. Thus, this source code raises a compilation error due to lack of support for struct variables in OpenMP clauses.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | typedef struct { int x; int y; } point; point parallware__root(){ point a; #pragma omp parallel for reduction(+:a.y) private(a.x) for(int i=0;i<SIZE;i++){ a.x=initializex(); a.y= a.y + a.x; } return a; } |
A valid solution is to declared independent variables for struct fields and use them within the loop body. Such an implementation is shown below and compiles successfully as OpenMP clauses are free of struct variables.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | typedef struct { int x; int y; } point; point parallware__root(){ point a; int a_x; int a_y; #pragma omp parallel for reduction(+:a_y) private(a_x) for(int i=0;i<SIZE;i++){ a_x=initializex(); a_y= a_y + a_x; } a.x=a_x; a.y=a_y; return a; } |
Conclusions
It’s a good practice to follow these recommendation from the beginning when doing parallel programming in C with OpenMP. In addition, following these programming suggestions you will naturally increase the effectiveness of Parallelware to discover parallelism in your sequential code and generating equivalent OpenMP-enabled parallel code.
Have you already seen the live demo of Parallelware? If not, just remember that Parallelware guarantees correctness and performance in your OpenMP-enabled parallel code. When you are starting to develop your code, it might be beneficial to compile it with Parallelware as it checks many of the best programming practices in C presented in this articles.
If you like this post, please share!
Try, Buy, or Learn about Parallelware Tools.
Make Code Parallel
Parallelware Trainer is an interactive, real-time code editor with features that facilitate the learning, usage, and implementation of parallel programming by understanding how and why sections of code can be parallelized.
Leave a Reply