Parallelware Analyzer provides a set of tools covering the most challenging steps of your parallel development workflow. Join the Parallelware Analyzer Early Access Program to get access to the tools for free.Â

In this post, we present the pwcheck tool which provides a static code analyzer to ensure that your code is defect-free and compliant with the best practices for the development of parallel code. To do so, it will analyze your code looking for the defects and recommendations detailed in the following website: https://www.appentra.com/knowledge/checks/
We will showcase an example of the detection of the defect PWD001: Invalid OpenMP multithreading datascoping and related recommendations that help prevent it using Parallelware Analyzer 0.10. A basic understanding of OpenMP should suffice. Linux commands are used but there should be no problem reproducing them in other operating systems.
An initial parallelization of PI
The following code computes an approximation of the PI number:
#include <math.h> #include <omp.h> #include <stdio.h> int main(int argc, char *argv[]) { const unsigned long iters = 300000000; const double realPiValue = 3.141592653589793238; double start = omp_get_wtime(); double x, sum = 0.0; for (int i = 0; i < iters; i++) { x = (i + 0.5) / iters; sum += sqrt(1 - x * x); } double result = 4.0 / iters * sum; printf("pi = %.10f\n", realPiValue); printf("result = %.10f\n", result); printf("error = %.1e\n", fabs(result - realPiValue)); printf("time = %.6f secs\n", omp_get_wtime() - start); return 0; }
Insert the following OpenMP directive right above the loop to parallelize the algorithm using a scalar reduction:
#pragma omp parallel for reduction(+: sum) for (int i = 0; i < iters; i++)
Build with OpenMP support. For example, using gcc through the following command:

If you run the program for different numbers of threads you will see that the error increases notably along with the number of threads:

Note that the error goes from 7.7e-13 for a single thread up to 1.5e-04 for 8 threads, yielding a result of 3.1417 which is a pretty bad approximation of the PI number. Something must be wrong with the parallelization and indeed it is. As we will see in the following sections, there is a hidden race condition in the code.
Looking for defects and recommendations in the parallel code
Let’s run pwcheck over the source file to see if it has something to report:

As you can see, pwcheck issues three recommendations (PWR002, PWR004 and PWR005) and reports one defect (PWD001). Since it constitutes a bug, let’s start by looking into the reported defect.
PWD001: Invalid OpenMP multithreading datascoping
The report of this defect states that variable x has an incorrect OpenMP data scoping: it is being shared by all threads when each thread should have a private copy.
Since in each iteration the x value is first computed and then read, it is very likely that one thread updates the value right after a second thread has also done so but before it has read the value, yielding incorrect results in the second thread. The following table illustrates how two loop iterations executing in parallel in two threads produce an erroneous result. In this case, it is as if the same iteration had been executed twice.
Iteration | Thread A | Thread B | x value | |
1 | x = (i + 0.5) / iters; | X1 | x is updated with value X1 | |
2 | x = (i + 0.5) / iters; | X2 | x is updated with value X2 | |
1 | sum += sqrt(1 – x * x); | X2 | X2 value is read from x instead of X1! | |
2 | sum += sqrt(1 – x * x); | X2 | X2 value is read again from x |
This is a race condition. In this case, it can be removed by privatizing x so that instead of writing to the same variable, each thread uses a private one.
As instructed by the pwcheck tool, to fix the defect you need to change the data scoping of x from shared (by default) to private. You can do so by adding a private(x) clause to the directive.
#pragma omp parallel for reduction(+: sum) private(x)
Run again the fixed code
Once you have updated the code, compile it and experiment again for different numbers of threads:

Although not exactly the same, now the error is always within the same order of magnitude.
Why does the error still vary?
Floating-point addition is not associative because of its limited precision, which leads to round-off errors. When the loop is executed in parallel, its iterations are partitioned across the number of available threads. Although the exact distribution depends on the OpenMP scheduler, each thread will compute its assigned iterations into its own copy of the sum variable. Once all threads have finished computing their loop iterations, the addition of all the private copies is stored in the sum shared variable. Given that floating-point is not associative, the exact result will depend both on the order of the iterations within threads and the order in which the thread results are added.
How could this defect have been prevented?
You have fixed the code but we haven’t got into the reason why the variable x had an erroneous data scoping. To see why this happened, let’s take a look at the reported recommendations PWR002, PWR004 and PWR005.
You will see that by following these recommendations you could could have avoided the problem in two ways: either by explicitly managing the OpenMP data scoping correctly by privatizing x, or through code refactoring by moving the declaration of x into the loop.
PWR004: Declare OpenMP scoping for all variables
The previous PWD001 defect reported that the data scoping for x was ‘shared’. Where has this been defined?
If you pay attention to the PWR004 reported description, it states that the variable x has been given an implicit datascoping. The implicitly assigned scope depends on the OpenMP implementation and in this case it happens to be shared by default. This is the cause of the PWD001 defect.
By following this recommendation, you make the data scoping explicit and not dependent on the OpenMP specification, which can be difficult to interpret in some situations. Thus, you should apply the suggestion and add a private(x) clause to the directive – which is exactly what we just did to fix the defect in the previous section.
PWR005: Disable default OpenMP scoping
So it looks like the root of the problem was an incorrect implicit data scoping for variable x. How can this be prevented? PWR005 recommends that you enforce explicit data scoping by adding a default(none) clause.
This is a good practice since it makes the compiler raise an error for each variable which isn’t listed in a data scoping clause. Thus, had we added default(none), the compiler would have asked us to assign a data scoping for x instead of implicitly – and wrongly – sharing it:
#pragma omp parallel for reduction(+: sum) default(none)

PWR002: Declare scalar variables in the smallest possible scope
If you examine the code carefully, you will notice that x is declared outside the loop but it is only used inside. PWR002 suggests that you move the declaration of x to its innermost possible scope which is the loop body.
Moving the declaration of variables to the innermost possible scope may prevent errors. That was the case here: since the variable was declared outside and implicit data scoping was enabled, it was assigned a shared data scoping by OpenMP. This scope was wrong, resulting in a software defect causing erroneous results.
Follow the suggestion and move the declaration of x into the loop body. You will need to remove the private(x) clause if you have it.
#pragma omp parallel for reduction(+: sum) for (int i = 0; i < iters; i++) { double x = (i + 0.5) / iters;
Is there something else that Parallelware Analyzer can help me with?
This post introduced the pwcheck tool but you could have saved yourself some time – and troubles – by using Parallelware Analyzer’s parallelization tool pwdirectives to create the parallel version. Give it a try using the following command:

Note that the -i argument stands for interactive editing which means that it will apply the changes to the file directly. If you prefer to keep the original file and generate a new one, use -o instead (e.g. pwdirectives -o pi_omp.c pi.c:13).
The parallelized version contains the following code:
#include <math.h> #include <omp.h> #include <stdio.h> int main(int argc, char *argv[]) { const unsigned long iters = 300000000; const double realPiValue = 3.141592653589793238; double start = omp_get_wtime(); double x, sum = 0.0; #pragma omp parallel default(none) shared(sum) { #pragma omp for reduction(+: sum) private(x) schedule(auto) for (int i = 0; i < iters; i++) { x = (i + 0.5) / iters; sum += sqrt(1 - x * x); } } // end parallel double result = 4.0 / iters * sum; printf("pi = %.10f\n", realPiValue); printf("result = %.10f\n", result); printf("error = %.1e\n", fabs(result - realPiValue)); printf("time = %.6f secs\n", omp_get_wtime() - start); return 0; }
As you can see, it has not only properly privatized the variable x but also added the default(none) clause to prevent errors due to implicit datascoping.
More examples
Parallelware Analyzer comes with examples for each supported defect and recommendation. You can find them in the examples/checks directory inside Parallelware Analyzer installation. Pass it to pwcheck to view reports of all defects and recommendations. Example codes are also available online at our check-examples GitHub repository.
Join the Early Access Program to try out Paralleware Analyzer!
Leave a Reply