The experiments show that Parallware provides performance speedups between 2x and 40x on the Titan supercomputer at ORNL.
LAPLACE: Computation of the Laplace transformation using an iterative method.
SAXPY: Vector sum and multiplication operations in simple precision.
DAXPY: Vector sum and multiplication operations in double precision.
SPMV: Sparse matrix-vector multiplication (using CRS format)
ATMUX: Sparse transposed-matrix vector multiplication (using CRS format)
MATVEC: Multiplication of a 2D matrix by a vector.
MANDEL: Computes Mandelbrot fractals.
PI: Approximation of the value of the number pi using the integration method.
PRIME: Sum of prime numbers in a given range (unbalanced computation).
MATMUL: Multiplication of two 2D matrices.
COULOMB: Computes the electric potential generated by a set of charged particles.
Success Story NAS Parallel Benchmarks (NPB)
The NAS Parallel Benchmarks (NPB) are a set of benchmarks targeting performance evaluation of highly parallel supercomputers. They are developed and maintained by the NASA Advanced Supercomputing (NAS) Division (formerly the NASA Numerical Aerodynamic Simulation Program) based at the NASA Ames Research Center. The NPB are valuable in that they are rigorous, offer a wide range of test sizes, and in contrast to other synthetic benchmarks, they simulate computation algorithms that are close to real-life applications. The benchmark NPB EP is an embarrassingly parallel program where two-dimensional statistics are accumulated from a large number of Gaussian pseudo random numbers which are generated using the Marsaglia polar method.
The extraction of parallelism in the source code of NPB EP is a great technical challenge mainly because it performs irregular computations through subscripted subscripts. In addition, it has complex control flows arising from branches and procedure calls that may potentially lead to unpredictable race-conditions at run-time. NPB provides a reference OpenMP parallel implementation that handles race-conditions in NPB EP through array privatization and array reduction operations. Each thread is provided with a private copy of the array, and computes partial results on its array copy. At the end, the private copies of each thread are reduced into a single array. Parallware supports several parallelization strategies that apply to the source code of NPB EP. In particular, it supports the parallelization strategy used in the OpenMP parallel implementation of NPB EP.