Parallware: Novel LLVM-Based Software Technology to Assist in Parallelization of Scientific Codes with OpenMP and OpenACC.
Parallware is a new technology for the static analysis of programs that overcomes the limitations of classical dependence analysis which is the method that current tools use to extract parallelism in scientific codes. This classical approach builds systems of mathematical equations, and finding solutions to these equations allows to identify pairs of memory references that might lead to race conditions during parallel execution. Solving these systems takes long execution times and needs user intervention by providing many hints about the code.
Based on the production-grade LLVM compiler infrastructure, Parallware uses a fast, extensible hierarchical classification engine to address dependence analysis, and thus discover parallelism. It splits the code into small domain-independent kernels, and each kernel is classified with respect to a taxonomy of parallel design patterns (e.g., scalar reduction, sparse reduction, parallel prefix) that has been defined through 13+ years working on parallelization of numerical simulation programs. Parallel design patterns specify how to parallelize the program, preserving correctness and maximizing performance (i.e., minimizing parallel overhead due to synchronization, privatization, etc). Finally, Parallware implements the best parallelization strategy by annotating the source code with the most appropriate OpenMP & OpenACC directives.
This technology is based on more than ten years of R+D in the area of advanced compilation techniques for detection of parallelism and generation of parallel-equivalent code for multi-core and many-core computing systems.
The key technical features are:
_New hierarchical classification engine that overcomes the technical limitations of classical dependence analysis. For the detection of parallel design patterns, the hierarchical classification engine is combined with many compiler static analyses, such as static single assignment, array access patterns, array access ranges, privatization, data scoping, aliasing, symbolic computations.
_Discover coarse-grain parallelism in the largest loops that appear in the program.
_Support for programs with sparse computations with indirect array accesses and control flow not known at compile time. Thus, Parallware technology covers well known fields such as finite elements, computational electromagnetics, and sparse algebra microkernels.
_Parallware technology learns from experience: The effectiveness of the hierarchical classification engine is trained with examples from well-known benchmark suites (e.g., NAS Parallel Benchmarks, CORAL benchmarks, SpecACCEL). Later, Parallware technology applies such knowledge to discover parallelism in new codes that have not been analyzed before.
Matureness of the Parallware technology
Parallware technology learns from experience, and succeeds in applying such knowledge to discover parallelism in new codes that have not been analyzed before.The development of Parallware is driven by codes available in the well-known benchmark suites NAS Parallel Benchmarks, CORAL benchmarks, SpecACCEL benchmarks. The reasons for using these benchmark suites are:_These benchmarks are created by the HPC community, and contain codes that are representative of real-world HPC applications running in large HPC facilities.
_These benchmarks provide a reference for the HPC community to measure the effectiveness and performance of the developer tools available in the market.
_The benchmarks provide sequential source codes as well as performant parallel implementations using OpenMP & OpenACC. Thus, each benchmark provides the input and the output to test the effectiveness and performance of the Parallware technology.
_The benchmarks are organized by increasing level of complexity:
- Microbenchmarks, small codes that use basic programming language features (e.g., arrays, pointers).
- Mini-Apps, mid-size codes that use advances programming language features (e.g., structs, functions).
- Real-world Apps, big codes whose implementation is almost as complex as real-world programs running in large HPC facilities.
Until November 2016, the Parallware technology has learnt from codes in microbenchmarks from dense/sparse algebra (e.g., MATMUL, LAPLACE) and CORAL benchmarks (e.g., HACCmk). Recently, some Mini-Apps from NAS Parallel Benchmarks (e.g., EP, BT, CG), and SpecACCEL benchmarks (e.g., CloverLeaf) are also being used for Parallware’s learning.
Published success stories have already shown the potential of Parallware technology. We have stablished collaborations that led to validate the Parallware technology through publication of R+D papers (SC15 WACCPD, OpenMPCon’15) and technical reports (ORNL CSC193 project).
This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.