The use of multiple cores is now ubiquitous in software. From the fastest supercomputers in the world to mobile phones, multi-core hardware is now available for everyone to use. However, the difficulty comes in utilizing the hardware that is available. The Parallel Computing Laboratory at UC Berkeley describes the challenge as “Writing programs that scale with increasing numbers of cores should be as easy as writing programs for sequential computers”, but unfortunately this is not the case. The Parallel Bridge (figure 1) is used to illustrate that Software is the bridge between two towers: Applications and the parallel Hardware industry. However, the Parallelware team believe the problem can be broken down further. By introducing a third tower, Code, we can improve programmer productivity by introducing guided parallelization to teach best practices on parallel programming.
Fig. 1. An extension of the parallel bridge of Berkeley’s view of the parallel computing landscape. The new tower Code inserted between Applications and parallel Hardware highlights the impact of the program code in the productivity of the parallelization process.
The Parallelware Engine
Underneath the Parallelware Trainer is a hierarchical classification engine that discovers parallel patterns in sequential code. The training tool then presents a ranking of the available parallelization strategies, and generates pragma-based parallel source code using the OpenMP 4.5 and OpenACC 2.5 standards.
The functionality of the Parallelware Engine goes beyond traditional compiler optimizations, which do not explain the choices made, and use a mathematical approach to discover potential parallelism in software. In addition, compiler optimizations are not fully controllable by the programmer: you cannot choose which optimizations to use. In contrast, the Parallelware Trainer tool enables the programmer to choose between different OpenMP and OpenACC-enabled implementations of their code.
Parallelware uses a new computational approach: it looks for algorithmic features in terms of parallel patterns, such as parallel loops, parallel scalar reductions and parallel sparse reductions. The tool then ranks the possible strategies that are applicable to the code that is being reviewed, allowing the generation and review of multiple strategies.
Crucially Parallelware reports the useful parallel patterns identified, but hides the complexity of the dependencies found by the compiler. This provides a much simpler interface for training, education and everyday assisted parallelism usage.
Developing Parallel Support
The technological roadmap of Parallelware is guided by best practices for parallel programming with OpenMP and OpenACC. By analysing the OpenMP and OpenACC implementations of well known benchmark suites including CORAL, NAS and ORNL’s XRayTrac miniapp, we have identified the key parallel functionality in HPC software that we now support:
- The most popular parallel patterns, namely, Parallel Forall, Parallel Scalar Reduction and Parallel Sparse Reduction.
- The parallel programming paradigms Loop and Offload for modern CPU devices (e.g., Intel Xeon, IBM Power) and NVIDIA GPUs.
- OpenMP and OpenACC implementations of parallel scalar/sparse reductions using the approaches Atomic access, Built-in reduction and Variable privatization.
The full details on the research that led to this development are available in this paper.
A domain independent tool for providing assisted parallelism
The analysis done by Parallelware enables the identification of the most common parallel programming patterns irrespective of the application domain. By focusing on the programming language rather than the domain or mathematics of the software under consideration, users of the Parallelware Trainer tool can identify and understand areas of potential parallelism in their code in a way previously unachievable. Parallelware is about democratizing access to HPC: we don’t all have time to become experts in our domain as well as experts in parallel programming!
Get involved and try out Parallelware Trainer
If you would like to use Parallelware Trainer yourself we will be running the next GPU Hackathon in March: the CesgaHack18. The hackathon presents the opportunity to build collaborations with the Appentra team similar to EDANYA’s, the opportunity to increase your software’s performance and improve your productivity enabling you to create more science and increase the impact of your work.
This year’s Hackathon will be truly international and presented in English. The deadline to register is February 11, 2018. Do not miss the opportunity to participate and accelerate the execution of your simulation application. Register here!