Computational hardware is rapidly changing with an increasing movement to multi-core processors available everywhere from your mobile phone to the fastest supercomputers. To take advantage of these technologies software needs to be able to execute in parallel: something that isn’t straightforward to achieve. In addition, for optimal performance and ease of use software ideally needs to achieve a basic level of platform independent parallelism, i.e. that works on multiple architectures.
As a result of the difficulties of developing parallel software and parallel algorithms, the parallel software workflow is now a circular process:
- performance analysis and profiling;
- debugging; and
Parallelware helps with the parallelization stage, helping you to identify the best parallelization strategies for your code when it is incorporated in the development cycle.
Parallelware: a fast, extensible hierarchical classification scheme
Parallelware addresses dependence and data-flow analyses by splitting the target code into small domain-independent computational kernels. Parallelware classifies each kernel into one type of parallel pattern: (1) parallel for, loop with no dependencies between iterations; (2) parallel reduction, where the cross-iteration dependencies correspond to an associative, commutative operator; and (3) parallel recurrence, where iterations of a loop depend on adjacent iterations. For this purpose it combines information from multiple compiler static analyses including array access patterns, array access ranges, privatization, data scoping, and aliasing through interprocedural function calls.
Finally, Parallelware checks dependences between the kernels in order to discover coarse-grain parallelism, determines the best parallelization strategy and implements it by annotating the source code with the most appropriate OpenMP and OpenACC directives.
A great challenge for automatic discovery of parallelism is to handle all the syntactic variations of a program: the same algorithm can often be coded with many different, similar but subtly different code examples. Discovering opportunities for parallelism, independently of the implementation, is not possible with the traditional mathematical approach of classical dependence analysis, which is not a classification scheme. This is where Parallelware truly succeeds: it is able to classify domain-independent kernels as parallel patterns and apply parallelization strategies for each kernel. This results in a performant code analysis tool.
LLVM is the de-facto standard compiler infrastructure that provides frontends, backends, optimizers, assemblers, linkers, and more to create a complete compiler. Originally LLVM was developed in C++ as a research project by the Adve and Lattner group at the University of Illinois at Urbana-Champaign. It was designed as a replacement for the existing code generator in the GCC stack. Now, LLVM is growing in popularity and is now supporting the compilation of a variety of languages including Ada, C, C++, D, Delphi, Fortran, Haskell, Objective-C and Swift.
The widespread adoption of LLVM including by NVIDIA/PGI, Intel, IBM, ARM, Apple and Sony, has resulted in a wide variety of new compiler support. The most widely used, and also used by Parallelware, is Clang: a new compiler supporting C, C++ and objective C. Unlike more traditional compilers LLVM is designed to provide reusable components. For example, Clang is the C/C++ frontend compiler of LLVM which converts C code into LLVM bitcode, which can then be translated into assembly by a backend compiler.
Parallelware uses LLVM to perform its analysis and identify parallelization opportunities in a way not previously possible. It is the core Parallelware technology that enables the identification of areas for parallel optimization and comparison of different possible parallelization targets.
What is an intermediate representation?
The intermediate representation (IR) is an intermediate language which is a common language between the software programming languages (e.g. C, C++, Fortran) and the executable code that can be executed on different hardware devices. Being common between multiple languages and across multiple architectures (each of which has its own binary code) means that only one version of the Parallelware technology needs to be built for all programming languages.
As LLVM is now the de-facto standard compiler infrastructure, Parallelware uses the LLVM IR. The LLVM IR has a modern design following object-oriented software engineering best practices, but with the benefit of providing, for instance, a strongly typed reduced instruction set and a scalable, highly extensible hierarchy of objects that represent the building blocks of a programming language: code concepts (e.g., Instruction, Loop, Function, Module) and code semantics (e.g., conditional control flow, data dependencies).
Within compiler theory, an intermediate representation (IR) is software infrastructure that captures the semantics of the source code written in a high-level programming language such as C, C++, Fortran. In addition, the IR provides an API to analyze and manipulate the semantics of the code.
LLVM allows the Parallelware team to take advantage of the information supplied by the IR to identify key functionality in any piece of C or C++ software, but also to take advantage of the inherent information available through the LLVM-IR on instruction sets and understand how particular parallelization strategies will work on different architectures.
What about Fortran?
At the moment an LLVM for Fortran is still in development stages, but the Fortran version of Clang, Flang, is on its way. We presented the first prototype for Fortran support at the SC17 conference and we are currently working with the Flang initiative to ensure that Flang meets the needs of Parallelware.
More information on Parallelware Technologies and the Parallelware technological roadmap is available on the Appentra technology page.
Make Code Parallel
Parallelware Trainer is an interactive, real-time code editor with features that facilitate the learning, usage, and implementation of parallel programming by understanding how and why sections of code can be parallelized.