This is the second video of our OpenACC programming course based on Appentra’s unique, productivity-oriented approach to learn best practices for parallel programming using OpenACC. Learning outcomes include how to decompose codes into parallel patterns and a practical step-by-step process based on patterns for parallelizing any code. In this video, we start with an introduction to OpenACC and an overview of the components of GPGPU technology.
✉ Subscribe to our newsletter and get all of our latest updates.
Course Index
- Course overview
- What is OpenACC?
- How OpenACC works.
- Building and running an OpenACC code.
- The openACC parallelization process.
- Using your first OpenACC directives.
- Using patterns to parallelize.
- Identifying a forall patterns.
- Implementing a parallel forall.
Video transcript:
What is OpenACC?
Using GPUs, whether in your laptop or in a complex hybrid supercomputer, is just difficult. It takes considerable effort to program for GPUs. OpenACC is one method to program GPGPUS, also known as GPUs.
Why accelerators and GPUs
But why should we use GPUs?
GPUs allow you to run many more floating point operations per second than would be typical on a standard CPU.
CPUs are power intensive for the relative performance they provide, whereas GPUs offer far more opportunity for floating point operations at a lower wattage. So as the race continues for more and more compute power, GPUs are increasingly prevalent in the architectures being made available. But they are not a complete replacement for the CPU and in reality not all software is suitable for running on a GPU.
Now we can’t ignore architecture when it comes to GPUs, because they are very slow when you use them serially. You also need to understand the cost of data transfer. But with modern GPUs you can assume, except in certain situations, that they will operate in a similar manner, so your OpenACC code will work fairly well on most GPU architectures.
OpenACC is one of multiple methods aimed at making it easier to use GPUs. You can take the code that runs on a CPU and incrementally change it using OpenACC to port it over to the GPU. The other extreme, of course, is to program at the lowest level if you want to get optimal performance. But in reality we all want portable solutions relatively quickly, which is definitely not what you would get if with lower level programming solutions for parallelism.
What is OpenACC?
OpenACC avoids completely re-writing your code, and by using directives you can get quickly up and running and ensuring your code is highly portable by the time you finish.
The beauty of OpenACC is that you don’t need to do a lot, in order to get running on a GPU, and porting is a lot simpler, if not trivial. There are many options, but to get up and running you don’t need to know much. You won’t get optimal performance straight away, but you can port to a GPU relatively quickly but adding directives that offload hotspot loops, that is computationally work, to the GPU.
One of the emphases of this course is how to work with real applications. We don’t want you finishing this course with just the knowledge to port a simple HelloWorld or matrix multiplication. You probably have a real, complex application, and by the end of this course we want you to feel comfortable with getting started with your code.
OpenACC takes account of the Host/Accelerator model of programming: that is your CPU (also known as the host) controls the GPU (i.e. the accelerator or device).
This means that until you start adding directives, your controller, the CPU, has control of the entire program and data, and only when you add directives does the work and the relevant data move onto the GPU (more on that later).
What is a GPU?
But what is a GPU?
The first thing to note is that a GPU contains many more floating point operations than a traditional CPU. This is consistent with the trend in hardware (all hardware including your laptop and even your car and mobile phone) to move to more and more parallelism. Rather than just adding more and more CPUs, many systems are now adding GPUs as well because of the relative power usage for floating point operations.
To make good use of GPUs your code needs to be capable of doing the same thing simultaneously thousands or even millions of times to take advantage of the technology.
Which finally brings us to the complexity of the memory that goes with these systems.
Not all the floating point units have access to all the memory, but the wonderful thing about using OpenACC is that most of this is taken care of for you.
Benefits and Limitations of OpenACC
The final thing to note about OpenACC is whether it is right for you and your code.
OpenACC is a wonderful solution if you need a quick implementation that can use GPUs and is platform independent (provided you have a GPU and a compiler).
However, you will never achieve the ‘best’ performance possible at the lowest level of programming. This is true for almost every tool you might use, and in reality, very few people will benefit from using much lower level tools, simply because of the time taken to use these tools never paying off. As a programmer, you should always ask yourself if it is worth putting in additional effort to optimise given the potential benefit and how many times you might run the code and receive the benefit.
✉ Subscribe to our newsletter and get all of our latest updates.
Leave a Reply