Table of contents:
Issue
Unused data should never be copied to or from the GPU to prevent unnecessary data movements between the CPU and the GPU, which impacts on performance.
Relevance
One of the key challenges when offloading work to the GPU is minimizing the data transfers between CPU memory and GPU memory. These transfers can greatly affect performance and should be as optimized as possible. Thus, only the strictly required data should be copied to or from the GPU memory.
Actions
Restrict the array range to be copied to the GPU to that strictly required.
Code example
The following code performs the sum of two arrays:
void foo() { int A[100], B[100], sum[100]; #pragma omp target map(to: A[0:100], B[0:100]) map(from: sum[0:100]) #pragma omp parallel for for (int i = 0; i < 50; i++) { sum[i] = A[i] + B[i]; } }
However, only half of the total array elements are actually being used. Thus, there is no need to transfer the entire arrays:
void foo() { int A[100], B[100], sum[100]; #pragma omp target map(to: A[0:50], B[0:50]) map(from: sum[0:50]) #pragma omp parallel for for (int i = 0; i < 50; i++) { sum[i] = A[i] + B[i]; } }
Related resources
Join Parallelware Analyzer Early Access
Enter the program to have access to all versions of Parallelware Analyzer until the official release.