OpenCL Parallel Programming Development Cookbook
Format: PDF / Kindle (mobi) / ePub
This cookbook is the perfect way to learn parallel programming in OpenCL because if offers a mix of enlightening theory and hands-on recipes. Ideal for experienced developers.
- Learn about parallel programming development in OpenCL and also the various techniques involved in writing high-performing code
- Find out more about data-parallel or task-parallel development and also about the combination of both
- Understand and exploit the underlying hardware features like processor registers and caches that run potentially tens of thousands of threads across the processors
OpenCL (Open Computing Language) is the first royalty-free standard for cross platform, parallel programming of modern processors found in personal computers, servers, mobiles, and embedded devices. OpenCL greatly improves speed and responsiveness for a wide spectrum of applications in numerous market categories, from gaming and entertainment to scientific and medical software. OpenCL has proved itself to be versatile in that it now runs on not only operating systems like Windows and Linux powered by Intel and AMD processors, but also on low power chips like ARM, and it has also been adopted by processor manufacturers like ARM Corp, Vivante, and Altera, among others.
OpenCL Parallel Programming Development Cookbook was designed to be practical so that we achieve a good balance between theory and application. Learning to program in a parallel way is relatively easy, but to be able to take advantage of all of the resources available to you efficiently is quite different. You need to be shown not only application, but also the theory behind it.
This book is roughly in two parts, where the first part is the fundamentals of OpenCL parallel development and the second part is the various algorithms we will explore with you. Each part is packed with many code samples and illustrations to demonstrate various concepts. The first part is essential for a beginner to not only program in parallel, but also to think in parallel and become equipped with the mental model with which to tackle parallel programming. The second part consists of seven different algorithms that the author has identified; you will learn various parallel programming techniques that experts have used in the past 60 years that are applicable to OpenCL.
This book will demonstrate how you think in parallel by illustrating and demonstrating programming techniques like data partitioning, thread coarsening, register tiling, data pre-fetching, and algorithm transformation. These techniques are demonstrated in the seven algorithms you’ll be shown, from image processing and solving sparse linear systems to in-memory sorting.
OpenCL Parallel Programming Development Cookbook combines recipes, illustrations, code, and explanations to allow you to learn the essentials of parallel programming in OpenCL, and the author has added in enough math so that the readers understand the motivation and can also lay the foundation upon which they will begin their own exploration.
What you will learn from this book
- How to use OpenCL
- Understand data partitioning and transfers in OpenCL
- Understand OpenCL data types
- Learn about OpenCL functions including math, atomic, threading model, data transfer, and so on
- Develop a histogram in OpenCL
- Learn how to develop Sobel edge detection in OpenCL for image processing
- Develop the Matrix Multiplication and the Sparse Matrix Vector Multiplication in OpenCL
- Learn to develop Bitonic sort and Radix sort in OpenCL
- Develop n-body with OpenCL
"Copy from Host|"); if(flags & CL_MEM_USE_HOST_PTR) strcat(flagStr, "Use from Host|"); if(flags & CL_MEM_ALLOC_HOST_PTR) strcat(flagStr, "Alloc from Host|"); printf("\tOpenCL Buffer's details =>\n\t size: %lu MB,\n\t object type is: %s,\n\t flags:0x%lx (%s) \n", memSize >> 20, str, flags, flagStr); } On OSX, you will compile the program by running the following command on your terminal: gcc -std=c99 -Wall -DUNIX -g -DDEBUG -DAPPLE -arch i386 -o buffer_query buffer_query.c -framework OpenCL On
= (float4) (a, b, c, d); store (1.0f, 2.0f, 3.0f, 4.0f) Another way to initialize a vector is to do it via a scalar value as shown in the following code: uint4 ui4 = (uint4)(9); // ui4 will store (9, 9, 9, 9) You can also create vectors in the following fashion: float4 f = (float4) ((float2) (1.1f, 2.2f), (float2) (3.3f, 4.4f)); float4 f2 = (float4) (1.1f, (float2) (2.2f, 3.3f), 4.4f); The data type on the left-hand-side and right-hand-side must be same or the OpenCL compiler will issue a
computations, as well as memory accesses. The wavefront continues executing until the end of the kernel is reached, when the wavefront is terminated and a new one can take its place on the GPU. Taking into account the fact that memory accesses by wavefronts happens in parallel, you will expect some sort of latency to occur and the processor is pretty clever in dealing with that situation, and what it does is executing many wavefronts in parallel and it works such that if one wavefront is waiting
each of the platform, device and context as cl_platform_XX, cl_device_ XX, cl_context_XX, and APIs prefixed in a similar fashion by cl and one such API is clGetPlatformInfo. In OpenCL, the APIs do not assume that you know exactly how many resources (for example platforms, devices, and contexts) are present or are needed when you write the OpenCL code. And in order to write portable code, the developers of the language have figured out a clever way to present the API such that you use the same API