OpenCL is a language that enables computation being executed on graphics processing unit, GPU, as well as on the central processing unit CPU, and it was invented by Apple in 2008. Modern GPUs have many more "cores" than modern CPU:s, so the main advantage of OpenCL is that computations that can be parallelised can be executed much faster. However, the "cores" in the GPU are not fullly fledged independent processing units, each core can only execute the same instruction as the other cores. All sorts of computations on matrices can be executed in parallel, but most parts of an "ordinary" program can not. So, if you have a lot of matrices or vectors to add or multiply, then OpenCL is for you.
In addition to the limitation of dependent "cores", the RAM available to these cores is limited. On my computer, a laptop with 8 GB of RAM, I have the following memory related limits:
Max memory allocation: 1610612736 Global memory size: 2147483648 Local memory size: 65536
Lastly, moving data to an from the GPU is slow, so ideally, all computations is done entirely on the GPU and only the results should be read back to the CPU.
The documentation google gives about "opencl debian intel" is outdated, in particular "OpenCLHowTo - Andreas Klöckner's wiki" incorrectly states that "the Intel CPU ICD is not packaged. Install it manually as above."
To get a working installation of OpenCL, simply install the debian packages beignet-opencl-icd
and ocl-icd-libopencl1
. To confirm that you have a working installation, install clinfo
sudo apt-get install beignet-opencl-icd ocl-icd-libopencl1 clinfo
Running clinfo
gives this output on my laptop:
Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.2 beignet 1.3 Platform Name: Intel Gen OCL Driver Platform Vendor: Intel Platform Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short Platform Name: Intel Gen OCL Driver Number of devices: 1 Device Type: CL_DEVICE_TYPE_GPU Device ID: 32902 Max compute units: 16 Max work items dimensions: 3 Max work items[0]: 512 Max work items[1]: 512 Max work items[2]: 512 Max work group size: 512 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Native vector width char: 8 Native vector width short: 8 Native vector width int: 4 Native vector width long: 2 Native vector width float: 4 Native vector width double: 2 Max clock frequency: 1000Mhz Address bits: 32 Max memory allocation: 1610612736 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 8192 Max image 3D height: 8192 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: No Round to +ve and infinity: No IEEE754-2008 fused multiply-add: No Cache type: Read/Write Cache line size: 64 Cache size: 8192 Global memory size: 2147483648 Constant buffer size: 134217728 Max number of constant args: 8 Local memory type: Global Local memory size: 65536 Error correction support: 0 Unified memory for Host and Device: 1 Profiling timer resolution: 80 Device endianness: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: Yes Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 0x7fa6f3050840 Name: Intel(R) HD Graphics IvyBridge M GT2 Vendor: Intel Device OpenCL C version: OpenCL C 1.2 beignet 1.3 Driver version: 1.3 Profile: FULL_PROFILE Version: OpenCL 1.2 beignet 1.3 Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_intel_motion_estimation
In order to compile your own OpenCL programs, you need a LLVM-based compiler, and the default one in debian is clang
.
sudo apt-get install clang
typedef float V __attribute__((vector_size(16))); V foo(V a, V b) { return a+b*a; }
To test the speed of matrix multiplications, I use a two-level high-level implementation:
libviennacl-dev
RViennaCL
and gpuR
The gpuR
outperforms the I3 CPU on matrix multiplication