OpenCL is a language that enables computation being executed on graphics processing unit, GPU, as well as on the central processing unit CPU, and it was invented by Apple in 2008. Modern GPUs have many more "cores" than modern CPU:s, so the main advantage of OpenCL is that computations that can be parallelised can be executed much faster. However, the "cores" in the GPU are not fullly fledged independent processing units, each core can only execute the same instruction as the other cores. All sorts of computations on matrices can be executed in parallel, but most parts of an "ordinary" program can not. So, if you have a lot of matrices or vectors to add or multiply, then OpenCL is for you.
In addition to the limitation of dependent "cores", the RAM available to these cores is limited. On my computer, a laptop with 8 GB of RAM, I have the following memory related limits:
Max memory allocation: 1610612736 Global memory size: 2147483648 Local memory size: 65536
Lastly, moving data to an from the GPU is slow, so ideally, all computations is done entirely on the GPU and only the results should be read back to the CPU.
The documentation google gives about "opencl debian intel" is outdated, in particular "OpenCLHowTo - Andreas Klöckner's wiki" incorrectly states that "the Intel CPU ICD is not packaged. Install it manually as above."
To get a working installation of OpenCL, simply install the debian packages beignet-opencl-icd and ocl-icd-libopencl1. To confirm that you have a working installation, install clinfo
sudo apt-get install beignet-opencl-icd ocl-icd-libopencl1 clinfo
Running clinfo gives this output on my laptop:
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.2 beignet 1.3
Platform Name: Intel Gen OCL Driver
Platform Vendor: Intel
Platform Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short
Platform Name: Intel Gen OCL Driver
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 32902
Max compute units: 16
Max work items dimensions: 3
Max work items[0]: 512
Max work items[1]: 512
Max work items[2]: 512
Max work group size: 512
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Native vector width char: 8
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 4
Native vector width double: 2
Max clock frequency: 1000Mhz
Address bits: 32
Max memory allocation: 1610612736
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 8192
Max image 3D height: 8192
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: No
Round to +ve and infinity: No
IEEE754-2008 fused multiply-add: No
Cache type: Read/Write
Cache line size: 64
Cache size: 8192
Global memory size: 2147483648
Constant buffer size: 134217728
Max number of constant args: 8
Local memory type: Global
Local memory size: 65536
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 80
Device endianness: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0x7fa6f3050840
Name: Intel(R) HD Graphics IvyBridge M GT2
Vendor: Intel
Device OpenCL C version: OpenCL C 1.2 beignet 1.3
Driver version: 1.3
Profile: FULL_PROFILE
Version: OpenCL 1.2 beignet 1.3
Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_intel_motion_estimation
In order to compile your own OpenCL programs, you need a LLVM-based compiler, and the default one in debian is clang.
sudo apt-get install clang
typedef float V __attribute__((vector_size(16))); V foo(V a, V b) { return a+b*a; }
To test the speed of matrix multiplications, I use a two-level high-level implementation:
libviennacl-devRViennaCL and gpuRThe gpuR outperforms the I3 CPU on matrix multiplication