OpenCL on IntelHD on Debian

About OpenCL

OpenCL is a language that enables computation being executed on graphics processing unit, GPU, as well as on the central processing unit CPU, and it was invented by Apple in 2008. Modern GPUs have many more "cores" than modern CPU:s, so the main advantage of OpenCL is that computations that can be parallelised can be executed much faster. However, the "cores" in the GPU are not fullly fledged independent processing units, each core can only execute the same instruction as the other cores. All sorts of computations on matrices can be executed in parallel, but most parts of an "ordinary" program can not. So, if you have a lot of matrices or vectors to add or multiply, then OpenCL is for you.

In addition to the limitation of dependent "cores", the RAM available to these cores is limited. On my computer, a laptop with 8 GB of RAM, I have the following memory related limits:

Max memory allocation:                         1610612736
Global memory size:                            2147483648
Local memory size:                             65536

Lastly, moving data to an from the GPU is slow, so ideally, all computations is done entirely on the GPU and only the results should be read back to the CPU.

Installing OpenCL for an Intel GPU on Debian

The documentation google gives about "opencl debian intel" is outdated, in particular "OpenCLHowTo - Andreas Klöckner's wiki" incorrectly states that "the Intel CPU ICD is not packaged. Install it manually as above."

To get a working installation of OpenCL, simply install the debian packages beignet-opencl-icd and ocl-icd-libopencl1. To confirm that you have a working installation, install clinfo

sudo apt-get install beignet-opencl-icd ocl-icd-libopencl1 clinfo

Running clinfo gives this output on my laptop:

Number of platforms:                             1
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 1.2 beignet 1.3
  Platform Name:                                 Intel Gen OCL Driver
  Platform Vendor:                               Intel
  Platform Extensions:                           cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short
  Platform Name:                                 Intel Gen OCL Driver
Number of devices:                               1
  Device Type:                                   CL_DEVICE_TYPE_GPU
  Device ID:                                     32902
  Max compute units:                             16
  Max work items dimensions:                     3
    Max work items[0]:                           512
    Max work items[1]:                           512
    Max work items[2]:                           512
  Max work group size:                           512
  Preferred vector width char:                   16
  Preferred vector width short:                  8
  Preferred vector width int:                    4
  Preferred vector width long:                   2
  Preferred vector width float:                  4
  Preferred vector width double:                 0
  Native vector width char:                      8
  Native vector width short:                     8
  Native vector width int:                       4
  Native vector width long:                      2
  Native vector width float:                     4
  Native vector width double:                    2
  Max clock frequency:                           1000Mhz
  Address bits:                                  32
  Max memory allocation:                         1610612736
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          8
  Max image 2D width:                            8192
  Max image 2D height:                           8192
  Max image 3D width:                            8192
  Max image 3D height:                           8192
  Max image 3D depth:                            2048
  Max samplers within kernel:                    16
  Max size of kernel argument:                   1024
  Alignment (bits) of base address:              1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     No
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               No
    Round to +ve and infinity:                   No
    IEEE754-2008 fused multiply-add:             No
  Cache type:                                    Read/Write
  Cache line size:                               64
  Cache size:                                    8192
  Global memory size:                            2147483648
  Constant buffer size:                          134217728
  Max number of constant args:                   8
  Local memory type:                             Global
  Local memory size:                             65536
  Error correction support:                      0
  Unified memory for Host and Device:            1
  Profiling timer resolution:                    80
  Device endianness:                             Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:
    Execute OpenCL kernels:                      Yes
    Execute native function:                     Yes
  Queue properties:
    Out-of-Order:                                No
    Profiling :                                  Yes
  Platform ID:                                   0x7fa6f3050840
  Name:                                          Intel(R) HD Graphics IvyBridge M GT2
  Vendor:                                        Intel
  Device OpenCL C version:                       OpenCL C 1.2 beignet 1.3
  Driver version:                                1.3
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 1.2 beignet 1.3
  Extensions:                                    cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_intel_motion_estimation

In order to compile your own OpenCL programs, you need a LLVM-based compiler, and the default one in debian is clang.

sudo apt-get install clang

typedef float V __attribute__((vector_size(16)));
V foo(V a, V b) { return a+b*a; }

To test the speed of matrix multiplications, I use a two-level high-level implementation:

The debian package libviennacl-dev
and the R packages RViennaCL and gpuR

The gpuR outperforms the I3 CPU on matrix multiplication

comments powered by Disqus

Back to the index

Blog roll

R-bloggers, Debian Weekly

Last modified: oktober 12, 2017