MCMC on the GPU

Recent developments in the implementation of probabilistic programming langauges has made it easy to fit regression models of the type often used in the social sciences in a parallelized way on the graphics cards (GPU) instead of the main processing unit (CPU). The more of the computations that can be dispatched to the GPU, the faster model fitting. Of the two implementations described here, Bambi and BRMS, the former is able to dispatch more to the GPU and is thus the fastest one.

BRMS is a mature R package, it has existed since at least 2018, which uses stan as backend. stan rather recently aquired support for dispatching computations to the GPU via OpenCL. In comparison to Bambi BRMS has a more powerful way of specifying formulas than Bmabi, and it has built in support for estimation of effects of the independent variables, including interactions. But the backend of BRMS, stan, dispatch less of the computations (specifically the sampling is performed on CPU) to the GPU, so it is slower because the GPU has to wait for the CPU. Bambi can also make use of multiple GPU:s while BRMS is restricted to one device. BRMS is used from R and Bambi is used from python.

R: BRMS

Requirements

In general, it is best to start with the backend, which for BRMS is OpenCL. OpenCL is a standard for GPUs from different vendors, such as nvidia and amd, that make it possible to write a single program that can run on different GPUs. The first step is to install a run time environment for your specific GPU vendor. Here we show how to do that for nvidia which is the leading GPU vendor. nvidia provides OpenCL via their Cuda toolkit. The cuda toolkit works in tandem with the graphics card driver, so their version must match. Installing the latest released version of each at the same time is an easy way to get compability (On windows, the Cuda Toolkit installer appears to install the graphics driver too, so perhaps it is not necessary to install the graphics driver separately)

The full chain of programs for brms (with a nvidia GPU) is:

brms - cmdstanr - cmdstan - opencl - nvidia toolkit

Installation windows

At the time of writing, the authoritative source was https://mc-stan.org/docs/2_26/cmdstan-guide/parallelization.html#opencl which gives the following instructions for installing an OpenCL run time environment on windows:

Install the NVIDIA GPU Driver and CUDA Toolkit.

Also install RTools for your version of R, which is needed to compile programs. At the time of writing I used version 4.2 of R and I used https://cran.r-project.org/bin/windows/Rtools/rtools42/rtools.html and specifically the link Rtools42 installer. To find other versions of RTools see https://cran.r-project.org/bin/windows/Rtools.

Once these are installed, next step is to install cmdstanr and cmdstan

cmdstan does not require OpenCL, and for cmdstan to use opencl on Windows, it needs special instructions. cmdstan can be automatically be installed by cmdstanr. To install cmdstanr

install.packages("cmdstanr", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))

(source: https://mc-stan.org/cmdstanr/)

Verify that RTools is properly installed:

library(cmdstanr)
check_cmdstan_toolchain(fix = TRUE, quiet = TRUE)

At the time of writing, the default settings would make cmdstan fail to install. To have it working it needs both the location of the OpenCL files and be instructed to not use precompiled headers.

To automatically install cmdstan via cmdstanr:

library(cmdstanr)
install_cmdstan(cores = 2)
set_cmdstan_path("D:/Hans/.cmdstan/cmdstan-2.29.2")
path_to_opencl_lib <- "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.7/lib/x64"
# path_to_opencl_lib <- "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.5/lib/x64"
cpp_options = list(
  "CXXFLAGS += -fpermissive",
  "PRECOMPILED_HEADERS"=FALSE,
  paste0("LDFLAGS+= -L\"",path_to_opencl_lib,"\" -lOpenCL")
)
cmdstan_make_local(cpp_options = cpp_options)
rebuild_cmdstan(cores=4)

Installation linux

Python: Bambi

Bambi and its backends is currently under development, which means that there can be problems with version mismatch when new versions of packages appear. To mitigate such problem combine versions that are known to play nice together, see bllow.

Installation windows

Python / miniconda

There are different ways to install python, I prefer miniconda which only installs packages I actually need.

Find a miniconda installer for windows here: https://docs.conda.io/en/latest/miniconda.html I used the one that installs Python 3.9.

installing packages for python

Start Ananconda Prompt (miniconda3) via the windows start menu

Install a cuda-supported jaxlib and and cuda supported numpyro

working versions

The most important fact here is that aesara is newer than the latest relase, and newer than the version pymc 4.0.0 requires!

Package Version

——————— absl-py 1.1.0 aeppl 0.0.31 aesara 2.7.3 arviz 0.12.1 attrs 21.4.0 bambi 0.9.0 blackjax 0.7.0 cachetools 5.2.0 cftime 1.6.0 cloudpickle 2.1.0 cons 0.4.5 cycler 0.11.0 etuples 0.3.5 fastprogress 1.0.2 filelock 3.7.1 flatbuffers 2.0 fonttools 4.33.3 formulae 0.3.4 iniconfig 1.1.1 jax 0.3.13 jaxlib 0.3.10+cuda11.cudnn82 kiwisolver 1.4.3 logical-unification 0.4.5 matplotlib 3.5.2 miniKanren 1.0.3 multipledispatch 0.6.0 netCDF4 1.6.0 numpy 1.21.6 numpyro 0.9.2 opt-einsum 3.3.0 packaging 21.3 pandas 1.4.3 Pillow 9.1.1 pip 22.1.2 pluggy 1.0.0 py 1.11.0 pymc 4.0.0 pyparsing 3.0.9 pytest 7.1.2 python-dateutil 2.8.2 pytz 2022.1 scipy 1.7.3 setuptools 62.6.0 six 1.16.0 tomli 2.0.1 toolz 0.11.2 tqdm 4.64.0 typing_extensions 4.2.0 wheel 0.37.1 xarray 2022.3.0 xarray-einstats 0.3.0

on master: jax 0.3.0 jaxlib 0.3.0+cuda11.cudnn82 numpy 1.21.5 numpyro 0.8.0

on pc27: jax 0.3.14 jaxlib 0.3.14+cuda11.cudnn82 numpy 1.21.6 numpyro 0.9.2

Good settings on gpu-master

method="nuts_numpyro" chain_method="sequential" gives lower RAM usage and lower GPU utilization but the same speed as method="nuts_numpyro" chain_method="vectorized"

nuts_blackjax overflows the GPU RAM

comments powered by Disqus

Back to the index

Blog roll

R-bloggers, Debian Weekly

Last modified: september 24, 2022