One of my friends is a PhD-student in mathematics, and he told me about a computation that maple
had difficulties in finishing. The computation involved nested loops, but the number of evaluations sounded reasonable small for modern computers (the loops run over lists of approx. 10.000 elements, which for two-level loops means 100.000.000 evaluations).
Having used R for a while, I have learnt to avoid loops and have started to think more in terms of vectors instead. This was an opportunity to test if the vectorised way of doing it could in fact speed up real world mathematical computations.
For really simple tasks, vectorised functions are superfast. To really appreciate the speed, I use an old laptop from 1997 for the tests. Here is a simple test of the speed on vectorised functions. First two vectors of 1.000.000 elements each (integers between 0 and 20.0000) are created and named a
and b
. Then, a third vector c
is created using the first two vectors. Lastly, 5 randomly chosen element from c
is printed out (so we know that some evalutations have actually taken place).
a <- sample(20000, 1000000, replace = T) b <- sample(20000, 1000000, replace = T) c <- a * a / (b + 1) - (a + b) c[sample(1000000, 5)] [1] 39099.4466 -9809.0971 -19094.8394 601.1394 -12350.6934
What is of interest here is the time for evaluation of c, which can be measured with the function system.time()
:
system.time(c <- a * a / (b + 1) - (a + b)) user system elapsed 4.740 0.944 5.789
So, in 4.7 seconds, this laptop from 1997 managed to apply the formula 1.000.000 times. However, saying that the formula "was applied" 1.000.000 times is inaccurate, it was only applied once. To understand the difference look at this way of creating c
from a
and b
:
c <- rep(0,100000) system.time(for(i in 1:100000) { c[i] <- a[i] * a[i] / (b[i] + 1) - (a[i] + b[i]) }) user system elapsed 100.131 0.308 100.580
In this example the formula is applied once for each pair of elements in a
and b
to create a specific element in c
. And yes, this is 100.000 elements not 1.000.000 as in the former examples. I didn't want to waste CPU time on my old laptop looping over 1.000.000 elements just to show you the figures. But the loop solution spends 25 times more CPU-time on a problem that seems 1/10 in size.
The loop solution is simply way slower. So, unless it has other advantages, it should be substituted with a vectorized solution.
In the example above, there is - in a way - only one calculation, the creation of vector c
. The mathematical problem my friend struggled with were not that simple, of course. His problem included recursion which he had tackled with loops. A main advantage of loops is that when evaluating them, the computer does not need to have a complete representation of data in memory. This is because evaluation is done one element at a time and thus only information relevant to the current element is needed. In contrast, vectorised solutions operate on lists of elements (vectors), and for a recursive solution, a large matrix of lists must be stored in memory for these vectorised functions to work on. Each level of recursion adds a new dimension to the matrix.
Let's see how well vectorization works for recursive problems. To test this I have made up a problem with three levels of recursion where calculations on later levels uses the results of calculations on former levels.
First, look at a loop solution:
my.timings.loop <- function(nr.of.outer.loops = 6, nr.of.middle.loops = 5, nr.of.inner.loops = 4, a) { my.results <- vector() for(i in 1:nr.of.outer.loops){ b <- log(a[i]+17) for(j in 1:nr.of.middle.loops){ c <- log(b+j) for(k in 1:nr.of.inner.loops){ d <- log(c+k) # If all results were interesting, we would save all evaluations of d # my.results <- d } } # Only store the last innermost evaluation for every outmost loop my.results[i] <- d } # dim(my.results) <- c(nr.of.inner.loops, nr.of.middle.loops, nr.of.outer.loops) my.results }
This function takes the first 6 values of a, which is given, and for each of these 6 values, adds 17 and stores the log of the sum in a variable b
which in turn is used as the base for the calculations in the next level, where c
is created. In the innermost loop, d
is created much like c
was in the second level. We only store the last d
of each b, so the results of the function is a list of nr.of.outer.loops elements. The comments show how all values could be saved, if they were of interest.
Below, the same problem is solved using matrices instead of loops. For each level of recursion, a new dimension is required by the matrix that holds the data.
my.timings.vectorized <- function(nr.of.outer.loops = 6, nr.of.middle.loops = 5, nr.of.inner.loops = 4, a) { x <- a[1:nr.of.outer.loops] temp <- nr.of.outer.loops * nr.of.middle.loops b <- log(x+17) temp.c.1 <- matrix(rep(b, nr.of.middle.loops), byrow = T, nrow = nr.of.middle.loops) temp.c.2 <- 1:nr.of.middle.loops temp.c.3 <- temp.c.1 + temp.c.2 c <- log(temp.c.3) temp.d.1 <- matrix(rep(c, nr.of.inner.loops), byrow = T, nrow = nr.of.inner.loops) temp.d.2 <- 1:nr.of.inner.loops temp.d.3 <- temp.d.1 + temp.d.2 my.results <- log(temp.d.3) dim(my.results) <- c(nr.of.inner.loops, nr.of.middle.loops, nr.of.outer.loops) my.results[nr.of.inner.loops, nr.of.middle.loops, 1:nr.of.outer.loops] }
So, what about evaluation times for these two different approaches? Below is the times used for an evaluation of 100 * 100 * 100 = 1.000.000 elements:
system.time(my.timings.loop(nr.of.outer.loops = 100, nr.of.middle.loops = 100, nr.of.inner.loops = 100, a=my.sample)) user system elapsed 139.892 0.388 140.484
system.time(my.timings.vectorized(nr.of.outer.loops = 100, nr.of.middle.loops = 100, nr.of.inner.loops = 100, a=my.sample)) user system elapsed 5.325 1.320 6.688
Conclusion: 25 times faster for the vectorized version.
So, as long as you have the RAM required, substitute loops with a gigantic matrix.
Edit 2010-07-22: For a real-world example, see Vectors and loops in R continued.