for-and-sapply-in-R

When you repeatedly want to manipulate an existing object in the global environment for is what you want.

foo <- 1:50000 the.result <- vector() system.time(for(i in foo){ the.result[i] <- i+1 }) user system elapsed 8.084 0.004 8.510 head(the.result) [1] 2 3 4 5 6 7

apply does not affect the global environment, since a *function** is used.

system.time(sapply(foo, function(x) {the.result[x] <- x+1} )) user system elapsed 6.841 0.000 6.969 head(the.result) logical(0)

If you want to apply a function over a list, to get results sapply is best. The results can be returned in form of a list to the global environment, but they can also be saved to disk (save() within the function within *apply())

However, for simple transformations like addition, both for and *apply is unecessary.

system.time(the.result <- foo+1) user system elapsed 0.000 0.000 0.001 head(the.result) [1] 2 3 4 5 6 7

A more elaborate example, change format from DD-MM-YYYY to YYYY-MM-DD

The first chunk creates the sample data vector to operate on, 50.000 random dates in form DD-MM-YYYY. n <- 50000 my.string <- paste(c(paste("0", sample(1:9, n*9/30, replace = T), sep = ""), sample(10:28, n*21/30, replace = T)), "-", c(paste("0", sample(1:9, n*9/12, replace = T), sep = ""), sample(10:12, n*3/12, replace = T)), "-", 2010, sep = "") head(my.string) [1] "07-07-2010" "06-04-2010" "04-07-2010" "07-09-2010" "05-04-2010" "03-08-2010" tail(my.string) [1] "19-12-2010" "23-10-2010" "25-12-2010" "17-12-2010" "27-10-2010" "24-10-2010" length(my.string) [1] 5000

The smart way of doing it, substr() is only invoked three times, and paste() is only invoked once. system.time(paste(substr(my.string, 7, 10), substr(my.string, 4,5), substring(my.string, 1,2), sep = "-")) user system elapsed 0.5 0.0 0.5

The naive way of doing it, substr() is invoked 150.000 times and paste() is invoked 50.000 times. system.time(for(i in 1:n){my.string[i] <- paste(substr(my.string[i], 7, 10), substr(my.string[i], 4,5), substring(my.string[i], 1,2), sep = "-")}) user system elapsed 24.852 0.088 25.595

So, only use for when there is no relevant function available that accepts vectors as argument.

comments powered by Disqus


Back to the index

Blog roll

R-bloggers, Debian Weekly
Valid XHTML 1.0 Strict [Valid RSS] Valid CSS! Emacs Muse Last modified: oktober 17, 2019