When you repeatedly want to manipulate an existing object in the global environment for
is what you want.
foo <- 1:50000
the.result <- vector()
system.time(for(i in foo){ the.result[i] <- i+1 })
user system elapsed
8.084 0.004 8.510
head(the.result)
[1] 2 3 4 5 6 7
apply does not affect the global environment, since a *function** is used.
system.time(sapply(foo, function(x) {the.result[x] <- x+1} ))
user system elapsed
6.841 0.000 6.969
head(the.result)
logical(0)
If you want to apply a function over a list, to get results sapply is best. The results can be returned in form of a list to the global environment, but they can also be saved to disk (save()
within the function within *apply()
)
However, for simple transformations like addition, both for
and *apply
is unecessary.
system.time(the.result <- foo+1)
user system elapsed
0.000 0.000 0.001
head(the.result)
[1] 2 3 4 5 6 7
A more elaborate example, change format from DD-MM-YYYY to YYYY-MM-DD
The first chunk creates the sample data vector to operate on, 50.000 random dates in form DD-MM-YYYY.
n <- 50000
my.string <- paste(c(paste("0", sample(1:9, n*9/30, replace = T), sep = ""), sample(10:28, n*21/30, replace = T)), "-", c(paste("0", sample(1:9, n*9/12, replace = T), sep = ""), sample(10:12, n*3/12, replace = T)), "-", 2010, sep = "")
head(my.string)
[1] "07-07-2010" "06-04-2010" "04-07-2010" "07-09-2010" "05-04-2010" "03-08-2010"
tail(my.string)
[1] "19-12-2010" "23-10-2010" "25-12-2010" "17-12-2010" "27-10-2010" "24-10-2010"
length(my.string)
[1] 5000
The smart way of doing it, substr()
is only invoked three times, and paste()
is only invoked once.
system.time(paste(substr(my.string, 7, 10), substr(my.string, 4,5), substring(my.string, 1,2), sep = "-"))
user system elapsed
0.5 0.0 0.5
The naive way of doing it, substr()
is invoked 150.000 times and paste()
is invoked 50.000 times.
system.time(for(i in 1:n){my.string[i] <- paste(substr(my.string[i], 7, 10), substr(my.string[i], 4,5), substring(my.string[i], 1,2), sep = "-")})
user system elapsed
24.852 0.088 25.595
So, only use for
when there is no relevant function available that accepts vectors as argument.