Pitfall in the R package data.table

data.table is a efficient package for data manipulation and extraction in R. It implements fast, i.e. zero-copying, methods for operating on columns in objects of the class "data.table", which extends the native "data.frame" class. But these methods does not solely apply to objects of class "data.table"! Look at this:

foo <- data.table(some.var = c("BF.9", "BF.4", "BF.2"))
bar <- foo$some.var
class(bar)
[1] "character"
print(bar)
[1] "BF.9" "BF.4" "BF.2"
my.original.order <- order(bar)
setkey(foo, some.var) ## this changes bar !
print(bar)
[1] "BF.2" "BF.4" "BF.9"
my.current.order <- order(bar)
identical(my.original.order, my.current.order)
[1] FALSE

If this surprises you, then use copy() instead of the arrow operator <- (or instead of = ).

foo <- data.table(some.var = c("BF.9", "BF.4", "BF.2"))
bal <- copy(foo$some.var)
class(bal)
[1] "character"
print(bal)
[1] "BF.9" "BF.4" "BF.2"
my.original.order <- order(bal)
setkey(foo, some.var) ## this does NOT change bal!
print(bal)
[1] "BF.9" "BF.4" "BF.2"
my.current.order <- order(bal)
identical(my.original.order, my.current.order)
[1] TRUE

comments powered by Disqus


Back to the index

Blog roll

R-bloggers, Debian Weekly
Valid XHTML 1.0 Strict [Valid RSS] Valid CSS! Emacs Muse Last modified: juli 31, 2020