For large datasets with large number of variables, of whom you just want to analyse a few, it makes sense to store the dataset in a RDBMS such as postgreSQL and let R fetch the relevant variables as needed.
On a debian-based system here is what you need to do to get R and postgreSQL to cooperate
Install the following packages:
r-cran-rodbc odbc-postgresql
As root execute:
odbcinst -i -d -f /usr/share/psqlodbc/odbcinst.ini.template
As your normal user, create a database, I used the name "foo" for this tutorial. PostgreSQL has its own user/priviledges system, which means that any user who is to use postgresql must be "blessed" to do so. The procedure for giving users access to postgresql is spelled out in the debian documentation of postgresql file:///usr/share/doc/postgresql-common/README.Debian.gz.
createdb foo
create a file ~/.odbc.ini
with the following contents:
[the foo] Driver = PostgreSQL Unicode Database = foo ReadOnly = 0
The phrase "PostgreSQL Unicode" refers to, and must be exactly the same as, a name in /etc/odbcinst.ini
Now, in R, load the RODBC module
library(RODBC)
The installation is now complete. To use it, create a connection
, also know as a channel
, with the function odbcConnect()
ch <- odbcConnect("foo")
Make a toy data.frame and save it as a table:
d <- data.frame(c("a","b","c","delta","b","a"), c(1,4,20,3,4,1), c(7,6,2,4,15,33), c("The", "quick", "fox", "jumps", "jumps", "quick"), c(2,4,3,2,5,3)) names(d) <- c("foo","pengar", "harlangd", "hopp", "japp") sqlSave(ch, d, tablename = "d", rownames = F, colnames = T)
To see the that the table really exists within postgreSQL
, in a terminal issue:
echo "select * from d" | isql -m10 foo
From within R, drob the toy table:
sqlDrop(ch, "d", errors = F)
When you are done, you should close the connection:
close(ch)
For more info, read the vignette of RODBC, which in Debian is located here: file:///usr/lib/R/site-library/RODBC/doc/RODBC.pdf (note: the link will only work after you have installed the r-cran-rodbc package on you own computer).