Other pages about at this site: Notes about R from a former SPSS user UTF-8 and importing SPSS files Installing FactoMineR
Other sections in this document: Using R
For a long time I have wanted to start using R
for statistical computation and visualisation. Here I will document the steps needed to get a R
environment with extra functionality (functions not included in the precomplied binaries for R
that Debian provide) installed.
In the example below, I avoid installing binaries in /home
and give write permissions to "hans" in /usr/local/lib/R
.
The extra functionality is in the package ca
which I wanted for correspondence analysis.
# apt-get install r-base r-base-dev r-cran-rgl # chown -R hans:staff /usr/local/lib/R $ R > install.packages("ca","/usr/local/lib/R/site-library") > install.packages("FactoMineR","/usr/local/lib/R/site-library") > install.packages("pgfSweave","/usr/local/lib/R/site-library/", repos = "http://cran.r-project.org")
Packages installed by apt-get install r-cran-cmdr
r-cran-abind r-cran-car r-cran-effects r-cran-lmtest r-cran-multcomp r-cran-mvtnorm r-cran-rcmdr
r-cran-relimp r-cran-sandwich r-cran-sm r-cran-strucchange r-cran-zoo
However, a package required by Rcmdr is not automatically installed: r-cran-rodbc. Having the data in a RDBMS is cool, and I already have postgresql installed on this machine, so the package odbc-postgresql looks promising (I haven't installed it yet, though).
Packages installed by apt-get install r-base-dev, which can be removed when the extra packages are installed
build-essential gfortran gcc g++ libncurses5-dev libreadline5-dev libjpeg62-dev libpcre3-dev libpng12-dev zlib1g-dev libbz2-dev refblas3-dev atlas3-base-dev
And for package rgl
:
libglu1-mesa-dev
update.packages(lib.loc = "/usr/local/lib/R/site-library/", repos = "http://cran.r-project.org")
remove.packages("ca")
local({ # add MASS to the default packages, set a CRAN mirror, set a directory into which local packages will be installed old <- getOption("defaultPackages"); r <- getOption("repos") r["CRAN"] <- "http://ftp.sunet.se/pub/lang/CRAN/" options(defaultPackages = c(old, "pgfSweave"), repos = r) ## options(repos = r) ## set the target dir for installation of local packages lib.loc = "/usr/local/lib/R/site-library/" ## set the width cols <- 145 if(nzchar(cols)) options(width = as.integer(cols)) })
If your system default compiler cannot be used (e.g. the current version of it has bugs), you can define another one to use with the $CC variable in a Makevars-file
.
(From http://stat.ethz.ch/R-manual/R-devel/doc/manual/R-admin.html#Customizing-package-compilation)The R system and package-specific compilation flags can be overridden or added to by setting the appropriate Make variables in the personal file HOME/.R/Makevars-R_PLATFORM (but HOME/.R/Makevars.win or HOME/.R/Makevars.win64 on Windows), or if that does not exist, HOME/.R/Makevars, where ‘R_PLATFORM’ is the platform for which R was built, as available in the platform component of the R variable R.version.
R.version
_ platform i486-pc-linux-gnu [...] version.string R version 2.13.0 (2011-04-13)
$ cat .R/Makevars-i486-pc-linux-gnu CC=gcc-4.4
Now, gcc-4.4 rather than the default gcc-4.6 will be used
install.packages("lme4","/usr/local/lib/R/site-library/", repos = "http://cran.r-project.org")
... gcc-4.4 -I/usr/share/R/include -I"/usr/lib/R/library/Matrix/include" -I"/usr/lib/R/library/stats/include" -fpic -std=gnu99 -O3 -pipe -g -c init.c -o init.o gcc-4.4 -I/usr/share/R/include -I"/usr/lib/R/library/Matrix/include" -I"/usr/lib/R/library/stats/include" -fpic -std=gnu99 -O3 -pipe -g -c lmer.c -o lmer.o
A sample session in R
:
$ R > library(ca) > data(smoke) > plot(ca(smoke))
Here is how to read data from an SPSS-file.
library(foreign)
myobj <- read.spss("yrken.sav", to.data.frame=TRUE)
Starting a graphical user interface:
library(Rcmdr)
Basic statistical functions:
.Table (- myobj$KONSANDE)
Using only a subset of a dataframe:
mysmalltable <- data.frame(myobj[1:2], myobj[5:5])
Simple correspondence analysis on a subset of a dataframe:
plot(ca(na.omit(data.frame(myobj[1:1], myobj[56:58], row.names = "YRKE"))), what = c("all", "all"), labels = c('2','2'))
Add supplementary points to the graph:
plot(ca(na.omit(data.frame(myobj[1:1], myobj[5:5], myobj[11:11], myobj[13:13], myobj[56:58], row.names = "YRKE")), supcol = 1:3), what = c("all", "all"), labels = c('2','2'))
Save the graph to a file by encapsulating the plot command within a pair of device setting commands:
png(file="politik-enkel.png", width=1600, height=1200)
plot(ca(na.omit(data.frame(myobj[1:1], myobj[56:58], row.names = "YRKE"))), what = c("all", "all"), labels = c('2','2'))
dev.off()
png()
opens the file for writing, but the content of the file is written by dev.off()
.
# A better way of importing, this names the rows with the variable YRKE, which can then be used in correspondence analysis (if rows are named with a number and "YRKE" is a variable, it can not be used in simple CA since it would appear as a factor). myobj <- data.frame(read.spss("yrken.sav", to.data.frame=TRUE), row.names = "YRKE")
On the other hand, when using multiple joint correspondence analysis, only factors are allowed so there is no need for row.names
i486-pc-linux-gnu