Extracting info from html to R

My employer provides information on the course participants in the courses I teach using a web-plattform. I wanted to group the students into randomized groups, so I needed their names as data in R. While this is indeed a very specific request, I think I will need to do it again, and similar approaches might be warranted in other cases.

  grep td names.html  | grep dynamic-data | unhtml > names.txt
  
bar <- read.table(file = "names.txt", sep = "\n")
personal.name <- bar[seq(from = 1, to = nrow(bar), by = 2),]
surname <- bar[seq(from = 2, to = nrow(bar), by = 2),]
my.names <- paste(personal.name, surname)

Note to self

my.index <- sample.int(length(namn))
my.major.groups <- as.numeric(cut(1:length(namn), breaks = 5))
my.minor.groups <- sapply(table(my.major.groups), function(x) {as.numeric(cut(1:x, breaks = 5))})
my.matrix <- data.frame(namn[my.index], unlist(my.minor.groups))
my.matrix$major <- as.numeric(substr(rownames(my.matrix), 1, 1))
my.matrix$minor <- LETTERS[my.matrix$unlist.my.minor.groups]
my.matrix <- my.matrix[,c(1,3,4)]
colnames(my.matrix) <- c("Namn", "Grupp", "Undergrupp")
write.csv2(my.matrix, file = "~/groups.csv", row.names = FALSE)

comments powered by Disqus


Back to the index

Blog roll

R-bloggers, Debian Weekly
Valid XHTML 1.0 Strict [Valid RSS] Valid CSS! Emacs Muse Last modified: oktober 17, 2019