If you want to find the "corresponding" items in two vectors where both vectors have duplicated values, but you want to ignore any duplicate in vector b that is not duplicated in vector a?
Think of the elements as a kind of identifier or describing cases, where vector b is part of some larger structure with other information that should be merged with a.
a <- c("foo", "bar", "bar", "bal") b <- c("faa", "foo", "boo", "bar", "bar", "bad", "bal", "bal", "baz")
The correct vector of corresponding positions is 2, 4, 5, 7 or 2, 4, 5, 8. Note that "bal" only appears once in a, but two times in b (at positions 7 and 8), which makes two solutions correct.
match()
and %in%
is unsufficient here:
> match(a, b)
[1] 2 4 4 7
> which(b %in% a)
[1] 2 4 5 7 8
a <- c("foo", "bar", "bar", "bal") b <- c("faa", "foo", "boo", "bar", "bar", "bad", "bal", "bal", "baz") tmp.a <- a tmp.b <- b my.result <- rep(NA, times=length(tmp.a)) while(length(which(is.na(my.result) == FALSE)) < length(tmp.a)){ ## this condition ## assumes that all elements in a have at least one corresponding element in b ## but that might perfectly fine, e.g. if a is derived from b in the first place. remove.these.from.b <- unique(match(na.omit(tmp.a), tmp.b)) remove.these.from.a <- match(tmp.b[remove.these.from.b], tmp.a) tmp.b[remove.these.from.b] <- NA tmp.a[remove.these.from.a] <- NA my.result[remove.these.from.a] <- remove.these.from.b ## store the matches } > my.result [1] 2 4 5 7