This text will explain two things, firstly how to find items in a vector which differ less than a treshold, secondly how to find coordinates from two vectors that refers to locations closer to each other than a treshold
Consider the vector N, that holds 10 items
N [1] 6410272 6404200 6407008 6393624 6401948 6403341 6397321 6412530 6385294 [10] 6407825
Which items are closer than treshold x to a another item in the vector? Let's define x to 1.500.
x <- 1500 order(N)[which(diff(N[order(N)]) < x)] [1] 5 6 3
So item 5, 6 and 3 differ by no more than 1.500 to at least one other item in the vector N.
Which items constitute the pairs, in which 5, 6, 3 are parts?
sapply(which(diff(N[order(N)]) < x), function (y) {order(N)[c(y, y+1)]}) [,1] [,2] [,3] [1,] 5 6 3 [2,] 6 2 10
So item 5 happens to be a pair with item 6, which is also in a pair with item 2, lastly, item 3 is in a pair with item 10.
If you need the pairs ordered:
d <- sapply(which(diff(N[order(N)]) < x), function (y) {order(N)[c(y, y+1)]}) e <- sapply(seq(1, length(d), 2), function (y) {sort(d[c(y,y+1)])}) e [,1] [,2] [,3] [1,] 5 2 3 [2,] 6 6 10
For each pair, what is the difference?
d <- sapply(which(diff(N[order(N)]) < x), function (y) {order(N)[c(y, y+1)]}) abs(diff(matrix(N[sapply(seq(1, length(d), 2), function (y) {sort(d[c(y,y+1)])})], 2))) [,1] [,2] [,3] [1,] 1393 859 817
Suppose you have two unordered vectors with coordinates, N for values on the north-south axis, and E for values on the east-west axis. How do you find pairs of coordinate pairs that are closer than x to each other? I met with a real-world application of this problem when I investigated arson, where geographical distance was very important.
E [1] 1268851 1268200 1267224 1274877 1271267 1270656 1269688 1275535 1278333 10] 1259585
We will use the matrix of indexes, e, as a means to get the differences in the other vector. The eventuality that there exists pairs in E, other than the ones found i N is irrelevant since those pairs differ more than threshold x in N.
Here are the values in E of the pairs from N
matrix(E[e], 2) [,1] [,2] [,3] [1,] 1271267 1268200 1267224 [2,] 1270656 1270656 1259585
Calculate the difference in E
abs(diff(matrix(E[e], 2)))
The same thing for N
abs(diff(matrix(N[e], 2)))
Add the resulting vectors
abs(diff(matrix(E[e], 2))) + abs(diff(matrix(N[e], 2)))
Which pairs have a total sum of difference less than x = 3000 ?
x = 3000 which(abs(diff(matrix(E[e], 2))) + abs(diff(matrix(N[e], 2))) < y) [1] 1
Only the first pair.
Everything in one block
x = 1 d <- sapply(which(diff(N[order(N)]) < x), function (y) {order(N)[c(y, y+1)]}) e <- sapply(seq(1, length(d), 2), function (y) {sort(d[c(y,y+1)])}) f <- which(abs(diff(matrix(E[e], 2))) + abs(diff(matrix(N[e], 2))) < x) matrix(bränder$Adress[e[,f]], ncol = 2, byrow = T)