将列表与R中的矩阵行匹配

问题描述:

"a"是列表,"b"是矩阵.

"a" is a list and "b" is a matrix.

a<-list(matrix(c(0,2,0,1,0,2,0,0,1,0,0,0,0,0,2,2),4), 
        matrix(c(0,1,0,0,0,1,1,0,0,0,0,0),3),
        matrix(c(0,0,0,0,2,0,1,0,0,0,0,0,2,0,2,1,0,1,1,0),5))
b<-matrix(c(2,2,1,1,1,2,1,2,1,1,2,1,1,1,1,1,1,2,2,2,1,2,1,1),6) 

> a
[[1]]
     [,1] [,2] [,3] [,4]
[1,]    0    0    1    0
[2,]    2    2    0    0
[3,]    0    0    0    2
[4,]    1    0    0    2

[[2]]
     [,1] [,2] [,3] [,4]
[1,]    0    0    1    0
[2,]    1    0    0    0
[3,]    0    1    0    0

[[3]]
     [,1] [,2] [,3] [,4]
[1,]    0    0    0    1
[2,]    0    1    0    0
[3,]    0    0    2    1
[4,]    0    0    0    1
[5,]    2    0    2    0

> b
     [,1] [,2] [,3] [,4]
[1,]    2    1    1    2
[2,]    2    2    1    2
[3,]    1    1    1    1
[4,]    1    1    1    2
[5,]    1    2    1    1
[6,]    2    1    2    1

列表"a"中有3个对象.我想测试列表"a"中每个对象中的所有非零元素是否与矩阵"b"中同一行的对应位置匹配.如果匹配,则输出匹配的行号b.

There are 3 objects in list "a". I want to test whether all the non-zero elements in each object in the list "a" match with the corresponding position of the same row in matrix "b". If matched, output the matched row number of b.

例如,第二个对象是

[[2]]
     [,1] [,2] [,3] [,4]
[1,]    0    0    1    0
[2,]    1    0    0    0
[3,]    0    1    0    0

我们可以看到第一行中的非零数字为1,它位于该行的第三位,它可以匹配矩阵"b"的1-5行,即矩阵中的非零数字.第二行是1,它位于该行的第一位,它可以匹配3-5行的矩阵"b",第三行中的非零数字是1,它位于第二行.该行可以匹配矩阵"b"的3-4行.因此,只有矩阵"b"的第三行或第四行可以匹配该对象中的所有行,因此输出结果为"3 4".

We can see the non-zero number in the 1st row is 1, and it locates in the third place of the row, it can match the 1-5 rows of matrix "b", the non-zero number in the 2nd row is 1, and it locates in the first place of this row, it can match the 3-5 rows of matrix "b", the non-zero number in the 3rd row is 1, and it locates in the second place of this row, it can match the 3-4 rows of matrix "b". so only the 3rd or 4th row of Matrix "b" can match all the rows in this object, so the output result is "3 4".

我的尝试代码如下:

temp<-Map(function(y) t(y), Map(function(a) 
           apply(a,1,function(x){
                 apply(b,1, function(y) identical(x[x!=0],y[x!=0]))}),a))
lapply(temp, function(a) which(apply(a,2,prod)==1))

结果如下:

[[1]]
integer(0)

[[2]]
[1] 3 4

[[3]]
[1] 6

是的.但是我想知道是否有更快速的代码来处理这个问题?

It is right. but I wonder whether there is more quick code to handle this question?

有几列,并尝试利用具有1个以上唯一值或没有非零值的列来减少计算量:

Having a few columns and trying to take advantage of columns with > 1 unique values or no non-zero values to reduce computations:

ff = function(a, b)
{
    i = seq_len(nrow(b))  #starting candidate matches
    for(j in seq_len(ncol(a))) {
        aj = a[, j]
        nzaj = aj[aj != 0L]
        if(!length(nzaj)) next  #if all(a[, j] == 0) save some operations
        if(sum(tabulate(nzaj) > 0L) > 1L) return(integer())  #if no unique values in a column break looping 
        i = i[b[i, j] == nzaj[[1L]]]  #update candidate matches
    }

    return(i)
}
lapply(a, function(x) ff(x, b))
#[[1]]
#integer(0)
#
#[[2]]
#[1] 3 4
#
#[[3]]
#[1] 6

使用您的实际大小的数据:

With data of your actual size:

set.seed(911)
a2 = replicate(300L, matrix(sample(0:3, 20 * 5, TRUE, c(0.97, 0.01, 0.01, 0.01)), 20, 5), simplify = FALSE)
b2 = matrix(sample(1:3, 15 * 5, TRUE), 15, 5)
identical(OP(a2, b2), lapply(a2, function(x) ff(x, b2)))
#[1] TRUE
microbenchmark::microbenchmark(OP(a2, b2), lapply(a2, function(x) ff(x, b2)), times = 50)
#Unit: milliseconds
#                              expr        min         lq       mean     median         uq       max neval cld
#                        OP(a2, b2) 686.961815 730.840732 760.029859 753.790094 785.310056 863.04577    50   b
# lapply(a2, function(x) ff(x, b2))   8.110542   8.450888   9.381802   8.949924   9.872826  15.51568    50  a

OP是:

OP = function (a, b) 
{
    temp = Map(function(y) t(y), Map(function(a) apply(a, 1, 
        function(x) {
            apply(b, 1, function(y) identical(x[x != 0], y[x != 
                0]))
        }), a))
    lapply(temp, function(x) which(apply(x, 2, prod) == 1))
}