


"a" is a list and "b" is a matrix.


> a
     [,1] [,2] [,3] [,4]
[1,]    0    0    1    0
[2,]    2    2    0    0
[3,]    0    0    0    2
[4,]    1    0    0    2

     [,1] [,2] [,3] [,4]
[1,]    0    0    1    0
[2,]    1    0    0    0
[3,]    0    1    0    0

     [,1] [,2] [,3] [,4]
[1,]    0    0    0    1
[2,]    0    1    0    0
[3,]    0    0    2    1
[4,]    0    0    0    1
[5,]    2    0    2    0

> b
     [,1] [,2] [,3] [,4]
[1,]    2    1    1    2
[2,]    2    2    1    2
[3,]    1    1    1    1
[4,]    1    1    1    2
[5,]    1    2    1    1
[6,]    2    1    2    1


There are 3 objects in list "a". I want to test whether all the non-zero elements in each object in the list "a" match with the corresponding position of the same row in matrix "b". If matched, output the matched row number of b.


     [,1] [,2] [,3] [,4]
[1,]    0    0    1    0
[2,]    1    0    0    0
[3,]    0    1    0    0

我们可以看到第一行中的非零数字为1,它位于该行的第三位,它可以匹配矩阵"b"的1-5行,即矩阵中的非零数字.第二行是1,它位于该行的第一位,它可以匹配3-5行的矩阵"b",第三行中的非零数字是1,它位于第二行.该行可以匹配矩阵"b"的3-4行.因此,只有矩阵"b"的第三行或第四行可以匹配该对象中的所有行,因此输出结果为"3 4".

We can see the non-zero number in the 1st row is 1, and it locates in the third place of the row, it can match the 1-5 rows of matrix "b", the non-zero number in the 2nd row is 1, and it locates in the first place of this row, it can match the 3-5 rows of matrix "b", the non-zero number in the 3rd row is 1, and it locates in the second place of this row, it can match the 3-4 rows of matrix "b". so only the 3rd or 4th row of Matrix "b" can match all the rows in this object, so the output result is "3 4".


temp<-Map(function(y) t(y), Map(function(a) 
                 apply(b,1, function(y) identical(x[x!=0],y[x!=0]))}),a))
lapply(temp, function(a) which(apply(a,2,prod)==1))



[1] 3 4

[1] 6


It is right. but I wonder whether there is more quick code to handle this question?


Having a few columns and trying to take advantage of columns with > 1 unique values or no non-zero values to reduce computations:

ff = function(a, b)
    i = seq_len(nrow(b))  #starting candidate matches
    for(j in seq_len(ncol(a))) {
        aj = a[, j]
        nzaj = aj[aj != 0L]
        if(!length(nzaj)) next  #if all(a[, j] == 0) save some operations
        if(sum(tabulate(nzaj) > 0L) > 1L) return(integer())  #if no unique values in a column break looping 
        i = i[b[i, j] == nzaj[[1L]]]  #update candidate matches

lapply(a, function(x) ff(x, b))
#[1] 3 4
#[1] 6


With data of your actual size:

a2 = replicate(300L, matrix(sample(0:3, 20 * 5, TRUE, c(0.97, 0.01, 0.01, 0.01)), 20, 5), simplify = FALSE)
b2 = matrix(sample(1:3, 15 * 5, TRUE), 15, 5)
identical(OP(a2, b2), lapply(a2, function(x) ff(x, b2)))
#[1] TRUE
microbenchmark::microbenchmark(OP(a2, b2), lapply(a2, function(x) ff(x, b2)), times = 50)
#Unit: milliseconds
#                              expr        min         lq       mean     median         uq       max neval cld
#                        OP(a2, b2) 686.961815 730.840732 760.029859 753.790094 785.310056 863.04577    50   b
# lapply(a2, function(x) ff(x, b2))   8.110542   8.450888   9.381802   8.949924   9.872826  15.51568    50  a


OP = function (a, b) 
    temp = Map(function(y) t(y), Map(function(a) apply(a, 1, 
        function(x) {
            apply(b, 1, function(y) identical(x[x != 0], y[x != 
        }), a))
    lapply(temp, function(x) which(apply(x, 2, prod) == 1))