如何获取具有前10个最高值的变量的列名称？

问题描述：

如果我有一个具有600列（变量）和10行的数据框架（sum_clus），没有NA，都是数值，那么我如何创建5个新的变量，给出前5个变量的列名在那一行？

If I have a data.frame(sum_clus) with 600 columns(variables) and 10 rows which have no NA's and are all numeric values, how can I create 5 new variables that give me the column names of the top 5 variables in that row?

例如。

max <- apply(sum_clus ,1, max)    
for(ii in 1:10) sum_clus$max[ii] <- colnames(sum_clus)[which(sum_clus[ii , ] 
== sum_clus[ii, sum_clus[ii,] == max[ii]])]

上面的代码帮助我创建一个变量sum_clus $ max，它给出了每行中max变量的列名。同样，我如何获得5个这样的变量，给出了前5个变量的列名称？ sum_clus $ max，sum_clus $ second_but_max等等。

This above code helped me create a variable sum_clus$max which gives me the column name of the max variable in each row. Similarly, how can I get 5 such variables that give me the column names of the top 5 variables? sum_clus$max, sum_clus$second_but_max, and so on..

提前感谢

答

这是一个类似的解决方案，使用（i）循环而不是 apply ;和（ii）等级而不是订单。

Here's a similar solution, using (i) a loop instead of apply; and (ii) rank instead of order.

set.seed(1)
n_i   = 10
n_ii  = 600
n_top = 5
df <- data.frame(matrix(runif(n_ii*n_i), ncol = n_ii))

out <- matrix("",n_top,n_i)
for (i in 1:n_i){
    colranks <- rank(df[i,])
    out[,i] <- names(sort(colranks)[n_ii:(n_ii-(n_top-1))])
}
#      [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]   [,9]   [,10] 
# [1,] "X369" "X321" "X348" "X415" "X169" "X258" "X55"  "X182" "X99"  "X78" 
# [2,] "X42"  "X295" "X563" "X173" "X377" "X31"  "X246" "X353" "X259" "X384"
# [3,] "X98"  "X440" "X371" "X207" "X429" "X292" "X433" "X437" "X123" "X558"
# [4,] "X13"  "X193" "X396" "X78"  "X543" "X228" "X211" "X2"   "X583" "X508"
# [5,] "X35"  "X364" "X249" "X33"  "X388" "X405" "X458" "X252" "X569" "X456"

具有应用的单行模拟是

apply(df,1,function(x)names(sort(rank(x))))[600:596,]

如何获取具有前10个最高值的变量的列名称？

相关推荐