R中是否有像bigmemory这样的程序包可以处理大型列表对象?
我知道R包bigmemory在处理大型矩阵和数据帧时效果很好.但是,我想知道是否有任何软件包或任何方法可以有效地处理大型列表.
I know that the R package bigmemory works great in dealing with large matrices and data frames. However, I was wondering if there is any package or any ways to efficiently work with large list.
具体来说,我创建了一个列表,其元素为矢量.我有一个for循环,在每次迭代过程中,多个值都附加到该列表中的选定元素(向量)上.最初,它运行速度很快,但是当迭代次数超过10000时,它会逐渐变慢(一次迭代大约需要一秒钟).我将经历大约70000至80000次迭代,此后列表会很大.
Specifically, I created a list with its elements being vectors. I have a for loop and during each iteration, multiple values were appended to a selected element in that list (a vector). At first, it runs fast, but when the iteration is over maybe 10000, it slows down gradually (one iteration takes about a second). I'm going to go through about 70000 to 80000 iterations, and the list would be so large after that.
所以我只是想知道bigmemory包中是否有诸如big.list之类的big.matrix这样的东西可以加快整个过程.
So I was just wondering if there is something like big.list as the big.matrix in the bigmemory package that could speed up this whole process.
谢谢!
我不确定这是否有帮助,但是您可以使用filehash
包以交互方式使用磁盘上的列表.
I'm not really sure if this a helpful answer, but you can interactively work with lists on disk using the filehash
package.
例如,下面的代码创建了一个磁盘数据库,将一个预先分配的空列表分配给数据库,然后运行一个函数(获取当前时间)以将该列表填充到数据库中.
For example here's some code that makes a disk database, assigns a preallocated empty list to the database, then runs a function (getting the current time) that fills the list in the database.
# how many items in the list?
n <- 100000
# setup database on disk
dbCreate("testDB")
db <- dbInit("testDB")
# preallocate vector in database
db$time <- vector("list", length = n)
# run function using disk object
for(i in 1:n) db$time[[i]] <- Sys.time()
在此过程中几乎没有使用RAM,但是由于磁盘I/O恒定,它非常很慢(比在某些测试中在RAM中慢两个数量级). .因此,我不确定此方法是否可以很好地解决如何加快处理大对象的问题.
There is hardly any use of RAM during this process, however it is VERY slow (two orders of magnitude slower than doing it in RAM on some of my tests) due to constant disk I/O. So I'm not sure that this method is a good answer to the question of how you can speed up working on big objects.