在R中绘制非常大的数据集
问题描述:
如何在R中绘制非常大的数据集?
How can I plot a very large data set in R?
我想使用箱形图,小提琴图或类似图形.无法将所有数据放入内存中.我可以逐步阅读并计算绘制这些图所需的摘要吗?如果可以,怎么办?
I'd like to use a boxplot, or violin plot, or similar. All the data cannot be fit in memory. Can I incrementally read in and calculate the summaries needed to make these plots? If so how?
答
In supplement to my comment to Dmitri answer, a function to calculate quantiles using ff
big-data handling package:
ffquantile<-function(ffv,qs=c(0,0.25,0.5,0.75,1),...){
stopifnot(all(qs<=1 & qs>=0))
ffsort(ffv,...)->ffvs
j<-(qs*(length(ffv)-1))+1
jf<-floor(j);ceiling(j)->jc
rowSums(matrix(ffvs[c(jf,jc)],length(qs),2))/2
}
这是一种精确的算法,因此它使用排序-可能会花费很多时间.
This is an exact algorithm, so it uses sorting -- and thus may take a lot of time.