如何在不缓存的情况下测量文件读取速度?

问题描述:

我的java程序花了大部分时间阅读一些文件,我想优化它,例如,通过使用并发,预取,内存映射文件,或者其他什么。

My java program spends most time by reading some files and I want to optimize it, e.g., by using concurrency, prefetching, memory mapped files, or whatever.

没有基准测试的优化是没有意义的,所以我进行了基准测试。但是,在基准测试期间,整个文件内容都缓存在RAM中,与实际运行不同。因此,基准测试的运行时间要小得多,而且很可能与现实无关。

Optimizing without benchmarking is a non-sense, so I benchmark. However, during the benchmark the whole file content gets cached in RAM, unlike in the real run. Thus the run-times of the benchmark are much smaller and most probably unrelated to the reality.

我需要以某种方式告诉操作系统(Linux)不要缓存文件内容,或者更好地在每次基准测试运行之前清除缓存。或者可能消耗大部分可用内存(32 GB),因此只有一小部分文件内容适合。如何操作?

I'd need to somehow tell the OS (Linux) not to cache the file content, or better to wipe out the cache before each benchmark run. Or maybe consume most of the available RAM (32 GB), so that only a tiny fraction of the file content fits in. How to do it?

我正在使用卡尺用于基准测试,但在这种情况下,我认为没有必要(它绝不是一个microbenchmark)我不确定这是个好主意。

I'm using caliper for benchmarking, but in this case I don't think its necessary (it's by no means a microbenchmark) and I'm not sure it's a good idea.

清除Linux文件缓存

Clear the Linux file cache

sync && echo 1 > /proc/sys/vm/drop_caches

创建一个使用所有RAM的大文件

Create a large file that uses all your RAM

dd if=/dev/zero of=dummyfile bs=1024 count=LARGE_NUMBER

(完成后别忘了删除 dummyfile 。)

(don't forget to remove dummyfile when done).