如何在内存中打开和读取LZMA文件

问题描述:

我有一个大文件,我们称之为 one-csv-file.xz .这是XZ压缩的CSV文件.

I have a giant file, let's call it one-csv-file.xz. It is an XZ-compressed CSV file.

如何在不首先将其解压缩到磁盘的情况下打开并解析文件?如果文件是100 GB,该怎么办?当然,Python无法一次将所有内容读取到内存中.它会分页还是用完内存?

How can I open and parse through the file without first decompressing it to disk? What if the file is, for example, 100 GB? Python cannot read all of that into memory at once, of course. Will it page or run out of memory?

您可以遍历LZMAFile对象

import lzma  # python 3, try lzmaffi in python 2
with open('one-csv-file.xz') as compressed:
    with lzma.LZMAFile(compressed) as uncompressed:
        for line in uncompressed:
            do_stuff_with(line)