在Python中获取已归档文件夹内容的文件名
我有一个名为gziptest.tar.gz的压缩文件夹,其中包含几个纯文本文件.
I have a compressed folder called gziptest.tar.gz which contains several plaintext files.
我希望能够获取文件名和文件的相应内容,但是gzip库的用法示例并未涵盖此内容.
I'd like to be able to get the filenames and corresponding contents of the files, but the examples of usage for the gzip library don't cover this.
以下代码:
import gzip
in_f = gzip.open('/home/cholloway/gziptest.tar.gz')
print in_f.read()
产生输出:
gzip test/file2000664 001750 001750 00000000016 12621163624 015761 0ustar00chollowaycholloway000000 000000 I like apples
gzip test/file1000664 001750 001750 00000000025 12621164026 015755 0ustar00chollowaycholloway000000 000000 hello world
line two
gzip test/000775 001750 001750 00000000000 12621164026 015035 5ustar00chollowaycholloway000000 000000
我可以使用一些正则表达式来检测新文件的开头并提取文件名,但是我想知道gzip或其他标准python库中是否已存在此功能.
I could use some regular expressions to detect the start of a new file and extract the filename, but I'm wondering if this functionality already exists within gzip or another standard python library.
对于该文件,请勿使用gzip
库.使用 tarfile
库.
For that file, don't use the gzip
library. Use the tarfile
library.
您正在使用的文件是文件test/*
的tar归档文件的gzip压缩.
The file you are working with is the gzip-compression of a tar archive of the files test/*
.
如果只想恢复tar存档,请使用gzip
解压缩文件.生成的文件是(如您所发现的)所需文件的存档.
If you only want to recover the tar archive, then use gzip
to uncompress the file. The resulting file is (as you discovered) an archive of the files you want.
从逻辑上讲,如果要访问tar归档文件中的文件,我们必须首先使用gzip
库恢复tar归档文件,然后使用tarfile
库来恢复文件.
Logically, if you want to access the files inside the tar archive, we must first use the gzip
library to recover the tar archive and then use the tarfile
library to recover the files.
实际上,我们仅使用tarfile
库:tarfile
库将代表您自动调用gzip
库.
Practically, we only use the tarfile
library: the tarfile
library will automatically invoke the gzip
library on your behalf.
我已从示例部分中复制了此示例tarfile
手册页:
I've copied this example from the examples section of the tarfile
man page:
import tarfile
tar = tarfile.open("sample.tar.gz")
tar.extractall()
tar.close()