如何在附加模式下使用numpy.save

问题描述：

我使用numpy.save和numpy.load来对项目中的大数据集进行读/写.我意识到numpy.save不适用于附加模式.例如(Python 3):

I use numpy.save and numpy.load to R/W large datasets in my project. I realized that that numpy.save does not apply append mode. For instance (Python 3):

import numpy as np

n = 5
dim = 5
for _ in range(3):
    Matrix = np.random.choice(np.arange(10, 40, dim), size=(n, dim))
    np.save('myfile', Matrix)

M1 = np.load('myfile.npy', mmap_mode='r')[1:7].copy()
print(M1)

使用切片[1:7]加载数据的特定部分是不正确的，因为np.save不会追加.我发现了这个答案，但看起来很奇怪(file(filename, 'a')什么是文件file ??).是否有一个聪明的解决方法可以在不使用其他列表的情况下实现这一目标?

Loading specific portion of data using slicing [1:7] is not correct because the np.save does not append. I found this answer but it looks strange ( file(filename, 'a') what is file file??). Is there a clever workaround to achieve that without using additional lists?

答

npy文件格式无法正常工作. npy文件对单个数组进行编码，并带有指定形状，dtype和其他元数据的标头.您可以在 npy文件格式规范中看到 NumPy文档.

The npy file format doesn't work that way. An npy file encodes a single array, with a header specifying shape, dtype, and other metadata. You can see the npy file format spec in the NumPy docs.

对附加数据的支持不是npy格式的设计目标.即使您设法使numpy.save追加到现有文件中而不是覆盖内容，结果也不会是有效的npy文件.要使用其他数据生成有效的npy文件，将需要重写标头，并且由于这可能需要重新调整标头的大小，因此可能会移动数据并需要重写整个文件.

Support for appending data was not a design goal of the npy format. Even if you managed to get numpy.save to append to an existing file instead of overwriting the contents, the result wouldn't be a valid npy file. Producing a valid npy file with additional data would require rewriting the header, and since this could require resizing the header, it could shift the data and require the whole file to be rewritten.

NumPy没有将数据追加到现有npy文件的工具，除了将数据读入内存，构建新数组并将新数组写入文件之外.如果要保存更多数据，请考虑编写新文件或选择其他文件格式.

NumPy comes with no tools to append data to existing npy files, beyond reading the data into memory, building a new array, and writing the new array to a file. If you want to save more data, consider writing a new file, or pick a different file format.

如何在附加模式下使用numpy.save

相关推荐