如何在附加模式下使用numpy.save
我使用numpy.save
和numpy.load
来对项目中的大数据集进行读/写.我意识到numpy.save
不适用于附加模式.例如(Python 3):
I use numpy.save
and numpy.load
to R/W large datasets in my project. I realized that that numpy.save
does not apply append mode. For instance (Python 3):
import numpy as np
n = 5
dim = 5
for _ in range(3):
Matrix = np.random.choice(np.arange(10, 40, dim), size=(n, dim))
np.save('myfile', Matrix)
M1 = np.load('myfile.npy', mmap_mode='r')[1:7].copy()
print(M1)
使用切片[1:7]
加载数据的特定部分是不正确的,因为np.save
不会追加.我发现了这个答案,但看起来很奇怪(file(filename, 'a')
什么是文件file
??).是否有一个聪明的解决方法可以在不使用其他列表的情况下实现这一目标?
Loading specific portion of data using slicing [1:7]
is not correct because the np.save
does not append. I found this answer but it looks strange ( file(filename, 'a')
what is file file
??). Is there a clever workaround to achieve that without using additional lists?
npy
文件格式无法正常工作. npy
文件对单个数组进行编码,并带有指定形状,dtype和其他元数据的标头.您可以在 npy
文件格式规范中看到 NumPy文档.
The npy
file format doesn't work that way. An npy
file encodes a single array, with a header specifying shape, dtype, and other metadata. You can see the npy
file format spec in the NumPy docs.
对附加数据的支持不是npy
格式的设计目标.即使您设法使numpy.save
追加到现有文件中而不是覆盖内容,结果也不会是有效的npy
文件.要使用其他数据生成有效的npy
文件,将需要重写标头,并且由于这可能需要重新调整标头的大小,因此可能会移动数据并需要重写整个文件.
Support for appending data was not a design goal of the npy
format. Even if you managed to get numpy.save
to append to an existing file instead of overwriting the contents, the result wouldn't be a valid npy
file. Producing a valid npy
file with additional data would require rewriting the header, and since this could require resizing the header, it could shift the data and require the whole file to be rewritten.
NumPy没有将数据追加到现有npy
文件的工具,除了将数据读入内存,构建新数组并将新数组写入文件之外.如果要保存更多数据,请考虑编写新文件或选择其他文件格式.
NumPy comes with no tools to append data to existing npy
files, beyond reading the data into memory, building a new array, and writing the new array to a file. If you want to save more data, consider writing a new file, or pick a different file format.