在追加模式下加载使用 numpy.save 保存的数组

问题描述：

我在追加模式下使用 numpy.save() 保存数组:

I save arrays using numpy.save() in append mode:

f = open("try.npy", 'ab')
sp.save(f,[1, 2, 3, 4, 5])
sp.save(f,[6, 7, 8, 9, 10])
f.close()

然后我可以在 LIFO 模式下加载数据吗?即，如果我现在想加载 6-10 数组，是否需要加载两次(使用 b):

Can I then load the data in LIFO mode? Namely, if I wish to now load the 6-10 array, do I need to load twice (use b):

f = open("try.npy", 'r')
a = sp.load(f)
b = sp.load(f)
f.close()

或者我可以直接加载第二个附加的保存吗?

or can I straightforward load the second appended save?

答

我有点惊讶这种顺序保存和加载的工作方式.我认为它没有记录(请纠正我).但显然每个 save 都是一个自包含单元，load 读取到该单元的末尾，而不是文件的末尾.

I'm a little surprised that this sequential save and load works. I don't think it is documented (please correct me). But evidently each save is a self contained unit, and load reads to the end of that unit, as opposed to the end of the file.

将每个load 视为一个readline.你不能只读取文件的最后一行；你必须在它之前阅读所有的内容.

Think of each load as a readline. You can't read just the last line of a file; you have to read all the ones before it.

好吧 - 有一种读取最后一个的方法 - 使用 seek 将读取的文件移动到特定点.但要做到这一点，您必须确切知道所需块的开始位置.

Well - there is a way of reading the last - using seek to move the file read to a specific point. But to do that you have to know exactly where the desired block starts.

np.savez 是将多个数组保存到一个文件，或者更确切地说是一个 zip 存档的预期方式.

np.savez is the intended way of saving multiple arrays to a file, or rather to a zip archive.

save 保存两部分，一个包含 dtype、shape 和 strides 等信息的头部，以及一个数组数据缓冲区的副本.nbytes 属性给出了数据缓冲区的大小.至少数字和字符串数据类型是这种情况.

save saves two parts, a header that contains information like dtype, shape and strides, and a copy of the array's data buffer. The nbytes attribute gives the size of the data buffer. At least this is the case for numeric and string dtypes.

save 文档有一个使用打开文件的示例 - 使用 seek(0) 来回退文件以供 load 使用.

save doc has an example of using an opened file - with seek(0) to rewind the file for use by load.

np.lib.npyio.format 有更多关于保存格式的信息.看起来可以通过读取头几个字节来确定头的长度.您可能可以使用模块中的函数来执行所有这些读取和计算.

np.lib.npyio.format has more information on the saving format. Looks like it is possible to determine the length of the header by reading its first few bytes. You could probably use functions in the module to perform all these reads and calculations.

如果我从示例中读取整个文件，我会得到:

If I read the whole file from the example, I get:

In [696]: f.read()
Out[696]: 
b"\x93NUMPY\x01\x00F\x00
{'descr': '<i4', 'fortran_order': False, 'shape': (5,), }\n
 \x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00\x05\x00\x00\x00
\x93NUMPY\x01\x00F\x00
{'descr': '<i4', 'fortran_order': False, 'shape': (5,), }\n
 \x06\x00\x00\x00\x07\x00\x00\x00\x08\x00\x00\x00\t\x00\x00\x00\n\x00\x00\x00"

我添加了换行符以突出显示此文件的不同部分.请注意，每个 save 都以 \x93NUMPY 开头.

I added line breaks to highlight the distinct pieces of this file. Notice that each save starts with \x93NUMPY.

使用打开的文件 f，我可以读取标题(或第一个数组):

With an open file f, I can read the header (or the first array) with:

In [707]: np.lib.npyio.format.read_magic(f)
Out[707]: (1, 0)
In [708]: np.lib.npyio.format.read_array_header_1_0(f)
Out[708]: ((5,), False, dtype('int32'))

我可以加载数据:

In [722]: np.fromfile(f, dtype=np.int32, count=5)
Out[722]: array([1, 2, 3, 4, 5])

我从 np.lib.npyio.format.read_array 函数代码中推导出来.

I deduced this from np.lib.npyio.format.read_array function code.

现在文件位于:

In [724]: f.tell()
Out[724]: 100

这是下一个数组的头部:

which is the head of the next array:

In [725]: np.lib.npyio.format.read_magic(f)
Out[725]: (1, 0)
In [726]: np.lib.npyio.format.read_array_header_1_0(f)
Out[726]: ((5,), False, dtype('int32'))
In [727]: np.fromfile(f, dtype=np.int32, count=5)
Out[727]: array([ 6,  7,  8,  9, 10])

我们在 EOF.

并且知道int32有4个字节，我们可以计算出数据占用了20个字节.所以我们可以通过读取头部，计算数据块的大小，然后seek 跳过一个数组来到达下一个数组.对于小阵列来说，工作不值得；但对于非常大的，它可能会有用.

And knowing that int32 has 4 bytes, we can calculate that the data occupies 20 bytes. So we could skip over an array by reading the header, calculating the size of the data block, and seek past it to get to the next array. For small arrays that work isn't worth it; but for very large ones, it may be useful.

在追加模式下加载使用 numpy.save 保存的数组

相关推荐