在追加模式下加载使用 numpy.save 保存的数组
我在追加模式下使用 numpy.save() 保存数组:
I save arrays using numpy.save() in append mode:
f = open("try.npy", 'ab')
sp.save(f,[1, 2, 3, 4, 5])
sp.save(f,[6, 7, 8, 9, 10])
f.close()
然后我可以在 LIFO 模式下加载数据吗?即,如果我现在想加载 6-10 数组,是否需要加载两次(使用 b):
Can I then load the data in LIFO mode? Namely, if I wish to now load the 6-10 array, do I need to load twice (use b):
f = open("try.npy", 'r')
a = sp.load(f)
b = sp.load(f)
f.close()
或者我可以直接加载第二个附加的保存吗?
or can I straightforward load the second appended save?
我有点惊讶这种顺序保存和加载的工作方式.我认为它没有记录(请纠正我).但显然每个 save
都是一个自包含单元,load
读取到该单元的末尾,而不是文件的末尾.
I'm a little surprised that this sequential save and load works. I don't think it is documented (please correct me). But evidently each save
is a self contained unit, and load
reads to the end of that unit, as opposed to the end of the file.
将每个load
视为一个readline
.你不能只读取文件的最后一行;你必须在它之前阅读所有的内容.
Think of each load
as a readline
. You can't read just the last line of a file; you have to read all the ones before it.
好吧 - 有一种读取最后一个的方法 - 使用 seek
将读取的文件移动到特定点.但要做到这一点,您必须确切知道所需块的开始位置.
Well - there is a way of reading the last - using seek
to move the file read to a specific point. But to do that you have to know exactly where the desired block starts.
np.savez
是将多个数组保存到一个文件,或者更确切地说是一个 zip 存档的预期方式.
np.savez
is the intended way of saving multiple arrays to a file, or rather to a zip archive.
save
保存两部分,一个包含 dtype
、shape
和 strides
等信息的头部,以及一个数组数据缓冲区的副本.nbytes
属性给出了数据缓冲区的大小.至少数字和字符串数据类型是这种情况.
save
saves two parts, a header that contains information like dtype
, shape
and strides
, and a copy of the array's data buffer. The nbytes
attribute gives the size of the data buffer. At least this is the case for numeric and string dtypes.
save
文档有一个使用打开文件的示例 - 使用 seek(0)
来回退文件以供 load
使用.
save
doc has an example of using an opened file - with seek(0)
to rewind the file for use by load
.
np.lib.npyio.format
有更多关于保存格式的信息.看起来可以通过读取头几个字节来确定头的长度.您可能可以使用模块中的函数来执行所有这些读取和计算.
np.lib.npyio.format
has more information on the saving format. Looks like it is possible to determine the length of the header by reading its first few bytes. You could probably use functions in the module to perform all these reads and calculations.
如果我从示例中读取整个文件,我会得到:
If I read the whole file from the example, I get:
In [696]: f.read()
Out[696]:
b"\x93NUMPY\x01\x00F\x00
{'descr': '<i4', 'fortran_order': False, 'shape': (5,), }\n
\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00\x05\x00\x00\x00
\x93NUMPY\x01\x00F\x00
{'descr': '<i4', 'fortran_order': False, 'shape': (5,), }\n
\x06\x00\x00\x00\x07\x00\x00\x00\x08\x00\x00\x00\t\x00\x00\x00\n\x00\x00\x00"
我添加了换行符以突出显示此文件的不同部分.请注意,每个 save
都以 \x93NUMPY
开头.
I added line breaks to highlight the distinct pieces of this file. Notice that each save
starts with \x93NUMPY
.
使用打开的文件 f
,我可以读取标题(或第一个数组):
With an open file f
, I can read the header (or the first array) with:
In [707]: np.lib.npyio.format.read_magic(f)
Out[707]: (1, 0)
In [708]: np.lib.npyio.format.read_array_header_1_0(f)
Out[708]: ((5,), False, dtype('int32'))
我可以加载数据:
In [722]: np.fromfile(f, dtype=np.int32, count=5)
Out[722]: array([1, 2, 3, 4, 5])
我从 np.lib.npyio.format.read_array
函数代码中推导出来.
I deduced this from np.lib.npyio.format.read_array
function code.
现在文件位于:
In [724]: f.tell()
Out[724]: 100
这是下一个数组的头部:
which is the head of the next array:
In [725]: np.lib.npyio.format.read_magic(f)
Out[725]: (1, 0)
In [726]: np.lib.npyio.format.read_array_header_1_0(f)
Out[726]: ((5,), False, dtype('int32'))
In [727]: np.fromfile(f, dtype=np.int32, count=5)
Out[727]: array([ 6, 7, 8, 9, 10])
我们在 EOF.
并且知道int32
有4个字节,我们可以计算出数据占用了20个字节.所以我们可以通过读取头部,计算数据块的大小,然后seek
跳过一个数组来到达下一个数组.对于小阵列来说,工作不值得;但对于非常大的,它可能会有用.
And knowing that int32
has 4 bytes, we can calculate that the data occupies 20 bytes. So we could skip over an array by reading the header, calculating the size of the data block, and seek
past it to get to the next array. For small arrays that work isn't worth it; but for very large ones, it may be useful.