如何更改numpy数组dtype和重塑?
我有一个从HDF5文件读取的数组,它是一个元组的一维数组.它的dtype是:
I have an array that I read from an HDF5 file, and it is a 1D array of tuples. Its dtype is:
[('cycle', '<u2'), ('dxn', 'i1'), ('i (mA)', '<f4'), ('V', '<f4'), ('R(Ohm)', '<f4')]
我想将其从n x 1数组转换为np.float类型的(n/5)x 5数组.
I would like to convert this from an n x 1 array into a (n/5) x 5 array of type np.float.
我尝试了np.astype,但这不起作用-它仅返回n个元素.有什么简单的方法吗?
I tried np.astype but that does not work--it returns only n elements. Any easy way to do this?
dtypes的混合使此转换比平常更为棘手.最后的答案是,将字段复制到目标数组具有速度和通用性的组合.
The mix of dtypes makes this conversion trickier than usual. The answer at the end, copying fields to a target array has the combination of speed and generality.
将结构化数组转换为常规NumPy数组-建议一个重复项,但这种情况下具有所有的float字段.
Convert structured array to regular NumPy array - was suggested as a duplicate, but that case has all float fields.
让我们构造一个示例:
In [850]: dt
Out[850]: dtype([('cycle', '<u2'), ('dxn', 'i1'), ('i (mA)', '<f4'), ('V', '<f4'), ('R(Ohm)', '<f4')])
In [851]: x=np.zeros((3,),dt)
In [852]: x['cycle']=[0,10,23]
In [853]: x['dxn']=[3,2,2]
In [854]: x['V']=[1,1,1]
In [855]: x
Out[855]:
array([(0, 3, 0.0, 1.0, 0.0), (10, 2, 0.0, 1.0, 0.0),
(23, 2, 0.0, 1.0, 0.0)],
dtype=[('cycle', '<u2'), ('dxn', 'i1'), ('i (mA)', '<f4'), ('V', '<f4'), ('R(Ohm)', '<f4')])
我们可以通过该链接中建议的方式查看3个float字段:
We can view the 3 float fields in ways suggested in that link:
In [856]: dt1=np.dtype([('f0','float32',(3))])
In [857]: y=x[list(x.dtype.names[2:])].view(dt1)
# or x[list(x.dtype.names[2:])].view((np.float32, 3))
In [858]: y
Out[858]:
array([([0.0, 1.0, 0.0],), ([0.0, 1.0, 0.0],), ([0.0, 1.0, 0.0],)],
dtype=[('f0', '<f4', (3,))])
In [859]: y['f0']
Out[859]:
array([[ 0., 1., 0.],
[ 0., 1., 0.],
[ 0., 1., 0.]], dtype=float32)
但是,如果我想更改所有值,则需要复制 y
.不允许一次写入多个字段.
But I need to make y
a copy if I want to change all the values. Writing to multiple fields at a time is not allowed.
In [863]: y=x[list(x.dtype.names[2:])].view(dt1).copy()
In [864]: y['f0']=np.arange(9.).reshape(3,3)
具有一个dtype的
view
不能捕获行结构;我们必须使用 reshape
将其添加回去.形状为(3,)
的 dt1
可以解决该问题.
view
with one dtype does not capture the row structure; we have to add that back with reshape
. dt1
with a (3,)
shape gets around that issue.
In [867]: x[list(x.dtype.names[2:])].view(np.float32)
Out[867]: array([ 0., 1., 0., 0., 1., 0., 0., 1., 0.], dtype=float32)
https://stackoverflow.com/a/5957455/901925 建议仔细阅读列表.
In [868]: x.tolist()
Out[868]: [(0, 3, 0.0, 1.0, 0.0), (10, 2, 0.0, 1.0, 0.0), (23, 2, 0.0, 1.0, 0.0)]
In [869]: np.array(x.tolist())
Out[869]:
array([[ 0., 3., 0., 1., 0.],
[ 10., 2., 0., 1., 0.],
[ 23., 2., 0., 1., 0.]])
可以使用 astype
转换单个字段:
Individual fields can be converted with astype
:
In [878]: x['cycle'].astype(np.float32)
Out[878]: array([ 0., 10., 23.], dtype=float32)
In [879]: x['dxn'].astype(np.float32)
Out[879]: array([ 3., 2., 2.], dtype=float32)
但不能多个字段:
In [880]: x.astype(np.float32)
Out[880]: array([ 0., 10., 23.], dtype=float32)
recfunctions
帮助操纵结构化数组(和recarray)
recfunctions
help manipulated structured arrays (and recarrays)
from numpy.lib import recfunctions
他们中的许多人构造了一个新的空结构,并逐字段复制值.在这种情况下的等效项:
Many of them construct a new empty structure, and copy values field by field. The equivalent in this case:
In [890]: z=np.zeros((3,5),np.float32)
In [891]: for i in range(5):
.....: z[:,i] = x[x.dtype.names[i]]
In [892]: z
Out[892]:
array([[ 0., 3., 0., 1., 0.],
[ 10., 2., 0., 1., 0.],
[ 23., 2., 0., 1., 0.]], dtype=float32)
在这种小情况下,它比 np.array(x.tolist())
慢一些.但是对于30000条记录,这要快得多.
In this small case it is a bit slower than np.array(x.tolist())
. But for 30000 records this is much faster.
通常,结构化数组中的记录要比字段多得多,因此在字段上进行迭代的速度并不慢.
Usually there are many more records than fields in a structured array, so iteration on fields is not slow.