在NumPy 1.14中将结构化数组的切片转换为常规NumPy数组

问题描述：

注1:此问题中没有给出任何答案就我而言.

Note 1: None of the answers given to this question work in my case.

注2:该解决方案必须在NumPy 1.14中运行.

Note 2: The solution must work in NumPy 1.14.

假设我具有以下结构化数组:

Assume I have the following structured array:

arr = np.array([(105.0, 34.0, 145.0, 217.0)], dtype=[('a', 'f4'), ('b', 'f4'), ('c', 'f4'), ('d', 'f4')]).

现在，我将像这样分割为结构化数据类型:

Now I'm slicing into the structured data type like so:

arr2 = arr[['a', 'b']]

现在我正在尝试将该切片转换为常规数组:

And now I'm trying to convert that slice into a regular array:

out = arr2[0].view((np.float32, 2))

结果

ValueError: Changing the dtype of a 0d array is only supported if the itemsize is unchanged

我想要得到的只是一个常规数组，如下所示:

What I would like to get is just a regular array like so:

[105.0, 34.0]

请注意，此示例已简化以使其最小化.在我的实际用例中，我显然不是在处理包含一个元素的数组.

Note that this example is simplified in order to be minimal. In my real use case I'm obviously not dealing with an array that holds one element.

我知道此解决方案有效:

I know that this solution works:

out = np.asarray(list(arr2[0]))

但是我认为必须有一个比将NumPy数组中已经存在的数据复制到列表然后再返回到数组更有效的解决方案.我认为有一种方法可以保留在NumPy中，也许根本不复制任何数据，我只是不知道如何.

but I thought there must be a more efficient solution than copying data that is already in a NumPy array into a list and then back into an array. I assume there is a way to stay in NumPy an maybe not actually copy any data at all, I just don't know how.

答

一维数组的确使用view进行了转换:

The 1d array does convert with view:

In [270]: arr = np.array([(105.0, 34.0, 145.0, 217.0)], dtype=[('a', 'f4'), ('b','f4'), ('c', 'f4'), ('d', 'f4')])
In [271]: arr
Out[271]: 
array([(105., 34., 145., 217.)],
      dtype=[('a', '<f4'), ('b', '<f4'), ('c', '<f4'), ('d', '<f4')])
In [272]: arr.view('<f4')
Out[272]: array([105.,  34., 145., 217.], dtype=float32)

当我们尝试转换单个元素时，会出现此错误:

It's when we try to convert a single element, that we get this error:

In [273]: arr[0].view('<f4')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-273-70fbab8f61ba> in <module>()
----> 1 arr[0].view('<f4')

ValueError: Changing the dtype of a 0d array is only supported if the itemsize is unchanged

早期的view通常需要对尺寸进行调整.我怀疑最近对结构化数组的处理发生了变化(一次索引多个字段时最明显)，该错误是有意还是无意的结果.

Earlier view often required a tweak in the dimensions. I suspect that with recent changes to handling of structured arrays (most evident when indexing several fields at once), this error is a result, either intentionally or not.

在整个数组的情况下，它将1d，4字段数组更改为1d，4元素数组，从(1)到(4，).但是更改元素从()到(4，).

In the whole array case it changed the 1d, 4 field array into a 1d, 4 element array, (1,) to (4,). But changing the element, goes from () to (4,).

过去，我推荐tolist作为解决view(和astype)问题的最可靠方法:

In the past I have recommended tolist as the surest way around problem with view (and astype):

In [274]: arr[0].tolist()
Out[274]: (105.0, 34.0, 145.0, 217.0)
In [279]: list(arr[0].tolist())
Out[279]: [105.0, 34.0, 145.0, 217.0]
In [280]: np.array(arr[0].tolist())
Out[280]: array([105.,  34., 145., 217.])

item也是将元素从其numpy结构中拉出的好方法:

item is also a good way of pulling an element out of its numpy structure:

In [281]: arr[0].item()
Out[281]: (105.0, 34.0, 145.0, 217.0)

tolost和item的结果是一个元组.

The result from tolost and item is a tuple.

您担心速度.但是，您只是在转换一个元素.在1000个项目的数组上使用tolist时，担心速度是一回事，而在处理1个元素时，则是另一回事.

You worry about speed. But you are just converting one element. It's one thing to worry about the speed when using tolist on a 1000 item array, quite another when working with 1 element.

In [283]: timeit arr[0]
131 ns ± 1.31 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [284]: timeit arr[0].tolist()
1.25 µs ± 11.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [285]: timeit arr[0].item()
1.27 µs ± 2.39 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [286]: timeit arr.tolist()
493 ns ± 17.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [287]: timeit arr.view('f4')
1.74 µs ± 18.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

您可以以不将尺寸减小到0的方式对元素建立索引(不是对速度有很大帮助):

You could index the element in a way that doesn't reduce the dimension to 0 (not that it helps much with speed):

In [288]: arr[[0]].view('f4')
Out[288]: array([105.,  34., 145., 217.], dtype=float32)
In [289]: timeit arr[[0]].view('f4')
6.54 µs ± 15.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [290]: timeit arr[0:1].view('f4')
2.63 µs ± 105 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [298]: timeit arr[0][None].view('f4')
4.28 µs ± 160 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

view仍然需要更改形状；考虑一个大数组:

view still requires a change in shape; consider a big array:

In [299]: arrs = np.repeat(arr, 10000)
In [301]: arrs.view('f4')
Out[301]: array([105.,  34., 145., ...,  34., 145., 217.], dtype=float32)
In [303]: arrs.shape
Out[303]: (10000,)
In [304]: arrs.view('f4').shape
Out[304]: (40000,)

视图仍然是1d，因为我们可能想要一个(10000,4)形状的2d数组.

The view is still 1d, where as we'd probably want a (10000,4) shaped 2d array.

更好的视图更改:

In [306]: arrs.view(('f4',4))
Out[306]: 
array([[105.,  34., 145., 217.],
       [105.,  34., 145., 217.],
       [105.,  34., 145., 217.],
       ...,
       [105.,  34., 145., 217.],
       [105.,  34., 145., 217.],
       [105.,  34., 145., 217.]], dtype=float32)
In [307]: _.shape
Out[307]: (10000, 4)

这适用于1元素数组，无论是1d还是0d:

This works with the 1 element array, whether 1d or 0d:

In [308]: arr.view(('f4',4))
Out[308]: array([[105.,  34., 145., 217.]], dtype=float32)
In [309]: _.shape
Out[309]: (1, 4)
In [310]: arr[0].view(('f4',4))
Out[310]: array([105.,  34., 145., 217.], dtype=float32)
In [311]: _.shape
Out[311]: (4,)

这是您链接中答案之一的建议: https://stackoverflow.com/a/10171321/901925

This was suggested in one of the answers in your link: https://stackoverflow.com/a/10171321/901925

与您的评论相反，它对我有用:

Contrary to your comment there, it works for me:

In [312]: arr[0].view((np.float32, len(arr.dtype.names)))
Out[312]: array([105.,  34., 145., 217.], dtype=float32)
In [313]: np.__version__
Out[313]: '1.14.0'

进行

With the edit:

In [84]: arr = np.array([(105.0, 34.0, 145.0, 217.0)], dtype=[('a', 'f4'), ('b','f4'), ('c', 'f4'), ('d', 'f4')])
In [85]: arr2 = arr[['a', 'b']]
In [86]: arr2
Out[86]: 
array([(105., 34.)],
      dtype={'names':['a','b'], 'formats':['<f4','<f4'], 'offsets':[0,4], 'itemsize':16})

In [87]: arr2.view(('f4',2))
...
ValueError: Changing the dtype to a subarray type is only supported if the total itemsize is unchanged

请注意，arr2 dtype包含offsets值.在最新的numpy版本中，多字段选择已更改.现在，它是一个真实的视图，可以保留原始数据-全部数据，而不仅仅是选定的字段.项目大小保持不变:

Note that the arr2 dtype includes an offsets value. In a recent numpy version, multiple field selection has changed. It is now a true view, preserving the original data - all of it, not just the selected fields. The itemsize is unchanged:

In [93]: arr.itemsize
Out[93]: 16
In [94]: arr2.itemsize
Out[94]: 16

arr.view(('f4',4)和arr2.view(('f4',4))产生相同的结果.

因此，您不能view(更改dtype)部分字段集.您必须首先获取整个数组的view，然后选择行/列，或者使用tolist.

So you can't view (change dtype) a partial set of the fields. You have to first take the view of the whole array, and then select rows/columns, or work with tolist.

我正在使用1.14.0. 1.14.1的发行说明说:

I'm using 1.14.0. Release notes for 1.14.1 says:

1.14.0中的变化是结构化数组的多字段索引返回了视图而不是副本已还原，但仍在NumPy 1.15上正常进行. 受影响的用户应阅读《 1.14.1 Numpy用户指南》部分基本/结构化数组/访问多个字段"，以获取有关如何处理此过渡.

The change in 1.14.0 that multi-field indexing of structured arrays returns a view instead of a copy has been reverted but remains on track for NumPy 1.15. Affected users should read the 1.14.1 Numpy User Guide section "basics/structured arrays/accessing multiple fields" for advice on how to manage this transition.

https://docs.scipy.org/doc/numpy-1.14.2/user/basics.rec.html#accessing-multiple-fields

这仍在开发中.该文档提到了repack_fields函数，但尚不存在.

This is still under development. That doc mentions a repack_fields function, but that doesn't exist yet.

在NumPy 1.14中将结构化数组的切片转换为常规NumPy数组

相关推荐