Python:根据数组中的值拆分NumPy数组

问题描述:

我有一个大数组:

[(1.0, 3.0, 1, 427338.4297000002, 4848489.4332)
 (1.0, 3.0, 2, 427344.7937000003, 4848482.0692)
 (1.0, 3.0, 3, 427346.4297000002, 4848472.7469) ...,
 (1.0, 1.0, 7084, 427345.2709999997, 4848796.592)
 (1.0, 1.0, 7085, 427352.9277999997, 4848790.9351)
 (1.0, 1.0, 7086, 427359.16060000006, 4848787.4332)]

我想根据数组中的第二个值(3.0、3.0、3.0 ... 1.0、1.0、10)将此数组拆分为多个数组.

I want to split this array into multiple arrays based on the 2nd value in the array (3.0, 3.0, 3.0...1.0,1.0,10).

每次第二个值更改时,我都想要一个新数组,因此基本上每个新数组都具有相同的第二个值.我已经在Stack Overflow上进行了查找,并且知道了命令

Every time the 2nd value changes, I want a new array, so basically each new array has the same 2nd value. I've looked this up on Stack Overflow and know of the command

np.split(array, number)

但是我不是试图将数组拆分为一定数量的数组,而是将其拆分为一个值.我将如何以上面指定的方式拆分数组? 任何帮助将不胜感激!

but I'm not trying to split the array into a certain number of arrays, but rather by a value. How would I be able to split the array in the way specified above? Any help would be appreciated!

您可以使用 numpy.diff 在第一列上:

You can find the indices where the values differ by using numpy.where and numpy.diff on the first column:

>>> arr = np.array([(1.0, 3.0, 1, 427338.4297000002, 4848489.4332),
 (1.0, 3.0, 2, 427344.7937000003, 4848482.0692),
 (1.0, 3.0, 3, 427346.4297000002, 4848472.7469),
 (1.0, 1.0, 7084, 427345.2709999997, 4848796.592),
 (1.0, 1.0, 7085, 427352.9277999997, 4848790.9351),
 (1.0, 1.0, 7086, 427359.16060000006, 4848787.4332)])
>>> np.split(arr, np.where(np.diff(arr[:,1]))[0]+1)
[array([[  1.00000000e+00,   3.00000000e+00,   1.00000000e+00,
          4.27338430e+05,   4.84848943e+06],
       [  1.00000000e+00,   3.00000000e+00,   2.00000000e+00,
          4.27344794e+05,   4.84848207e+06],
       [  1.00000000e+00,   3.00000000e+00,   3.00000000e+00,
          4.27346430e+05,   4.84847275e+06]]),
 array([[  1.00000000e+00,   1.00000000e+00,   7.08400000e+03,
          4.27345271e+05,   4.84879659e+06],
       [  1.00000000e+00,   1.00000000e+00,   7.08500000e+03,
          4.27352928e+05,   4.84879094e+06],
       [  1.00000000e+00,   1.00000000e+00,   7.08600000e+03,
          4.27359161e+05,   4.84878743e+06]])]

说明:

首先,我们将在第二个第二列中获取项目:

Here first we are going to fetch the items in the second 2 column:

>>> arr[:,1]
array([ 3.,  3.,  3.,  1.,  1.,  1.])

现在要找出项目实际更改的位置,我们可以使用numpy.diff:

Now to find out where the items actually change we can use numpy.diff:

>>> np.diff(arr[:,1])
array([ 0.,  0., -2.,  0.,  0.])

任何非零的东西都意味着它旁边的项目是不同的,我们可以使用numpy.where查找非零项目的索引,然后将其加1,因为该项目的实际索引比1大.返回的索引:

Any thing non-zero means that the item next to it was different, we can use numpy.where to find the indices of non-zero items and then add 1 to it because the actual index of such item is one more than the returned index:

>>> np.where(np.diff(arr[:,1]))[0]+1
array([3])