将不等长列表的列表重塑为numpy数组
我有一个带有 dtype = object
的特定数组,该数组元素表示不同时间的坐标对,我想将其重塑为更简单的格式.我设法做到了一次",但是我无法让它一直用于所有时间的观测.
I have a specific array with dtype = object
, the array elements represent couples of coordinates at different times and I want to reshape it into an easier format.
I managed to do this for "one time", but I can't get it to work for all time observations.
每个观察的长度是不同的,所以也许我必须使用掩码值来做到这一点.下面是一个示例,我希望可以更好地解释我想要的内容.
The length of each observation is different so perhaps I must use masked values to do that. Below is an example that I hope explains better what I want.
# My "input" is:
a = np.array([[], [(2, 0), (2, 2)], [(2, 2), (2, 0), (2, 1), (2, 2)]], dtype=object)
#And my "output" is:
#holding_array_VBPnegl
array([[2, 0],
[2, 2],
[2, 1]])
#It doesnt consider my for loop in a.shape[0], so the expected result is :
test = np.array([[[True, True],
[True, True],
[True, True]],
[[2, 0],
[2, 2],
[True, True]]
[[2, 0],
[2, 2],
[2, 1]]])
#with "True" the masked values
我尝试使用在StackOverflow上找到的代码:
I have tried using code I found on StackOverflow:
import numpy as np
holding_list_VBPnegl=[]
for i in range(a.shape[0]):
for x in a[i]:
if x in holding_list_VBPnegl:
pass
else:
holding_list_VBPnegl.append(x)
print holding_list_VBPnegl
holding_array_VBPnegl = np.asarray(holding_list_VBPnegl)
Numpy数组理想地用于连续内存块,因此您首先需要预先分配所需的内存量.您可以从数组 a
的长度中获取它(我很乐意将其转换为列表-不要滥用numpy数组来存储不等长的列表)(您将观察结果称为序列时间步长,是吗?)和最长观测值的长度(在本例中为4, a
的最后一个元素).
Numpy arrays are ideally used for blocks of contiguous memory, so you'll first need to preallocate the required amount of memory. You can get this from the length of your array a
(which I'll gladly cast to a list - don't abuse numpy arrays for storing unequal length lists) (you refer to the observations as a sequence of timesteps, yes?) and the length of the longest observation (in this case 4, a
's last element).
import numpy as np
a = np.array([[], [(2, 0), (2, 2)], [(2, 2), (2, 0), (2, 1), (2, 2)]], dtype=object)
s = a.tolist() # Lists are a better container type for your data...
cols = len(s)
rows = max( len(l) for l in s)
m = np.ones((cols, rows, 2))*np.nan
现在,您已经预先分配了所需的内容,并将阵列设置为可屏蔽.您现在只需要用已有的数据填充数组:
Now you've preallocated what you need and set the array ready for masking. You only need to fill the array now with the data you already have:
for rowind, row in enumerate(s):
try:
m[rowind, :len(row),:] = np.array(row)
except ValueError:
pass # broadcasting error: row is empty
result = np.ma.masked_array(m.astype(np.int), mask=np.isnan(m))
result
masked_array(data =
[[[-- --]
[-- --]
[-- --]
[-- --]]
[[2 0]
[2 2]
[-- --]
[-- --]]
[[2 2]
[2 0]
[2 1]
[2 2]]],
mask =
[[[ True True]
[ True True]
[ True True]
[ True True]]
[[False False]
[False False]
[ True True]
[ True True]]
[[False False]
[False False]
[False False]
[False False]]],
fill_value = 999999)