保存numpy数组的字典

问题描述：

因此，我有一个数据库，其中包含两年的站点数据.我现在正尝试将这些数据用于分析-按关键字等对广告费用进行绘图和排序.

So I have a DB with a couple of years worth of site data. I am now attempting to use that data for analytics - plotting and sorting of advertising costs by keyword, etc.

从数据库获取的数据之一需要几分钟才能完成.虽然我可以花一些时间来优化SQL语句，但我还是想获取数据，我宁愿只留下该类和SQL本身，获取数据，然后将结果保存到数据文件中，以便以后更快地进行检索.该数据库的大多数数据都不会更改，因此我可以编写一个单独的python脚本每24小时更新一次文件，然后将该文件用于长时间运行的任务.

One of the data grabs from the DB takes minutes to complete. While I could spend some time optimizing the SQL statements I use to get the data I'd prefer to simply leave that class and it's SQL alone, grab the data, and save the results to a data file for faster retrieval later. Most of this DB data isn't going to change so I could write a separate python script to update the file every 24 hours and then use that file for this long running task.

数据以numpy数组的字典形式返回.当我使用numpy.save('data', data)时，文件保存就好了.当我使用data2 = numpy.load('data.npy')时，它会正确加载文件.但是，输出data2不等于原始的data.

The data is being returned as a dictionary of numpy arrays. When I use numpy.save('data', data) the file is saved just fine. When I use data2 = numpy.load('data.npy') it loads the file without error. However, the output data2 doesn't not equal the original data.

具体来说，行data == data2返回false.此外，如果我使用以下内容:

Specifically the line data == data2 returns false. Additionally, if I use the following:

for key, key_data in data.items():
  print key

有效.但是，当我用data2.items()替换data.items()时，我得到一个错误:

it works. But when I replace data.items() with data2.items() then I get an error:

AttributeError: 'numpy.ndarray' object has no attribute 'items'

使用type(data)我得到dict.使用type(data2)我得到numpy.ndarray.

Using type(data) I get dict. Using type(data2) I get numpy.ndarray.

那我该如何解决呢?我希望加载的数据等于传递给保存的数据.是否有numpy.save的参数可以解决此问题，还是我需要某种形式的简单重新格式化功能才能将加载的数据重新格式化为正确的结构?

So how do I fix this? I want the loaded data to equal the data I passed in for saving. Is there an argument to numpy.save to fix this or do I need some form of simple reformatting function to reformat the loaded data into the proper structure?

尝试通过for循环或索引全部进入ndarray会导致有关索引0-d数组的错误.像这样的dict(data2)进行转换也无法在0-d数组上进行迭代.但是，Spyder显示了数组的值，并且包含了我保存的数据.我只是不知道该怎么去.

Attempts to get into the ndarray via for loops or indexing all lead to errors about indexing a 0-d array. Casting like this dict(data2) also fails for iterating over a 0-d array. However, Spyder shows value of the array and it includes the data I saved. I just can't figure out how to get to it.

如果我需要重新格式化加载的数据，我将感谢一些示例代码来说明如何做到这一点.

If I need to reformat the loaded data I'd appreciate some example code on how to do this.

答

让我们看一个小例子:

In [819]: N
Out[819]: 
array([[  0.,   1.,   2.,   3.],
       [  4.,   5.,   6.,   7.],
       [  8.,   9.,  10.,  11.]])

In [820]: data={'N':N}

In [821]: np.save('temp.npy',data)

In [822]: data2=np.load('temp.npy')

In [823]: data2
Out[823]: 
array({'N': array([[  0.,   1.,   2.,   3.],
       [  4.,   5.,   6.,   7.],
       [  8.,   9.,  10.,  11.]])}, dtype=object)

np.save用于保存numpy数组. data是字典.因此，它将它包装在一个对象数组中，并使用pickle保存该对象.您的data2可能具有相同的字符.

np.save is designed to save numpy arrays. data is a dictionary. So it wrapped it in a object array, and used pickle to save that object. Your data2 probably has the same character.

您将获得以下阵列:

In [826]: data2[()]['N']
Out[826]: 
array([[  0.,   1.,   2.,   3.],
       [  4.,   5.,   6.,   7.],
       [  8.,   9.,  10.,  11.]])

相关推荐