将包含字典列表的pandas数据框列解压缩为新列
我有一个数据框 new_df
,其中有一列,其中包含字典列表以及一些行 NaN
.
I have a dataframe new_df
that has one column, which contains a list of dictionaries, with some rows NaN
.
new_df
0
0 NaN
1 NaN
2 [{'start_time': '09:16:44', 'e...
3 [{'start_time': '09:36:44', 'e...
4 [{'start_time': '09:46:44', 'e...
5 [{'start_time': '09:48:44', 'e...
6 [{'start_time': '09:55:44', 'e...
7 [{'start_time': '09:59:44', 'e...
8 [{'start_time': '10:50:22', 'e...
9 [{'start_time': '11:30:22', 'e...
10 [{'start_time': '11:35:22', 'e...
11 [{'start_time': '12:50:22', 'e...
12 NaN
13 NaN
当一行包含包含字典的列表时,其格式如下:
When a row contains a list containing a dictionary it is in this format:
[{'start_time': '09:16:44', 'end_time': '9:36:44', 'job_id': '123456'}]
我需要将 new_df
中每个列表/行中的字典解压缩到新列中,并将这些新列应用于另一个数据框.
I need to unpack the dictionary in each list/row in new_df
into new columns and apply these new columns to another dataframe.
我遇到的问题是保留 new_df
的索引,因为将新列数据正确地应用于其他数据框是必需的.
The problem I am having is preserving the index of new_df
as it is needed to correctly apply the new column data to the other dataframe.
我可以解压缩列表并从字典值创建新列,但是当我应用新列时,它们将应用于 row [0]
而不是 row [2] 代码>在这种情况下.我丢失了行值是
NaN
的开头和结尾的行.
I can unpack the lists and create new columns from the dictionary values, but when I apply the new columns, they apply to row[0]
instead of row[2]
in this case. I lose the rows at the beginning and end where the row values are NaN
.
add_df = pd.DataFrame(list(new_df[0]))
生产
start_time end_time job_id
0 09:16:44 09:36:44 123456
1 09:36:44 09:46:44 123457
2 09:46:44 09:48:44 123458
3 09:48:44 09:59:59 123459
... ... ...
8 11:35:22 12:45:00 123460
9 12:50:22 13:00:00 123461
我需要是要保留如下所示的索引,即保存字典列表的 new_df
中的索引:
What I need is to preserve the indexes like shown below, the indexes from new_df
that holds the lists of dictionaries:
start_time end_time job_id
0 NaN NaN NaN
1 NaN NaN NaN
2 09:16:44 09:36:44 123456
3 09:36:44 09:46:44 123457
4 09:46:44 09:48:44 123458
5 09:48:44 09:59:59 123459
... ... ...
10 11:35:22 12:45:00 123460
11 12:50:22 13:00:00 123461
12 NaN NaN NaN
13 NaN NaN NaN
如何保存索引并保留前行和后行 NaN
行?
How can I preserve the index to and have the leading and trailing NaN
rows?
@ Ben.T的评论让我想到了我想要实现的目标.
The comment made by @Ben.T made me think of what I was trying to accomplish.
我从一系列字典创建一个数据框.当我可以将新数据框应用于列轴上的现有数据框时,为什么要逐列剥离此新数据框?
I was creating a dataframe from a series that is a list of dictionaries. Why was I peeling off this new dataframe column by column, when I could apply the new dataframe to the existing dataframe on the column axis?
我的解决方案:
# Creates df but removes the NaN elements
new_df = pd.DataFrame(list(orig_df[0]).dropna())
# Get the orig_df indexes of non-NaN rows to apply to the new df
new_ndx = new_df.index[orig_df[0].notna()]
# Reset index and give new indexes that will line up
new_df = new_df.reset_index(drop=True)
new_df = new_df.set_index(new_ndx)
# Now apply the new_df to the orig_df
orig_df= pd.concat([orig_df, new_df ], axis=1)
也许还有一种更Python化的方式来完成此任务...?
Is there maybe a more pythonic way to accomplish this...?