如何删除具有空值的特定行
这是我拥有的数据帧的子集.对于句子列具有值的每一行,在接下来的两行中重复A B C D列,而句子列没有值.如何删除句子中具有空值的第二行.我需要将句子列的第一行保留为空值.
This is a subset of data frame that I have. For each row that sentence column has value, column A B C D are repeated for the next two rows without having a value for the sentence column. How can I remove the second row with null value for sentences. I need to keep the first row with null value for sentence column.
A B C D R sentence ADR
112 135 21 EffexorXR.21 1 lack of good feeling. good
113 135 21 EffexorXR.21 1 1
114 135 21 EffexorXR.21 1
115 136 21 EffexorXR.21 2 Feel disconnected disconnected
116 136 21 EffexorXR.21 2
117 136 21 EffexorXR.21 2
118 142 22 EffexorXR.22 1 Weight gain gain
119 142 22 EffexorXR.22 1 1
120 142 22 EffexorXR.22 1
输出是这样
A B C D R sentence ADR
112 135 21 EffexorXR.21 1 lack of good feeling. good
113 135 21 EffexorXR.21 1 1
115 136 21 EffexorXR.21 2 Feel disconnected disconnected
116 136 21 EffexorXR.21 2
118 142 22 EffexorXR.22 1 Weight gain gain
119 142 22 EffexorXR.22 1 1
如果我使用以下代码:
df = df[pd.notnull(df['sentences'])]
,然后它将删除具有空值的两行.有什么建议吗?
df = df[pd.notnull(df['sentences'])]
, Then It will remove both rows with null values. Any suggestion ?
以下解决方案不起作用.
The following solution does not work.
df.set_index('A').drop_duplicates().reset_index()
也许您可以看到合并列的重复并将其用于mask
原始dataframe
:
May be you can see duplicates of combined columns and use it to mask
original dataframe
:
new_df = df[~df[['B','C','D', 'R', 'sentence']].duplicated()]
print(new_df)
输出:
A B C D R sentence ADR
0 112 135 21 EffexorXR.21 1 lack of good feeling. good
1 113 135 21 EffexorXR.21 1 1
3 115 136 21 EffexorXR.21 2 Feel disconnected disconnected
4 116 136 21 EffexorXR.21 2
6 118 142 22 EffexorXR.22 1 Weight gain gain
7 119 142 22 EffexorXR.22 1 1