在熊猫数据框中使用NaN替换空列表
我正在尝试用NaN值替换数据中的一些空列表.但是如何在表达式中表示一个空列表?
I'm trying to replace some empty list in my data with a NaN values. But how to represent an empty list in the expression?
import numpy as np
import pandas as pd
d = pd.DataFrame({'x' : [[1,2,3], [1,2], ["text"], []], 'y' : [1,2,3,4]})
d
x y
0 [1, 2, 3] 1
1 [1, 2] 2
2 [text] 3
3 [] 4
d.loc[d['x'] == [],['x']] = d.loc[d['x'] == [],'x'].apply(lambda x: np.nan)
d
ValueError: Arrays were different lengths: 4 vs 0
而且,我想通过使用d[d['x'] == ["text"]]
并选择ValueError: Arrays were different lengths: 4 vs 1
错误来选择[text]
,但是通过使用d[d['y'] == 3]
选择3
是正确的.为什么?
And, I want to select [text]
by using d[d['x'] == ["text"]]
with a ValueError: Arrays were different lengths: 4 vs 1
error, but select 3
by using d[d['y'] == 3]
is correct. Why?
如果您希望将x
列中的空列表替换为numpy nan
,则可以执行以下操作:
If you wish to replace empty lists in the column x
with numpy nan
's, you can do the following:
d.x = d.x.apply(lambda y: np.nan if len(y)==0 else y)
如果要在等于['text']
的行上对数据框进行子集化,请尝试以下操作:
If you want to subset the dataframe on rows equal to ['text']
, try the following:
d[[y==['text'] for y in d.x]]
我希望这会有所帮助.