在熊猫数据框中使用NaN替换空列表

问题描述:

我正在尝试用NaN值替换数据中的一些空列表.但是如何在表达式中表示一个空列表?

I'm trying to replace some empty list in my data with a NaN values. But how to represent an empty list in the expression?

import numpy as np
import pandas as pd
d = pd.DataFrame({'x' : [[1,2,3], [1,2], ["text"], []], 'y' : [1,2,3,4]})
d

    x           y
0   [1, 2, 3]   1
1   [1, 2]      2
2   [text]      3
3   []          4



d.loc[d['x'] == [],['x']] = d.loc[d['x'] == [],'x'].apply(lambda x: np.nan)
d

ValueError: Arrays were different lengths: 4 vs 0

而且,我想通过使用d[d['x'] == ["text"]]并选择ValueError: Arrays were different lengths: 4 vs 1错误来选择[text],但是通过使用d[d['y'] == 3]选择3是正确的.为什么?

And, I want to select [text] by using d[d['x'] == ["text"]] with a ValueError: Arrays were different lengths: 4 vs 1 error, but select 3 by using d[d['y'] == 3] is correct. Why?

如果您希望将x列中的空列表替换为numpy nan,则可以执行以下操作:

If you wish to replace empty lists in the column x with numpy nan's, you can do the following:

d.x = d.x.apply(lambda y: np.nan if len(y)==0 else y)

如果要在等于['text']的行上对数据框进行子集化,请尝试以下操作:

If you want to subset the dataframe on rows equal to ['text'], try the following:

d[[y==['text'] for y in d.x]]

我希望这会有所帮助.