将逻辑值与以pandas/numpy的NaN进行比较
我想对两个熊猫系列的布尔值进行按元素或运算.还包括np.nan
.
I want to do an element-wise OR operation on two pandas Series of boolean values. np.nan
s are also included.
我尝试了三种方法,并且意识到根据方法,可以将表达式"np.nan
或False
"评估为True
,False
和np.nan
.
I have tried three approaches and realized that the expression "np.nan
or False
" can be evaluted to True
, False
, and np.nan
depending on the approach.
这些是我的示例系列:
series_1 = pd.Series([True, False, np.nan])
series_2 = pd.Series([False, False, False])
方法1
使用pandas的|
运算符:
In [5]: series_1 | series_2
Out[5]:
0 True
1 False
2 False
dtype: bool
方法2
使用numpy中的logical_or
函数:
In [6]: np.logical_or(series_1, series_2)
Out[6]:
0 True
1 False
2 NaN
dtype: object
方法3
我定义了logical_or
的矢量化版本,应该对数组进行逐行评估:
Approach #3
I define a vectorized version of logical_or
which is supposed to be evaluated row-by-row over the arrays:
@np.vectorize
def vectorized_or(a, b):
return np.logical_or(a, b)
我在两个系列上使用vectorized_or
并将其输出(它是一个numpy数组)转换为pandas系列:
I use vectorized_or
on the two series and convert its output (which is a numpy array) into a pandas Series:
In [8]: pd.Series(vectorized_or(series_1, series_2))
Out[8]:
0 True
1 False
2 True
dtype: bool
问题
我想知道这些结果的原因.
此答案解释了np.logical_or
并说np.logical_or(np.nan, False)
是True
,但是为什么这仅在矢量化时才有效,而在矢量化时却不起作用在方法2中?以及如何解释方法1的结果?
Question
I am wondering about the reasons for these results.
This answer explains np.logical_or
and says np.logical_or(np.nan, False)
is be True
but why does this only works when vectorized and not in Approach #2? And how can the results of Approach #1 be explained?
第一个区别:|
是np.bitwise_or
.它说明了#1和#2之间的区别.
first difference : |
is np.bitwise_or
. it explains the difference between #1 and #2.
第二个区别:由于serie_1.dtype如果为object
(非同类数据),则在前两种情况下逐行进行操作.
Second difference : since serie_1.dtype if object
(non homogeneous data), operations are done row by row in the two first cases.
使用向量化(#3)时:
When using vectorize ( #3):
vectorized
输出的数据类型是通过调用确定的 输入的第一个元素的功能.这可以避免 通过指定otypes
参数.
The data type of the output of
vectorized
is determined by calling the function with the first element of the input. This can be avoided by specifying theotypes
argument.
对于矢量化操作,请退出对象模式.数据首先根据第一个元素进行转换(在此处为bool(nan)
为True
),然后进行操作.
For vectorized operations, you quit the object mode. data are first converted according to first element (bool here, bool(nan)
is True
) and the operations are done after.