熊猫使用布尔值选择DataFrame列
我想使用布尔值从具有超过1,000列的数据框comb
中选择具有4000多个条目的列.该表达式为我提供了布尔(真/假)结果:
I want to use a boolean to select the columns with more than 4000 entries from a dataframe comb
which has over 1,000 columns. This expression gives me a Boolean (True/False) result:
criteria = comb.ix[:,'c_0327':].count()>4000
我想用它仅选择新数据框的True
列.
以下只是给我提供了不可调布尔系列键":
I want to use it to select only the True
columns to a new Dataframe.
The following just gives me "Unalignable boolean Series key provided":
comb.loc[criteria,]
我也尝试过:
comb.ix[:, comb.ix[:,'c_0327':].count()>4000]
类似于此问题的答案沿着列而不是行的数据框布尔选择 但这给了我同样的错误:提供了不可对齐的布尔系列键"
Similar to this question answer dataframe boolean selection along columns instead of row but that gives me the same error: "Unalignable boolean Series key provided"
comb.ix[:,'c_0327':].count()>4000
产量:
c_0327 False
c_0328 False
c_0329 False
c_0330 False
c_0331 False
c_0332 False
c_0333 False
c_0334 False
c_0335 False
c_0336 False
c_0337 True
c_0338 False
.....
返回的是一个以列名作为索引,布尔值作为行值的Series.
What is returned is a Series with the column names as the index and the boolean values as the row values.
我想您实际上是想要的:
I think actually you want:
这现在应该可以工作:
comb[criteria.index[criteria]]
基本上,这会使用条件中的索引值和布尔值来屏蔽它们,这将返回一个列名数组,我们可以使用它从orig df中选择感兴趣的列.
Basically this uses the index values from criteria and the boolean values to mask them, this will return an array of column names, we can use this to select the columns of interest from the orig df.