使用具有多个元素的字典过滤数据框

使用具有多个元素的字典过滤数据框

问题描述:

我已经尝试了几个小时才能在此处找到答案,但是在我的特定情况下我无法解决任何问题.我能找到的最接近的是:应用多个字符串包含过滤器使用字典将熊猫数据框

I've tried for a few hours now to find an answer here but I am unable to get any to work in my particular case. Closest I could find was this: Apply multiple string containment filters to pandas dataframe using dictionary

我有一个交易价格的pd.Dataframe,其中包含以下几列:

I have a pd.Dataframe of deal prices with the following columns:

df1 = database[['DealID',
         'Price',
         'Attribute A',
         'Attribute B',
         'Attribute C']]

这些属性分为以下几类:

The attributes are categorised into the following:

filter_options = {
    'Attribute A': ["A1","A2","A3","A4"],
    'Attribute B': ["B1","B2","B3","B4"],
    'Attribute C': ["C1","C2","C3"],
}

我想使用 filter_options 的子集来过滤df1,该子集的每个键具有 多个 个值:

I want to filter df1 using a subset of filter_options which has multiple values per key:

filter = {
    'Attribute A': ["A1","A2"],
    'Attribute B': ["B1"],
    'Attribute C': ["C1","C3"],
}

当字典中每个键只有一个值时,下面的方法可以正常工作.

The below works fine when there is only one value per key in the dictionary.

df_filtered = df1.loc[(df1[list(filter)] == pd.Series(filter)).all(axis=1)]

但是,我能够通过每个键获取多个值来获得相同的结果吗?

However, am i able to get the same outcome with multple values per key?

谢谢!

我相信您需要更改变量 filter ,因为python保留了字,然后将 list comprehension isin concat 表示布尔掩码:

I believe you need change variable filter because python reserved word and then use list comprehension with isin and concat for boolean mask:

df1 = pd.DataFrame({'Attribute A':["A1","A2"],
                    'Attribute B':["B1","B2"],
                    'Attribute C':["C1","C2"],
                    'Price':[140,250]})

filt = {
    'Attribute A': ["A1","A2"],
    'Attribute B': ["B1"],
    'Attribute C': ["C1","C3"],
}

print (df1[list(filt)])
  Attribute A Attribute B Attribute C
0          A1          B1          C1
1          A2          B2          C2

mask = pd.concat([df1[k].isin(v) for k, v in filt.items()], axis=1).all(axis=1)
print (mask)
0     True
1    False
dtype: bool

df_filtered = df1[mask]
print (df_filtered)
  Attribute A Attribute B Attribute C  Price
0          A1          B1          C1    140