如何使用动态字符串通过Python Pandas过滤数据框
问题描述:
DataFrame
DataFrame
PROJECT CLUSTER_x MARKET_x CLUSTER_y MARKET_y Exist
0 P17 A CHINA C CHINA both
1 P18 P INDIA P INDIA both
2 P16 P AMERICA P AMERICA both
3 P19 P INDIA P JAPAN both
下面的代码可以正常工作,并输出为索引0和3
This below code works perfectly alright and gives output as index 0 and 3
df_mismatched = df_common[ (df_common['MARKET_x'] != df_common['MARKET_y']) | (df_common['CLUSTER_x'] != df_common['CLUSTER_y']) ]
我们如何动态建立这样的过滤条件?类似于下面的代码,因此下次无需进行硬编码
How we can dynamlically build such filter criteria? something like below code, so that next time hardcoding won't be necessary
str_common = '(df_common["MARKET_x"] != df_common["MARKET_y"]) | (df_common["CLUSTER_x"] != df_common["CLUSTER_y"])'
df_mismatched = df_common[str_common]
答
For the dynamic purpose, you can use query
in python like:
con = "(MARKET_x!=MARKET_y)|(CLUSTER_x!=CLUSTER_y)"
print(df.query(con))
PROJECT CLUSTER_x MARKET_x CLUSTER_y MARKET_y Exist
0 P17 A CHINA C CHINA both
3 P18 P INDIA P JAPAN both
请记住,如果列名称中包含空格或特殊字符,将无法产生正确的结果.
Remember that if the columns names have spaces or special characters it fails to produce the right results.