使用具有多个值的字典过滤数据框
我有一个数据框,当用户选择要应用过滤器的列和值时,该列由三列组成在系统上,使用以下 keys ==列名创建字典: values =单元格值,我想将过滤后的数据框显示为桌子.
I have a dataframe that is consist of 3 columns when user select what are the columns and values that he want to apply a filter on it the system create a dictionary with these keys==columns name : values = cell values, I want to display the filtered dataframe as a table.
我想在字典和数据框之间进行比较,并显示过滤后的数据.
import pandas as pd
df =pd.DataFrame({
"source_number":[11199,11328,11287,32345,12342,1232,13456,123244,13456],
"location":["loc2","loc1","loc3","loc1","loc2","loc2","loc3","loc2","loc1"],
"category":["cat1","cat2","cat1","cat3","cat3","cat3","cat2","cat3","cat2"],
})
#let say the created dictionary have the below value
sidebars = {"location":["loc1","loc2"],"category":["cat1","cat3"]}
预期结果:
source_number location category
32345 loc1 cat3
11199 loc2 cat1
12342 loc2 cat3
1232 loc2 cat3
123244 loc2 cat3
streamlit代码:
import numpy as np
import pandas as pd
import streamlit as st
df =pd.DataFrame({
"source_number": [
[11199,11328,11287,32345,12342,1232,13456,123244,13456],
"location":
["loc2","loc1","loc3","loc1","loc2","loc2","loc3","loc2","loc1"],
"category":
["cat1","cat2","cat1","cat3","cat3","cat3","cat2","cat3","cat2"],
})
is_check = st.checkbox("Display Data")
if is_check:
st.table(df)
columns = st.sidebar.multiselect("Enter the variables", df.columns)
sidebars = {}
for y in columns:
ucolumns=list(df[y].unique())
sidebars[y]=st.sidebar.multiselect('Filter '+y, ucolumns)
L = [df[k].isin(v) if isinstance(v, list)
else df[k].eq(v)
for k, v in sidebars.items() if k in df.columns]
df = df[np.logical_and.reduce(L)]
st.table(df)
如何获得期望的结果,我知道我需要遍历字典并进行比较
How to get the excpected result i know that i need to iterate over the dictionary and compare
我根据 @jezrael 的答案解决了字典和数据框之间的对应关系.
After i solved the comapring between dictionary and dataframe based on the answer of @jezrael.
它仍然在第一行显示以下错误:
it still display the below error at the first :
KeyError: True
Traceback:
File "f:\aienv\lib\site-packages\streamlit\script_runner.py", line 333, in _run_script
exec(code, module.__dict__)
File "F:\AIenv\streamlit\app.py", line 326, in <module>
df = df[np.logical_and.reduce(L)]
File "f:\aienv\lib\site-packages\pandas\core\frame.py", line 2902, in __getitem__
indexer = self.columns.get_loc(key)
File "f:\aienv\lib\site-packages\pandas\core\indexes\base.py", line 2893, in get_loc
raise KeyError(key) from err
使用dict理解通过np.logical_and并减少
技巧:
Use dict comprehension for select dynamic by columns names with values in lists by Series.isin
with np.logical_and and reduce
trick:
注意-如果在字典中使用 isin
,则所有值都必须为 list
Notice - If use isin
in dict all values has to be list
df = df[np.logical_and.reduce([df[k].isin(v) for k, v in sidebars.items()])]
print (df)
source_number location category
0 11199 loc2 cat1
3 32345 loc1 cat3
4 12342 loc2 cat3
5 1232 loc2 cat3
7 123244 loc2 cat3
If possible scalars or lists in dict is possible use if-else
in list comprehension with test scalars by Series.eq
:
#let say the created dictionary have the below value
sidebars = {"location":["loc1","loc2"],"category":"cat3"}
L = [df[k].isin(v) if isinstance(v, list) else df[k].eq(v) for k, v in sidebars.items()]
df = df[np.logical_and.reduce(L)]
print (df)
source_number location category
3 32345 loc1 cat3
4 12342 loc2 cat3
5 1232 loc2 cat3
7 123244 loc2 cat3
如果可能,某些列可能无法通过dict键进行匹配来过滤它(但随后不会通过此不匹配的键进行过滤):
If possible some column no match by keys of dict is possible filter it (but then not filtered by this not matched key):
L = [df[k].isin(v) for k, v in sidebars.items() if k in df.columns]
L = [df[k].isin(v) if isinstance(v, list)
else df[k].eq(v)
for k, v in sidebars.items() if k in df.columns]
df = df[np.logical_and.reduce(L)]
首次使用流式编码,所以可能会有更好的解决方案,如果传递空字典,这就是问题.
First time code in streamlit, so possible better solutions, here is problem if passed empty dictionary.
因此可以通过 bool(sidebars)
进行检查:
So possible check it by if bool(sidebars)
:
is_check = st.checkbox("Display Data")
if is_check:
st.table(df)
columns = st.sidebar.multiselect("Enter the variables", df.columns)
sidebars = {}
for y in columns:
ucolumns=list(df[y].unique())
print (ucolumns)
sidebars[y]=st.sidebar.multiselect('Filter '+y, ucolumns)
if bool(sidebars):
L = [df[k].isin(v) if isinstance(v, list)
else df[k].eq(v)
for k, v in sidebars.items() if k in df.columns]
df1 = df[np.logical_and.reduce(L)]
st.table(df1)