使用多个条件和多个数据框的python pandas countifs

问题描述：

尝试在Python中使用多个数据框创建等同于Excel的跨多个表的计数.

Trying to create--in Python using multiple data frames--the equivalent of a countifs in Excel that would span multiple sheets.

我需要根据当前数据框中的条件，在另一个数据框上添加新的记录列数.

I need a new column count of records on another data frame based on criteria from the current data frame.

请参见 Excel印象关于我想在python中做的事情，也此处.

See Excel impression of what I want to do in python, also here.

我的目标?

计算学生数据框架上的考试
通过学生证
考试日期> =注册日期
考试日期< =详细日期
考试成绩> = 70

Count exams on students data frame
by Student ID
with exam date >= enroll date
with exam date <= detail date
with exam grade >= 70

基本上，Excel等价于...

Basically the Excel equivalent would be...

= COUNTIFS(摘要！$ B $ 1:$ B $ 11，> ="& Detail！B2，摘要！$ B $ 1:$ B $ 11，< ="& Detail！C2，摘要！$ C $ 1:$ C $ 11，> ="& 70，摘要！$ A $ 1:$ A $ 11，"="& Detail！A2)

=COUNTIFS(Summary!$B$1:$B$11, ">="&Detail!B2, Summary!$B$1:$B$11, "<="&Detail!C2, Summary!$C$1:$C$11, ">="&70, Summary!$A$1:$A$11, "="&Detail!A2)

......其中摘要"是主要数据框架，详细信息"是我要对记录进行计数的次要数据框架.

...where Summary is the primary data frame and Detail is the secondary data frame where I want to count records.

在我的研究中找到了以下答案:

Found these answers in my research:

python中的sumifs函数
在Python
Python熊猫对特定条件进行计数和求和

sumifs function in python
What is a good way to do countif in Python
Python Pandas counting and summing specific conditions

不太符合我的要求，因为它们没有跨越多个数据帧.我能够为单个数据框创建一个基本计数:

Not quite what I'm looking for, because they don't span multiple data frames. I was able to create a basic countifs for a singular data frame:

sum(1 for x in students['Student ID'] if x == 1)
sum(1 for x in exams['Exam Grade'] if x >= 70)

答

基本上，您要做的是设置两个数据框，对于通过考试"信息说df1，对于标记在上面说df2每次考试.

Basically what you'll want to do is set up two dataframes, say df1 for the "exams passed" information and df2 for the marks on each exam.

要开始工作，您可以像这样阅读excel文件:

To get yourself started, you can read in your excel files like this:

df1 = pd.read_excel('filename1.xlsx')
df2 = pd.read_excel('filename2.xlsx')

然后对于要分割df2的df1中的每一行，并获取分割后的数据帧的长度.

Then for each row in df1 you want to segment df2 and get the length of the segmented dataframe.

首先，尽管您可能想为df1中的每一行列出信息，这可以通过以下方式完成:

First though you might want to make list of information for each row in df1, which could be done like this:

student_info = df1[['Student ID', 'Enrollment Date', 'Qualification Date']].values

然后，您可以像这样遍历行:

Then you can iterate through the rows like this:

N_exams_passed = [] # Store counts for each student in a list

for s_id, s_enroll, s_qual in student_info:
    N_exams_passed.append(len(df2[(df2['Student ID']==s_id) &
                                  (df2['Exam Date']>=s_enroll) &
                                  (df2['Exam Date']<=s_qual) &
                                  (df2['Grade']>=70)])
                          )

然后在df1中添加/替换列:

Then add/replace the column in df1:

df1['Exams Passed'] = N_exams_passed

为了正确地比较日期，您需要将它们转换为每个熊猫数据框中的datetime对象，我将由您自己决定.提示:您可以使用pd.to_datetime()功能.

In order to compare the dates properly you will need to convert them to datetime objects in each pandas dataframe, and I will leave this up to you. Hint: you can use the pd.to_datetime() function.

使用多个条件和多个数据框的python pandas countifs

相关推荐