如何按多列分组
问题描述:
我想根据UserId,Date,category(每天使用频率),每个类别的最大持续时间以及一天中使用最多的那部分按不同的列对数据框进行分组,最后将结果存储在.csv文件.
I want to group by my dataframe by different columns based on UserId,Date,category (frequency of use per day ) ,max duration per category ,and the part of the day when it is most used and finally store the result in a .csv file.
name duration UserId category part_of_day Date
Settings 3.436 1 System tool evening 2020-09-10
Calendar 2.167 1 Calendar night 2020-09-11
Calendar 5.705 1 Calendar night 2020-09-11
Messages 7.907 1 Phone_and_SMS night 2020-09-11
Instagram 50.285 9 Social night 2020-09-28
Drive 30.260 9 Productivity night 2020-09-28
df.groupby(["UserId", "Date","category"])["category"].count()
我的代码结果是:
UserId Date category
1 2020-09-10 System tool 1
2020-09-11 Calendar 8
Clock 2
Communication 86
Health & Fitness 5
但是我想要这个结果
UserId Date category count(category) max-duration
1 2020-09-10 System tool 1 3
2020-09-11 Calendar 2 5
2 2020-09-28 Social 1 50
Productivity 1 30
我该怎么做?我找不到任何解决方案的想要的结果
How can I do that? I can not find the wanted result for any solution
答
您似乎想要计算每个组的统计信息.
It looks like you might be wanting to calculate statistics for each group.
grouped = df.groupby(["UserId", "Date","category"])
result = grouped.agg({'category': 'count', 'duration': 'max'})
result.columns = ['group_count','duration_max']
result = result.reset_index()
result
UserId Date category group_count duration_max
0 1 2020-09-10 System tool 1 3.436
1 1 2020-09-11 Calendar 2 5.705
2 1 2020-09-11 Phone_and_SMS 1 7.907
3 9 2020-09-28 Productivity 1 30.260
4 9 2020-09-28 Social 1 50.285