查找列中的特定字符串,并找到与该字符串对应的最大值
我在想:
1。)如何在列中找到特定字符串
2.)字符串,我如何找到它对应的最大
3.)如何计算该列中每一行的字符串数量
1.) how do I find a specific string in a column
2.) given that string, how would I find it's corresponding max
3.) How do I count the number of strings for each row in that column
我有一个名为sports.csv的csv文件
I have a csv file called sports.csv
import pandas as pd
import numpy as np
#loading the data into data frame
X = pd.read_csv('sports.csv')
两个感兴趣的列是总计
和 Gym
列:
the two columns of interest are the Totals
and Gym
column:
Total Gym
40 Football|Baseball|Hockey|Running|Basketball|Swimming|Cycling|Volleyball|Tennis|Ballet
37 Baseball|Tennis
61 Basketball|Baseball|Ballet
12 Swimming|Ballet|Cycling|Basketball|Volleyball|Hockey|Running|Tennis|Baseball|Football
78 Swimming|Basketball
29 Baseball|Tennis|Ballet|Cycling|Basketball|Football|Volleyball|Swimming
31 Tennis
54 Tennis|Football|Ballet|Cycling|Running|Swimming|Baseball|Basketball|Volleyball
33 Baseball|Hockey|Swimming|Cycling
17 Football|Hockey|Volleyball
注意
列有每个相应的运动的多个字符串。我试图找到一种方法来找到所有的健身房有棒球,并找到一个与最大总。但是,我只对至少有两个运动的健身房感兴趣,我不想考虑:
Notice that the Gym
column has multiple strings for each corresponding sport.I'm trying to find a way to find all of the gyms that have Baseball and find the one with the max total. However, I'm only interested in gyms that have at least two other sports i.e. I wouldn't want to consider:
Total Gym
37 Baseball|Tennis
可以使用 pandas
轻松地做到这一点。首先,将字符串拆分为tab分隔符上的列表,然后迭代列表,并选择长度大于2的那些,因为您希望棒球以及其他两项运动作为标准。
First, split the strings into a list on the tab delimiter followed by iterating over the list and choosing the ones with the length greater than 2 as you would want baseball along with two other sports as the criteria.
In [4]: df['Gym'] = df['Gym'].str.split('|').apply(lambda x: ' '.join([i for i in x if len(x)>2]))
In [5]: df
Out[5]:
Total Gym
0 40 Football Baseball Hockey Running Basketball Sw...
1 37
2 61 Basketball Baseball Ballet
3 12 Swimming Ballet Cycling Basketball Volleyball ...
4 78
5 29 Baseball Tennis Ballet Cycling Basketball Foot...
6 31
7 54 Tennis Football Ballet Cycling Running Swimmin...
8 33 Baseball Hockey Swimming Cycling
9 17 Football Hockey Volleyball
使用 str.contains
搜索字符串 Baseball $ c $ b> $ c>
Using str.contains
to search for the string Baseball
in the column Gym
.
In [6]: df = df.loc[df['Gym'].str.contains('Baseball')]
In [7]: df
Out[7]:
Total Gym
0 40 Football Baseball Hockey Running Basketball Sw...
2 61 Basketball Baseball Ballet
3 12 Swimming Ballet Cycling Basketball Volleyball ...
5 29 Baseball Tennis Ballet Cycling Basketball Foot...
7 54 Tennis Football Ballet Cycling Running Swimmin...
8 33 Baseball Hockey Swimming Cycling
计算相应的字符串计数。
Compute respective string counts.
In [8]: df['Count'] = df['Gym'].str.split().apply(lambda x: len([i for i in x]))
该数据框对应于 Totals
列中的最大值。
Followed by choosing the subset of the dataframe corresponding to the maximum value in the Totals
column.
In [9]: df.loc[df['Total'].idxmax()]
Out[9]:
Total 61
Gym Basketball Baseball Ballet
Count 3
Name: 2, dtype: object