根据pandas DataFrame列中的值序列查找行索引
我有一个DataFrame,其中的一列具有三个唯一的字符串。我需要做的是生成一个包含行索引的列表,该行的索引在良好之后为非常不好,但在不良之后为非常不好。
I have a DataFrame with a column that has three unique character strings. What I need to do is to generate a list containing indexes of rows that has 'very bad' after good, but not 'very bad' after 'bad'.
import random
df = pd.DataFrame({
'measure': [random.randint(0,10) for _ in range(0,20)],
})
df['status'] = df.apply(
lambda x: 'good' if x['measure'] > 4 else 'very bad' if x['measure'] < 2 else 'bad',
axis=1)
measure status
0 8 good
1 8 good
2 0 very bad
3 5 good
4 2 bad
5 3 bad
6 9 good
7 9 good
8 10 good
9 5 good
10 1 very bad
11 7 good
12 7 good
13 6 good
14 5 good
15 10 good
16 3 bad
17 0 very bad
18 3 bad
19 5 good
我希望得到这个列表:
[2,10]
对此是否有一线解决方案?
Is there a one line solution to this?
我不想使用数字值,因为它们仅在此处用于生成DataFrame或遍历所有行,这对于我的用例而言在计算上是昂贵的。
I don't want to use numeric values as they are used purely here to generate the DataFrame or loop over all rows which is computationally expensive for my use case.
如果数据框索引是默认范围索引,则可以使用以下方法:
If your dataframe index is default range index, then you can use this:
np.where((df['status'] == 'very bad') & (df['status'].shift() == 'good'))[0]
输出:
array([ 2, 10], dtype=int64)
其他,则可以使用以下命令:
Else, you can use the following:
irow = np.where((df['status'] == 'very bad') & (df['status'].shift() == 'good'))[0]
df.index[irow]