pandas 数据框条件更改

问题描述：

我正在处理csv时间序列数据，该数据显示了每个时间范围内的步数。一旦步数超过65535，它将从0开始计数，依此类推。但是，由于并非所有数据集都具有65535计数（有些从65530开始，如果在时间范围内进行了多步，则从5开始），我无法找到一种好方法来处理它，使6553x之后的每个0都将变为65536。等等。

I'm working on csv time series data, which shows count of step per some time frame. Once the step count is exceeding 65535, it will count start from 0, etc. However since not all the dataset has 65535 count (some goes from 65530, then 5, if they made several steps on the time frame), I can't find a good way to handle it so that every 0 after 6553x will change to 65536.. etc.

step    realstep
65531     65531
65533     65533
65534     65534
2         65538
4         65540

我正在尝试计算实际步长，以获取它们的差异（例如，步长/分钟）。

I'm trying to count the real step in order to get their difference (e.g step/minute).

答

找到复位位置，其中 diff 为负，然后将最大计数器值（由于从0开始计数，所以为65536）添加到超出该值的所有行。如果多次重置（我添加了一些额外的数据），这将很灵活

Find where it resets with diff being negative and add the max counter value (65536 since you count from 0) to all rows beyond that. This will be flexible if it resets multiple times (I added some extra data)

df['real_step'] = df.step + df.step.diff(1).lt(0).cumsum()*65536

    step  real_step
0  65531      65531
1  65533      65533
2  65534      65534
3      2      65538
4      4      65540
5  65434     130970
6      2     131074
7      4     131076

相关推荐