阅读上一个和下一个观察结果

阅读上一个和下一个观察结果

问题描述:

我有一个这样的数据集(sp 是一个指标):

I have a dataset like this(sp is an indicator):

datetime        sp
ddmmyy:10:30:00 N
ddmmyy:10:31:00 N
ddmmyy:10:32:00 Y
ddmmyy:10:33:00 N
ddmmyy:10:34:00 N

我想用Y"以及上一个和下一个提取观察:

And I would like to extract observations with "Y" and also the previous and next one:

ID              sp
ddmmyy:10:31:00 N
ddmmyy:10:32:00 Y
ddmmyy:10:33:00 N

我厌倦了使用滞后"并成功提取了带有Y"和下一个的观察结果,但仍然不知道如何提取前一个.

I tired to use "lag" and successfully extract the observations with "Y" and the next one, but still have no idea about how to extract the previous one.

这是我的尝试:

data surprise_6_step3; set surprise_6_step2;
length lag_sp $1;
lag_sp=lag(sp);
if sp='N' and lag(sp)='N' then delete;
run;

结果是:

ID              sp
ddmmyy:10:32:00 Y
ddmmyy:10:33:00 N

还有什么方法可以提取先前的观察结果?感谢您的帮助.

Any methods to extract the previous observation also? Thx for any help.

尝试在数据步骤中的 set 语句中使用 point 选项.像这样:

Try using the point option in set statement in data step. Like this:

data extract;
set surprise_6_step2 nobs=nobs;
if sp = 'Y' then do;
  current = _N_;
  prev = current - 1;
  next = current + 1;

  if prev > 0 then do;
    set x point = prev;
    output;
  end;

  set x point = current;
  output;

  if next <= nobs then do;
    set x point = next;
    output;
  end;
end;

run;

当您在 set 语句中使用数据集时,存在一个隐式循环._N_ 是一个自动变量,它包含有关隐式循环的观察信息(从 1 开始).当您找到您的值时,您将 _N_ 的值存储到变量 current 中,以便您知道在哪一行找到它.nobs 是数据集中的观察总数.

There is an implicite loop through dataset when you use it in set statement. _N_ is an automatic variable that contains information about what observation is implicite loop on (starts from 1). When you find your value, you store the value of _N_ into variable current so you know on which row you have found it. nobs is total number of observations in a dataset.

检查 prev 是否大于 0,如果 next 小于 nobs 可避免错误,如果您的行在数据集中(则没有前一行),如果您的行在数据集中的最后一行(则没有下一行).

Checking if prev is greater then 0 and if next is less then nobs avoids an error if your row is first in a dataset (then there is no previous row) and if your row is last in a dataset (then there is no next row).