SAS中的状态持续时间

问题描述:

我有一个关于SAS的问题,以及对某个状态变量持续时间的分析。我想找到我的数据集中的每个人连续保持多长时间直到状态b出现。如果状态c在状态a之后发生,则持续时间应设置为零。请注意,如果pre_period处于状态a,则我还将持续时间设置为零,但是如果我获得另一个状态,则事后应计为该状态。

I have a question concerning SAS and the analysis of the duration of a certain state of a variable. I want to find how long each individual in my dataset stays in state a continiously until state b occurs. If state c occurs after state a the duration should be set to zero. Note that I would also set the duration to zero if pre_period is in state a, but if I get another state a afterwards that should be counted.

数据看起来像

    pre_period    week1 week2 week3 week4 week5 week6 week7 ...
id1 b             b     a     a     a     b     c     c     ...
id2 a             a     a     a     b     a     b     b     ...
id3 b             b     a     a     b     a     a     b     ...
id4 c             c     c     a     a     a     a     a     ...
id5 a             b     a     b     b     a     a     b     ...
id6 b             a     a     a     a     a     a     a     ...

在sas代码中设置的示例:

The sample set in sas code:

data work.sample_data;
input id $ pre_period $  (week1-week7) ($);
datalines;
id1 b b a a a b c c
id2 a a a a b a b b
id3 b b a a b a a b
id4 c c c a a a a a
id5 a b a b b a a b
id6 b a a a a a a a
;

所以对于id1,应该给我3的持续时间,对于id2 1,对于id3 3和1 ,对于id5 1和2以及id6 7的id4 5。

So for id1 that should give me a duration of 3, for id2 1, for id3 3 and 1, for id4 5 for id5 1 and 2 and for id6 7.

因此输出应如下所示:

    dur1 dur2 dur3 dur4 ...
id1 3    .    .    .    ...
id2 1    .    .    .    ...
id3 3    1    .    .    ...
id4 5    .    .    .    ...
id5 1    2    .    .    ...
id6 7    .    .    .    ...

我是SAS的初学者,但没有找到解决此问题的方法。请注意,该数据集包含几千行和大约一千列,因此对于一个人来说,我可能想要捕获多个状态区间a(因此输出中有多个持续时间变量)。

I am a beginner in SAS and did not found a way to solve this problem. Note that the dataset contains several thousand rows and roughly a thousand columns, so that for one individual I might have several intervals of state a which I all want to capture (therefore several duration variables in the output).

我非常感谢您的任何建议。谢谢!

I am grateful for any advice. Thanks!

在这些情况下,这是明智的,并且可以考虑使用有限状态机。这样,以后如果您的需求发生变化,很容易扩展状态机。

In these cases, it could be wise and think in terms of a finite state machine. In this way, it is quite easy to extend the state machine later on if your requirements changes.

持续时间在三种情况下有效(包括从您的请求中得到的隐式情况)结果集):

The duration is valid in three cases (including the inplicit one given from your result set):


  • 如果$,则应计算状态 a 的连续持续时间b $ b

    • 以状态 b 结尾,

    • 它仍处于状态 a 当数据集结束时,

    • 并且只要它在前期状态为 a 。

    • The continious duration of state a should be counted if
      • it ends with state b,
      • it is still in state a when the data set ends,
      • and as long it does not start in the first week when the pre period state is a.

      首先,我们必须满足前期要求,我们可以将此状态称为 pre_period_locked_state

      First of all, we have to take care of pre period requirement, we can call this state for pre_period_locked_state:

          do week = 1 to last_week;
              if current_state = pre_period_locked_state then do;
                  if 'a' not = pre_period or 'a' not = week_state then do;
                  current_state = duration_state;
              end;
      

      接下来的事情是当状态不是 a $时c $ c>,这里称为 no_duration_state

      The next thing is disect is when the state is not a, here called no_duration_state:

              if current_state = no_duration_state then do;
                  if 'a' = week_state then do;
                       current_state = duration_state;
                  end;
              end;
      

      这是我们的空闲状态,只有在新的持续时间开始时才会更改。下一个状态称为 duration_state ,并定义为:

      This is our idle state and will only change when a new duration starts. This next state is named duration_state and defined as:

              if current_state = duration_state then do;
                  if 'a' = week_state then do;
                      duration_count = duration_count + 1;
                  end;
                  if ('a' not = week_state or week = last_week) and 0 < duration_count then do;
                      current_state = dispatch_state;
                   end;
              end;
      

      第一部分可能是很自声明的,持续时间计数器。第二部分照顾持续时间的结束时间。

      The first part is probably quite self declaring, the duration counter. The second part takes care of when a duration ends.

      现在进入 dispatch_state

              if current_state = dispatch_state then do;
                  if 'b' = week_state or 'a' = week_state and week = last_week then do;
                      duration{duration_index} = duration_count;
                      duration_index = duration_index + 1;
                  end;
                  duration_count = 0; 
                  current_state = no_duration_state;
              end;
      

      这将照顾输出表的索引,并确保只存储有效的持续时间。

      This takes care of the indexing of the output table and will also make sure that only store valid durations.

      我在下面添加了 id7 ,因为示例数据没有任何持续时间以b以外的状态结尾

      I added id7 below, since the sample data did not have any duration that ended with a status other than b.

      data work.sample_data;
      input id $ pre_period $  (week1-week7) ($);
      datalines;
      id1 b b a a a b c c
      id2 a a a a b a b b
      id3 b b a a b a a b
      id4 c c c a a a a a
      id5 a b a b b a a b
      id6 b a a a a a a a
      id7 b a a c a a a a
      ;
      

      完整的sas代码状态机:

      The full sas code state machine:

       data work.duration_fsm;
          set work.sample_data;
          array weeks{*} week1-week7;
          array duration{*} dur1-dur7;
      
          *states;
          initial_reset_state = 'initial_reset_state';
          pre_period_locked_state = 'pre_period_locked_state';
          duration_state = 'duration_state';
          no_duration_state = 'no_duration_state';
          dispatch_state = 'dispatch_state';
          length current_state $ 50;
      
          *initial values;
          current_state = initial_reset_state;
          last_week = dim(weeks);
      
          keep id dur1-dur7;
      
          do week = 1 to last_week;
              if current_state = initial_reset_state then do;
                  duration_count = 0;
                  duration_index = 1;
              current_state = pre_period_locked_state;
              end;
              week_state = weeks{week};
              if current_state = pre_period_locked_state then do;
                  if 'a' not = pre_period and 'a' = week_state then do;
                          current_state = duration_state;
                      end;
                  else if 'a' = pre_period and 'a' not = week_state then do;
                      current_state = no_duration_state;
                  end;
              end;
              if current_state = no_duration_state then do;
                  if 'a' = week_state then do;
                       current_state = duration_state;
                  end;
              end;
              if current_state = duration_state then do;
                  if 'a' = week_state then do;
                      duration_count = duration_count + 1;
                  end;
                  if ('a' not = week_state or week = last_week) and 0 < duration_count then do;
                      current_state = dispatch_state;
                   end;
              end;
              if current_state = dispatch_state then do;
                  if 'b' = week_state or  'a' = week_state and week = last_week then do;
                      duration{duration_index} = duration_count;
                      duration_index = duration_index + 1;
                  end;
                  duration_count = 0; 
                  current_state = no_duration_state;
              end;
          end;
          run;
      

      这将输出 work.duration_fsm

      +-----+------+------+------+------+------+------+------+
      | id  | dur1 | dur2 | dur3 | dur4 | dur5 | dur6 | dur7 |
      +-----+------+------+------+------+------+------+------+
      | id1 |    3 |      |      |      |      |      |      |
      | id2 |    1 |      |      |      |      |      |      |
      | id3 |    2 |    2 |      |      |      |      |      |
      | id4 |    5 |      |      |      |      |      |      |
      | id5 |    1 |    2 |      |      |      |      |      |
      | id6 |    7 |      |      |      |      |      |      |
      | id7 |    4 |      |      |      |      |      |      |
      +-----+------+------+------+------+------+------+------+