SAS中的状态持续时间
我有一个关于SAS的问题,以及对某个状态变量持续时间的分析。我想找到我的数据集中的每个人连续保持多长时间直到状态b出现。如果状态c在状态a之后发生,则持续时间应设置为零。请注意,如果pre_period处于状态a,则我还将持续时间设置为零,但是如果我获得另一个状态,则事后应计为该状态。
I have a question concerning SAS and the analysis of the duration of a certain state of a variable. I want to find how long each individual in my dataset stays in state a continiously until state b occurs. If state c occurs after state a the duration should be set to zero. Note that I would also set the duration to zero if pre_period is in state a, but if I get another state a afterwards that should be counted.
数据看起来像
pre_period week1 week2 week3 week4 week5 week6 week7 ...
id1 b b a a a b c c ...
id2 a a a a b a b b ...
id3 b b a a b a a b ...
id4 c c c a a a a a ...
id5 a b a b b a a b ...
id6 b a a a a a a a ...
在sas代码中设置的示例:
The sample set in sas code:
data work.sample_data;
input id $ pre_period $ (week1-week7) ($);
datalines;
id1 b b a a a b c c
id2 a a a a b a b b
id3 b b a a b a a b
id4 c c c a a a a a
id5 a b a b b a a b
id6 b a a a a a a a
;
所以对于id1,应该给我3的持续时间,对于id2 1,对于id3 3和1 ,对于id5 1和2以及id6 7的id4 5。
So for id1 that should give me a duration of 3, for id2 1, for id3 3 and 1, for id4 5 for id5 1 and 2 and for id6 7.
因此输出应如下所示:
dur1 dur2 dur3 dur4 ...
id1 3 . . . ...
id2 1 . . . ...
id3 3 1 . . ...
id4 5 . . . ...
id5 1 2 . . ...
id6 7 . . . ...
我是SAS的初学者,但没有找到解决此问题的方法。请注意,该数据集包含几千行和大约一千列,因此对于一个人来说,我可能想要捕获多个状态区间a(因此输出中有多个持续时间变量)。
I am a beginner in SAS and did not found a way to solve this problem. Note that the dataset contains several thousand rows and roughly a thousand columns, so that for one individual I might have several intervals of state a which I all want to capture (therefore several duration variables in the output).
我非常感谢您的任何建议。谢谢!
I am grateful for any advice. Thanks!
在这些情况下,这是明智的,并且可以考虑使用有限状态机。这样,以后如果您的需求发生变化,很容易扩展状态机。
In these cases, it could be wise and think in terms of a finite state machine. In this way, it is quite easy to extend the state machine later on if your requirements changes.
持续时间在三种情况下有效(包括从您的请求中得到的隐式情况)结果集):
The duration is valid in three cases (including the inplicit one given from your result set):
- 如果$,则应计算状态
a
的连续持续时间b $ b- 以状态
b
结尾, - 它仍处于状态
a
当数据集结束时, - 并且只要它在前期状态为 a 。
- The continious duration of state
a
should be counted if- it ends with state
b
, - it is still in state
a
when the data set ends, - and as long it does not start in the first week when the pre period state is
a
.
首先,我们必须满足前期要求,我们可以将此状态称为
pre_period_locked_state
:First of all, we have to take care of pre period requirement, we can call this state for
pre_period_locked_state
:do week = 1 to last_week; if current_state = pre_period_locked_state then do; if 'a' not = pre_period or 'a' not = week_state then do; current_state = duration_state; end;
接下来的事情是当状态不是
a $时c $ c>,这里称为
no_duration_state
:The next thing is disect is when the state is not
a
, here calledno_duration_state
:if current_state = no_duration_state then do; if 'a' = week_state then do; current_state = duration_state; end; end;
这是我们的空闲状态,只有在新的持续时间开始时才会更改。下一个状态称为
duration_state
,并定义为:This is our idle state and will only change when a new duration starts. This next state is named
duration_state
and defined as:if current_state = duration_state then do; if 'a' = week_state then do; duration_count = duration_count + 1; end; if ('a' not = week_state or week = last_week) and 0 < duration_count then do; current_state = dispatch_state; end; end;
第一部分可能是很自声明的,持续时间计数器。第二部分照顾持续时间的结束时间。
The first part is probably quite self declaring, the duration counter. The second part takes care of when a duration ends.
现在进入
dispatch_state
:if current_state = dispatch_state then do; if 'b' = week_state or 'a' = week_state and week = last_week then do; duration{duration_index} = duration_count; duration_index = duration_index + 1; end; duration_count = 0; current_state = no_duration_state; end;
这将照顾输出表的索引,并确保只存储有效的持续时间。
This takes care of the indexing of the output table and will also make sure that only store valid durations.
我在下面添加了
id7
,因为示例数据没有任何持续时间以b以外的状态结尾I added
id7
below, since the sample data did not have any duration that ended with a status other than b.data work.sample_data; input id $ pre_period $ (week1-week7) ($); datalines; id1 b b a a a b c c id2 a a a a b a b b id3 b b a a b a a b id4 c c c a a a a a id5 a b a b b a a b id6 b a a a a a a a id7 b a a c a a a a ;
完整的sas代码状态机:
The full sas code state machine:
data work.duration_fsm; set work.sample_data; array weeks{*} week1-week7; array duration{*} dur1-dur7; *states; initial_reset_state = 'initial_reset_state'; pre_period_locked_state = 'pre_period_locked_state'; duration_state = 'duration_state'; no_duration_state = 'no_duration_state'; dispatch_state = 'dispatch_state'; length current_state $ 50; *initial values; current_state = initial_reset_state; last_week = dim(weeks); keep id dur1-dur7; do week = 1 to last_week; if current_state = initial_reset_state then do; duration_count = 0; duration_index = 1; current_state = pre_period_locked_state; end; week_state = weeks{week}; if current_state = pre_period_locked_state then do; if 'a' not = pre_period and 'a' = week_state then do; current_state = duration_state; end; else if 'a' = pre_period and 'a' not = week_state then do; current_state = no_duration_state; end; end; if current_state = no_duration_state then do; if 'a' = week_state then do; current_state = duration_state; end; end; if current_state = duration_state then do; if 'a' = week_state then do; duration_count = duration_count + 1; end; if ('a' not = week_state or week = last_week) and 0 < duration_count then do; current_state = dispatch_state; end; end; if current_state = dispatch_state then do; if 'b' = week_state or 'a' = week_state and week = last_week then do; duration{duration_index} = duration_count; duration_index = duration_index + 1; end; duration_count = 0; current_state = no_duration_state; end; end; run;
这将输出
work.duration_fsm
:+-----+------+------+------+------+------+------+------+ | id | dur1 | dur2 | dur3 | dur4 | dur5 | dur6 | dur7 | +-----+------+------+------+------+------+------+------+ | id1 | 3 | | | | | | | | id2 | 1 | | | | | | | | id3 | 2 | 2 | | | | | | | id4 | 5 | | | | | | | | id5 | 1 | 2 | | | | | | | id6 | 7 | | | | | | | | id7 | 4 | | | | | | | +-----+------+------+------+------+------+------+------+
- it ends with state
- 以状态