如何强制 R 在回归中使用指定的因子水平作为参考?
问题描述:
如果我在回归中使用二元解释变量,如何告诉 R 使用某个级别作为参考?
How can I tell R to use a certain level as reference if I use binary explanatory variables in a regression?
它只是默认使用某个级别.
It's just using some level by default.
lm(x ~ y + as.factor(b))
与 b {0, 1, 2, 3, 4}
.假设我想使用 3 而不是 R 使用的零.
with b {0, 1, 2, 3, 4}
. Let's say I want to use 3 instead of the zero that is used by R.
答
参见 relevel()
函数.下面是一个例子:
See the relevel()
function. Here is an example:
set.seed(123)
x <- rnorm(100)
DF <- data.frame(x = x,
y = 4 + (1.5*x) + rnorm(100, sd = 2),
b = gl(5, 20))
head(DF)
str(DF)
m1 <- lm(y ~ x + b, data = DF)
summary(m1)
现在使用 relevel()
函数更改 DF
中的因子 b
:
Now alter the factor b
in DF
by use of the relevel()
function:
DF <- within(DF, b <- relevel(b, ref = 3))
m2 <- lm(y ~ x + b, data = DF)
summary(m2)
模型估计了不同的参考水平.
The models have estimated different reference levels.
> coef(m1)
(Intercept) x b2 b3 b4 b5
3.2903239 1.4358520 0.6296896 0.3698343 1.0357633 0.4666219
> coef(m2)
(Intercept) x b1 b2 b4 b5
3.66015826 1.43585196 -0.36983433 0.25985529 0.66592898 0.09678759