有条件地根据现有列中的特定数值(键)创建新列
我有一个data.frame df
,其中x列填充有整数(1-9).我想根据x的值更新列y和z,如下所示:
I have a data.frame df
where the column x is populated with integers (1-9). I would like to update columns y and z based on the value of x as follows:
if x is 1,2, or 3 | y = 1 ## if x is 1,4, or 7 | z = 1
if x is 4,5, or 6 | y = 2 ## if x is 2,5, or 8 | z = 2
if x is 7,8, or 9 | y = 3 ## if x is 3,6, or 9 | z = 3
下面是一个data.frame,具有所需的 y
和 z
Below is a data.frame with the desired output for y
and z
df <- structure(list(x = c(1L, 2L, 3L, 3L, 4L, 2L, 1L, 2L, 5L, 2L,
1L, 6L, 3L, 7L, 3L, 2L, 1L, 4L, 3L, 2L), y = c(1L, 1L, 1L, 1L,
2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 3L, 1L, 1L, 1L, 2L, 1L, 1L
), z = c(1L, 2L, 3L, 3L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 3L, 3L,
1L, 3L, 2L, 1L, 1L, 3L, 2L)), .Names = c("x", "y", "z"), class = "data.frame", row.names = c(NA,
-20L))
我可以编写带有多个if语句的for循环,以逐行填充 y
和 z
.似乎不太r:它不是向量化的.有没有一种方法可以指定哪些数值将对应于新的数值?就像地图或键一样,用于指示将基于先前的值变为哪些值.
I can write a for-loop with multiple if statements to fill y
and z
row by row. This doesn't seem very r: it is not vectorized. Is there a method to specify what numeric values will correspond to new numeric values? Like a map or key to indicate which values will become based on the previous values.
解决方案1:查找向量
假设我在评论中指出的不匹配是数据中的错误,而不是规则中的错误,那么您可以按以下步骤完成操作:
Solution #1: Lookup Vector
Assuming the mismatches I pointed out in my comment are mistakes in the data, and not in the rules, then you can accomplish this as follows:
x2y <- rep(1:3,each=3);
x2z <- rep(1:3,3);
df$y <- x2y[df$x];
df$z <- x2z[df$x];
df1 <- df; ## for identical() calls later
df;
## x y z
## 1 1 1 1
## 2 2 1 2
## 3 3 1 3
## 4 3 1 3
## 5 4 2 1
## 6 2 1 2
## 7 1 1 1
## 8 2 1 2
## 9 5 2 2
## 10 2 1 2
## 11 1 1 1
## 12 6 2 3
## 13 3 1 3
## 14 7 3 1
## 15 3 1 3
## 16 2 1 2
## 17 1 1 1
## 18 4 2 1
## 19 3 1 3
## 20 2 1 2
上述解决方案取决于 x
的域包含从1开始的连续整数值,因此直接指向查找向量"的索引就足够了.如果 x
以很高的数字开始但仍是连续的,则可以通过在索引之前减去 x
的最小值减去一个来使此解决方案有效.
The above solution is dependent on the fact that the domain of x
consists of contiguous integer values beginning from 1, so a direct index into a "lookup vector" suffices. If x
began at a very high number but was still contiguous you could make this solution work by subtracting one less than the minimum of x
before indexing.
如果您不喜欢此假设,则可以使用查找表完成任务:
If you don't like this assumption, then you can accomplish the task with a lookup table:
library('data.table');
lookup <- data.table(x=1:9,y=x2y,z=x2z,key='x');
lookup;
## x y z
## 1: 1 1 1
## 2: 2 1 2
## 3: 3 1 3
## 4: 4 2 1
## 5: 5 2 2
## 6: 6 2 3
## 7: 7 3 1
## 8: 8 3 2
## 9: 9 3 3
df[c('y','z')] <- lookup[df['x'],.(y,z)];
identical(df,df1);
## [1] TRUE
或基本R方法:
lookup <- data.frame(x=1:9,y=x2y,z=x2z);
lookup;
## x y z
## 1 1 1 1
## 2 2 1 2
## 3 3 1 3
## 4 4 2 1
## 5 5 2 2
## 6 6 2 3
## 7 7 3 1
## 8 8 3 2
## 9 9 3 3
df[c('y','z')] <- lookup[match(df$x,lookup$x),c('y','z')];
identical(df,df1);
## [1] TRUE
解决方案3:算术表达式
另一种替代方法是设计与映射等效的算术表达式:
Solution #3: Arithmetic Expression
Yet another alternative is to devise arithmetic expressions equivalent to the mapping:
df$y <- (df$x-1L)%/%3L+1L;
df$z <- 3L--df$x%%3L;
identical(df,df1);
## [1] TRUE
此特定解决方案取决于您的映射恰好具有可用于算术描述的规则性这一事实.
This particular solution is dependent on the fact that your mapping happens to possess a regularity that lends itself to arithmetic description.
关于实现,它还利用了 C/C ++ 和 Java ),即一元负数高于模数,而二元负数高于二进制减法,因此 df $ z
的计算等效于 3L-((-df $x)%% 3L)
.
With regard to implementation, it also takes advantage of a bit of a non-obvious property of R precedence rules (actually this is true of other languages as well, such as C/C++ and Java), namely that unary negative is higher than modulus which is higher than binary subtraction, thus the calculation for df$z
is equivalent to 3L-((-df$x)%%3L)
.
要进一步详细了解 z
的计算:不可能用 df $ x %% 3
的直模来描述映射,因为3,6和9输入将修改为零.这可以通过简单的索引分配操作来解决,但我想实现一个更简单且纯算术的解决方案.要从零变到3,我们可以从3中减去 df $ x %% 3
,但这会弄乱(反转)剩余的值.我意识到,通过取输入值的负的模,我们将预反转"它们,然后从3中减去它们全部将校正"它们,并且还将转换零按需要分成3个.
To go into more detail regarding the z
calculation: It is not possible to describe the mapping with a straight modulus of df$x%%3
, because the 3, 6, and 9 inputs would mod to zero. That could be solved with a simple index-assign operation, but I wanted to achieve a simpler and purely arithmetic solution. To get from zero to 3 we can subtract df$x%%3
from 3, but that would mess up (invert) the remaining values. I realized that by taking the mod of the negative of the input values, we would "pre-invert" them, and then subtracting all of them from 3 would "right" them and would also convert the zeroes into 3, as desired.