R中的NA值存在问题

R中的NA值存在问题

问题描述:

我觉得这应该很容易,我看过x互联网,但是我不断收到错误消息.我过去做过很多分析,但是对于R和编程来说是新手.

I feel this should be something easy, I have looked x the internet, but I keep getting error messages. I have done plenty of analytics in the past but am new to R and programming.

我有一个相当基本的功能来计算均值x列数据:

I have a pretty basic function to calculate means x columns of data:

columnmean <-function(y){
  nc <- ncol(y)
  means <- numeric(nc)
  for(i in 1:nc) {
    means[i] <- mean(y[,i])
  }
    means 
}

我正在RStudio中,并使用随附的空气质量"数据集对其进行测试.当我加载AQ数据集并运行我的函数时:

I'm in RStudio and testing it using the included 'airquality' dataset. When I load the AQ dataset and run my function:

data("airquality")
columnmean(airquality)

我回来了:

NA NA 9.957516 77.882353 6.993464 15.803922

NA NA 9.957516 77.882353 6.993464 15.803922

因为AQ中的前两个变量中包含NA. K,太酷了.我想抑制NA,以使R会忽略它们并始终运行该函数.

Because the first two variables in AQ have NAs in them. K, cool. I want to suppress the NAs such that R will ignore them and run the function anyway.

我读到我可以使用na.rm = TRUE来指定它,例如:

I am reading that I can specify this with na.rm=TRUE, like:

columnmean(airquality, na.rm = TRUE)

但是当我这样做时,我收到一条错误消息:

But when I do this, I get an error message saying:

"columnmean(airquality,na.rm = TRUE)中的错误: 未使用的参数(na.rm = TRUE)"

"Error in columnmean(airquality, na.rm = TRUE) : unused argument (na.rm = TRUE)"

我正在阅读所有我只需要包含na.rm = TRUE的地方,该函数将运行并忽略NA值...但是我一直收到此错误.我也尝试过use ="complete"和其他任何我能找到的东西.

I'm reading all over the place that I simply need to include na.rm = TRUE and the function will run and ignore the NA values...but I keep getting this error. I have also tried use = "complete" and anything else I can find.

两个警告:

我知道我可以使用is.na创建一个向量,然后对数据进行子集化,但是我不想执行额外的步骤,我只是希望它运行该函数并忽略丢失的数据.

I know I can create a vector with is.na and then subset the data, but I don't want that extra step, I just want it to run the function and ignore the missing data.

我也知道我可以在IN函数中指定忽略或不忽略的功能,但是我希望有一种方法可以选择逐个忽略或不忽略,而不是让它成为其中的一部分功能本身.

I know also I can specify IN the function to ignore or not ignore, but I'd like a way to choose to ignore/not ignore on the fly, on a action by action basis, rather than having it be part of the function itself.

我们非常感谢您的帮助.谢谢大家.

Help is appreciated. Thank you, everyone.

我们可以在mean

columnmean <-function(y){
  nc <- ncol(y)
  means <- numeric(nc)
  for(i in 1:nc) {
    means[i] <- mean(y[,i], na.rm = TRUE)
  }
   means 
}


如果我们有时需要将na.rm参数用作FALSE,而有时需要将TRUE用作TRUE,则在'columnmean'的参数中指定该参数.


If we need to use na.rm argument sometimes as FALSE and other times as TRUE, then specify that in the argument of 'columnmean'

columnmean <-function(y, ...){
    nc <- ncol(y)
  means <- numeric(nc)
   for(i in 1:nc) {
     means[i] <- mean(y[,i], ...)
   }
   means 
  }

columnmean(df1, na.rm = TRUE)
#[1] 1.5000000 0.3333333
 columnmean(df1, na.rm = FALSE)
#[1] 1.5  NA

数据

 df1 <- structure(list(num = c(1L, 1L, 2L, 2L), x1 = c(1L, NA, 0L, 0L
 )), .Names = c("num", "x1"), row.names = c(NA, -4L), class = "data.frame")