在数据帧或向量中查找非数字数据

问题描述:

我用 read.csv()读了一些冗长的数据,令我惊讶的是,数据是作为因素而不是数字出现的,所以我猜测至少要有一个数据中的非数字项目.我如何找到这些物品在哪里?

I have read in some lengthy data with read.csv(), and to my surprise the data is coming out as factors rather than numbers, so I'm guessing there must be at least one non-numeric item in the data. How can I find where these items are?

例如,如果我具有以下数据框:

For example, if I have the following data frame:

df <- data.frame(c(1,2,3,4,"five",6,7,8,"nine",10))

我想知道第5行和第9行包含非数字数据.我该怎么办?

I would like to know that rows 5 and 9 have non-numeric data. How would I do that?

df <- data.frame(c(1,2,3,4,"five",6,7,8,"nine",10))

诀窍是知道通过 as.numeric(as.character(.))转换为数字会将非数字转换为 NA .

The trick is knowing that converting to numeric via as.numeric(as.character(.)) will convert non-numbers to NA.

which(is.na(as.numeric(as.character(df[[1]]))))
## 5 9

(仅使用 as.numeric(df [[1]])不起作用-只是删除保留数字代码的级别).

(just using as.numeric(df[[1]]) doesn't work - it just drops the levels leaving the numeric codes).

您可以选择隐藏警告:

which.nonnum <- function(x) {
   which(is.na(suppressWarnings(as.numeric(as.character(x)))))
}
which.nonnum(df[[1]])

为更加小心,您还应该在转换前检查这些值是否不适用:

To be more careful, you should also check that the values weren't NA before conversion:

which.nonnum <- function(x) {
   badNum <- is.na(suppressWarnings(as.numeric(as.character(x))))
   which(badNum & !is.na(x))
}