读取R中数据集时出错

问题描述：

在R中的数据集中读取如下：

When reading in my data set in R as follows:

Dataset.df <- read.table("C:\\dataset.txt", header=T)

我收到以下错误消息：

I get the following error message:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
   line 1 did not have 145 elements

这是什么意思，可以有人告诉我如何修复它？

What does this mean and can somebody tell me how to fix it?

答

这个错误是非常自明的，数据文件的第一行似乎有数据丢失（或第二行，视情况而定，因为您使用 header = TRUE ）。

This error is pretty self-explanatory. There seem to be data missing in the first line of your data file (or second line, as the case may be since you're using header = TRUE).

这是一个迷你示例：

## Create a small dataset to play with
cat("V1 V2\nFirst 1 2\nSecond 2\nThird 3 8\n", file="test.txt")

R自动检测到它应该期望rownames加上两列（3元素），但它没有在第2行找到3个元素，所以你得到一个错误：

R automatically detects that it should expect rownames plus two columns (3 elements), but it doesn't find 3 elements on line 2, so you get an error:

read.table("test.txt", header = TRUE)
# Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
#   line 2 did not have 3 elements

查看数据文件，看看确实有问题：

Look at the data file and see if there is indeed a problem:

cat(readLines("test.txt"), sep = "\n")
# V1 V2
# First 1 2
# Second 2
# Third 3 8

手动更正可能是或者我们可以假设第二行行中的值第一个值应在第一列中，其他值应为 NA 。如果是这种情况， fill = TRUE 足以解决您的问题。

Manual correction might be needed, or we can assume that the value first value in the "Second" row line should be in the first column, and other values should be NA. If this is the case, fill = TRUE is enough to solve your problem.

read.table("test.txt", header = TRUE, fill = TRUE)
#        V1 V2
# First   1  2
# Second  2 NA
# Third   3  8

R也很聪明，即使丢失了rownames，也需要多少元素：

R is also smart enough to figure it out how many elements it needs even if rownames are missing:

cat("V1 V2\n1\n2 5\n3 8\n", file="test2.txt")
cat(readLines("test2.txt"), sep = "\n")
# V1 V2
# 1
# 2 5
# 3 8
read.table("test2.txt", header = TRUE)
# Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
#   line 1 did not have 2 elements
read.table("test2.txt", header = TRUE, fill = TRUE)
#   V1 V2
# 1  1 NA
# 2  2  5
# 3  3  8

相关推荐