如何识别 R boxplot 中异常值的标签?
R boxplot 函数是一种非常有用的数据查看方式:它可以快速为您提供数据的大致位置和方差以及异常值数量的直观摘要.另外,我想找出异常值,以便快速发现数据集中的问题.
The R boxplot function is a very useful way to look at data: it quickly provides you with a visual summary of the approximate location and variance of your data, and the number of outliers. In addition, I'd like to identify the outliers, in order to quickly find problems in the dataset.
可以使用 myplot$out
访问这些异常值的值.不幸的是,这些异常值的标签似乎不可用.有一些软件包旨在在图本身上显示标签:http://www.r-statistics.com/2011/01/how-to-label-all-the-outliers-in-a-boxplot/,但是它们效果不佳,我只想列出这些异常值,我不需要它们出现在情节本身中.
The values of these outliers can be accessed using myplot$out
. Unfortunately, the labels of these outliers seem to be unavailable. There are some packages aimed at displaying the labels on the plot itself: http://www.r-statistics.com/2011/01/how-to-label-all-the-outliers-in-a-boxplot/, but they don't work well and I just want to list these outliers, I don't need them to be on the plot itself.
有什么想法吗?
大部分艰苦的工作都是您自己完成的.剩下的就是比较:
You've done most of the hard work yourself. All that is remaining is a comparison:
##First create some data
##You should include this in your question)
set.seed(2)
dd = data.frame(x = rlnorm(26), y=LETTERS)
抓取异常值
outliers = boxplot(dd$x, plot=FALSE)$out
从原始数据框中提取异常值
Extract the outliers from the original data frame
dd[dd$x %in% outliers,]
进一步说明:
变量 dd$x
是 26 个数字的向量.变量 outliers
包含异常值的值(只需在 R 控制台中键入 dd$x
和 outliers
).命令
The variable dd$x
is the vector of 26 numbers. The variable outliers
contains the values of the outliers (just type dd$x
and outliers
in your R console). The command
dd$x %in% outliers
匹配 dd$x 和异常值的值,即:
matches the values of dd$x and outliers, viz:
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE <snip>
方括号表示法,dd[dd$x %in% outliers,]
返回数据框dd
的行,其中dd$x %in% 异常值
返回 TRUE
.
The square bracket notation, dd[dd$x %in% outliers,]
returns the rows of the data frame dd
, where dd$x %in% outliers
return TRUE
.