[Linux运维 -- 硬件]smartctl的使用 [Linux运维 -- 硬件]smartctl的使用

1. 是什么

常用的磁盘检查工具,smart(Self-Monitoring,Analysis and Reporting Technology)

2. 安装

(1)ubuntu

$ sudo apt-get install smartmontools

(2)rhat & Centos

$ yum install smartmontools

3. 使用

(1) 看磁盘是否支持smartctl

$ sudo smartctl -i /dev/sda1 
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.18-164.11.1.el5] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Constellation ES (SATA 6Gb/s)
Device Model:     ST1000NM0011
Serial Number:    Z1N0EVRZ
LU WWN Device Id: 5 000c50 03f123968
Firmware Version: SN02
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7202 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sun Aug 23 23:27:54 2015 CST
SMART support is: Available - device has SMART capability.          
SMART support is: Enabled

最后两行给出了是否支持smartctl

(2)手动开启支持smartctl

$ smartctl --smart=on --offlineauto=on --saveauto=on /dev/sda1

各个参数意思如下:

-s VALUE, --smart=VALUE
Enable/disable SMART on device (on/off)

-o VALUE, --offlineauto=VALUE (ATA)
Enable/disable automatic offline testing on device (on/off)

-S VALUE, --saveauto=VALUE (ATA)
Enable/disable Attribute autosave on device (on/off)

(3)检查磁盘的健康状况

$ sudo smartctl -H /dev/sda1 
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.18-164.11.1.el5] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

(4)显示磁盘的属性值

$ sudo smartctl -A /dev/sdl1
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.18-164.11.1.el5] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   084   063   044    Pre-fail  Always       -       238687534
  3 Spin_Up_Time            0x0003   099   099   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       3
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   087   060   030    Pre-fail  Always       -       573183052
  9 Power_On_Hours          0x0032   063   063   000    Old_age   Always       -       33120
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       3
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   075   049   045    Old_age   Always       -       25 (Min/Max 20/30)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       1
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       567
194 Temperature_Celsius     0x0022   025   051   000    Old_age   Always       -       25 (0 20 0 0 0)
195 Hardware_ECC_Recovered  0x001a   120   099   000    Old_age   Always       -       238687534
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       2
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       2
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

基本上,SMART属性表列出了制造商在硬盘中定义好的属性值,以及这些属性相关的故障阈值。这个表由驱动固件自动生成和更新。

  • ID: 属性值,通常是1到255之间的十进制数字
  • ATTRIBUTE_NAME:制造商定义的属性值
  • VALUE:这是表格中最重要的信息之一,代表给定属性的标准化值,在1到253之间。253意味着最好情况,1意味着最坏情况。取决于属性和制造商,初始化VALUE可以被设置成100或200.
  • FLAG:属性操作标志
  • THRESH: 在报告硬盘FAILED状态前,WORST可以允许的最小值
  • TYPE: 属性的类型(Pre-fail或Oldage)。Pre-fail类型的属性可被看成一个关键属性,表示参与磁盘的整体SMART健康评估(PASSED/FAILED)。如果任何Pre-fail类型的属性故障,那么可视为磁盘将要发生故障。另一方面,Oldage类型的属性可被看成一个非关键的属性(如正常的磁盘磨损),表示不会使磁盘本身发生故障。
  • UPDATED: 表示属性的更新频率。Offline代表磁盘上执行离线测试的时间。
  • WHEN_FAILED: 如果VALUE小于等于THRESH,会被设置成“FAILING_NOW”;如果WORST小于等于THRESH会被设置成“In_the_past”;如果都不是,会被设置成“-”。在“FAILING_NOW”情况下,需要尽快备份重要文件,特别是属性是Pre-fail类型时。“In_the_past”代表属性已经故障了,但在运行测试的时候没问题。“-”代表这个属性从没故障过。
  • RAW_VALUE: 制造商定义的原始值,从VALUE派生。

(5)测试磁盘

  • short 测试
$ sudo smartctl -t short /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.18-164.11.1.el5] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 1 minutes for test to complete.
Test will complete after Mon Aug 24 00:01:22 2015

Use smartctl -X to abort test.
  • long测试
$ sudo smartctl -t long /dev/sda
  • 看测试进度
$ sudo smartctl -l selftest /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.18-164.11.1.el5] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     33120         -
  • 停止测试
$ sudo smartctl -X /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.18-164.11.1.el5] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Abort SMART off-line mode self-test routine".
Self-testing aborted!

参考:

(1) http://linux.cn/article-4682-1.html
(2) http://xmodulo.com/check-hard-disk-health-linux-smartmontools.html
(3) http://chaorenyong.blog.51cto.com/2163445/1051859
(4) http://bbs.chinaunix.net/thread-4132241-1-1.html