Yarn临时目录不足招致Hive任务失败
Yarn临时目录不足导致Hive任务失败
从一张已有的Hive Table中创建新表及Partition出现如下问题
- 原有Hive Table中有160g数据(为三个月中所有应用和服务器的用户访问记录)
- 新表选取需要字段,并按照应用/服务器Ip/访问时间创建Partition
-
//创建table set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; CREATE TABLE IF NOT EXISTS app_trace( trace_id string, client_ip string, user_device string, user_id string, user_account string, org_id string, org_name string, org_path string, org_parent_id string, url string, completed boolean, cost int, create_time bigint, parameters map<string,string>, subtrace array<string> ) PARTITIONED BY (app_id int,server_ip string,create_date string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\|' COLLECTION ITEMS TERMINATED BY '\$' MAP KEYS TERMINATED BY '\:' STORED AS SEQUENCEFILE //加载数据 insert OVERWRITE table app_trace partition(app_id,server_ip,craete_date) select trace_id, client_ip, user_device, user_id, user_account, org_id, org_name, org_path, org_parent_id, url, completed, cost, create_time, parameters, subtrace, app_id, server_ip, create_date from user_trace;
-
Hive错误信息 写道Task with the most failures(4):
-----
Task ID:
task_1418272031284_0203_r_000071
URL:
http://HADOOP-5-101:8088/taskdetails.jsp?jobid=job_1418272031284_0203&tipid=task_1418272031284_0203_r_000071
-----
Diagnostic Messages for this Task:
Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in InMemoryMerger - Thread to merge in-memory shuffled map-outputs
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:221)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:88)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:250)
at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:208)
at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl$InMemoryMerger.merge(MergeManagerImpl.java:476)
at org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
Caused by: java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:219)
... 11 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 282 Reduce: 80 Cumulative CPU: 12030.1 sec HDFS Read: 79178863622 HDFS Write: 15785449373 FAIL
Total MapReduce CPU Time Spent: 0 days 3 hours 20 minutes 30 seconds 100 msec
经过排查,发现
-
HDFS存储正常[jyzx@HADOOP-5-101 main_disk]$ hdfs dfs -df -h
Filesystem Size Used Available Use%
hdfs://HADOOP-5-101:8020 8.9 T 625.9 G 7.8 T 7% -
DataNode本地存储异常[jyzx@HADOOP-5-101 main_disk]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
50G 46G 837M 99% /
tmpfs 7.8G 56K 7.8G 1% /dev/shm
/dev/cciss/c0d0p1 485M 32M 428M 7% /boot -
具体出现问题的目录/hadoop/yarn/local/usercache
[root@HADOOP-6-199 local]# du -h --max-depth=1
4.0K ./usercache_DEL_1411698127772
4.0K ./usercache_DEL_1411700964513
4.0K ./usercache_DEL_1411713191383
4.0K ./usercache_DEL_1418272057670
4.0K ./usercache_DEL_1411699568217
628K ./filecache
4.0K ./usercache_DEL_1411713338641
7.2G ./usercache
4.0K ./usercache_DEL_1411698079868
4.0K ./usercache_DEL_1411713240205
104K ./nmPrivate
7.2G . -
/hadoop/yarn/local/usercache是yarn的node-manager本地目录
yarn.nodemanager.local-dirs=/hadoop/yarn/local/usercache
解决方法
- 只需要修改yarn的配置yarn.nodemanager.local-dirs,指定到更大的存储上即可
- yarn.nodemanager.local-dirs=/mnt/disk1/hadoop/yarn/local/usercache
- 重启yarn集群