Yarn临时目录不足招致Hive任务失败

Yarn临时目录不足导致Hive任务失败

从一张已有的Hive Table中创建新表及Partition出现如下问题

原有Hive Table中有160g数据(为三个月中所有应用和服务器的用户访问记录)
新表选取需要字段,并按照应用/服务器Ip/访问时间创建Partition

//创建table
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

CREATE TABLE IF NOT EXISTS app_trace(
      trace_id string,
      client_ip string,
      user_device string,
      user_id string,
      user_account string,
      org_id string,
      org_name string,
      org_path string,
      org_parent_id string,
      url string,
      completed boolean,
      cost int,
      create_time bigint,
      parameters map<string,string>,
      subtrace array<string>
)
PARTITIONED BY (app_id int,server_ip string,create_date string)
ROW FORMAT DELIMITED
      FIELDS TERMINATED BY '\|'
      COLLECTION ITEMS TERMINATED BY '\$'
      MAP KEYS TERMINATED BY '\:'
STORED AS SEQUENCEFILE

//加载数据
insert OVERWRITE table app_trace partition(app_id,server_ip,craete_date)
  select
      trace_id,
      client_ip,
      user_device,
      user_id,
      user_account,
      org_id,
      org_name,
      org_path,
      org_parent_id,
      url,
      completed,
      cost,
      create_time,
      parameters,
      subtrace,
      app_id,
      server_ip,
      create_date
  from user_trace;

Hive错误信息写道

Task with the most failures(4):
-----
Task ID:
task_1418272031284_0203_r_000071

URL:
http://HADOOP-5-101:8088/taskdetails.jsp?jobid=job_1418272031284_0203&tipid=task_1418272031284_0203_r_000071
-----
Diagnostic Messages for this Task:
Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in InMemoryMerger - Thread to merge in-memory shuffled map-outputs
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:221)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:88)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:250)
at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:208)
at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl$InMemoryMerger.merge(MergeManagerImpl.java:476)
at org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
Caused by: java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:219)
... 11 more

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 282 Reduce: 80 Cumulative CPU: 12030.1 sec HDFS Read: 79178863622 HDFS Write: 15785449373 FAIL
Total MapReduce CPU Time Spent: 0 days 3 hours 20 minutes 30 seconds 100 msec

经过排查,发现

HDFS存储正常

[jyzx@HADOOP-5-101 main_disk]$ hdfs dfs -df -h
Filesystem Size Used Available Use%
hdfs://HADOOP-5-101:8020 8.9 T 625.9 G 7.8 T 7%
DataNode本地存储异常

[jyzx@HADOOP-5-101 main_disk]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
50G 46G 837M 99% /
tmpfs 7.8G 56K 7.8G 1% /dev/shm
/dev/cciss/c0d0p1 485M 32M 428M 7% /boot
具体出现问题的目录

/hadoop/yarn/local/usercache

[root@HADOOP-6-199 local]# du -h --max-depth=1
4.0K ./usercache_DEL_1411698127772
4.0K ./usercache_DEL_1411700964513
4.0K ./usercache_DEL_1411713191383
4.0K ./usercache_DEL_1418272057670
4.0K ./usercache_DEL_1411699568217
628K ./filecache
4.0K ./usercache_DEL_1411713338641
7.2G ./usercache
4.0K ./usercache_DEL_1411698079868
4.0K ./usercache_DEL_1411713240205
104K ./nmPrivate
7.2G .
/hadoop/yarn/local/usercache

是yarn的node-manager本地目录
yarn.nodemanager.local-dirs=/hadoop/yarn/local/usercache

解决方法

只需要修改yarn的配置yarn.nodemanager.local-dirs,指定到更大的存储上即可
yarn.nodemanager.local-dirs=/mnt/disk1/hadoop/yarn/local/usercache
重启yarn集群

Yarn临时目录不足招致Hive任务失败

从一张已有的Hive Table中创建新表及Partition出现如下问题

经过排查,发现

解决方法

相关推荐