Mapreduce任务中map成功reduce失败如何解决?
我是在windows10上运行的hadoop-3.3.0,用的cmd窗口运行
python文件肯定没问题的,我在网上复制的教程的,再本地能成功运行,用的这个代码,hadoop fs -cat /input dir | /mapper.py | sort | /reducer.py
用的是python写的wordcount。发现只能运行mapper,如果mapreduce一起运行,会卡在执行reduce的时候。下面是出错记录。
我的yarn-site.xml配置
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>40960</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property>
</configuration>
我的mapred-site.xml的配置
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>%HADOOP_HOME%/share/hadoop/mapreduce/*,%HADOOP_HOME%/share/hadoop/mapreduce/lib/*,%HADOOP_HOME%/share/hadoop/common/*,%HADOOP_HOME%/share/hadoop/common/lib/*,%HADOOP_HOME%/share/hadoop/yarn/*,%HADOOP_HOME%/share/hadoop/yarn/lib/*,%HADOOP_HOME%/share/hadoop/hdfs/*,%HADOOP_HOME%/share/hadoop/hdfs/lib/*</value>
</property>
</configuration>
我的core-site.xml的配置
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://0.0.0.0:19000</value>
</property>
</configuration>
我的hdfs-site.xml的配置
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///C:/hadoop/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/C:/hadoop/data/datanode</value>
</property>
</configuration>
我的yarn-site.xml配置
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>40960</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property>
</configuration>
我的mapred-site.xml的配置
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>%HADOOP_HOME%/share/hadoop/mapreduce/*,%HADOOP_HOME%/share/hadoop/mapreduce/lib/*,%HADOOP_HOME%/share/hadoop/common/*,%HADOOP_HOME%/share/hadoop/common/lib/*,%HADOOP_HOME%/share/hadoop/yarn/*,%HADOOP_HOME%/share/hadoop/yarn/lib/*,%HADOOP_HOME%/share/hadoop/hdfs/*,%HADOOP_HOME%/share/hadoop/hdfs/lib/*</value>
</property>
</configuration>
我的core-site.xml的配置
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://0.0.0.0:19000</value>
</property>
</configuration>
我的hdfs-site.xml的配置
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///C:/hadoop/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/C:/hadoop/data/datanode</value>
</property>
</configuration>
我是在windows10上运行的hadoop-3.3.0,用的cmd窗口运行
python文件肯定没问题的,我在网上复制的教程的,再本地能成功运行,用的这个代码,hadoop fs -cat /input dir | /mapper.py | sort | /reducer.py
1 在你的.py文件开头添加:#!/usr/bin/env python (或者#!/usr/bin/python)
2.确认reducer.py是可执行的,chmod 744 mapper.py
3.确认py脚本中是否有未添加的依赖项
4.尝试将py脚本放到 /usr/bin 下面
5.你的reducer.py文件的空格、tab、换行等符号有问题 ,尝试用vim打开一个新文件,重新输入
注意:运行完之后失败请 hadoop fs -rmr 你的文件 将文件夹删除
还不行的话,参考https://blog.csdn.net/leo_weile/article/details/79792837这篇文章尝试解决一下。
1、把dubug日志打开
2、你的nodemanager 分配40G的内存么?服务器物理内存够用不
3、或者你把代码发出来,看下代码逻辑