Hadoop2-YARN 伪分布模式筹建

Hadoop2-YARN 伪分布模式搭建

网络上关于搭建YARN开发环境的文章也不少了，在我搭建的过程中进行了参考，发现好多都有

问题，特别时针对如何在Eclipse中运行WordCount没有详细的介绍，本篇文章是我自己尝试搭

建YARN伪分布模式开发环境的一个总结，若有疑问欢迎讨论，谢谢！

1. 系统环境

   Memory: 3G
   CentOS6.3 x86-64
   jdk-6u37-linux-x64.bin
   hadoop-2.0.2-alpha.tar.gz
   并配置好Java环境变量。
2. 配置hosts、IP及SSH认证
   [kevin@linux-fdc ~]$ cat /etc/hosts
   127.0.0.1   localhost localhost.localdomain
   ::1         localhost6 localhost6.localdomain6
   192.168.81.251   linux-fdc.tibco.com linux-fdc
3. 创建Hadoop账户
   (1)创建账户
   useradd -g kevin -d /home/kevin -m kevin
   (2)创建密码
   passwd kevin
   (3)删除账户
   userdel --help
   groupdel --help
   (4)查看账户
   cat /etc/group
   cat /etc/passwd
4. 解压hadoop-2.0.2-alpha.tar.gz
   解压hadoop-2.0.2-alpha.tar.gz至/usr/custom/hadoop-2.0.2-alpha.
5. 配置Hadoop环境变量
   export HADOOP_HOME=/usr/custom/hadoop-2.0.2-alpha
   export HADOOP_MAPRED_HOME=$HADOOP_HOME
   export HADOOP_COMMON_HOME=$HADOOP_HOME
   export HADOOP_HDFS_HOME=$HADOOP_HOME
   export YARN_HOME=$HADOOP_HOME
   export HADOOP_LIB=$HADOOP_HOME/lib
   export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
   export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

   export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
6. 配置Hadoop
   (1)core-site.xml

<configuration>

	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://localhost:9000</value>
		<final>true</final>
		<description>
			The name of the default file system. A URI whose scheme
			and authority determine the FileSystem implementation. The uri's
			scheme determines the config property (fs.SCHEME.impl) naming the
			FileSystem implementation class. The uri's authority is used to
			determine the host, port, etc. for a filesystem.
		</description>
	</property>

	<property>
		<name>hadoop.tmp.dir</name>
		<value>/home/kevin/workspace-yarn/tmp</value>
		<description>
			A base for other temporary directories.
		</description>
	</property>

	<property>
		<name>io.native.lib.available</name>
		<value>true</value>
		<description>
			Should native hadoop libraries, if present, be used.
		</description>
	</property>

	<property>
		<name>io.file.buffer.size</name>
		<value>131072</value>
		<final>true</final>
		<description>
			The size of buffer for use in sequence files. The size of
			this buffer should probably be a multiple of hardware page size (4096
			on Intel x86), and it determines how much data is buffered during
			read and write operations.
		</description>
	</property>

</configuration>

(2)hdfs-site.xml

<configuration>

	<property>
		<name>dfs.namenode.name.dir</name>
		<value>/home/kevin/workspace-yarn/dfs/name</value>
		<description>
			Determines where on the local filesystem the DFS name
			node should store the name table(fsimage). If this is a
			comma-delimited list of directories then the name table is replicated
			in all of the directories, for redundancy.
		</description>
		<final>true</final>
	</property>

	<property>
		<name>dfs.datanode.data.dir</name>
		<value>/home/kevin/workspace-yarn/dfs/data</value>
		<description>
			Determines where on the local filesystem an DFS data node
			should store its blocks. If this is a comma-delimited list of
			directories, then data will be stored in all named directories,
			typically on different devices. Directories that do not exist are
			ignored.
		</description>
		<final>true</final>
	</property>

	<property>
		<name>dfs.namenode.edits.dir</name>
		<value>/home/kevin/workspace-yarn/dfs/edits</value>
		<description>
			Determines where on the local filesystem the DFS name
			node should store the transaction (edits) file. If this is a
			comma-delimited list of directories then the transaction file is
			replicated in all of the directories, for redundancy. Default value
			is same as dfs.name.dir
		</description>
	</property>

	<property>
		<name>dfs.replication</name>
		<value>1</value>
		<description>
			Default block replication. The actual number of
			replications can be specified when the file is created. The default
			is used if replication is not specified in create time.
		</description>
	</property>
	<property>
		<name>dfs.permissions.enabled</name>
		<value>false</value>
		<description>
			If "true", enable permission checking in HDFS. If
			"false", permission checking is turned off, but all other behavior is
			unchanged. Switching from one parameter value to the other does not
			change the mode, owner or group of files or directories.
		</description>
	</property>

</configuration>

(3)mapred-site.xml

<configuration>
	<property>
		<name>mapreduce.framework.name</name>
		<valie>yarn</valie>
		<description>
			The runtime framework for executing MapReduce jobs. Can
			be one of local, classic or yarn.
		</description>
	</property>

	<property>
		<name>yarn.app.mapreduce.am.staging-dir</name>
		<value>/home/kevin/workspace-yarn/history/stagingdir</value>
		<description>
			YARN requires a staging directory for temporary files
			created by running jobs. By default it creates
			/tmp/hadoop-yarn/staging with restrictive permissions that may
			prevent your users from running jobs. To forestall this, you should
			configure and create the staging directory yourself.
		</description>
	</property>

	<property>
		<name>mapreduce.task.io.sort.mb</name>
		<value>100</value>
		<description>
			The total amount of buffer memory to use while sorting
			files, in megabytes. By default, gives each merge stream 1MB, which
			should minimize seeks.
		</description>
	</property>
	<property>
		<name>mapreduce.task.io.sort.factor</name>
		<value>10</value>
		<description>
			More streams merged at once while sorting files.
		</description>
	</property>
	<property>
		<name>mapreduce.reduce.shuffle.parallelcopies</name>
		<value>5</value>
		<description>
			Higher number of parallel copies run by reduces to fetch
			outputs from very large number of maps.
		</description>
	</property>

	<property>
		<name>yarn.nodemanager.resource.memory-mb</name>
		<value>1024</value>
		<description>The amount of memory on the NodeManager in GB, the
			default:8192.
		</description>
	</property>

</configuration>

(4)yarn-site.xml

<configuration>

	<!-- Site specific YARN configuration properties -->
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce.shuffle</value>
		<description>
			Shuffle service that needs to be set for Map Reduce
			applications.
		</description>
	</property>

	<property>
		<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
		<value>org.apache.hadoop.mapred.ShuffleHandler</value>
		<description>
			The exact name of the class for shuffle service.
		</description>
	</property>

	<property>
		<name>yarn.resourcemanager.scheduler.address</name>
		<value>linux-fdc.tibco.com:8030</value>
		<description>
			ResourceManager host:port for ApplicationMasters to talk
			to Scheduler to obtain resources.
			Host is the hostname of the
			resourcemanager and port is
			the port on which the Applications in the
			cluster talk to the Resource Manager.
		</description>
	</property>

	<property>
		<name>yarn.resourcemanager.resource-tracker.address</name>
		<value>linux-fdc.tibco.com:8031</value>
		<description>
			ResourceManager host:port for NodeManagers.
			Host is the
			hostname of the resource manager and
			port is the port on which the
			NodeManagers contact the Resource Manager.
		</description>
	</property>

	<property>
		<name>yarn.resourcemanager.address</name>
		<value>linux-fdc.tibco.com:8032</value>
		<description>
			The address of the applications manager interface in the
			RM.
		</description>
	</property>

	<property>
		<name>yarn.resourcemanager.admin.address</name>
		<value>linux-fdc.tibco.com:8033</value>
		<description>
			ResourceManager host:port for administrative commands.
			The address of the RM admin interface.
		</description>
	</property>

	<property>
		<name>yarn.resourcemanager.webapp.address</name>
		<value>linux-fdc.tibco.com:8088</value>
		<description>
			The address of the RM web application.
		</description>
	</property>

	<property>
		<name>yarn.nodemanager.local-dirs</name>
		<value>/home/kevin/workspace-yarn/nm/local</value>
		<description>
			Specifies the directories where the NodeManager stores
			its localized files.
			All of the files required for running a
			particular YARN application will be
			put here for the duration of the application run.
			必须配置，如果不配置将使得NodeManager处于Unhealthy状态，无法提供服务，现象是提交作业时，
			作业一直处于pending状态无法往下执行。
		</description>
	</property>

	<property>
		<name>yarn.nodemanager.log-dirs</name>
		<value>/home/kevin/workspace-yarn/nm/log</value>
		<description>ResourceManager web-ui host:port.
			Specifies the
			directories where the NodeManager stores container log files.
			必须配置，如果不配置将使得NodeManager处于Unhealthy状态，无法提供服务，现象是提交作业时，
			作业一直处于pending状态无法往下执行。
		</description>
	</property>

	<property>
		<name>yarn.nodemanager.remote-app-log-dir</name>
		<value>/home/kevin/workspace-yarn/aggrelog</value>
		<description>
			Specifies the directory where logs are aggregated.
		</description>
	</property>

</configuration>

7. hadoop-env.sh增加JAVA_HOME

   # The java implementation to use.
   export JAVA_HOME=/usr/custom/jdk1.6.0_37
8. 格式化HDFS
   bin/hdfs namenode -format
9. 启动HDFS
   sbin/start-dfs.sh
   或者
   sbin/hadoop-daemon.sh start namenode
   sbin/hadoop-daemon.sh start datanode
10. 启动YARN
   sbin/start-yarn.sh
   或者
   sbin/yarn-daemon.sh start resourcemanager
   sbin/yarn-daemon.sh start nodemanager
11.查看集群
   (1)查看集群: http://192.168.81.251:8088/
   (2)Namenode: localhost:50070/dfshealth.jsp
   (3) SencondNameNode: 192.168.81.251:50090/status.jsp
12.Eclipse中运行Example: WordCount.java

因为我内存较小的晕因，通过hadoop jar 命令直接运行hadoop-mapreduce-examples-2.0.2-alpha.jar

中的wordcount时，在Map阶段就产出了Java hea space异常，在将WordCount代码Import

进Eclipse时，按照Hadoop v1的方式运行该例子也遇到了不少的问题，现将我成功运行该例子的

步骤记录下来，仅供参考：

(1) 启动RM, NM, NN, DN, SNN

(2) 上传测试文件:student.txt至HDFS

创建/input文件夹：hadoop fs -mkdir /input

上传文件：hadoop fs -put /home/kevin/Documents/student.txt /input/student.txt

查看结果：

     [kevin@linux-fdc ~]$ hadoop fs -ls -d -R /input/student.txt
     Found 1 items
     -rw-r--r--   1 kevin supergroup        131 2013-01-19 10:30 /input/student.txt
Hadoop2-YARN 伪分布模式筹建
    (3) Eclipse 中Run Configurations...中的Arguments Tab中在Program arguments中输入：

hdfs://localhost:9000/input hdfs://localhost:9000/output
Hadoop2-YARN 伪分布模式筹建
(4) 运行日志

2013-01-19 10:43:01,088 WARN  conf.Configuration (Configuration.java:warnOnceIfDeprecated(816)) - session.id is deprecated. Instead, use dfs.metrics.session-id
2013-01-19 10:43:01,095 INFO  jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
2013-01-19 10:43:01,599 WARN  util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2013-01-19 10:43:01,682 WARN  mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(247)) - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2013-01-19 10:43:01,734 INFO  input.FileInputFormat (FileInputFormat.java:listStatus(245)) - Total input paths to process : 1
2013-01-19 10:43:01,817 WARN  snappy.LoadSnappy (LoadSnappy.java:<clinit>(46)) - Snappy native library not loaded
2013-01-19 10:43:02,155 INFO  mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(368)) - number of splits:1
2013-01-19 10:43:02,256 WARN  conf.Configuration (Configuration.java:warnOnceIfDeprecated(816)) - mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
2013-01-19 10:43:02,257 WARN  conf.Configuration (Configuration.java:warnOnceIfDeprecated(816)) - mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
2013-01-19 10:43:02,257 WARN  conf.Configuration (Configuration.java:warnOnceIfDeprecated(816)) - mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
2013-01-19 10:43:02,257 WARN  conf.Configuration (Configuration.java:warnOnceIfDeprecated(816)) - mapred.job.name is deprecated. Instead, use mapreduce.job.name
2013-01-19 10:43:02,258 WARN  conf.Configuration (Configuration.java:warnOnceIfDeprecated(816)) - mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
2013-01-19 10:43:02,258 WARN  conf.Configuration (Configuration.java:warnOnceIfDeprecated(816)) - mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
2013-01-19 10:43:02,258 WARN  conf.Configuration (Configuration.java:warnOnceIfDeprecated(816)) - mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
2013-01-19 10:43:02,258 WARN  conf.Configuration (Configuration.java:warnOnceIfDeprecated(816)) - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2013-01-19 10:43:02,259 WARN  conf.Configuration (Configuration.java:warnOnceIfDeprecated(816)) - mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
2013-01-19 10:43:02,264 WARN  conf.Configuration (Configuration.java:warnOnceIfDeprecated(816)) - mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
2013-01-19 10:43:02,545 INFO  mapreduce.JobSubmitter (JobSubmitter.java:printTokens(438)) - Submitting tokens for job: job_local_0001
2013-01-19 10:43:02,678 WARN  conf.Configuration (Configuration.java:loadProperty(2028)) - file:/home/kevin/workspace-eclipse/example-hadoop/build/test/mapred/staging/kevin-1414338785/.staging/job_local_0001/job.xml:an attempt to override final parameter: hadoop.tmp.dir;  Ignoring.
2013-01-19 10:43:02,941 WARN  conf.Configuration (Configuration.java:loadProperty(2028)) - file:/home/kevin/workspace-eclipse/example-hadoop/build/test/mapred/local/localRunner/job_local_0001.xml:an attempt to override final parameter: hadoop.tmp.dir;  Ignoring.
2013-01-19 10:43:02,948 INFO  mapreduce.Job (Job.java:submit(1222)) - The url to track the job: http://localhost:8080/
2013-01-19 10:43:02,950 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1267)) - Running job: job_local_0001
2013-01-19 10:43:02,951 INFO  mapred.LocalJobRunner (LocalJobRunner.java:createOutputCommitter(320)) - OutputCommitter set in config null
2013-01-19 10:43:02,986 INFO  mapred.LocalJobRunner (LocalJobRunner.java:createOutputCommitter(338)) - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2013-01-19 10:43:03,173 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(386)) - Waiting for map tasks
2013-01-19 10:43:03,173 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(213)) - Starting task: attempt_local_0001_m_000000_0
2013-01-19 10:43:03,278 INFO  mapred.Task (Task.java:initialize(565)) -  Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@40be76c7
2013-01-19 10:43:03,955 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1288)) - Job job_local_0001 running in uber mode : false
2013-01-19 10:43:03,974 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1295)) -  map 0% reduce 0%
2013-01-19 10:43:03,975 INFO  mapred.MapTask (MapTask.java:setEquator(1130)) - (EQUATOR) 0 kvi 26214396(104857584)
2013-01-19 10:43:03,979 INFO  mapred.MapTask (MapTask.java:<init>(926)) - mapreduce.task.io.sort.mb: 100
2013-01-19 10:43:03,979 INFO  mapred.MapTask (MapTask.java:<init>(927)) - soft limit at 83886080
2013-01-19 10:43:03,979 INFO  mapred.MapTask (MapTask.java:<init>(928)) - bufstart = 0; bufvoid = 104857600
2013-01-19 10:43:03,979 INFO  mapred.MapTask (MapTask.java:<init>(929)) - kvstart = 26214396; length = 6553600
2013-01-19 10:43:04,528 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - 
2013-01-19 10:43:04,569 INFO  mapred.MapTask (MapTask.java:flush(1392)) - Starting flush of map output
2013-01-19 10:43:04,569 INFO  mapred.MapTask (MapTask.java:flush(1411)) - Spilling map output
2013-01-19 10:43:04,570 INFO  mapred.MapTask (MapTask.java:flush(1412)) - bufstart = 0; bufend = 195; bufvoid = 104857600
2013-01-19 10:43:04,570 INFO  mapred.MapTask (MapTask.java:flush(1414)) - kvstart = 26214396(104857584); kvend = 26214336(104857344); length = 61/6553600
2013-01-19 10:43:04,729 INFO  mapred.MapTask (MapTask.java:sortAndSpill(1600)) - Finished spill 0
2013-01-19 10:43:04,734 INFO  mapred.Task (Task.java:done(979)) - Task:attempt_local_0001_m_000000_0 is done. And is in the process of committing
2013-01-19 10:43:05,077 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - map
2013-01-19 10:43:05,078 INFO  mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local_0001_m_000000_0' done.
2013-01-19 10:43:05,078 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(238)) - Finishing task: attempt_local_0001_m_000000_0
2013-01-19 10:43:05,078 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(394)) - Map task executor complete.
2013-01-19 10:43:05,155 INFO  mapred.Task (Task.java:initialize(565)) -  Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@63f8247d
2013-01-19 10:43:05,182 INFO  mapred.Merger (Merger.java:merge(549)) - Merging 1 sorted segments
2013-01-19 10:43:05,206 INFO  mapred.Merger (Merger.java:merge(648)) - Down to the last merge-pass, with 1 segments left of total size: 143 bytes
2013-01-19 10:43:05,206 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - 
2013-01-19 10:43:05,487 WARN  conf.Configuration (Configuration.java:warnOnceIfDeprecated(816)) - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2013-01-19 10:43:05,789 INFO  mapred.Task (Task.java:done(979)) - Task:attempt_local_0001_r_000000_0 is done. And is in the process of committing
2013-01-19 10:43:05,792 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - 
2013-01-19 10:43:05,792 INFO  mapred.Task (Task.java:commit(1140)) - Task attempt_local_0001_r_000000_0 is allowed to commit now
2013-01-19 10:43:05,840 INFO  output.FileOutputCommitter (FileOutputCommitter.java:commitTask(432)) - Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://localhost:9000/output/_temporary/0/task_local_0001_r_000000
2013-01-19 10:43:05,840 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - reduce > reduce
2013-01-19 10:43:05,840 INFO  mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local_0001_r_000000_0' done.
2013-01-19 10:43:06,001 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1295)) -  map 100% reduce 100%
2013-01-19 10:43:07,002 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1306)) - Job job_local_0001 completed successfully
2013-01-19 10:43:07,063 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1313)) - Counters: 32
	File System Counters
		FILE: Number of bytes read=496
		FILE: Number of bytes written=315196
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=262
		HDFS: Number of bytes written=108
		HDFS: Number of read operations=15
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=4
	Map-Reduce Framework
		Map input records=8
		Map output records=16
		Map output bytes=195
		Map output materialized bytes=158
		Input split bytes=104
		Combine input records=16
		Combine output records=11
		Reduce input groups=11
		Reduce shuffle bytes=0
		Reduce input records=11
		Reduce output records=11
		Spilled Records=22
		Shuffled Maps =0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=3
		CPU time spent (ms)=0
		Physical memory (bytes) snapshot=0
		Virtual memory (bytes) snapshot=0
		Total committed heap usage (bytes)=327155712
	File Input Format Counters 
		Bytes Read=131
	File Output Format Counters 
		Bytes Written=108

(5) 运行结果
Hadoop2-YARN 伪分布模式筹建

Hadoop2-YARN 伪分布模式筹建

相关推荐