Hadoop学习，环境设立

Hadoop学习，环境设置
参考：
http://hadoop.apache.org/common/docs/stable/single_node_setup.html

前置条件：
1）、Java运行环境，设置好JAVA_HOME
2）、安装ssh工具。

1、在http://www.apache.org/dyn/closer.cgi/hadoop/common/下载hadoop的release。下载后，解压。
2、在classpath中设置$HADOOP_HOME,并加到PATH中；
3、修改$HADOOP_HOME/conf/hadoop-env.sh,设置文件中的$JAVA_HOME

在终端运行hadoop命令，如果安装设置成功，将展示hadoop的帮助文档。

hadoop是通过xml配置的。core-site.xml用来配置common组件的属性，hdfs-site.xml勇于配置HDFS属性，mapred-site.xml文件用户配置mapreduce属性。
Hadoop运行模式有以下三种：
1、standalone或local Mode模式，
默认的设置，非分布式的的hadoop运行模式，无需运行任何守护进程，所有程序都在单JVM上执行，适合开发debug。
2、pseudo-distributed model伪分布式模式
该模式下，hadoop运行在由本机模拟的集群上。
1）配置：

core-site.xml：
<configuration>
     <property>
         <name>fs.default.name</name>
         <value>hdfs://localhost</value>
     </property>
</configuration>

hdfs-site.xml
<configuration>
     <property>
<name>dfs.replication</name>
<value>1</value>
     </property>
</configuration>

mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>

2）运行$ ssh localhost命令，如果需要输密码，则运行以下命令：
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

3）执行
格式化hdfs：$ hadoop namenode -format
启动hadoop： $ start-all.sh
停止hadoop： $ stop-all.sh

3、分布式集群模式 Fully-Distributed Operation
参考http://hadoop.apache.org/common/docs/stable/cluster_setup.html

3、

相关推荐