Fedora8配备Hadoop0.22
大致用Fedora8配置了一次hadoop,模拟配置的,跑了三台vmware虚拟机,都是fedora8,
http://download.****.net/detail/fzxy002763/5064976
第一步:jdk,我的java是在装fedora8的时候就包含安装的,应为在后面配置时候需要知道其路径,所以我们这里需要查看下默认Java安装的路径,我这的默认路径是/usr/lib/jvm/java-1.7.0-icedtea-1.7.0.0
第二步:修改/etc/hosts,vi /etc/hosts就行了,以下是master的配置
# Do not remove the following line, or various programs # that require network functionality will fail. 127.0.0.1 master localhost.localdomain localhost localhost ::1 localhost6.localdomain6 localhost6 192.168.1.200 master 192.168.1.201 slave1 192.168.1.202 slave2
这里我实验是配置了3台,一台主两台从,后面的是主机名,配置ssh时候需要用到,从的配置只要添加自己的主机名和master即可,如slave1的配置
# Do not remove the following line, or various programs # that require network functionality will fail. 127.0.0.1 master localhost.localdomain localhost localhost ::1 localhost6.localdomain6 localhost6 192.168.1.200 master 192.168.1.201 slave1
最好还需要修改下主机名,fedora8修改主机名有点折腾,参考http://blog.****.net/fzxy002763/article/details/8581667
第三步:配置ssh,主要就是添加ssh密钥,这样无需密码就可以直接用ssh进行
连接通信,首先生成ssh密钥,以下都是在master的配置
ssh-keygen -t rsaPS:这个要看在什么用户下生成,如abc,就在\home\abc下找,如果在root下,即在\root下找
就可以在/home/abc/.ssh,这个路径下找.ssh文件夹进去,可以看到id_rsa.pub(公钥)和id_rsa(私钥),接着
cat /home/abc/.ssh/id_rsa.pub >> /home/abc/.ssh/authorized_keys chmod 644 /home/abc/.ssh/authorized_keys
这里我是用绝对路径标明的,到时候按照自己路径改下就行,然后cat authorized_keys可以看到如下
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA4LbQI+NkkoQx0ARrtKj0Rz2LadwyzHgp64PHJQmf/m457EMl5nSlTGYz3ZtlnZ0cMLwN3d05/yBjl7s3kUtZQP2lnjKdhqPoMowcTSZmaLkdD3qMnZ3UP7Ckryd2IulQ4VJm7X2Or14sDJmVl9P4V6OzsJOVDQV8G5SpJGtAMHD64jfd4nGlUgyMzPXOlYf1Tlf53DoCfq1Mk+x/S9pDz96onWN9TEOGutkcSAyzG3i8Glf/kIsMB9Mo5Jh+nhhiMG1u00LKHiIj566BpYeo/0lqlJ2I9gzHAUqg5mUjEXbbhOv6ER3wQC0J3aWXQNMW09R61t3DGzFqmiJMixhtjw== abc@master
再尝试用ssh登陆master,如下
ssh master不用密码即可登陆就表示配置可以
第四步:分发ssh密钥,我们要配置ssh以达到slave可以直接登陆到自己和master,master可以不用密码登陆到所有的从,如下配置
master分发密钥给slave1和slave2,
scp /home/abc/.ssh/id_rsa.pub abc@slave1:/home/abc/.ssh/id_rsa.pub.master scp /home/abc/.ssh/id_rsa.pub abc@slave2:/home/abc/.ssh/id_rsa.pub.masterslave1接收到密钥后进行添加密钥,slave2和slave1配置同,如下
cat /home/abc/.ssh/id_rsa.pub.master >> /home/abc/.ssh/authorized_keys chmod 644 /home/abc/.ssh/authorized_keys然后测试master能不能登陆slave1和slave2
ssh slave1 ssh slave2PS:这里连接不上可以是没有配置完全,这里需要master对slave1,slave2分发,slave1对master分发,slave2对master分发,最后配置结果查看cat authorized_keys如下
master
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA4LbQI+NkkoQx0ARrtKj0Rz2LadwyzHgp64PHJQmf/m457EMl5nSlTGYz3ZtlnZ0cMLwN3d05/yBjl7s3kUtZQP2lnjKdhqPoMowcTSZmaLkdD3qMnZ3UP7Ckryd2IulQ4VJm7X2Or14sDJmVl9P4V6OzsJOVDQV8G5SpJGtAMHD64jfd4nGlUgyMzPXOlYf1Tlf53DoCfq1Mk+x/S9pDz96onWN9TEOGutkcSAyzG3i8Glf/kIsMB9Mo5Jh+nhhiMG1u00LKHiIj566BpYeo/0lqlJ2I9gzHAUqg5mUjEXbbhOv6ER3wQC0J3aWXQNMW09R61t3DGzFqmiJMixhtjw== abc@master ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA11KrRCyloT7KqIlvny01vsRXtyXfQnCkGcxgDsfy2VZBadyiNGqIwUeRO7dNKfdZDCOduz9Mik5smKgHPCz4iYGcZNeMGfOtboLFXw4xUzonoYELYWr8EALnPUY48il2gJNkK9uX4wvCtQ26h468SILWPNpr+iCAJLccfcIbjMypW9zU5ecxolSrX7tCENSOAUXFEEXX5e34LMH6woJM4aypdJkK4CoNY2DVsXJuYipOK6rQMTLbeK0qg5J+qBRYbPLw0gIFPYhkU5O47/5ojPK69s4Xf2nTvrIN6aTcnWVsvBe9gvdNxFQVHOb4/THma4afiUxCYI2CHm0Srzgvhw== abc@slave1 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAoHC1W5TAeo4jknChbDCAine5xUv9rtuGIOVHHWT8amKMwLl0hsg5Zzrco/8xbqAz45rs//3TA37YrXJ50ucs7CM9Mk7CbebfNjwN5JFYaauTp/U9wf21kEo8h7NFHOMxjamt5HasVUyshloUCguqd4M/9OqjTSR29XauqwQEw9e9FAhBb0sjtwX00eG3WcFl84KE5qOlsql1Apz6G9DYTJauok00iKgvR/UCh0miPvIifTzZmRUtP/FLx/98PHJ5F/uhr+0ICiXcDMC+qPdJub74n7ufrjIQhAM164+khxNwR8rxj1X/vIIT4LWFFpENARyIOZGWcoMMxxR0Qn7k7w== abc@slave2
slave1
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA11KrRCyloT7KqIlvny01vsRXtyXfQnCkGcxgDsfy2VZBadyiNGqIwUeRO7dNKfdZDCOduz9Mik5smKgHPCz4iYGcZNeMGfOtboLFXw4xUzonoYELYWr8EALnPUY48il2gJNkK9uX4wvCtQ26h468SILWPNpr+iCAJLccfcIbjMypW9zU5ecxolSrX7tCENSOAUXFEEXX5e34LMH6woJM4aypdJkK4CoNY2DVsXJuYipOK6rQMTLbeK0qg5J+qBRYbPLw0gIFPYhkU5O47/5ojPK69s4Xf2nTvrIN6aTcnWVsvBe9gvdNxFQVHOb4/THma4afiUxCYI2CHm0Srzgvhw== abc@slave1 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA4LbQI+NkkoQx0ARrtKj0Rz2LadwyzHgp64PHJQmf/m457EMl5nSlTGYz3ZtlnZ0cMLwN3d05/yBjl7s3kUtZQP2lnjKdhqPoMowcTSZmaLkdD3qMnZ3UP7Ckryd2IulQ4VJm7X2Or14sDJmVl9P4V6OzsJOVDQV8G5SpJGtAMHD64jfd4nGlUgyMzPXOlYf1Tlf53DoCfq1Mk+x/S9pDz96onWN9TEOGutkcSAyzG3i8Glf/kIsMB9Mo5Jh+nhhiMG1u00LKHiIj566BpYeo/0lqlJ2I9gzHAUqg5mUjEXbbhOv6ER3wQC0J3aWXQNMW09R61t3DGzFqmiJMixhtjw== abc@master
slave2
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAoHC1W5TAeo4jknChbDCAine5xUv9rtuGIOVHHWT8amKMwLl0hsg5Zzrco/8xbqAz45rs//3TA37YrXJ50ucs7CM9Mk7CbebfNjwN5JFYaauTp/U9wf21kEo8h7NFHOMxjamt5HasVUyshloUCguqd4M/9OqjTSR29XauqwQEw9e9FAhBb0sjtwX00eG3WcFl84KE5qOlsql1Apz6G9DYTJauok00iKgvR/UCh0miPvIifTzZmRUtP/FLx/98PHJ5F/uhr+0ICiXcDMC+qPdJub74n7ufrjIQhAM164+khxNwR8rxj1X/vIIT4LWFFpENARyIOZGWcoMMxxR0Qn7k7w== abc@slave2 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA4LbQI+NkkoQx0ARrtKj0Rz2LadwyzHgp64PHJQmf/m457EMl5nSlTGYz3ZtlnZ0cMLwN3d05/yBjl7s3kUtZQP2lnjKdhqPoMowcTSZmaLkdD3qMnZ3UP7Ckryd2IulQ4VJm7X2Or14sDJmVl9P4V6OzsJOVDQV8G5SpJGtAMHD64jfd4nGlUgyMzPXOlYf1Tlf53DoCfq1Mk+x/S9pDz96onWN9TEOGutkcSAyzG3i8Glf/kIsMB9Mo5Jh+nhhiMG1u00LKHiIj566BpYeo/0lqlJ2I9gzHAUqg5mUjEXbbhOv6ER3wQC0J3aWXQNMW09R61t3DGzFqmiJMixhtjw== abc@master
能够不用密码互相ssh登陆就代表配置完成
第五步:配置hadoop,将hadoop解压缩,添加进bash,可以修改vi /home/abc/.bashrc,添加
export HADOOP_HOME=/home/hadoop/hadoop-0.22.0 export PATH=$PATH:$HADOOP_HOME/bin然后修改/home/hadoop/hadoop-0.22.0/conf下的master和slaves,如下
修改master
master
修改slaves
slave1 slave2然后修改 vi /home/hadoop/hadoop-0.22.0/conf/hadoop-env.sh,添加java_home,如下我添加的是默认的java
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-icedtea-1.7.0.0
修改conf/core-site.xml,如下
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.default.name</name> <value>hdfs://master:9000/</value> <description></description> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>fs.inmemory.size.mb</name> <value>10</value> <description>Larger amount of memory allocated for the in-memory file-system used to merge map-outputs at the reduces.</description> </property> <property> <name>io.sort.factor</name> <value>10</value> <description>More streams merged at once while sorting files.</description> </property> <property> <name>io.sort.mb</name> <value>10</value> <description>Higher memory-limit while sorting data.</description> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> <description>Size of read/write buffer used in SequenceFiles.</description> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/storage/tmp/hadoop-${user.name}</value> <description></description> </property> </configuration>然后修改conf/hdfs-site.xml,如下
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.name.dir</name> <value>/home/hadoop/storage/name/a,/home/hadoop/storage/name/b</value> <description>Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently.</description> </property> <property> <name>dfs.data.dir</name> <value>/home/hadoop/storage/data/a,/home/hadoop/storage/data/b,/home/hadoop/storage/data/c</value> <description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</description> </property> <property> <name>dfs.block.size</name> <value>67108864</value> <description>HDFS blocksize of 64MB for large file-systems.</description> </property> <property> <name>dfs.namenode.handler.count</name> <value>10</value> <description>More NameNode server threads to handle RPCs from large number of DataNodes.</description> </property> </configuration>接着修改conf/mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapred.job.tracker</name> <value>hdfs://master:19830/</value> <description>Host or IP and port of JobTracker.</description> </property> <property> <name>mapred.system.dir</name> <value>/home/hadoop/storage/mapred/system</value> <description>Path on the HDFS where where the MapReduce framework stores system files.Note: This is in the default filesystem (HDFS) and must be accessible from both the server and client machines.</description> </property> <property> <name>mapred.local.dir</name> <value>/home/hadoop/storage/mapred/local</value> <description>Comma-separated list of paths on the local filesystem where temporary MapReduce data is written. Note: Multiple paths help spread disk i/o.</description> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>10</value> <description>The maximum number of Map tasks, which are run simultaneously on a given TaskTracker, individually.Note: Defaults to 2 maps, but vary it depending on your hardware.</description> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>2</value> <description>The maximum number of Reduce tasks, which are run simultaneously on a given TaskTracker, individually. Note: Defaults to 2 reduces, but vary it depending on your hardware.</description> </property> <property> <name>mapred.reduce.parallel.copies</name> <value>5</value> <description>Higher number of parallel copies run by reduces to fetch outputs from very large number of maps.</description> </property> <property> <name>mapred.map.child.java.opts</name> <value>-Xmx128M</value> <description>Larger heap-size for child jvms of maps.</description> </property> <property> <name>mapred.reduce.child.java.opts</name> <value>-Xms64M</value> <description>Larger heap-size for child jvms of reduces.</description> </property> <property> <name>tasktracker.http.threads</name> <value>5</value> <description>More worker threads for the TaskTracker's http server. The http server is used by reduces to fetch intermediate map-outputs.</description> </property> <property> <name>mapred.queue.names</name> <value>default</value> <description>Comma separated list of queues to which jobs can be submitted. Note: The MapReduce system always supports atleast one queue with the name as default. Hence, this parameter's value should always contain the string default. Some job schedulers supported in Hadoop, like the Capacity Scheduler(http://hadoop.apache.org/common/docs/stable/capacity_scheduler.html), support multiple queues. If such a scheduler is being used, the list of configured queue names must be specified here. Once queues are defined, users can submit jobs to a queue using the property name mapred.job.queue.name in the job configuration. There could be a separate configuration file for configuring properties of these queues that is managed by the scheduler. Refer to the documentation of the scheduler for information on the same.</description> </property> <property> <name>mapred.acls.enabled</name> <value>false</value> <description>Boolean, specifying whether checks for queue ACLs and job ACLs are to be done for authorizing users for doing queue operations and job operations. Note: If true, queue ACLs are checked while submitting and administering jobs and job ACLs are checked for authorizing view and modification of jobs. Queue ACLs are specified using the configuration parameters of the form mapred.queue.queue-name.acl-name, defined below under mapred-queue-acls.xml. Job ACLs are described at Job Authorization(http://hadoop.apache.org/common/docs/stable/mapred_tutorial.html#Job+Authorization).</description> </property> </configuration>然后进行hadoop的分发,如下
scp -r /home/hadoop/hadoop-0.22.0 abc@slave1:/home/hadoop scp -r /home/hadoop/hadoop-0.22.0 abc@slave2:/home/hadoop接着在master和slave下创建storage文件夹基本就配置好了,如下
mkdir /home/hadoop/storage
这里大部分主要是参考这篇文章配置的http://blog.****.net/shirdrn/article/details/7166513