This blog gives you a step by step straight forward way to configure hadoop on your centos cluster.
My cluster consists of 6 nodes (1 Master and rest are Slaves).
Master node runs two daemons named :JobTracker, Namenode and Secondary Namenode.
Rest of the slaves node runs TaskTracker and Datanodes.
Steps are:
-creating separate user on all nodes.
useradd hadoop
passwd *****
Login through hadoop user and follow the steps on all the nodes.
-change the hostname if you want.
in /etc/sysconfig/network add hostname=HadoopMaster (or HadoopSlave1..5)
-configure etc/hosts file on all nodes.
172.29.100.191 hadoopmaster.company.local HadoopMaster
172.29.100.126 hadoopslave1.company.local HadoopSlave1
172.29.100.106 hadoopslave2.company.local HadoopSlave2
172.29.100.178 hadoopslave3.company.local HadoopSlave3
172.29.100.199 hadoopslave4.company.local HadoopSlave4
172.29.100.140 hadoopslave5.company.local HadoopSlave5
-configuring ssh access between all nodes
ssh-keygen -t dsa (at master home)
ssh-copy-id -i /home/hadoop/.ssh/id_dsa hadoop@HadoopSlave1 (copying id_rsa from master to all nodes.)
ssh-copy-id -i /home/hadoop/.ssh/id_dsa hadoop@HadoopSlave2 (all the slaves node)
-installing hadoop (on all the nodes)
cd /home/hadoop
wget http://mirror.cloudera.com/apache/hadoop/core/hadoop-0.20.2/hadoop-0.20.2.tar.gz
tar -xvzf hadoop-0.20.2.tar.gz (extract tar)
mv hadoop-0.20.2 hadoop
-files to be modified in hadoop/conf directory (on all the nodes)
conf/core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://HadoopMaster:9000/</value>
</property>
conf/hdfs-site.xml
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/hdfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
conf/mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>HadoopMaster:9001</value>
</property>
conf/hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.6.0_18
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
conf/masters
HadoopMaster
conf/slaves
HadoopSlave1
HadoopSlave2
HadoopSlave3
HadoopSlave4
HadoopSlave5
Starting hadoop:
format the namenode if required by
-bin/hadoop namenode -format (format all the nodes)
-Note : If error occurs check in the hadoop logs which is configured in hadoop-env.sh and debug
-bin/start-all.sh (only on master node)
After Hadoop, If you like you can go through the simple Hbase configuration too..
This is a hbase managed zookeeper configuration(its by default configuration)
Changes done in files are:
1) /home/hadoop/hbase-0.92.1/conf/hbase-env.sh
++ export HBASE_HOME=/home/hadoop/hbase-0.92.1
++ export HBASE_PID_DIR=/home/hadoop/var/hbase/pids (storing pids of hbase daemons)
++ export JAVA_HOME=/usr/java/jdk1.7.0_05/
#export HBASE_MANAGES_ZK=false (commenting out htis line because by default it is TRUE)
2) /home/hadoop/hbase-0.92.1/conf/hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://HadoopMaster:9000/hbase</value>
<description>The directory shared by RegionServers.</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster will be in. Possible values are
false: standalone and pseudo-distributed setups with managed Zookeeper
true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)</description>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2222</value>
<description>Property from ZooKeeper's config zoo.cfg.
The port at which the clients will connect.
</description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoopmaster.company.local</value>
<description>Comma separated list of servers in the ZooKeeper Quorum.
For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
By default this is set to localhost for local and pseudo-distributed modes
of operation. For a fully-distributed setup, this should be set to a full
list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
this is the list of servers which we will start/stop ZooKeeper on.
</description>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/var/zookeeper</value>
<description>Property from ZooKeeper's config zoo.cfg.
The directory where the snapshot is stored.
</description>
</property>
</configuration>
3) /home/hadoop/hbase-0.92.1/conf/regionservers
Added the name of region servers:
HadoopSlave1
HadoopSlave2
HadoopSlave3
HadoopSlave4
HadoopSlave5
Note: From hbase version 0.90 onwards it uses SASL authentication for communication(it is optional) but here I have skipped this functionality.
Starting hbase:
Stoping hbase:
MASTER--
./stop-hbase
REGIONSERVERS--
./hbase-daemon.sh stop regionserver
./hbase-daemon.sh stop zookeeper
No comments:
Post a Comment