Here i will be giving you step by step procedures to install and configure hadoop (version 1.1.0) on a linux (debian based distro) as a single node cluster. This guide is for beginners and you need to boot into your linux machine as a root user
From the above commands, you have actually moved hadoop src to /usr/local and uncompressed that file in /usr/local/
Hadoop is a standalone java based application, so it requires java 1.6 as its dependency which is to be installed by your own ( if not already installed).
Step 4: Change the configuration files
Befor we configure, type the following to identify java home
# which java
if for example output is
/usr/bin/java
Then
your JAVA_HOME is
/usr
Now,
# cd /usr/local/hadoop-1.1.0/
# cd conf/
# vi hadoop-env.sh
Find the following line
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
and replace it as
export JAVA_HOME=/usr/
Next paste the following content into the file core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
</configuration>
Next paste the following content into the file hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Next paste the following content into the file mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>
Next check the file /etc/hosts if the following content exists as the first line, if not add it
127.0.0.1 localhost <your host name>
Where,
<your host name> is the hostname of your machine.
you can find the hostname by
# hostname
Step 5: Associate user hadoop to your source folder
# cd /usr/local/
# chown -R hadoop hadoop-1.1.0
Step 6: Format HDFS file system Name node and Data Node
# cd /usr/local/hadoop-1.1.0/bin
# su hadoop
# ./hadoop namenode -format
It provides information like
12/10/19 12:00:20 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = java.net.UnknownHostException: vignesh: vignesh
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.1.0
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1 -r 1394289; compiled by 'hortonfo' on Thu Oct 4 22:06:49 UTC 2012
************************************************************/
12/10/19 12:00:20 INFO util.GSet: VM type = 64-bit
12/10/19 12:00:20 INFO util.GSet: 2% max memory = 17.77875 MB
12/10/19 12:00:20 INFO util.GSet: capacity = 2^21 = 2097152 entries
12/10/19 12:00:20 INFO util.GSet: recommended=2097152, actual=2097152
12/10/19 12:00:21 INFO namenode.FSNamesystem: fsOwner=hadoop
12/10/19 12:00:21 INFO namenode.FSNamesystem: supergroup=supergroup
12/10/19 12:00:21 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/10/19 12:00:21 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
12/10/19 12:00:21 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
12/10/19 12:00:21 INFO namenode.NameNode: Caching file names occuring more than 10 times
12/10/19 12:00:21 INFO common.Storage: Image file of size 112 saved in 0 seconds.
12/10/19 12:00:21 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/tmp/hadoop-hadoop/dfs/name/current/edits
12/10/19 12:00:21 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/tmp/hadoop-hadoop/dfs/name/current/edits
12/10/19 12:00:21 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
12/10/19 12:00:21 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at java.net.UnknownHostException: vignesh: vignesh
************************************************************/
Similarly format the data node by
# ./hadoop datanode -format
Step 7: Make passwordless ssh for hadoop user
# ssh-keygen -t rsa -P ""
Press enter when it promts
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
and it generates the key as
Created directory '/home/hadoop/.ssh'.
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
f7:e3:1d:e6:2d:7d:23:2f:64:ea:1c:77:99:26:af:e0 hadoop@vignesh
The key's randomart image is:
+--[ RSA 2048]----+
| |
| |
| |
| |
| S . |
| . . o o|
| o*oo* |
| oo+B*+o|
| .E..B++|
+-----------------+
# cat /home/hadoop/.ssh/id_rsa.pub > /home/hadoop/.ssh/authorized_keys
# ssh hadoop@localhost
type "yes" if it prompts as below
The authenticity of host 'localhost (::1)' can't be established.
RSA key fingerprint is 7e:4a:40:b5:57:06:0d:83:34:58:80:80:c3:e7:18:20.
Are you sure you want to continue connecting (yes/no)?
After this it logs into hadoop user and you have successfully configured passwordless ssh
Now type
# exit
The above command must be used only once. So you are still as hadoop user
Step 8: Start Hadoop services
# ./start-all.sh
You must see something like this. If not you are facing some errors
This command copyies the file (test_input) that we just created into hdfs file system (inside test folder)
This command list all files in folder "test" of hdfs file system.
This command runs a mapreduce program (word count) for your input and generates output in "test/output" of hdfs file system.
Here end our step by step guide to work with hadoop ( for beginners ).