Well folks,
Here i will be giving you step by step procedures to install and configure hadoop (version 1.1.0) on a linux (debian based distro) as a single node cluster. This guide is for beginners and you need to boot into your linux machine as a root user
Step 1: First you need to download hadoop source from the following URL
http://apache.techartifact.com/mirror/hadoop/common/hadoop-1.1.1/hadoop-1.1.1.tar.gz
Open a terminal
# cd <to directory where you downloaded hadoop>
# mv hadoop-1.1.0.tar.gz /usr/local/
# cd /usr/local/
# tar zxvf hadoop-1.1.0.tar.gz
From the above commands, you have actually moved hadoop src to /usr/local and uncompressed that file in /usr/local/
Step 2: Hadoop is a standalone java based application, so it requires java 1.6 as its dependency which is to be installed by your own ( if not already installed).
Step 3: Next you need to add a specific user to associate to hadoop
# adduser hadoop
It prompts you to enter password and few other information
Adding user `hadoop' ...
Adding new group `hadoop' (1001) ...
Adding new user `hadoop' (1001) with group `hadoop' ...
Creating home directory `/home/hadoop' ...
Copying files from `/etc/skel' ...
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for hadoop
Enter the new value, or press ENTER for the default
Full Name []:
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n] Y
it starts 5 services
NameNode
SecondaryNameNode
DataNode
JobTracker
TaskTracker
You can check if the services are running by
# jps
You must see something like this. If not you are facing some errors
26207 TaskTracker
26427 Jps
25847 DataNode
25986 SecondaryNameNode
26089 JobTracker
25738 NameNode
Here i will be giving you step by step procedures to install and configure hadoop (version 1.1.0) on a linux (debian based distro) as a single node cluster. This guide is for beginners and you need to boot into your linux machine as a root user
Step 1: First you need to download hadoop source from the following URL
http://apache.techartifact.com/mirror/hadoop/common/hadoop-1.1.1/hadoop-1.1.1.tar.gz
Open a terminal
# cd <to directory where you downloaded hadoop>
# mv hadoop-1.1.0.tar.gz /usr/local/
# cd /usr/local/
# tar zxvf hadoop-1.1.0.tar.gz
From the above commands, you have actually moved hadoop src to /usr/local and uncompressed that file in /usr/local/
Step 2: Hadoop is a standalone java based application, so it requires java 1.6 as its dependency which is to be installed by your own ( if not already installed).
Step 3: Next you need to add a specific user to associate to hadoop
# adduser hadoop
It prompts you to enter password and few other information
Adding user `hadoop' ...
Adding new group `hadoop' (1001) ...
Adding new user `hadoop' (1001) with group `hadoop' ...
Creating home directory `/home/hadoop' ...
Copying files from `/etc/skel' ...
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for hadoop
Enter the new value, or press ENTER for the default
Full Name []:
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n] Y
Step 4: Change the configuration files
Befor we configure, type the following to identify java home
# which java
if for example output is
/usr/bin/java
Then
your JAVA_HOME is /usr
Now,
# cd /usr/local/hadoop-1.1.0/
# cd conf/
# vi hadoop-env.sh
Find the following line
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
and replace it as
export JAVA_HOME=/usr/
Next paste the following content into the file core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
</configuration>
Next paste the following content into the file hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Next paste the following content into the file mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>
Similarly format the data node by
# ./hadoop datanode -format
Befor we configure, type the following to identify java home
# which java
if for example output is
/usr/bin/java
Then
your JAVA_HOME is /usr
Now,
# cd /usr/local/hadoop-1.1.0/
# cd conf/
# vi hadoop-env.sh
Find the following line
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
and replace it as
export JAVA_HOME=/usr/
Next paste the following content into the file core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
</configuration>
Next paste the following content into the file hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Next paste the following content into the file mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>
Next check the file /etc/hosts if the following content exists as the first line, if not add it
127.0.0.1 localhost <your host name>
Where,
<your host name> is the hostname of your machine.
you can find the hostname by
# hostname
Where,
<your host name> is the hostname of your machine.
you can find the hostname by
# hostname
Step 5: Associate user hadoop to your source folder
# cd /usr/local/
# chown -R hadoop hadoop-1.1.0
Step 6: Format HDFS file system Name node and Data Node
# cd /usr/local/hadoop-1.1.0/bin
# su hadoop
# ./hadoop namenode -format
It provides information like
12/10/19 12:00:20 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = java.net.UnknownHostException: vignesh: vignesh
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.1.0
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1 -r 1394289; compiled by 'hortonfo' on Thu Oct 4 22:06:49 UTC 2012
************************************************************/
12/10/19 12:00:20 INFO util.GSet: VM type = 64-bit
12/10/19 12:00:20 INFO util.GSet: 2% max memory = 17.77875 MB
12/10/19 12:00:20 INFO util.GSet: capacity = 2^21 = 2097152 entries
12/10/19 12:00:20 INFO util.GSet: recommended=2097152, actual=2097152
12/10/19 12:00:21 INFO namenode.FSNamesystem: fsOwner=hadoop
12/10/19 12:00:21 INFO namenode.FSNamesystem: supergroup=supergroup
12/10/19 12:00:21 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/10/19 12:00:21 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
12/10/19 12:00:21 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
12/10/19 12:00:21 INFO namenode.NameNode: Caching file names occuring more than 10 times
12/10/19 12:00:21 INFO common.Storage: Image file of size 112 saved in 0 seconds.
12/10/19 12:00:21 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/tmp/hadoop-hadoop/dfs/name/current/edits
12/10/19 12:00:21 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/tmp/hadoop-hadoop/dfs/name/current/edits
12/10/19 12:00:21 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
12/10/19 12:00:21 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at java.net.UnknownHostException: vignesh: vignesh
************************************************************/
Similarly format the data node by
# ./hadoop datanode -format
Step 7: Make passwordless ssh for hadoop user
# ssh-keygen -t rsa -P ""
Press enter when it promts
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
and it generates the key as
Created directory '/home/hadoop/.ssh'.
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
f7:e3:1d:e6:2d:7d:23:2f:64:ea:1c:77:99:26:af:e0 hadoop@vignesh
The key's randomart image is:
+--[ RSA 2048]----+
| |
| |
| |
| |
| S . |
| . . o o|
| o*oo* |
| oo+B*+o|
| .E..B++|
+-----------------+
# cat /home/hadoop/.ssh/id_rsa.pub > /home/hadoop/.ssh/authorized_keys
# ssh hadoop@localhost
type "yes" if it prompts as below
The authenticity of host 'localhost (::1)' can't be established.
RSA key fingerprint is 7e:4a:40:b5:57:06:0d:83:34:58:80:80:c3:e7:18:20.
Are you sure you want to continue connecting (yes/no)?
After this it logs into hadoop user and you have successfully configured passwordless ssh
Now type
# exit
The above command must be used only once. So you are still as hadoop user
Step 8: Start Hadoop services
# ./start-all.sh
it starts 5 services
NameNode
SecondaryNameNode
DataNode
JobTracker
TaskTracker
You can check if the services are running by
# jps
You must see something like this. If not you are facing some errors
26207 TaskTracker
26427 Jps
25847 DataNode
25986 SecondaryNameNode
26089 JobTracker
25738 NameNode
Log into
http://localhost:50030
for hadoop map/reduce administration (optional)
Log into
http://localhost:50070
for browsing the hdfs file system (optional)
Step 9: Follow these commands
# ./hadoop dfsadmin -report
This command gives you information on your hdfs system
# ./hadoop fs -mkdir test
This command creates a directory "test" in your hdfs file system
# vi test_input
In the text editor type
"hi all hello all"
save and exit the file
# ./hadoop fs -put test_input test/input
This command copyies the file (test_input) that we just created into hdfs file system (inside test folder)
#./hadoop fs -ls test
This command list all files in folder "test" of hdfs file system.
#./hadoop jar ../hadoop-examples-1.1.1.jar wordcount test/input test/output
This command runs a mapreduce program (word count) for your input and generates output in "test/output" of hdfs file system.
You can check the output in the following url
http://localhost:50070
Browse the filesystem -> user -> hadoop -> test -> output ->part-r-00000
Step 10: To stop hadoop (optional)
# sh stop-all.sh
Here end our step by step guide to work with hadoop ( for beginners ).
how to create jar file of my own mapreduce program ...
ReplyDeleteFor example i have a java file named Spatial and i need to create the jar file
In your example, hadoop-examples.jar is used for running...
For my program,how can i create jar ...and should i execute the program?
could u please help me?