CDH4 Cluster Installation Guide is for Hadoop
developers and system administrators interested in Hadoop cluster installation.
The following sections describe how to install and configure version 4 of
Cloudera’s.
This document helps you in configuring the multi node Hadoop Cluster. It also helps you in configuration of cluster for better performance. It also contains the best practices which will help you in the HBase and MapReduce configuration.
1) Summary
2)
prerequisites
A. Supported
Operating System
B. Software
C. Unique
Host name
3)
Introduction
to Cloudera Manager Installation
4) Preparation for
installation
A. Networking
B. Firewall
and Security
I.
Configure the SSH
II.
Check SSH_config file setting
III.
IPTABLE turn to off
C. Disable
SELINUX
D. Proxy
Setting
1.1.1
In terminal
E. Host
Entry
F. Update
OS Setup
G. Download
Cloudera manger
5)
Setting up the cloudera manger
Step 1: Registration Document
Step 2:
Specify Host for installation
Step 3:
Connecting Specified hosts With SSH
Step 4:
Choose CDH Version
Step 5:
Provide SSH Login Credentials
Figure 5: SSH/Login Credentials
Step 6:
Installation Done
Figure 6: Installation on nodes
Step 7:
Inspect hosts for correctness
Figure 7: Check hosts for correctness.
Step 8:
Choose services to install on the cluster
Figure 8: Choose services to Install.
Step 9:
Inspect Role Assignments
Figure 9: Role Assignments.
Step 10:
Review the Configuration
Figure 10: Configuration for your Cluster
Step 11:
Change the Default Administrator Password
Step 12: Test
the Installation
Figure 11: All Services
This document helps you in configuring the multi node Hadoop Cluster. It also helps you in configuration of cluster for better performance. It also contains the best practices which will help you in the HBase and MapReduce configuration.
1) Summary
CDH4 Cluster Installation Guide is for Hadoop developers and system
administrators interested in Hadoop cluster installation. The following
sections describe how to install and configure version 4 of Cloudera’s.
This
document helps you in configuring the multi node Hadoop Cluster. It also helps
you in configuration of cluster for better performance. It also contains the
best practices which will help you in the HBase and MapReduce configuration.
2)
prerequisites
A. Supported
Operating System
CDH4 supports the following operating systems:
- Red Hat-compatible systems
·
Red Hat Enterprise Linux 5.7 and
CentOS 5.7
·
Red Hat Enterprise Linux 6.2 and
CentOS 6.2
·
Oracle Enterprise Linux 5.6 with
Unbreakable Enterprise Kernel
- SLES systems
·
SUSE Linux Enterprise Server 11.
Service Pack 1 or later is required.
- Debian systems
·
Debian 6.0 (Squeeze)
- Ubuntu systems
·
Ubuntu 10.04
·
Ubuntu 12.04
Cloudera manager only support 64 bit operation System.
B. Software
o
Perl
o
SSH
o
Open ssh –server
o
Open ssh –clients
o
Cloudera manager
C. Unique
Host name
o
Host name should be unique in your network ex.
Node1.example.com
3)
Introduction
to Cloudera Manager Installation
Cloudera Manager automates the installation
and configuration of CDH on an entire cluster, requiring only that you have
root SSH access to your cluster's machines, and access to the internet or a
local repository with installation files for all these machines.
It Consist of following
·
Cloudera Manager Server
·
Cloudera Manager agent
·
PostgreSQL database
·
About cloudera manager, how it works?
o Using
SSH, discover the cluster hosts you specify via IP address ranges or hostnames
o Configure
the package repositories for Cloudera Manager, CDH, and the Oracle JDK
o Install
the Cloudera Manager Agent and CDH (including Hue) on the cluster hosts
o Install
the Oracle JDK if it's not already installed on the cluster hosts
o Determine
mapping of services to host
o Suggest
a Hadoop configuration and start the Hadoop services
You can also choose to add node
and remove node from the cluster.
4) Preparation for
installation
A. Networking
o
Check internet setting on every host
o Cluster
hosts must have Same DNS and reverse
DNS properly configured
o Check
out the hostname it should with
standard ex. Node1.example.com
B. Firewall
and Security
o The Cloudera Manager Server must have
SSH access to the cluster hosts when you run the installation wizard.
Note
You must log in using a root account or an account that has
password-less sudo permission. For authentication during the installation and
upgrade procedures, you will need to either enter the password or upload a
public and private key pair for the root or sudo user account.
Cloudera Manager uses SSH only during the initial install or
upgrade. Once your cluster is set up, you can safely disable root SSH access or
change the root password. Cloudera Manager does not save SSH credentials and
all credential information is discarded once the installation is complete.
I.
Configure the SSH
Steps:
a)
ssh-keygen
b)
cd /root/.ssh/
c)
ls
d)
cp idrsa.pub authorised_keys
e)
Cat the all idsa.pub and
authorised_keys and store in the same location so that every machine can SSH to
other.
II.
Check SSH_config file setting
When using multiple systems the indispensable tool is, as
we all know, ssh. Using ssh you can login to other (remote) systems and work
with them as if you were sitting in front of them. Even if some of your systems
exist behind firewalls you can still get to them with ssh, but getting there
can end up requiring a number of command line options and the more systems you
have the more difficult it gets to remember them. However, you don't have to
remember them, at least not more than once: you can just enter them into ssh's
config file and be done with it.
Steps
1)
vi /etc/ssh/ssh_config
Change ask to no i.e. StrictHostkeychecking
no
III.
IPTABLE turn to off
Iptables is administration tool
/ command for IPv4 packet filtering and NAT. You need to use the following tools:
[a] service is a command to run
a System V init script. It is use to save / stop / start firewall service.
[b] Chkconfig command is used to
update and queries run level information for system service. It is a system
tool for maintaining the /etc/rc*.d hierarchy. Use this tool to disable
firewall service at boot time.
Steps
a)
Service iptables save
b)
service iptables stop
c)
chkconfig iptables off
d)
/etc/init.d/network restart
C. Disable
SELINUX
Steps
for disable the SELINUX
o
vi
/etc/selinux/config
SELINUX= disabled
D. Proxy
Setting
Check Use this proxy1 server for all protocols
1.1.1
In terminal
sudo gedit /etc/yum.conf
# The
proxy server - proxy server:port number
proxy=http://proxy1.xx.com:8080
# the
account details for yum connections
proxy_username=<username>
proxy_password=<your
password>
Then save the file
sudo "yum clean all"
E. Host
Entry
This snippet describes the format of the /etc/hosts file.
This file is a simple text file that associates IP addresses with hostnames,
one line per IP address. You should have some subset of all hostnames in
/etc/hosts. You should have some sort of name resolution, even when no network
interfaces are running, for example, during boot time. This is not only a
matter of convenience, but it allows you to use symbolic hostnames in your
network RC scripts. Thus, when changing IP addresses, you only have to copy an
updated hosts file to all machines and reboot, rather than edit a large number
of RC files separately. Usually you put all local hostnames and addresses in
hosts, adding those of any gateways and NIS servers used. For each host a
single line should be present with the following information:
sudo gedit /etc/hosts
127.0.0.1 localhost localhost
10.xxx.xxx.xxx master.example.com master
10.xxx.xxx.xxx node01.example.com node01
10.xxx.xxx.xxx node02.example.com node02
10.xxx.xxx.xxx node03.example.com node03
10.xxx.xxx.xxx master.example.com master
10.xxx.xxx.xxx node01.example.com node01
10.xxx.xxx.xxx node02.example.com node02
10.xxx.xxx.xxx node03.example.com node03
F. Update
OS Setup
Updating the OS not necessary but it is better to use the
latest stable version.
Check for the fast mirror for update
vi
/etc/yum/pluginconf.d/fastestmirror.conf
change the “Enabled=1”
yum -y update
G. Download
Cloudera manger
If you have curl
installed then use
or use click on the link
IF the link is not working the go to
download the latest
stable free version
Change the mode for the execution
Chmod +x
cloudera-manager-installer.bin
Sudo
./cloudera-manager-installer.bin
Accept the agreements and click ok
After some time cloudera manager
will be install
http://master.example.com:7180
To start the Cloudera Manager Admin Console:
1. In a web browser, enter the URL, including the port, for the
Cloudera Server. The login screen for Cloudera Manager appears.
2. Log into Cloudera Manager.
The default credentials are:
Username: admin
Password: admin
5)
Setting up the cloudera manger
Follow the
following steps
Step 1: Registration Document
·
You can register on cloudera and
click on Submit Registration.
·
Just click on Proceed.
Figure 1
:- Registration Document
Step 2:
Specify Host for installation
To
enable Cloudera Manager to automatically discover your cluster hosts where you
want to install CDH, enter the cluster hostnames or IP addresses and click Search.
You can also specify hostname and IP address ranges:
Use This Expansion
Range
|
To Specify These
Hosts
|
10.1.1.[1-4]
|
10.1.1.1, 10.1.1.2,
10.1.1.3, 10.1.1.4
|
host[1-3].network.com
|
host1.network.com,
host2.network.com, host3.network.com
|
host[07-10].network.com
|
host07.network.com,
host08.network.com, host09.network.com, host10.network.com
|
You
can specify multiple addresses and address ranges by separating them by commas,
semicolons, tabs, or blank spaces, or by placing them on separate lines. Use
this technique to make more specific searches instead of searching overly wide
ranges.
The
scan results will include all addresses scanned, but only scans that reach
hosts running SSH will be selected for inclusion in your cluster by default.
Note:
If you don't know the IP addresses of all of the hosts, you can enter an
address range that spans over unused addresses. Note that larger ranges will
require more time to scan.
You
can abort any actively running scan by clicking Abort Scan. To find
additional hosts after scanning completes, add or modify the hostname or IP
address ranges and click Search again.
Figure 2
: Specify Host for Installation
Step 3:
Connecting Specified hosts With SSH
This step is only for the searching
the specified hosts it is reachable or not. SSH on the basis of starting node.
If the node is not reachable from that node then it shows not reachable. If SSH
is not running then it shows that “Could not connect to
host”. It also shows the response time to respond while connecting to
the particular node.
Figure 3:
Connecting Specified hosts With SSH
Step 4:
Choose CDH Version
This screen shot showing the
available version of CDH and its versions. It also support offline installation
with custom repository.
Figure 4: Choose CDH
Version
Step 5:
Provide SSH Login Credentials
To authenticate with the
hosts, you must either use a root account that is on all of your cluster hosts,
or use an account that has password-less sudo permissions. Select root or enter
the user name for an account that has password-less sudo permissions. You can
either use a shared password for the account, or use a public and private key
pair.
To
enter a password, click all hosts accept same password and enter the account
password. To use a public and private key pair, click all hosts accept same
public key. Specify or browse for the location of the public and private keys.
If your keys contain a passphrase, enter it.
Step 6:
Installation Done
The
wizard runs a maximum of 10 installations in parallel to avoid excessive
network load. The status of installation on each host is displayed on the page
that appears after you click Start
Installation. You can also click the Details
link for individual hosts to view detailed information about the installation
and error messages if installation fails on any hosts.
If you
click the Abort Installation button
while installation is in progress, it will halt any pending or in-progress
installations and roll back any in-progress installations to a clean state. The
Abort Installation button does not
affect host installations that have already completed successfully or already
failed.
If
installation fails on a host, you can click the Retry link next to the failed host to try installation on that host
again. To retry installation on all failed hosts, click Retry Failed Hosts at the bottom of the screen.
When
the Continue button appears at the
bottom of the screen, the installation process is complete.
If the
installation has completed successfully on some hosts but failed on others, you
can click Continue if you want to
skip installation on the failed hosts and continue to the next screen to start
installing the Cloudera Management services on the successful hosts.
Step 7:
Inspect hosts for correctness
When
you continue, the Host Inspector runs to validate the installation, and
provides a summary of what it finds, including all the versions of the
installed components. If the validation is successful, click Continue.
Error 1: clock is not synchronized.
– Synchronized clock with the cloudera manager server node.
Error 2: /etc/hosts file error- error in the host file.
Error 3: Localhost error – Localhost line is not added in the /etc/hosts file.
Step 8:
Choose services to install on the cluster
Choose the services you
want to start on your cluster.
·
Choose which version of CDH to use.
·
Choose the combination of services to install:
Core Hadoop, HBase Services, All Services, or Custom Services.
Some
services depend on others; for example, HBase requires HDFS and Zookeeper. Most
of the combinations install MapReduce v1. Choose the custom option to install
MapReduce v2 (YARN) or use the Add Service functionality to add YARN after
installation completes.
Step 9:
Inspect Role Assignments
Click Inspect Role Assignments to see how
the wizard will assign roles for the services you have chosen, and change them
if you need to. These assignments are typically acceptable, but you can
reassign services to nodes of your choosing, if desired. The wizard evaluates
the hardware configurations of the cluster hosts to determine the best machines
for each role. For example, the wizard assigns the NameNode role to the machine
that best meets the NameNode requirements. The wizard also configures other
options, such as the number of map and reduces slots for TaskTracker, on the
basis of the size of the cluster and the physical characteristics of each
machine, such as the number of CPUs, amount of RAM, and disk space.
Click Continue when you are satisfied with
the assignments.
Step 10:
Review the Configuration
Review Configuration
Changes to be applied.
·
Confirm the settings entered for file system
paths. The file paths required vary based on the services to be installed. For
example, you might confirm the NameNode Data Directory and the DataNode Data
Directory for HDFS or confirm the TaskTracker Local Data Directory List or
JobTracker Local Data Directory for MapReduce.
·
Click on continue
·
The wizard starts the services on your cluster.
·
When all of the services are started, click Continue.
·
Click Continue.
Step 11:
Change the Default Administrator Password
·
As soon as possible after running the wizard and
beginning to use Cloudera Manager, you should change the default administrator
password.
·
To change the administrator password:
·
·
Click the gear icon to display the Administration page (Right TOP Corner).
·
Click the Users
tab.
·
Click the Change
Password button next to the admin account.
·
Enter a new password twice and then click Submit.
Step 12: Test
the Installation
·










