Hadoop and Spark Installation on Raspberry Pi-3 Cluster – Part-2
February 15, 2017 Leave a comment
Cluster Architecture
- Master Node will be connected to home router via WiFi
- Master Node to Slave Node connection will be established through Ethernet switch via Ethernet Cables
- From my MAC (which will be on my home network), I will be able to SSH to the master node and then control the whole cluster
For Spark/Hadoop Cluster, there are few more TODOs that we need to take care of
- Update /etc/hosts on every node (master and slave) with hostname and ip_address of every other node
- Use same super user and group to do all installations on every node
- Enable SSH on every node and establish passwordless SSH communication from Master to every Slave node.
- Install zip/unzip and java on every node.
In this part we will see SINGLE NODE SETUP. I will be using MAC to perform all these steps.
Step #1: Load Raspbian Pi image onto the MicroSD Card
- Download SD Formatter from https://www.sdcard.org/downloads/formatter_4/eula_mac/index.html
- Format the disk (follow steps from google)
- Download the Raspbian_Lite OS from https://www.raspberrypi.org/ and follow these instructions
- diskutil list (this will list the newly added disk in my case it was /dev/disk4)
- diskutil unmountDisk /dev/disk4
- sudo dd bs=1m if=~/Downloads/2017-01-11-raspbian-jessie-lite.img of=/dev/rdisk4
Step #2: Configure the PI, connect to WiFi and upgrade all latest patches.
- sudo raspi-config
- Change pwd
- Localization options (change locale to us_en, timezone to US-Eastern and wifi country to US)
- Advanced Options – Mem Split from 64 to 16(becos of rasbian_lite OS will take very less footprint and it does not have any UI)
- Interfacing Options – Enable SSH
- sudo vi /etc/network/interfaces
- Ethernet eth0 is the wired connection which we will be using
- wlan0 is the wifi adapter on the board: Configure wifi so that we can use wifi to get updates on the Raspbian Lite OS
- change manual to dhcp => this tells the interface that we get the settings via dhcp AND add SSID and PWD of home router
- change the line “iface wlan0 inet manual” to
iface wlan0 inet dhcp wpa-ssid "SSID/NETWORK_NAME" wpa-psk "PASSWORD"
- Reboot the interface
- sudo ifdown wlan0
- sudo ifup wlan0
- Now Raspberry PI is connected to WIFI
- Update and Upgrade the Raspberry PI
- sudo apt-get update
- sudo apt-get upgrade
Step #3: Create separate superuser and group
- We will use this user and group for all our core installations and configuration changes on all nodes
sudo addgroup hadoop sudo adduser --ingroup hadoop hduser sudo adduser hduser sudo su hduser
Step #4: Download and Install REQUIRED SOFTWARES
- Download and install zip utility
- sudo apt-get install zip unzip
- Download and install java
- sudo apt-get install oracle-java7-jdk
- Download, install and configure Spark
wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz sudo tar -xvzf spark-2.1.0-bin-hadoop2.7.tgz -C /opt/ sudo chown -R hduser /opt/spark-2.1.0-bin-hadoop2.7 source ~/.bashrc cp $SPARK_HOME/conf/spark-env.sh.template $SPARK_HOME/conf/spark-env.sh SPARK_MASTER_IP=192.168.2.1 SPARK_WORKER_MEMORY=512m
- Download, install and configure Hadoop
wget http://apache.osuosl.org/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz sudo mkdir /opt cd ~ sudo tar -xvzf hadoop-2.7.1.tar.gz -C /opt/ cd /opt sudo chown -R hduser:hadoop hadoop-2.7.1/ sudo nano /opt/hadoop-2.7.3/etc/hadoop/ export JAVA_HOME=/usr/lib/jvm/jdk-7-oracle-arm-vfp-hflt/jre
- sudo nano /opt/hadoop-2.7.3/etc/hadoop/hdfs-site.xml
<property> <name>dfs.replication</name> <value>1</value> </property>
- sudo nano /opt/hadoop-2.7.3/etc/hadoop/core-site.xml
<property> <name>fs.default.name</name> <value>hdfs://rpi3-0:54310</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/hdfs/tmp</value> </property>
- sudo nano /opt/hadoop-2.7.3/etc/hadoop/mapred-site.xml
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.map.memory.mb</name> <value>256</value> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Xmx204m</value> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>102</value> </property> <property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx102m</value> </property> <property> <name>yarn.app.mapreduce.am.resource.mb</name> <value>128</value> </property> <property> <name>yarn.app.mapreduce.am.command-opts</name> <value>-Xmx102m</value> </property>
- sudo nano /opt/hadoop-2.7.3/etc/hadoop/yarn-site.xml
<property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>rpi3-0:8025</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>rpi3-0:8030</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>rpi3-0:8050</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>4</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>1024</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>128</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>1024</value> </property> <property> <name>yarn.scheduler.minimum-allocation-vcores</name> <value>1</value> </property> <property> <name>yarn.scheduler.maximum-allocation-vcores</name> <value>4</value> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>true</value> </property> <property> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>4</value> </property>
- Add environment variables to the bashrc file
sudo nano ~/.bashrcexport JAVA_HOME=/usr/lib/jvm/jdk-7-oracle-arm-vfp-hflt/jre export HADOOP_HOME=/opt/hadoop-2.7.3 export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.7 export PATH=$PATH:$SPARK_HOME/bin
Step #5: Create image from SD Card and clone it to all other PIs on the cluster
- Switch off the Raspberry Pi-3 and take the sd-card out and plug it into mac
- Run the below commands
- diskutil list
- sudo dd if=/dev/disk4 of=~/Downloads/raspberrypi_base_with_hdp.dmg
- On other nodes
- diskutil unmountDisk /dev/disk4
- sudo dd bs=1m if=~/Downloads/raspberrypi_base_with_hdp.dmg of=/dev/rdisk4
References:
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
https://dqydj.com/raspberry-pi-hadoop-cluster-apache-spark-yarn/
Recent Comments