Hadoop and Spark Installation on Raspberry Pi-3 Cluster – Part-2

Cluster Architecture

  • Master Node will be connected to home router via WiFi
  • Master Node to Slave Node connection will be established through Ethernet switch via Ethernet Cables
  • From my MAC (which will be on my home network), I will be able to SSH to the master node and then control the whole cluster

For Spark/Hadoop Cluster, there are few more TODOs that we need to take care of

  • Update /etc/hosts on every node (master and slave) with hostname and ip_address of every other node
  • Use same super user and group to do all installations on every node
  • Enable SSH on every node and establish passwordless SSH communication from Master to every Slave node.
  • Install zip/unzip and java on every node.

In this part we will see SINGLE NODE SETUP. I will be using MAC to perform all these steps.

Step #1: Load Raspbian Pi image onto the MicroSD Card

  • Download SD Formatter from https://www.sdcard.org/downloads/formatter_4/eula_mac/index.html
  • Format the disk (follow steps from google)
  • Download the Raspbian_Lite OS from https://www.raspberrypi.org/ and follow these instructions
    • diskutil list (this will list the newly added disk in my case it was /dev/disk4)
    • diskutil unmountDisk /dev/disk4
    • sudo dd bs=1m if=~/Downloads/2017-01-11-raspbian-jessie-lite.img of=/dev/rdisk4

Step #2: Configure the PI, connect to WiFi and upgrade all latest patches.

  • sudo raspi-config
    • Change pwd
    • Localization options (change locale to us_en, timezone to US-Eastern and wifi country to US)
    • Advanced Options – Mem Split from 64 to 16(becos of rasbian_lite OS will take very less footprint and it does not have any UI)
    • Interfacing Options – Enable SSH
  • sudo vi /etc/network/interfaces
    • Ethernet eth0 is the wired connection which we will be using
    • wlan0 is the wifi adapter on the board: Configure wifi so that we can use wifi to get updates on the Raspbian Lite OS
    • change manual to dhcp => this tells the interface that we get the settings via dhcp AND add SSID and PWD of home router
      • change the line “iface wlan0 inet manual” to
 
iface wlan0 inet dhcp
    wpa-ssid "SSID/NETWORK_NAME"
    wpa-psk "PASSWORD"
 
  •  Reboot the interface
    • sudo ifdown wlan0
    • sudo ifup wlan0
  • Now Raspberry PI is connected to WIFI
  • Update and Upgrade the Raspberry PI
    • sudo apt-get update
    • sudo apt-get upgrade

Step #3: Create separate superuser and group

  • We will use this user and group for all our core installations and configuration changes on all nodes
 
sudo addgroup hadoop
sudo adduser --ingroup hadoop hduser
sudo adduser hduser sudo
su hduser
 

Step #4: Download and Install REQUIRED SOFTWARES

  • Download and install zip utility
    • sudo apt-get install zip unzip
  • Download and install java
    • sudo apt-get install oracle-java7-jdk
  • Download, install and configure Spark
wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz
    sudo tar -xvzf spark-2.1.0-bin-hadoop2.7.tgz -C /opt/
    sudo chown -R hduser /opt/spark-2.1.0-bin-hadoop2.7
    source ~/.bashrc
    cp $SPARK_HOME/conf/spark-env.sh.template $SPARK_HOME/conf/spark-env.sh  
        SPARK_MASTER_IP=192.168.2.1
        SPARK_WORKER_MEMORY=512m
  • Download, install and configure Hadoop
    wget http://apache.osuosl.org/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
    sudo mkdir /opt
    cd ~
    sudo tar -xvzf hadoop-2.7.1.tar.gz -C /opt/
    cd /opt
    sudo chown -R hduser:hadoop hadoop-2.7.1/
    sudo nano /opt/hadoop-2.7.3/etc/hadoop/
    export JAVA_HOME=/usr/lib/jvm/jdk-7-oracle-arm-vfp-hflt/jre
    
  • sudo nano /opt/hadoop-2.7.3/etc/hadoop/hdfs-site.xml
     
            <property>
                    <name>dfs.replication</name>
                    <value>1</value>
            </property>
    
  • sudo nano /opt/hadoop-2.7.3/etc/hadoop/core-site.xml
     
            <property>
                    <name>fs.default.name</name>
                    <value>hdfs://rpi3-0:54310</value>
            </property>
            <property>
                    <name>hadoop.tmp.dir</name>
                    <value>/hdfs/tmp</value>
            </property>
    
  • sudo nano /opt/hadoop-2.7.3/etc/hadoop/mapred-site.xml
     
      <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
      </property>
      <property>
        <name>mapreduce.map.memory.mb</name>
        <value>256</value>
      </property>
      <property>
        <name>mapreduce.map.java.opts</name>
        <value>-Xmx204m</value>
      </property>
      <property>
        <name>mapreduce.reduce.memory.mb</name>
        <value>102</value>
      </property>
      <property>
        <name>mapreduce.reduce.java.opts</name>
        <value>-Xmx102m</value>
      </property>
      <property>
        <name>yarn.app.mapreduce.am.resource.mb</name>
        <value>128</value>
      </property>
      <property>
        <name>yarn.app.mapreduce.am.command-opts</name>
        <value>-Xmx102m</value>
      </property>
    
  • sudo nano /opt/hadoop-2.7.3/etc/hadoop/yarn-site.xml
     
            <property>
                    <name>yarn.resourcemanager.resource-tracker.address</name>
                    <value>rpi3-0:8025</value>
            </property>
            <property>
                    <name>yarn.resourcemanager.scheduler.address</name>
                    <value>rpi3-0:8030</value>
            </property>
            <property>
                    <name>yarn.resourcemanager.address</name>
                    <value>rpi3-0:8050</value>
            </property>
            <property>
                    <name>yarn.nodemanager.aux-services</name>
                    <value>mapreduce_shuffle</value>
            </property>
            <property>
                    <name>yarn.nodemanager.resource.cpu-vcores</name>
                    <value>4</value>
            </property>
            <property>
                    <name>yarn.nodemanager.resource.memory-mb</name>
                    <value>1024</value>
            </property>
            <property>
                    <name>yarn.scheduler.minimum-allocation-mb</name>
                    <value>128</value>
            </property>
            <property>
                    <name>yarn.scheduler.maximum-allocation-mb</name>
                    <value>1024</value>
            </property>
            <property>
                    <name>yarn.scheduler.minimum-allocation-vcores</name>
                    <value>1</value>
            </property>
            <property>
                    <name>yarn.scheduler.maximum-allocation-vcores</name>
                    <value>4</value>
            </property>
            <property>
                    <name>yarn.nodemanager.vmem-check-enabled</name>
                    <value>false</value>
            </property>
            <property>
                    <name>yarn.nodemanager.pmem-check-enabled</name>
                    <value>true</value>
            </property>
            <property>
                    <name>yarn.nodemanager.vmem-pmem-ratio</name>
                    <value>4</value>
            </property>
    
  • Add environment variables to the bashrc file
    sudo nano ~/.bashrc

    export JAVA_HOME=/usr/lib/jvm/jdk-7-oracle-arm-vfp-hflt/jre
    export HADOOP_HOME=/opt/hadoop-2.7.3
    export HADOOP_MAPRED_HOME=$HADOOP_HOME
    export HADOOP_COMMON_HOME=$HADOOP_HOME
    export HADOOP_HDFS_HOME=$HADOOP_HOME
    export YARN_HOME=$HADOOP_HOME
    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
    
    export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.7 
    export PATH=$PATH:$SPARK_HOME/bin  
    

Step #5: Create image from SD Card and clone it to all other PIs on the cluster

  • Switch off the Raspberry Pi-3 and take the sd-card out and plug it into mac
  • Run the below commands
    • diskutil list
    • sudo dd if=/dev/disk4 of=~/Downloads/raspberrypi_base_with_hdp.dmg
  • On other nodes
    • diskutil unmountDisk /dev/disk4
    • sudo dd bs=1m if=~/Downloads/raspberrypi_base_with_hdp.dmg of=/dev/rdisk4

References:
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
https://dqydj.com/raspberry-pi-hadoop-cluster-apache-spark-yarn/

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Mawazo

Mostly technology with occasional sprinkling of other random thoughts

amintabar

Amir Amintabar's personal page

101 Books

Reading my way through Time Magazine's 100 Greatest Novels since 1923 (plus Ulysses)

Seek, Plunnge and more...

My words, my world...

ARRM Foundation

Do not wait for leaders; do it alone, person to person - Mother Teresa

Executive Management

An unexamined life is not worth living – Socrates

javaproffesionals

A topnotch WordPress.com site

thehandwritinganalyst

Just another WordPress.com site

coding algorithms

"An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem." -- John Tukey

%d bloggers like this: