Hadoop and Spark Installation on Raspberry Pi-3 Cluster – Part-2

Cluster Architecture

  • Master Node will be connected to home router via WiFi
  • Master Node to Slave Node connection will be established through Ethernet switch via Ethernet Cables
  • From my MAC (which will be on my home network), I will be able to SSH to the master node and then control the whole cluster

For Spark/Hadoop Cluster, there are few more TODOs that we need to take care of

  • Update /etc/hosts on every node (master and slave) with hostname and ip_address of every other node
  • Use same super user and group to do all installations on every node
  • Enable SSH on every node and establish passwordless SSH communication from Master to every Slave node.
  • Install zip/unzip and java on every node.

In this part we will see SINGLE NODE SETUP. I will be using MAC to perform all these steps.

Step #1: Load Raspbian Pi image onto the MicroSD Card

  • Download SD Formatter from https://www.sdcard.org/downloads/formatter_4/eula_mac/index.html
  • Format the disk (follow steps from google)
  • Download the Raspbian_Lite OS from https://www.raspberrypi.org/ and follow these instructions
    • diskutil list (this will list the newly added disk in my case it was /dev/disk4)
    • diskutil unmountDisk /dev/disk4
    • sudo dd bs=1m if=~/Downloads/2017-01-11-raspbian-jessie-lite.img of=/dev/rdisk4

Step #2: Configure the PI, connect to WiFi and upgrade all latest patches.

  • sudo raspi-config
    • Change pwd
    • Localization options (change locale to us_en, timezone to US-Eastern and wifi country to US)
    • Advanced Options – Mem Split from 64 to 16(becos of rasbian_lite OS will take very less footprint and it does not have any UI)
    • Interfacing Options – Enable SSH
  • sudo vi /etc/network/interfaces
    • Ethernet eth0 is the wired connection which we will be using
    • wlan0 is the wifi adapter on the board: Configure wifi so that we can use wifi to get updates on the Raspbian Lite OS
    • change manual to dhcp => this tells the interface that we get the settings via dhcp AND add SSID and PWD of home router
      • change the line “iface wlan0 inet manual” to
 
iface wlan0 inet dhcp
    wpa-ssid "SSID/NETWORK_NAME"
    wpa-psk "PASSWORD"
 
  •  Reboot the interface
    • sudo ifdown wlan0
    • sudo ifup wlan0
  • Now Raspberry PI is connected to WIFI
  • Update and Upgrade the Raspberry PI
    • sudo apt-get update
    • sudo apt-get upgrade

Step #3: Create separate superuser and group

  • We will use this user and group for all our core installations and configuration changes on all nodes
 
sudo addgroup hadoop
sudo adduser --ingroup hadoop hduser
sudo adduser hduser sudo
su hduser
 

Step #4: Download and Install REQUIRED SOFTWARES

  • Download and install zip utility
    • sudo apt-get install zip unzip
  • Download and install java
    • sudo apt-get install oracle-java7-jdk
  • Download, install and configure Spark
wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz
    sudo tar -xvzf spark-2.1.0-bin-hadoop2.7.tgz -C /opt/
    sudo chown -R hduser /opt/spark-2.1.0-bin-hadoop2.7
    source ~/.bashrc
    cp $SPARK_HOME/conf/spark-env.sh.template $SPARK_HOME/conf/spark-env.sh  
        SPARK_MASTER_IP=192.168.2.1
        SPARK_WORKER_MEMORY=512m
  • Download, install and configure Hadoop
    wget http://apache.osuosl.org/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
    sudo mkdir /opt
    cd ~
    sudo tar -xvzf hadoop-2.7.1.tar.gz -C /opt/
    cd /opt
    sudo chown -R hduser:hadoop hadoop-2.7.1/
    sudo nano /opt/hadoop-2.7.3/etc/hadoop/
    export JAVA_HOME=/usr/lib/jvm/jdk-7-oracle-arm-vfp-hflt/jre
    
  • sudo nano /opt/hadoop-2.7.3/etc/hadoop/hdfs-site.xml
     
            <property>
                    <name>dfs.replication</name>
                    <value>1</value>
            </property>
    
  • sudo nano /opt/hadoop-2.7.3/etc/hadoop/core-site.xml
     
            <property>
                    <name>fs.default.name</name>
                    <value>hdfs://rpi3-0:54310</value>
            </property>
            <property>
                    <name>hadoop.tmp.dir</name>
                    <value>/hdfs/tmp</value>
            </property>
    
  • sudo nano /opt/hadoop-2.7.3/etc/hadoop/mapred-site.xml
     
      <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
      </property>
      <property>
        <name>mapreduce.map.memory.mb</name>
        <value>256</value>
      </property>
      <property>
        <name>mapreduce.map.java.opts</name>
        <value>-Xmx204m</value>
      </property>
      <property>
        <name>mapreduce.reduce.memory.mb</name>
        <value>102</value>
      </property>
      <property>
        <name>mapreduce.reduce.java.opts</name>
        <value>-Xmx102m</value>
      </property>
      <property>
        <name>yarn.app.mapreduce.am.resource.mb</name>
        <value>128</value>
      </property>
      <property>
        <name>yarn.app.mapreduce.am.command-opts</name>
        <value>-Xmx102m</value>
      </property>
    
  • sudo nano /opt/hadoop-2.7.3/etc/hadoop/yarn-site.xml
     
            <property>
                    <name>yarn.resourcemanager.resource-tracker.address</name>
                    <value>rpi3-0:8025</value>
            </property>
            <property>
                    <name>yarn.resourcemanager.scheduler.address</name>
                    <value>rpi3-0:8030</value>
            </property>
            <property>
                    <name>yarn.resourcemanager.address</name>
                    <value>rpi3-0:8050</value>
            </property>
            <property>
                    <name>yarn.nodemanager.aux-services</name>
                    <value>mapreduce_shuffle</value>
            </property>
            <property>
                    <name>yarn.nodemanager.resource.cpu-vcores</name>
                    <value>4</value>
            </property>
            <property>
                    <name>yarn.nodemanager.resource.memory-mb</name>
                    <value>1024</value>
            </property>
            <property>
                    <name>yarn.scheduler.minimum-allocation-mb</name>
                    <value>128</value>
            </property>
            <property>
                    <name>yarn.scheduler.maximum-allocation-mb</name>
                    <value>1024</value>
            </property>
            <property>
                    <name>yarn.scheduler.minimum-allocation-vcores</name>
                    <value>1</value>
            </property>
            <property>
                    <name>yarn.scheduler.maximum-allocation-vcores</name>
                    <value>4</value>
            </property>
            <property>
                    <name>yarn.nodemanager.vmem-check-enabled</name>
                    <value>false</value>
            </property>
            <property>
                    <name>yarn.nodemanager.pmem-check-enabled</name>
                    <value>true</value>
            </property>
            <property>
                    <name>yarn.nodemanager.vmem-pmem-ratio</name>
                    <value>4</value>
            </property>
    
  • Add environment variables to the bashrc file
    sudo nano ~/.bashrc

    export JAVA_HOME=/usr/lib/jvm/jdk-7-oracle-arm-vfp-hflt/jre
    export HADOOP_HOME=/opt/hadoop-2.7.3
    export HADOOP_MAPRED_HOME=$HADOOP_HOME
    export HADOOP_COMMON_HOME=$HADOOP_HOME
    export HADOOP_HDFS_HOME=$HADOOP_HOME
    export YARN_HOME=$HADOOP_HOME
    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
    
    export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.7 
    export PATH=$PATH:$SPARK_HOME/bin  
    

Step #5: Create image from SD Card and clone it to all other PIs on the cluster

  • Switch off the Raspberry Pi-3 and take the sd-card out and plug it into mac
  • Run the below commands
    • diskutil list
    • sudo dd if=/dev/disk4 of=~/Downloads/raspberrypi_base_with_hdp.dmg
  • On other nodes
    • diskutil unmountDisk /dev/disk4
    • sudo dd bs=1m if=~/Downloads/raspberrypi_base_with_hdp.dmg of=/dev/rdisk4

References:
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
https://dqydj.com/raspberry-pi-hadoop-cluster-apache-spark-yarn/

Hadoop and Spark Installation on Raspberry Pi-3 Cluster – Part-1

In this part we will go through the hardware configurations of the Raspberyy PI-3 4 Node Cluster setup.

Hardware Configuration


raspi3

 

  1. Raspberry Pi 3 Model B 2016 Single Board Computer – 4 = 120 (bought from Microcentre)

 

sd_card

 

2. Samsung EVO 32GB Class 10 Micro SDHC Card with Adapter (MB-MP32DA/AM) – 4 (12) = $48

 

ethernet_cables

 

3. Cable Matters 8-Pack, Cat5E Snagless Ethernet Patch Cable in Blue 3 Feet – 4 = $10

 

 

 

 

8port_ethernet_switch

 

4. TRENDnet 8-Port Unmanaged 10/100 Mbps GREENnet Ethernet Desktop Plastic Housing Switch,TE100-S8 – 1 = $15

 

usb_cables

 

5. Anker [4-Pack] PowerLine Micro USB (1ft) – 1 = $10

 

usb2typem_cable

 

6. StarTech.com USB2TYPEM 3 Feet USB to Type M Barrel 5V DC Power Cable – 1 = $4

 

nylon_standoffs

 

7. 100 Pcs M2.5 x 10mm + 6mm PC Board Hexagonal Hex Threaded Spacer – 1 = $14

 

usb_switch8. ORICO DUB-10P-WH 96W 10 Ports Family-Sized Smart Super USB Charger with 10 × 5V 2.4A Port – White – 1 = $20

 

machine_screws

 

 

9. Machine Screws (bought from Home Depot) – 2 = under $2

 

 

 

10. Thin card board from Home Depot – $4

Total Budget under $250

Except Machine Screws, Card Board and RaspberryPis all above things are bought in Amazon and NewEgg.

Additionally you can buy heat sinks and fans based on the load and extent you are going to use this cluster.

My personal cluster setup Steps:

  • Drill 4 holes on to the card board as per the holes available on the Raspberry Pi
  • Stack two nylon standoffs together form 20 such sets
  • Insert screws from below the cardboard and 4 stacked nylon standoffs from above the cardboard
  • Keep stacking Raspberry Pis (RPIs) one by one and tighten the standoffs.
  • Connect 4 micro USB cables from RPIs to the USB switch
  • Connect 4 ethernet cables from RPIs to the Ethernet switch
  • Connect Ethernet switch to the USB switch via USB2TYPEM cable.
  • Insert 32G samsung sd card into each of the RPIs
  • Switch on the USB switch and the entire cluster will be powered.

cluster_1 cluster_2

 

Mawazo

Mostly technology with occasional sprinkling of other random thoughts

amintabar

Amir Amintabar's personal page

101 Books

Reading my way through Time Magazine's 100 Greatest Novels since 1923 (plus Ulysses)

Seek, Plunnge and more...

My words, my world...

ARRM Foundation

Do not wait for leaders; do it alone, person to person - Mother Teresa

Executive Management

An unexamined life is not worth living – Socrates

javaproffesionals

A topnotch WordPress.com site

thehandwritinganalyst

Just another WordPress.com site

coding algorithms

"An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem." -- John Tukey

%d bloggers like this: