Variables in Javascript

Javascript is type inferenced which mean based on the value it will identify the type and hence we dont have to explicitly specify the type.

There are 3 primitive data types and undefined, null

  • Number: These are double precision 64 bit format (there are no Integers, Short, Float so on)
  • String: Sequqnces of Unicode Characters (no char type)
  • Boolean: It is of type ‘Boolean’ with two possible values ‘true/false’
  • undefined: It is of type ‘undefined’ with two possible values ‘undefined’. Value assigned to every declared variable until its defined.
    • Ex var value; value = 10; //value between these two statements is value ‘undefined’ of type ‘undefined’
  • null: It is of type ‘null’ with two possible values ‘null’.
  • In ECMA6, new variable called Symbol is introduced just like ENUMs

Note: There is no scoping information attached to variable declarations and hence all declared variables are by default global.

Variables and Values can be interrogated using ‘typeof’

typeof <variable>
typeof <value>
var a;
console.log(typeof a);
a = 10;
console.log(typeof a);
a = "hello";
console.log(typeof a);
a = true;
console.log(typeof a);
a = null;
console.log(typeof a);

Null is a typeof Object
* a=null; typeof a – returns object instead of null, it was a bug in early versions of JS but its not fixed in newer versions bcos it breaks backward compatibility and many web applications will break.

Type Coercion: As JS was introduced to be friendly  language with developers and they did type conversion of variables used in the expression and this lead to many confusions and later they fixed it but because of backward compatibility we still live with them. One such issue is ‘==’

  • double equals == and triple equals ===
  • 12 + “4” -> results in “124” because it looks at expression and finds one is string and other is number and hence coerce number into string so that it can do string concatenation
    * JS does a lot of type coercion and hence the behavior is unpredictable lot of times, beware of it
var a = 10;
var b = "10";
a == b -> returns true, whereas
a === b -> returns false

Values of all types have associated boolean value

  • Non zero numbers can be passed to a if loop which returns true
  • Non empty strings can be passed to a if loop which returns true
  • undefined and null are always false
var a = 10;
if(a) {
console.log("a is true");
} else {
console.log("a is false");
}

a = 0;
if(a) {
console.log("a is true");
} else {
console.log("a is false");
}

a = "Hello";
if(a) {
console.log("a is true");
} else {
console.log("a is false");
}

a = "";
if(a) {
console.log("a is true");
} else {
console.log("a is false");
}

Advertisements

Objects in Javascript

Objects in Javascript are free form which means that we can add/remove fields and methods whenever we want. They are not bound to a particular class (like in Java, in fact there is no class concept in JS).

Easiest way to create a object is object inline like – “var obj = {};” Objects can also be created via object literal

 
// create object inline
var myObj = {}; 
// FREE FORM: dynamically attach property to the above object
myObj.prop1 = "Hello"; 
console.log(myObj);
// Prints the above attached property
console.log(myObj1.prop1); 
// If we try to access the property which is not declared in object, we get it as 'undefined'
console.log(myObj1.prop2); 
delete myObj1.prop1; // FREE FORM: delete property from an Object

Properties and Access Specifiers
All properties within object are public as they dont come with any access specifier. Properties of an object can be accessed in 2 ways
– Using dot notation
– Using square brackets
When to use dot notation vs square brackets?

  • Use [] notation
    • when property is a reserved keyword or invalid identifier
    • when property name is dynamic
    •  var myObj1 = {
      "prop1": "Hello",
      "prop2": 10,
      "1": "one"
      }
      console.log(myObj1.1); // will result in error
      var propertyName = "prop2";
      console.log(myObj1.propertyName);
      
  • Prefer DOT NOTATION over [] NOTATION because JS Engine can do some optimizations upfront with DOT NOTATION. And DOT NOTATION is also faster compared to [] approach.

Two objects can be compared with ‘===’ operator
var myObj2 = myObj1;
(myObj1 === myObj2) -> returns true

Objects with undefined and null
If we dont define a property then its gonna be undefined, but if we wanna define a property with empty value then we initialize it with null value.

Introduction to javascript

Javascript is created in early 90s by Brendan Eich at Netscape and is later introduced as a standard specification by ECMA committee. Current version supported by most browsers is ECMA5 and newly released version is ECMA 2015 aka ECMA6

Javascript is a lightweight, interpreted or JIT compiled programming language with first class functions.
It is a prototype based, multi-paradigm, dynamic language, scripting language, supporting object oriented, imperative, declarative and functional programming styles. There are lot of buzz words, and we will see each one of them

  • Lightweight: It has very less foot print in the machine that its running
  • Interpreted: We do not explicitly compile JS programs (like we do in JAVA, instead its compiled on the go)
  • First Class Functions: Functions are first class citizens in JS which means
    • we can assign functions to variables
    • we can pass functions as method params
    • we can return a function as a return type from a method
  • Multi Paradigm: It can support all programming paradigms
  • Object Oriented: model state and behavior around objects to do sth
  • Imperative: step by step instructions on HOW TO DO sth (like C)
  • Declarative: we tell WHAT TO DO rather than HOW TO DO (like scala)
  • Functional: subset of declarative language style (scala)
  • Dynamic Language: Method binding to a object is done at runtime rather than at the compile time and during compilation time, compiler wont report (for example, like Java does)
  • Scripting Language: Instructions written to execute on runtime environment (like unix scripting enables with the deployment of web applications), JS is used with modifying DOM structure at the browser runtime.

Why so many programmers are not comfortable with JS by tagging it as a front end technology?
– Its because of the above traits of the language.
– Because of backward compatibility lot of bugs remained as bugs forever in JS (like == vs === , null typeof object, so on)
– Initially when it was introduced, it was meant to be friendly language and the hence it internally did lot of type coercions

Why learn JS? With NodsJS, Javascript evolved a lot and now it is being used widely across different layers

  • Client side web development
    • Native JS
    • JQuery
    • Angular, React
  • Server side
    • NodeJS
    • Express
    • Browser Extensions
      so on..

Javascript runtime is usually a browser but as we learn this we can either use nodejs or mozilla firefox’s scratchpad to write and execute JS programs.

References:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference
https://www.youtube.com/user/koushks

Java8 Streams

Streams are functional programming design pattern for processing of elements of a data structure sequentially or parallely.

Example:

 
List<Order> orders = getOrders();
int qtySum = orders.stream()
                .filter(o -> o.getType().equals("ONLINE"))
                .mapToInt(o -> o.getQuantity())
                .sum();

Above code can be explained as below:

  • Create a stream from source java collection
  • Add a filter operation to the stream “intermediate operations pipeline”
  • Add a map operation to the stream “intermediate operations pipeline”
  • Add a terminal operation that kicks off the stream processing

A Stream has

  • Source: that stream can pull objects from
  • Pipeline: of operations that can execute on the elements of the stream
  • Terminal: operation that pull values down the stream

Streams are lazily evaluated and the stream lifecycle is

  • Creation: from source
  • Configuration: from collection of pipeline operations
  • Execution: (terminal operation is invoked)
  • Cleanup

Few java stream sources

 
// Number Stream
LongStream.range(0, 5).forEach(System.out::println)

// Collection Streams
List<String> cities = Arrays.asList("nyc", "edison", "livingston");
cities.stream().forEach(System.out::println)

// Character Srream
int cnt = "ABC".chars().count()

// File Streams
Path filePath = new Path("/user/ntallapa/test");
Files.list(filePath).forEach(System.out::println);

Stream Terminal Operations

  • Reduction Terminal Operations: results in single result
    • reduce(f(x)), sum(f(x)), min(f(x)), max(f(x)), etc
  • Mutable Reduction Terminal Operations: returns multiple results in a container data structure
    • List integers = Arrays.asList(1,2,3,3);
    • Set integersSet = integers.stream().collect(Collectors.toSet()); // returns 1,2,3
  • Search Terminal Operations: returns result as soon as match is found
    • findFirst(), findAny(), anyMatch(f(x))
  • Generic Terminal Operations: do any kind of processing on every element

Stream Pipeline Rule: Since streams are meant to process elements sequentially/parallely, stream source is not allowed to modify

Intermediate Pipeline operations can be of two types

  • Stateless: filter(f(x)), summaryStatistics()
  • Statefull: distinct(), limit(n), skip(n)

References:
https://www.youtube.com/watch?v=MLksirK9nnE
https://www.youtube.com/watch?v=8pDm_kH4YKY

Java8 Lambdas

Lambdas enable functional programming in Java. Concept of Lambda is available in many languages but the beauty of it in Java is its BACKWARD COMPATIBILITY. In this post we will discuss below areas

  • Concept
  • Syntax
  • Functional Interfaces
  • Variable Capture
  • Method References
  • Default Methods

Concept


Lambda can be defined as

  • a way of defining anonymous functions
  • It can be passed to variables
  • It can be passed to functions
  • Can be returned from functions

What are lambda good for?

  • These are the base for functional programming model
  • Makes parallel programming easier: If we want to make 100s of cores busy then its easier to do with functional programming than that of the object oriented programming
  • Write compact code (Hadoop1 in Java: 200k lines of code VS Spark1 in Scala: 25k Lines of Code)
  • Richer Data Structure Collections
  • Develop Cleaner APIs

Syntax

 
List<Integer> integers = Arrays.asList(1,2,3);
integers.forEach(x -> System.out.println(x));
OR
integers.forEach((x) -> {
    x = x+10;
    System.out.println(x);
});
OR
integers.forEach((Integer x) -> {
    x = x+10;
    System.out.println(x);
});

We can explicitly mention types, but Java8 compiler is able to do Type Inference

Lambda Expression in Java (from video: A Peek Under the Hood by Brian) is converted into a Function and then we call the generated function

Functional Interfaces (FI)


An interface with just one method (one NON-DEFAULT Method) is called a Functional Interface.

Prior to Java8 we use to write the function with signature, open a curly brace and then write body of the function and close it but In Java8:

 
// define FI
@FunctionalInterface 
    // enforces that the interface is FI (it fails compilation if below interface has more than one method)
    // Its optional and can be applied only to interfaces
public interface Consumer<T> {
    void accept(T t);
}

// give the definition
Consumer<Integer> consumer = x -> System.out.println(x);

// use it
List<Integer> integers = Arrays.asList(1,2,3);
integers.forEach(consumer);

Few things to notice here:

  • Here we are separating the body of the function (Line #10) from its signature(Line #6).
  • The method generated from lambda expression must have same signature as that of the FI(see Line#67: lambda takes one arg ‘x’, throws no Exception and returns nothing)
  • In Java8, the type of the lambda expression is same as that of the FI that lambda is assigned to. (see Line#67)

Variable Capture (VC)


Lambdas can interact with variables (local, instance and static) defined outside the body of lambda (aka VC).

 
List<Integer> integers = Arrays.asList(1,2,3);
int vc=10;
integers.forEach(x -> System.out.println(x+vc));

Note: Local variables accessed and used inside the Lambda are final and cannot be modified.

Lambda vs Anonymous Inner Classes

  • Inner classes can have state in the form of class level instance variables whereas lambdas cannot.
  • Inner Classes can have multiple methods whereas Lambda’s cannot
  • ‘this’ points to the object instance of anonymous inner class whereas it points to the enclosing object for lambda

java.util.function.* contains 43 most commonly used functional interfaces

  • Consumer: functions which takes argument of type T and returns void
  • Supplier: functions that takes no argument and returns a result of type T
  • Predicate: functions which takes argument of type T and returns boolean
  • Function<T, R>: function that takes an argument of type T and returns a result of type R

Method References (MRs)


As lambda being a way to define anonymous function, there is a good chance that the function we want to use exists already. In these cases, MRs can be used to pass an existing function in place where lambda is expected

 
@FunctionalInterface 
public interface Consumer<T> {
    void accept(T t);
}

public void doSomething(Integer x) {
    System.out.println(x);
}

Consumer<Integer> cons1 = x -> doSomething(x);
cons1.accept(1);
// Reuse with MR
Consumer<Integer> cons2 = Example::doSomething;
cons2.accept(2);

Note: The signature of the referenced method must match the signature of FI method.

By looking at above definition it is obvious that MR works on method with only one argument and no return type

Referencing a Constructor: Constructor method references are quite handy while working with Streams

 
// Create a new function which has a method that takes 'String' as parameter (LHS to arrow), returns 'Integer' (RHS to arrow as body of method)
Function<String, Integer> mapper1 = x -> new Integer(x);
System.out.println(mapper1.apply("11"));

// Refer a Cons
Function<String, Integer> mapper2 = x -> Integer::new;
System.out.println(mapper2.apply("22"));

References to a specific object instance method:

 
Consumer<Integer> cons1 = x -> doSomething(x);
cons1.accept(1);
// can also be written as: this invokes the println() method on System.out object by passing param '2'
Consumer<Integer> cons2 = System.out::println;
cons2.accept(2);

Default Methods ***


This is very important feature because it addresses Interface Evolution Problem: How a published interface (like List, Iterable, etc) can be evolved without breaking existing implementations (backward compatible)

Default Method: A default method on a java interface has an implementation provided in the interface and is inherited by the classes that implements it.

 
public Iterable<T> { 
    Iterator<T> iterator;
    
    default void forEach(Consumer<? super T> action) {
        for(T t: this) {
            action.accept(t);
        }
    }
}

References:
https://www.youtube.com/watch?v=MLksirK9nnE
https://www.youtube.com/watch?v=8pDm_kH4YKY

Spark Architecture

Spark can be launched in different modes and each of this mode has different architecture.

  1. Local: Single JVM
  2. Standalone: from datastax (Static Allocation)
  3. YARN: from Hadoop (Dynamic Allocation)
  4. Mesos: Spark’s own Arch (Dynamic Allocation)

In these, 2,3,4 are distributed architectures. Standalone and Mesos architectures are similar to that of YARN.

https://tekmarathon.com/2017/02/13/hadoop-2-x-architecture/

In YARN, there are two different modes

  • Spark YARN Client Mode Architecture: This is for Spark in Scala/Python shell (aka Interactive Mode). Here Spark Driver will be run in the Edge Node and if the Driver Program is killed or edge node crashes, the application gets killed.
  • Spark YARN Cluster Mode Architecture: This is when user submits spark application using spark-submit. Here the Spark Driver is initiated inside the Application Master.

Unlike Hadoop Driver Program, Spark Driver is also responsible for

  • DAG Scheduler and Task Scheduler: Once Executors are launched inside Containers, they will have direct communication with this Scheduler. This play far more important role than that of the YARN Scheduler in Spark Applications.
  • Spark UI: UI with application DAG, Jobs and Stages are all served by the Spark Driver.

Spark Terminology – Nodes, Containers, Executors, Cores/Slots, Tasks, Partitions, Jobs, Stages

  • Spark cluster can be formed with ‘n’ Nodes.
  • Each Node can have 1+ containers. Number of containers are decided based on the min and max container memory limits in yarn-site.xml.
  • Each Container must have exactly 1 Executor JVM.
  • Each Executor can have 1+ Slots (aka Cores). The minimum slots required for Spark application are 2. Recommended range is between 8-32. We can choose a maximum of 2-3x times actual physical cores on a node.
  • Tasks are run inside the Slots.Task is a unit of work assigned to Executor core/slot by the Task Scheduler.
  • Partition is a block of data(like blocks in HDFS file). Spark RDD is split into 1+ partitions. Each Partition requires one thread of computation (aka Task) and hence an RDD with ‘n’ partitions requires ‘n’ Tasks to perform any Transformation.
  • Jobs: A Spark Application is split into ‘n’ Jobs based on number of Actions inside it. Basically for every Action a Job will be launched.
  • Stages: A Job is divided into ‘m’ Stages. A Stage is a group that can be put together based on operations, for example: map() and filter() can put together into a stage. And this Stage is finally split into ‘n’ Tasks.

Dynamic Allocation in Spark

Dynamic Allocation is a spark feature that allows addition or removal of executors launched by the application dynamically to match the workload.

Unlike static allocation of resources (prior to 1.6.0) where spark used to reserve fixed amount of CPU and Memory resources, in Dynamic Allocation its purely based on the workload.

Note: This is the main difference between Spark Standalone Architecture (static allocation) and Spark YARN/Mesos Architecture.

 
// flag to enable/diable DA feature
spark.dynamicAllocation.enabled: true/false

// Application starts with this many executors
spark.dynamicAllocation.minExecutors: m

// Application can increase to this many executors at max
spark.dynamicAllocation.maxExecutors: n

// For the FIRST Time when this time is hit, number of executors 
// will be increased
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout: x secs

// Next Time onwards, whenever this time is hit, it increases 
// number of executors till maxExecutors is hit
spark.dynamicAllocation.schedulerBacklogTimeout: y secs

// It releases the executor when it sees no Task is scheduled 
// on Executor for this time
spark.dynamicAllocation.executorIdleTimeout: z secs

There were few issues with dynamic allocation in Streaming Applications because

  • Executors may never be idle as they run for every N secs
  • Receiver will be running on a Slot/Core inside Executor which is never finished and hence idleTimeout will never be hit

https://issues.apache.org/jira/browse/SPARK-12133

 
// For streaming applications, disable above switch and enable below one
spark.streaming.dynamicAllocation.enabled

Tuning Spark Applications

Tuning performance of  Spark Applications can be done at various stages

  • OS Level
  • JVM Level
  • YARN Level
  • Spark Level

OS Level
In yarn-site.xml, we can allocate physical and virtual memory for a container initialized the node.
https://tekmarathon.com/2017/02/13/important-yarn-configuration-properties/

JVM Level
We can look at the performance of the JVM Garbage Collection and then fine tune GC parameters
./bin/spark-submit –name “My app” –master yarn –conf spark.eventLog.enabled=false –conf “spark.executor.extraJavaOptions=-XX:OldSize=100M -XX:MaxNewSize=100M -XX:+PrintGCDetails -XX:+PrintGCTimeStamps” myApp.jar
http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html

YARN Level
While submitting the job, we can control
Number of executors (Executor is run inside container and 1 Executor per Container)
Memory for each executor
Number of cores for each executor (This value can be raised to a maximum of 2x times the actual cores, but beaware that it can also raise the bar for memory)
Memory Overhead
./bin/spark-submit –name “My app” –master yarn –num-executors 8 –executor-memory 4G –executor-cores 16 –conf “spark.yarn.executor.memoryOverhead=1024M” myApp.jar
https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

Spark Level
Prior to Spark 1.6.0, executor memory (spark.executor.memory) was split into two different pools
Storage Memory: Where it caches RDDs
Execution Memory: Where it holds execution objects
From 1.6.0 onwards, they are combined into a unified pool and there is no hard line split between the two. It is dynamically decided at run time on ratio of memory allocation for these two pools.

memory_allocation
https://0x0fff.com/spark-memory-management/

Based on all the above factors, we should target tuning the memory settings based on

  • Objectives (EFFICIENCY vs RELIABILITY) and
  •  Workloads (whether its a BATCH/STREAMING)


spark_mem_mgmt

Some TIPS:

    • Cost of garbage collection is directly proportional to the number of objects hence try to reduce number of objects (for example use Array(int) instead of List)
    • For Batch Applications use default GC (ParallelGC) and for Streaming Applications use ConcMarkSweepGC
// BATCH Apps: default GC
-XX:+UseParallelGC -XX:ParallelGCThreads=<#>

// Streaming Apps
-XX:+UseConcMarkSweepGC -XX:ParallelCMSThreads=<#>
OR
// G1 GC Available from Java7, which is considered as good replacement to CMS
--XX:+UseG1GC
    • KRYO Serialization: This is 10x times faster than Java Serialization. In general, for 1G disk file, it takes 2-3G to store it into memory which is is the cost of Java Serialization.
conf.set("spark.serializer", "org.apache.spark.serializer.KyroSer");
// We need to register our custom classes with KYRO Serializer
  • TACHYON: Use tachyon for off-heap storage. The advantage is that even if the Executor JVM crashes, it stays in the OFF_HEAP storage.
 

References:
https://www.youtube.com/watch?v=dTR30Fy02Yo&t=19s

Hadoop and Spark Installation on Raspberry Pi-3 Cluster – Part-4

In this part we will see the configuration of Slave Node. Here are the steps

  1. Mount second Raspberry Pi-3 device on the nylon standoffs (on top of Master Node)
  2. Load the image from part2 into a sd_card
  3. Insert the sd_card into one Raspberry Pi-3 (RPI) device
  4. Connect RPI to the keyboard via USB port
  5. Connect to monitor via HDMI cable
  6. Connect to Ethernet switch via ethernet port
  7. Connect to USB switch via micro usb slot
  8. Hadoop related changes on Slave node

Here Steps1-7 are all physical and hence I am skipping them.

Once the device is powered on, login via external keyboard and monitor and change the hostname from rpi3-0 (which comes from base image) to rpi3-1

Step #8: Hadoop Related Configuration


  • Setup HDFS
 
sudo mkdir -p /hdfs/tmp  
sudo chown hduser:hadoop /hdfs/tmp  
chmod 750 /hdfs/tmp  
hdfs namenode -format
  • Update /etc/hosts file
 
127.0.0.1	localhost
192.168.2.1	rpi3-0
192.168.2.101	rpi3-1
192.168.2.102	rpi3-2
192.168.2.103	rpi3-3
  • Repeat the above steps for each of the slave node. And for every addition of slave node, ensure
  • ssh is setup from master node to slave node
  • slaves file on master is updated
  • /etc/hosts file on both master and slave is updated

Start the hadoop/spark cluster


    • Start dfs and yarn services
 
cd /opt/hadoop-2.7.3/sbin 
start-dfs.sh 
start-yarn.sh 
    • On master node “jps” should show following
 
hduser@rpi3-0:~ $ jps
20421 ResourceManager
20526 NodeManager
19947 NameNode
20219 SecondaryNameNode
24555 Jps
20050 DataNode
    • On Slave Node “jps” should show following processes
 
hduser@rpi3-3:/opt/hadoop-2.7.3/logs $ jps
2294 NodeManager
2159 DataNode
2411 Jps
    • To verify the successful installation, run a hadoop and spark job in cluster mode and you will see the Application Master tracking URL.
    • Run spark Job
      • spark-submit –class com.learning.spark.SparkWordCount –master yarn –executor-memory 512m ~/word_count-0.0.1-SNAPSHOT.jar /ntallapa/word_count/text 2
    • Run example mapreduce job
      • hadoop jar /opt/hadoop-2.7.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /ntallapa/word_count/text /ntallapa/word_count/output

Hadoop and Spark Installation on Raspberry Pi-3 Cluster – Part-3

In this part we will see the configuration of Master Node. Here are the steps

  1. Mount first Raspberry Pi-3 device on the nylon standoffs
  2. Load the image from part2 into a sd_card
  3. Insert the sd_card into one Raspberry Pi-3 (RPI) device
  4. Connect RPI to the keyboard via USB port
  5. Connect to monitor via HDMI cable
  6. Connect to Ethernet switch via ethernet port
  7. Connect to USB switch via micro usb slot
  8. DHCPD Configuration
  9. NAT Configuration
  10. DHCPD Verification
  11. Hadoop related changes on Master node

Here Steps1-7 are all physical and hence I am skipping them.
master_node

Step #8: dhcpd configuration


This node will serve as DHCP server or NAT server and overall controller of the cluster

    • Goto “sudo raspi-config” -> Advanced Options -> HostName -> “rpi3-0” (make sure its rpi3-0 as its our first node)
    • sudo apt-get install isc-dhcp-server
    • sudo nano /etc/dhcp/dhcpd.conf
    • Define subnet which will be the network that all the RPI-3 nodes connect to.
 
subnet 192.168.2.0 netmask 255.255.255.0 {
        range 192.168.2.100 192.168.2.200;
        option broadcast-address 192.168.2.255;
        option routers 192.168.2.1;
        max-lease-time 7200;
        option domain-name "rpi3";
        option domain-name-servers 8.8.8.8;
}
  • Adjust server configuration
  • sudo nano /etc/default/isc-dhcp-server
  • Tell which interface to use at last line. (“eth0”)
  •  
    # On what interfaces should the DHCP server (dhcpd) serve DHCP requests?
    #       Separate multiple interfaces with spaces, e.g. "eth0 eth1".
    INTERFACES="eth0"
    
    
  • Configure the interfaces file of rpi3-0 in order to be served as dhcp server and nat server for the rest of the pi cluster
    • sudo nano /etc/network/interfaces
    • Make the below changes and reboot the PI
    •  
      auto eth0
      iface eth0 inet static
      	address 192.168.2.1
      	netmask 255.255.255.0
      

Step #9: NAT configuration


    • Now we will configure IP tables to provide Network Address Translation services on our master node rpi3-0
    • sudo nano /etc/sysctl.conf
    • uncomment “net.ipv4.ip_forward=1”
    • sudo sh -c “echo 1 > /proc/sys/net/ipv4/ip_forward”
    • Now it has been activated, run below 3 commands to configure IP Tables correctly
 
sudo iptables -t nat -A POSTROUTING -o wlan0 -j MASQUERADE
sudo iptables -A FORWARD -i wlan0 -o eth0 -m state --state RELATED
sudo iptables -A FORWARD -i eth0 -o wlan0 -j ACCEPT
  • Make sure we have this setup correct
  • sudo iptables -t nat -S
  • sudo iptables -S
  • In order to avoid loosing this config upon reboot, do
    • sudo sh -c “iptables-save > /etc/iptables.ipv4.nat” (save iptables configuration to a file)
    • sudo nano /etc/network/interfaces (add below line to interfaces file)
      post-up iptables-restore < /etc/iptables.ipv4.nat
 
auto eth0
iface eth0 inet static
	address 192.168.2.1
	netmask 255.255.255.0
	post-up iptables-restore  < /etc/iptables.ipv4.nat

Step #10: Verify dhcpd

  • To see the address that has been assigned to the new PI
  • cat /var/lib/dhcp/dhcpd.leases
  • This would also give us the MAC address of the newly added node
  • It is always handy to have the dhcp server assign fixed addresses to each node in the cluster so that its easy to remember the node by ipaddress. For instance next node in the cluster is rpi3-1 and it would be helpful to have a ip 192.168.2.101. To do this modify dhcp server config file
      • sudo nano /etc/dhcp/dhcpd.conf
     
    host rpi3-1 {
        hardware ethernet MAC_ADDRESS;
        fixed-address 192.168.2.101;
    }
    
    • Eventually this file will have a entry to all the nodes in the cluster
    • Now we can ssh into the new node via IP Address

Step #11: Hadoop Related Configuration


    • Setup SSH
 
su hduser 
cd ~  
mkdir .ssh  
ssh-keygen  
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys  
chmod 0750 ~/.ssh/authorized_keys  
// Lets say we added new slave node rpi3-1, then copy the ssh id to the slave node which will enable passwordless login
ssh-copy-id hduser@rpi3-1 (Repeat for each slave node)  
ssh hduser@rpi3-1
    • Setup HDFS
 
sudo mkdir -p /hdfs/tmp  
sudo chown hduser:hadoop /hdfs/tmp  
chmod 750 /hdfs/tmp  
hdfs namenode -format
    • Edit master and slave config files
        • sudo nano /opt/hadoop-2.7.3/etc/hadoop/masters
       
      rpi3-0
      
        • sudo nano /opt/hadoop-2.7.3/etc/hadoop/slaves
       
      rpi3-0
      rpi3-1
      rpi3-2
      rpi3-3
      
    • Update /etc/hosts file
 
127.0.0.1	localhost
192.168.2.1	rpi3-0
192.168.2.101	rpi3-1
192.168.2.102	rpi3-2
192.168.2.103	rpi3-3

References:
https://learn.adafruit.com/setting-up-a-raspberry-pi-as-a-wifi-access-point/install-software

Mawazo

Mostly technology with occasional sprinkling of other random thoughts

amintabar

Amir Amintabar's personal page

101 Books

Reading my way through Time Magazine's 100 Greatest Novels since 1923 (plus Ulysses)

Seek, Plunnge and more...

My words, my world...

ARRM Foundation

Do not wait for leaders; do it alone, person to person - Mother Teresa

Executive Management

An unexamined life is not worth living – Socrates

Diabolical or Smart

Nitwit, Blubber, Oddment, Tweak !!

javaproffesionals

A topnotch WordPress.com site

thehandwritinganalyst

Just another WordPress.com site

coding algorithms

"An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem." -- John Tukey