Ủng hộ chi phí duy trì website

Vì tình hình dịch bệnh Covid19, mình đang gặp khó khăn trong việc duy trì website. Mình rất mong nhận được sự giúp đỡ của các bạn để tiếp tục có kinh phí vận hành website.

Set up Apache Spark on a Multi-Node Cluster

Rahul Nayak
Mar 8, 2018 · 4 min read

This blog explains how to install Apache Spark on a multi-node cluster. This guide provides step by step instructions to deploy and configure Apache Spark on the real multi-node cluster.

Spark is a fast and powerful framework that provides an API to perform massive distributed processing over resilient sets of data.

Recommended Platform

For Apache Spark installation on a multi-node cluster, we will be needing multiple nodes, for that we can use multiple machines or AWS instances.

Spark Architecture

  • Master Daemon — (Master/Driver Process)
  • Worker Daemon –(Slave Process)
  • Cluster Manager
Spark Architecture

A spark cluster has a single Master and any number of Slaves/Workers. The driver and the executors run their individual Java processes and users can run them on the same horizontal spark cluster or on separate machines i.e. in a vertical spark cluster or in mixed machine configuration.

Prerequisites

Create a user of same name in master and all slaves to make your tasks easier during ssh and also switch to that user in master.

Add entries in hosts file (master and slaves)

$ sudo vim /etc/hosts

Now add entries of master and slaves in hosts file.

<MASTER-IP> master
<SLAVE01-IP> slave01
<SLAVE02-IP> slave02

Install Java 7 (master and slaves)

$ sudo apt-get install python-software-properties
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java7-installer

To check if java is installed, run the following command.

$ java -version

Install Scala (master and slaves)

$ sudo apt-get install scala

To check if scala is installed, run the following command.

$ scala -version

Configure SSH (only master)

$ sudo apt-get install openssh-server openssh-client

Generate key pairs

$ ssh-keygen -t rsa -P ""

Configure passwordless SSH

Copy the content of .ssh/id_rsa.pub (of master) to .ssh/authorized_keys (of all the slaves as well as master).

Check by SSH to all the slaves

$ ssh slave01
$ ssh slave02

Install Spark

Download latest version of Spark

$ wget http://www-us.apache.org/dist/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz

Extract Spark tar

$ tar xvf spark-2.3.0-bin-hadoop2.7.tgz

Move Spark software files

$ sudo mv spark-2.3.0-bin-hadoop2.7 /usr/local/spark

Set up the environment for Spark

$ sudo vim ~/.bashrc

Add the following line to ~/.bashrc file. It means adding the location, where the spark software file are located to the PATH variable.

export PATH = $PATH:/usr/local/spark/bin

Use the following command for sourcing the ~/.bashrc file.

$ source ~/.bashrc

Note: The whole spark installation procedure must be done in master as well as in all slaves.

Spark Master Configuration

Edit spark-env.sh

$ cd /usr/local/spark/conf
$ cp spark-env.sh.template spark-env.sh

Now edit the configuration file spark-env.sh.

$ sudo vim spark-env.sh

And set the following parameters.

export SPARK_MASTER_HOST='<MASTER-IP>'export JAVA_HOME=<Path_of_JAVA_installation>

Add Workers

$ sudo vim slaves

And add the following entries.

master
slave01
slave02

Start Spark Cluster

$ cd /usr/local/spark
$ ./sbin/start-all.sh

To stop the spark cluster, run the following command on master.

$ cd /usr/local/spark
$ ./sbin/stop-all.sh

Check whether services have been started

$ jps

Spark Web UI

Spark Master UI

http://<MASTER-IP>:8080/

Spark Application UI

http://<MASTER_IP>:4040/

You can proceed further with Spark shell commands to play with Spark.

YML Innovation Lab

Imagine a non-billable team that’s daily RFP is to…

YML Innovation Lab

Imagine a non-billable team that’s daily RFP is to experiment, break things, and embody the bleeding edge of technology. Enter YML’s Innovation Lab; a team of AI and machine learning experts dreaming up the future of technology.

Rahul Nayak

Written by

Just a guy

YML Innovation Lab

Imagine a non-billable team that’s daily RFP is to experiment, break things, and embody the bleeding edge of technology. Enter YML’s Innovation Lab; a team of AI and machine learning experts dreaming up the future of technology.