Setting up a Load-Balanced Cassandra Cluster v3 (Ubuntu Server 14.04 LTS VMs) on Microsoft Azure

Contents:

Introduction

Apache Cassandra is a highly available and a very scalable NoSQL database. In this article, I'll demonstrate how to deploy a Cassandra Cluster on Microsoft Azure platform.

At the time of writing, due to the fact that there are no articles online about setting up a cluster node for Cassandra v3, I have decided to write this. Also, this article is up to date with Azure new CLI tools and provides a full solution.

This articles assumes you are familiar with Azure, Azure CLI tools and Ubuntu OS.

This article is for deploying a small scale Cassandra Cluster of 2-8 nodes. If it is required to deploy a bigger scale Cassandra Cluster, you should use Chef or Salt Stack to automate the same process.

System Architecture

The architecture consists of a cluster of 4 Cassandra nodes that are load balanced as shown in the figure below:
load balanced cassandra virtual machines

This is just a demonstration (not a real flowchart)

The subnet used for this example will be maximally 10.0.0.0/24.
The nodes will have the following static IP addresses:

  • 10.0.0.4 and 10.0.0.6 for seeds
  • 10.0.0.8 and 10.0.0.10 for nodes

The seeds are responsible for broadcasting the available nodes to the other nodes.

Creating the first template VM

  • Create a new resource group, I named it "cassandra-group".
  • Using the resource manager, add a new Ubuntu Server 14.04 LTS VM. I named it "cass-tmp".
  • Set it up under subnet 10.0.0.0/24. I gave it the static IP address of 10.0.0.12.
  • Allow inbound traffic and outbound traffic at port tcp:9042.
  • Deploy it.
  • SSH to it.

Installing Cassandra 3

Oracle Java 8 and JNI are prerequisites for Cassandra v3.

  • Install Oracle Java 8 using :
sudo apt-add-repository ppa:webupd8team/java  
sudo apt-get update  
sudo apt-get install oracle-java8-installer

# Check that java is properly installed
java -version  

You might need to setup JAVA_HOME path variable export JAVA_HOME=/usr/lib/jvm/java-8-oracle if java -version failed.

  • Install JNA using:
sudo apt-get install libjna-java -y  
  • Install Cassandra:

This will set up the PPAs for Cassandra and the keys for verification.

echo "deb http://www.apache.org/dist/cassandra/debian 30x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list  
echo "deb-src http://www.apache.org/dist/cassandra/debian 30x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list

gpg --keyserver pgp.mit.edu --recv-keys F758CE318D77295D  
gpg --export --armor F758CE318D77295D | sudo apt-key add -  
gpg --keyserver pgp.mit.edu --recv-keys 2B5C1B00  
gpg --export --armor 2B5C1B00 | sudo apt-key add -  
gpg --keyserver pgp.mit.edu --recv-keys 0353B12C  
gpg --export --armor 0353B12C | sudo apt-key add -

sudo apt-get update  
sudo apt-get install cassandra  
  • Check if Cassandra is running and discover nodes:
sudo service cassandra status  
sudo nodetool status  

You should only see one node connected at localhost with status UN. U is for Up and N is for Normal

Configuring Cassandra for clustering

  • Stop Cassandra using sudo service cassandra stop.
  • Find your ethernet card interface ID using ifconfig, it should be eth(x).
  • Edit Cassandra's configuration cassandra.yaml:

    • Change the cluster name.
    • Add the IP addresses of the seed nodes.
    • Comment out the listen_address.
    • Add the listen interface.
    • Start the RPC service.
    • Set the RPC interface.
    • Set the broadcast RPC address.
    • Set the endpoint snitch.

    By editing the file: sudo vim /etc/cassandra/cassandra.yaml

cluster_name: 'My Cluster'  
seeds: "10.0.0.4,10.0.0.6"

# listen_address:     
listen_interface: eth0 

start_rpc: true  
# rpc_address: 
rpc_interface: eth0  
broadcast_rpc_address: 10.0.0.12

endpoint_snitch: GossipingPropertyFileSnitch  
  • Delete all Cassandra system configurations sudo rm -rf /var/lib/cassandra/data/system/.
  • Start Cassandra sudo service cassandra start, check the nodes using sudo nodetool status. You should see your own node listed under your interface IP rather than localhost.

    Nodetool takes time to bootstrap and find the nodes. If you receive a Java error, then restart Cassandra.

Now Cassandra is installed and is cluster operational.

Deploying duplicate VMs on Azure

Deploying first VM

To deploy a duplicate of the VM, the VM must be generalized, captured into a JSON template, and redeployed as many times as needed.
  • To generalize the VM:
sudo waagent -deprovision+user  

And accept when prompted. This will remove all user related configurations on the machine.

  • Stop the VM.

  • Capture VM into JSON template onto your local machine.

azure vm capture "cassandra-group" -n "cass-tmp" -p "cass-vhd" -t "cass-template.json"  

This will create a JSON file containing your VM settings.

You should have Azure CLI tools installed and must be logged in.

  • Create four new public IP addresses and four new NICs.
azure network public-ip create "cassandra-group" "cass-ip-1"  
azure network public-ip create "cassandra-group" "cass-ip-2"  
azure network public-ip create "cassandra-group" "cass-ip-3"  
azure network public-ip create "cassandra-group" "cass-ip-4"  
azure network nic create "cassandra-group" "cass-nic-1" -k "cass-subnet" -m "cass-vnet" -p "cass-ip-1" -a "10.0.0.4"  
azure network nic create "cassandra-group" "cass-nic-2" -k "cass-subnet" -m "cass-vnet" -p "cass-ip-2" -a "10.0.0.6"  
azure network nic create "cassandra-group" "cass-nic-3" -k "cass-subnet" -m "cass-vnet" -p "cass-ip-3" -a "10.0.0.8"  
azure network nic create "cassandra-group" "cass-nic-4" -k "cass-subnet" -m "cass-vnet" -p "cass-ip-4" -a "10.0.0.10"  
azure network nic list --json | grep "cass-nic"  
  • Deploy your first VM.
azure group deployment create "cassandra-group" -n "cass01" -f "cass-template.json"  

You will be asked for:

  1. VM name "cass01"
  2. Admin username and password
  3. NIC ID for "cass-nic-1" as mentioned in List all your NICs IDs

Deploying three more VMs

The template image created before cannot be used anymore to deploy more of that image due to the osDisk URI, it has already been used. We'll create more templates with different osDisk URIs, and deploy again.

  • Copy the template thrice
cp cass-template.json cass-template2.json  
cp cass-template.json cass-template3.json  
cp cass-template.json cass-template4.json  
  • Edit each new template's osDisk VHD URI https://clixxxxxxxxxxxxx.blob.core.windows.net/vmcontainer1cd54367-xxxx-xxxx-xxxx-xxxxxxxxx/osDisk.xxxxxx-xxxx-xxxx-xxxx-xxxxx..... Just change the word osDisk to osDisk2 or osDisk3 or osDisk4.

  • Deploy three VMs.

azure group deployment create "cassandra-group" -n "cass02" -f "cass-template2.json"  
azure group deployment create "cassandra-group" -n "cass03" -f "cass-template3.json"  
azure group deployment create "cassandra-group" -n "cass04" -f "cass-template4.json"  

Enter the information you are asked for like mentioned before.

Configuring the nodes

SSH to every node, and edit the etc/cassandra/cassandara.yaml as mentioned in Configuring Cassandra for clustering to edit the broadcast_rpc_address to match the IP address given to every node.

broadcast_rpc_address: 10.0.0.4  
broadcast_rpc_address: 10.0.0.6  
broadcast_rpc_address: 10.0.0.8  
broadcast_rpc_address: 10.0.0.10  

Make sure to restart cassandra.

sudo service cassandra restart  

Using any node, check the status of cluster:

sudo nodetool status  

The output should be similar to the following:

Datacenter: datacenter1  
=======================
Status=Up/Down  
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns    Host ID                               Rack
UN  10.0.0.4   223.37 KB  256          ?       211af4a7-eb3b-45de-91d3-225ec2c55ba6  rack1  
UN  10.0.0.6   232.37 KB  256          ?       2663c92f-7b82-4633-b481-fa52023ecdc7  rack1  
UN  10.0.0.8   260.06 KB  256          ?       cb20c567-66e9-402b-9776-90d52d706f76  rack1  
UN  10.0.0.10  180.22 KB  256          ?       e0ffd0b6-11e4-4870-9185-a683018555e5  rack1  


Testing the Cassandra Cluster

To test our Cassandra cluster, we will create a keyspace, a table and a row to one of the nodes, and then we will expect it to be added on the other nodes. Enter Cassandra's command line client:

cqlsh 10.0.0.4  

Note: You may enter the IP address of any of the nodes.

CREATE KEYSPACE test WITH replication = {  
    'class': 'SimpleStrategy',
    'replication_factor': '1'
   };

USE test;

CREATE TABLE users (  
 name text, 
 PRIMARY KEY (name));

INSERT INTO users (name) VALUES ('John');  
SELECT * FROM users;

 name
------
 John

(1 rows)

Now check the others nodes:

cqlsh 10.0.0.8  
USE test;

SELECT * FROM users;

 name
------
 John

(1 rows)

The user exists, the cluster is working!

Load-Balancing the Cassandra Cluster

An internal load balancer that can be only accessed through your other VMs will be created.

The following steps can be done through Azure CLI tools too, but it is less cumbersome using the portal.

  1. Using Browse, look for Load Balancers.
  2. Click on Add. Add a name, choose Internal schema, and make sure to choose the right Resource Group.

    You might need to refresh the list to see the new load balancer.

  3. Click on the new load balancer to browse it.

  4. Select Probes. Click on Add. Choose the settings you prefer according to the level of service you wish to attain. Make sure to choose TCP on port 9042.
  5. Choose Backend Pools. Click on Add. If your virtual machines lie in an availability set, choose it and skip step 6. If not, just press Save and perform step 6.
  6. Using Azure CLI we will add the VMs to the load balancer backend pool. Find the load balancer backend pool ID using azure network lb address-pool list --json, then run the following:

    azure network nic address-pool add -g "cassandra-group" -n "cass-nic-1" -i "<backend-pool-id>" -l "cassandra-ilb"
    azure network nic address-pool add -g "cassandra-group" -n "cass-nic-2" -i "<backend-pool-id>" -l "cassandra-ilb"
    azure network nic address-pool add -g "cassandra-group" -n "cass-nic-3" -i "<backend-pool-id>" -l "cassandra-ilb"
    azure network nic address-pool add -g "cassandra-group" -n "cass-nic-4" -i "<backend-pool-id>" -l "cassandra-ilb"
    

    The VMs are now added to the internal load balancer backend pool.

  7. Select Load Balancing Rules. Click on Add. Choose the protocol TCP for port and backend ports of 9042. Choose the probe you created and the backend pool you created.

Congratulations. You may now use the internal load balancer IP address at port 9042 to load balance traffic to your Cassandra cluster.

Zaid Daba'een

Read more posts by this author.