Contents:
- Introduction
- System Architecture
- Create your first template VM
- Deploying duplicate VMs on Azure
- Configuring the nodes
- Testing the Cassandra Cluster
- Load-Balancing the Cassandra Cluster
Introduction
Apache Cassandra is a highly available and a very scalable NoSQL database. In this article, I'll demonstrate how to deploy a Cassandra Cluster on Microsoft Azure platform.
At the time of writing, due to the fact that there are no articles online about setting up a cluster node for Cassandra v3, I have decided to write this. Also, this article is up to date with Azure new CLI tools and provides a full solution.
This articles assumes you are familiar with Azure, Azure CLI tools and Ubuntu OS.
This article is for deploying a small scale Cassandra Cluster of 2-8 nodes. If it is required to deploy a bigger scale Cassandra Cluster, you should use Chef or Salt Stack to automate the same process.
System Architecture
The architecture consists of a cluster of 4 Cassandra nodes that are load balanced as shown in the figure below:
The subnet used for this example will be maximally 10.0.0.0/24
.
The nodes will have the following static IP addresses:
10.0.0.4
and10.0.0.6
for seeds10.0.0.8
and10.0.0.10
for nodes
The seeds are responsible for broadcasting the available nodes to the other nodes.
Creating the first template VM
- Create a new resource group, I named it "cassandra-group".
- Using the resource manager, add a new Ubuntu Server 14.04 LTS VM. I named it "cass-tmp".
- Set it up under subnet
10.0.0.0/24
. I gave it the static IP address of10.0.0.12
. - Allow inbound traffic and outbound traffic at port
tcp:9042
. - Deploy it.
- SSH to it.
Installing Cassandra 3
Oracle Java 8 and JNI are prerequisites for Cassandra v3.
- Install Oracle Java 8 using :
sudo apt-add-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
# Check that java is properly installed
java -version
You might need to setup JAVA_HOME path variable export
JAVA_HOME=/usr/lib/jvm/java-8-oracle
ifjava -version
failed.
- Install JNA using:
sudo apt-get install libjna-java -y
- Install Cassandra:
This will set up the PPAs for Cassandra and the keys for verification.
echo "deb http://www.apache.org/dist/cassandra/debian 30x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
echo "deb-src http://www.apache.org/dist/cassandra/debian 30x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
gpg --keyserver pgp.mit.edu --recv-keys F758CE318D77295D
gpg --export --armor F758CE318D77295D | sudo apt-key add -
gpg --keyserver pgp.mit.edu --recv-keys 2B5C1B00
gpg --export --armor 2B5C1B00 | sudo apt-key add -
gpg --keyserver pgp.mit.edu --recv-keys 0353B12C
gpg --export --armor 0353B12C | sudo apt-key add -
sudo apt-get update
sudo apt-get install cassandra
- Check if Cassandra is running and discover nodes:
sudo service cassandra status
sudo nodetool status
You should only see one node connected at localhost with status UN. U is for Up and N is for Normal
Configuring Cassandra for clustering
- Stop Cassandra using
sudo service cassandra stop
. - Find your ethernet card interface ID using
ifconfig
, it should beeth(x)
. Edit Cassandra's configuration
cassandra.yaml
:- Change the cluster name.
- Add the IP addresses of the seed nodes.
- Comment out the listen_address.
- Add the listen interface.
- Start the RPC service.
- Set the RPC interface.
- Set the broadcast RPC address.
- Set the endpoint snitch.
By editing the file:
sudo vim /etc/cassandra/cassandra.yaml
cluster_name: 'My Cluster'
seeds: "10.0.0.4,10.0.0.6"
# listen_address:
listen_interface: eth0
start_rpc: true
# rpc_address:
rpc_interface: eth0
broadcast_rpc_address: 10.0.0.12
endpoint_snitch: GossipingPropertyFileSnitch
- Delete all Cassandra system configurations
sudo rm -rf /var/lib/cassandra/data/system/
. - Start Cassandra
sudo service cassandra start
, check the nodes usingsudo nodetool status
. You should see your own node listed under your interface IP rather than localhost.Nodetool takes time to bootstrap and find the nodes. If you receive a Java error, then restart Cassandra.
Deploying duplicate VMs on Azure
Deploying first VM
To deploy a duplicate of the VM, the VM must be generalized, captured into a JSON template, and redeployed as many times as needed.- To generalize the VM:
sudo waagent -deprovision+user
And accept when prompted. This will remove all user related configurations on the machine.
Stop the VM.
Capture VM into JSON template onto your local machine.
azure vm capture "cassandra-group" -n "cass-tmp" -p "cass-vhd" -t "cass-template.json"
This will create a JSON file containing your VM settings.
You should have Azure CLI tools installed and must be logged in.
- Create four new public IP addresses and four new NICs.
azure network public-ip create "cassandra-group" "cass-ip-1"
azure network public-ip create "cassandra-group" "cass-ip-2"
azure network public-ip create "cassandra-group" "cass-ip-3"
azure network public-ip create "cassandra-group" "cass-ip-4"
azure network nic create "cassandra-group" "cass-nic-1" -k "cass-subnet" -m "cass-vnet" -p "cass-ip-1" -a "10.0.0.4"
azure network nic create "cassandra-group" "cass-nic-2" -k "cass-subnet" -m "cass-vnet" -p "cass-ip-2" -a "10.0.0.6"
azure network nic create "cassandra-group" "cass-nic-3" -k "cass-subnet" -m "cass-vnet" -p "cass-ip-3" -a "10.0.0.8"
azure network nic create "cassandra-group" "cass-nic-4" -k "cass-subnet" -m "cass-vnet" -p "cass-ip-4" -a "10.0.0.10"
azure network nic list --json | grep "cass-nic"
- Deploy your first VM.
azure group deployment create "cassandra-group" -n "cass01" -f "cass-template.json"
- VM name "cass01"
- Admin username and password
- NIC ID for "cass-nic-1" as mentioned in List all your NICs IDs
Deploying three more VMs
The template image created before cannot be used anymore to deploy more of that image due to the osDisk URI, it has already been used. We'll create more templates with different osDisk URIs, and deploy again.
- Copy the template thrice
cp cass-template.json cass-template2.json
cp cass-template.json cass-template3.json
cp cass-template.json cass-template4.json
Edit each new template's osDisk VHD URI
https://clixxxxxxxxxxxxx.blob.core.windows.net/vmcontainer1cd54367-xxxx-xxxx-xxxx-xxxxxxxxx/osDisk.xxxxxx-xxxx-xxxx-xxxx-xxxxx....
. Just change the wordosDisk
toosDisk2
orosDisk3
orosDisk4
.Deploy three VMs.
azure group deployment create "cassandra-group" -n "cass02" -f "cass-template2.json"
azure group deployment create "cassandra-group" -n "cass03" -f "cass-template3.json"
azure group deployment create "cassandra-group" -n "cass04" -f "cass-template4.json"
Enter the information you are asked for like mentioned before.
Configuring the nodes
SSH to every node, and edit the etc/cassandra/cassandara.yaml
as mentioned in Configuring Cassandra for clustering to edit the broadcast_rpc_address
to match the IP address given to every node.
broadcast_rpc_address: 10.0.0.4
broadcast_rpc_address: 10.0.0.6
broadcast_rpc_address: 10.0.0.8
broadcast_rpc_address: 10.0.0.10
Make sure to restart cassandra.
sudo service cassandra restart
Using any node, check the status of cluster:
sudo nodetool status
The output should be similar to the following:
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.0.0.4 223.37 KB 256 ? 211af4a7-eb3b-45de-91d3-225ec2c55ba6 rack1
UN 10.0.0.6 232.37 KB 256 ? 2663c92f-7b82-4633-b481-fa52023ecdc7 rack1
UN 10.0.0.8 260.06 KB 256 ? cb20c567-66e9-402b-9776-90d52d706f76 rack1
UN 10.0.0.10 180.22 KB 256 ? e0ffd0b6-11e4-4870-9185-a683018555e5 rack1
Testing the Cassandra Cluster
To test our Cassandra cluster, we will create a keyspace, a table and a row to one of the nodes, and then we will expect it to be added on the other nodes. Enter Cassandra's command line client:
cqlsh 10.0.0.4
Note: You may enter the IP address of any of the nodes.
CREATE KEYSPACE test WITH replication = {
'class': 'SimpleStrategy',
'replication_factor': '1'
};
USE test;
CREATE TABLE users (
name text,
PRIMARY KEY (name));
INSERT INTO users (name) VALUES ('John');
SELECT * FROM users;
name
------
John
(1 rows)
Now check the others nodes:
cqlsh 10.0.0.8
USE test;
SELECT * FROM users;
name
------
John
(1 rows)
The user exists, the cluster is working!
Load-Balancing the Cassandra Cluster
An internal load balancer that can be only accessed through your other VMs will be created.
The following steps can be done through Azure CLI tools too, but it is less cumbersome using the portal.
- Using Browse, look for Load Balancers.
Click on Add. Add a name, choose Internal schema, and make sure to choose the right Resource Group.
You might need to refresh the list to see the new load balancer.
Click on the new load balancer to browse it.
- Select Probes. Click on Add. Choose the settings you prefer according to the level of service you wish to attain. Make sure to choose TCP on port 9042.
- Choose Backend Pools. Click on Add. If your virtual machines lie in an availability set, choose it and skip step 6. If not, just press Save and perform step 6.
Using Azure CLI we will add the VMs to the load balancer backend pool. Find the load balancer backend pool ID using
azure network lb address-pool list --json
, then run the following:azure network nic address-pool add -g "cassandra-group" -n "cass-nic-1" -i "<backend-pool-id>" -l "cassandra-ilb" azure network nic address-pool add -g "cassandra-group" -n "cass-nic-2" -i "<backend-pool-id>" -l "cassandra-ilb" azure network nic address-pool add -g "cassandra-group" -n "cass-nic-3" -i "<backend-pool-id>" -l "cassandra-ilb" azure network nic address-pool add -g "cassandra-group" -n "cass-nic-4" -i "<backend-pool-id>" -l "cassandra-ilb"
The VMs are now added to the internal load balancer backend pool.
- Select Load Balancing Rules. Click on Add. Choose the protocol TCP for port and backend ports of 9042. Choose the probe you created and the backend pool you created.
Congratulations. You may now use the internal load balancer IP address at port 9042 to load balance traffic to your Cassandra cluster.