Introducing Cassandra

Cassandra Logo

Lately I’ve been trying out Cassandra @ work. Just recently I took a webinar that DataStax hosted (the commercial company behind Cassandra), and conducted by Tim Berglund (@tlberglund). The webinars introduce Cassandra for developers and operations. A great way to get started understanding what Cassandra does, and doesn’t do. (Tim’s training videos on O’Reilly are excellent by the way).

Cassandra is a schema-less, scalable, distributed database. There’s actually more to it than that, but the list of it’s capabilities is rather long :-). Surprisingly, it’s also relatively easy to setup. I found the setup process simpler than earilier versions of MySQL, yet you get far more from a brief configuration.

Single node setup

Setting up a single node of Cassandra is straight forward, but it is handy to have some notes in one place, since there may be some additional files you need to download if you want some of the features provided by the OpsCenter package that DataStax makes available for monitoring your Cassandra cluster. More about that later.

First, decide if you want the Apache version (no OpsCenter available with this version) or the DataStax Community version.

Cassandra runs on the JVM, so make sure you have a version of the Java runtime environment. I’ve tested with both Oracle Java 6 & 7. The OpenJDK is not recommended. One caveat about Java 7 below.

I’ll be using the DataStax Community edition, which at this time is v1.1.0.

Once you’ve downloaded the version for your OS, go ahead and install it. I’m using the tarball: dsc-cassandra-1.1.0-bin.tar.gz

Configuration
Edit your conf/cassandra.yaml file.

initial_token
You can set this to 0 for a single node, but read the notes below if you setup a cluster.

directories
Decide where you want your data files, commit log, and cache to be saved to. Needs to be a path you have permissions to read/write.

seeds, listen_address, rpc_address
You can leave these as the defaults, but you’ll want to change these for cluster configurations.

Java 7
Depending on the version of Cassandra you are running with Java 7, the initial amount of memory set for the stack space appears to be too small. (it works fine with v1.1.0). However, if you run into an error from the JVM about memory, change the following line in conf/cassandra-env.sh (near line 153):
JVM_OPTS=”$JVM_OPTS -Xss128k”
and change it to
JVM_OPTS=”$JVM_OPTS -Xss160k”

At this point, you can go ahead and try running Cassandra:
bin/cassandra -f

This will run it in the foreground, allowing you to see any errors.
If you scroll through the output, you will see two items not available:
…
JNA not found. Native methods will be disabled.
…
Will not load MX4J, mx4j-tools.jar is not in the classpath

These features (JNA and mx4j) can be downloaded and installed to the cassandra/lib path by getting them from:
http://sourceforge.net/projects/mx4j/files/
https://github.com/twall/jna

You need the mx4j-tools.jar from the mx4j project.
You need the jna.jar and platform.jar for JNA support.
Once you have the jar files copied, stop and then restart Cassandra.

Assuming no errors, at this point you have a working Cassandra node. You can go ahead and create a keyspace (database), and column families (tables).

Cluster setup

This is mostly a repeat of the single node setup. Install and configure Cassandra on your other nodes, but this time you will be filling in the config section for ‘seeds’ by adding a few of the IPs from the other nodes. This allows the nodes to start talking to each other, and learn the topology of the network. You don’t need to include all the other nodes, just enough for the cluster to start talking to itself.

initial_token
You really want to set the inital_token for each node you are installing to. As noted in the conf file, poorly chosen tokens will lead to hotspots for your data. There is a site available for generating tokens depending on the number of nodes you have here.

seeds
As mentioned above, you will want to add some of the IP addresses of the other nodes (even if those nodes are simply running in a virtual machine). Modify this line, and make sure the list of IPs is within the quotes:
seeds: “192.168.10.100, 192.168.10.101, 192.168.10.103”

listen_address:
Set this to the local host IP address (the address that you will be configuring some of the other nodes to talk to).

rpc_address:
I set this to the same IP as I’m using for the listen_address - the local host IP.

Ready
At this point, the node is ready to become part of a cluster. You will need to perform all of the single node and cluster setup as described above on each node that you want as part of the cluster. Go ahead and start up your Cassandra instances.

OpsCenter

Take a look at this: opscenter

Download the OpsCenter. This is pretty cool. The OpsCenter is your dashboard, allowing you to setup, modify, observe and maintain your Cassandra cluster.

OpsCenter setup

There are two parts to the OpsCenter, as far as configuration goes:
The OpsCenter itself
The agent that sends data to the OpsCenter

You only need the OpsCenter running on one server, but you need the agent running on each node, so that it can feed information to the OpsCenter.

conf/opscenterd.conf
Set the interface value to your local host IP

I also turned off ssl, since I’m just setting this up as a test cluster using several virtual machines, by adding this under [agents]:

[agents]
use_ssl = false

agent/conf/address.yaml
You can create this by running the bin/setup program, but for a simple entry, you can just create it yourself. One difference here, will be the ‘stomp_interface’ - this is the IP address of the server where you want to run the OpsCenter. The agents on all nodes should be using the same OpsCenter IP address to talk to. Also note that here also, I’ve turned off ssl.

stomp_interface: “192.168.10.100”
use_ssl: 0

You will need to setup the agent configuration on each node.

Then run the agent:
agent/bin/opscenter-agent -f

To turn on the OpsCenter:
bin/opscenter -f

Then use your browser to connect to the IP address that you configured OpsCenter to use, via port 8888.

If all has gone well, at this point you have Cassandra and OpsCenter up and running, and you can see your cluster. Time to start creating keyspaces (databases) and column families (tables). Then look into CQL :-).

I also recommend the #cassandra channel on freenode for questions, and the documentation on the DataStax site is extensive.

Hope this helps.

Devslant Obviously Obfuscated Originality

Introducing Cassandra