SHR's Occasional Braindump

Crafting Code and Business in a Beautiful Place

Getting up and running with Cassandra 1.0 and Ruby on OS X

I’ve been looking at the possibility of using Cassandra for an internal projects, so wanted to get a test system up and running. The latest Cassandra release at the time of writing is 1.0.6, but most of the resources I found referenced earlier versions. So here’s a quick guide to getting set up for Cassandra development with Ruby on OS X (Lion specifically, though it shouldn’t matter too much).

Download and install Cassandra

The latest tarball is linked from the project homepage. I decided to run my local install as my login user rather than root, so I installed into a new directory in my home directory and set up a convenient symlink:

$ mkdir ~/cassandra && cd ~/cassandra
$ tar xzf apache-cassandra-1.0.6-bin.tar.gz
$ ln -s apache-cassandra-1.0.6/bin/ bin

By default, Cassandra stores data in /var/lib/cassandra/. That won’t do when running as my own user, but a few changes to the config fix that problem:

# in ~/cassandra/apache-cassandra-1.0.6/conf/cassandra.yaml
# directories where Cassandra should store data on disk.
    - /Users/MYUSER/cassandra/cassandra/var/data
# commit log
commitlog_directory: /Users/MYUSER/cassandra/var/commitlog
# saved caches
saved_caches_directory: /Users/MYUSER/cassandra/var/saved_caches

The Log4J configuration needs a similar change:

# in ~/cassandra/apache-cassandra-1.0.6/conf/
# Edit the next line to point to your logs directory

At this point I was able to fire up cassandra without errors (note: the -f flag keeps the process in the foreground).

$ ~/cassandra/bin/cassandra -f
INFO 16:43:00,979 Node localhost/ state jump to normal
INFO 16:43:00,981 Bootstrap/Replace/Move completed! Now serving reads.
INFO 16:43:00,982 Will not load MX4J, mx4j-tools.jar is not in the classpath

Install gems

To keep things isolated, I set up a clean gemset with rvm before installing any gems. At the time of writing, the latest release of the cassandra gem didn’t support the 1.0 release of Cassandra, so I installed head from github via bundler.

$ mkdir cassandratest && cd cassandratest
$ echo rvm --create use 1.9.2@cassandratest > .rvmrc
$ source .rvmrc
Using ~/.rvm/gems/ruby-1.9.2-p290 with gemset cassandratest
$ gem install bundler
gem install bundler
Fetching: bundler-1.0.21.gem (100%)
Successfully installed bundler-1.0.21
1 gem installed

## add to Gemfile:
# source ''
# gem "cassandra", :git => "git://"

$ bundle install
Your bundle is complete! Use `bundle show [gemname]` to see where a bundled gem is installed.

Create a schema

I’m not yet sure what the best long-term strategy (and tooling) for schema management is, so for now I went with setting up a test schema via the CLI. I followed the blog schema described here, with a couple of changes for recent deprecations.

# in cassandratest/schema/schema.rb
create keyspace demo
	with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
	and strategy_options = [{replication_factor:1}];
use demo;

create column family users
	with comparator = UTF8Type
	and key_validation_class=UTF8Type
	and column_metadata = [
		{column_name: full_name, validation_class: UTF8Type}
		{column_name: email, validation_class: UTF8Type}
		{column_name: state, validation_class: UTF8Type}
		{column_name: birth_year, validation_class: LongType}

create column family blog_entry
	with comparator = TimeUUIDType
	and key_validation_class=UTF8Type
	and default_validation_class = UTF8Type;

Running this through the CLI got the schema set up:

$ ~/cassandra/bin/cassandra-cli  -host localhost -port 9160 -f schema/schema.txt
Connected to: "Test Cluster" on localhost/9160
Waiting for schema agreement...
... schemas agree across the cluster
Authenticated to keyspace: demo
... schemas agree across the cluster

Insert some data using the Cassandra Ruby library

And so we’re ready to write some data. Here’s a quick script to connect from Ruby and set up some sample data along the lines referenced in the CLI wiki page above:

# in cassandratest/data1.rb
require 'rubygems'
require 'cassandra/1.0'
include SimpleUUID

client ='demo', '')

client.insert("users", 'johnsmith', {
  'full_name' => 'John Smith',
  'email' => '',
  'state' => 'CA',
  'birth_year' =>

client.insert("users", 'janesmith', {
  'full_name' => 'Jane Smith',
  'email' => '',
  'state' => 'CA',
  'birth_year' =>

client.each_key("users") do |user|
  1.upto(20) do |x|
    client.insert("blog_entry", user, { => "Blog post #{x}"

Similar thing with CQL

The cassandra gem, and the data insertion code above, interfaces with Cassandra via its Thrift api. Recently, though, there has been a move towards a new, more SQL-like, api called CQL (syntax) which started shipping with version 0.8.

To try out this alternative api I installed the cassandra CQL gem:

## add to Gemfile:
# gem "cassandra-cql"
$ bundle update
Your bundle is updated!

Note: in the course of trying out CQL I fired up the cqlsh (CQL shell) which ships with Cassandra. That program required a python cql module to run (more details here). I installed that using the following command (for python 2.7):

$ sudo easy_install-2.7 cql

Then I put together the following script which closely follows the example above. Note that this creates a new keyspace before inserting data—I didn’t see a way to do that using the cassandra gem.

require 'cassandra-cql'

client ='')

#schema creation
  client.execute("drop keyspace demo2")
rescue CassandraCQL::Error::InvalidRequestException
client.execute("create keyspace demo2
                with strategy_class='org.apache.cassandra.locator.SimpleStrategy'
                and strategy_options:replication_factor=1")
client.execute("use demo2")

client.execute("create columnfamily users (id text primary key, full_name text, email text,
                  state text, birth_year 'LongType')
                  with default_validation = 'BytesType'")

client.execute("create columnfamily blog_entry (id text primary key)  with comparator = 'TimeUUIDType'")

#data insertion
client.execute("insert into users(id, full_name, email, state, birth_year)
                  values('johnsmith', 'John Smith', '', 'CA', 1969)")
client.execute("insert into users(id, full_name, email, state, birth_year)
                  values('janesmith', 'Jane Smith', '', 'CA', 1970)")

client.execute("select id from users").fetch do |row|
  userid = row.to_hash['id']
  1.upto(20) do |x|
    stamp =
    client.execute("insert into blog_entry(id, #{stamp}) values(?, ?)", userid, "Blog post #{x}")


The following posts were useful in piecing together the details for installing and using cassandra and the cassandra gem:

Overview of Cassandra gem syntax
Charles Max Wood’s intro video

For reaching an initial understanding on the Cassandra data model:

WTF is a Supercolumn?
Up and running with Cassandra
DataStax Cassandra documentation

On the move from Thrift to CQL:

Mailing list thread
CQL Getting Started

blog comments powered by Disqus