SHR's Occasional Braindump

Crafting Code and Business in a Beautiful Place

Getting up and running with Cassandra 1.0 and Ruby on OS X

I’ve been looking at the possibility of using Cassandra for an internal projects, so wanted to get a test system up and running. The latest Cassandra release at the time of writing is 1.0.6, but most of the resources I found referenced earlier versions. So here’s a quick guide to getting set up for Cassandra development with Ruby on OS X (Lion specifically, though it shouldn’t matter too much).

Download and install Cassandra

The latest tarball is linked from the project homepage. I decided to run my local install as my login user rather than root, so I installed into a new directory in my home directory and set up a convenient symlink:

$ mkdir ~/cassandra && cd ~/cassandra
$ tar xzf apache-cassandra-1.0.6-bin.tar.gz
$ ln -s apache-cassandra-1.0.6/bin/ bin

By default, Cassandra stores data in /var/lib/cassandra/. That won’t do when running as my own user, but a few changes to the config fix that problem:

# in ~/cassandra/apache-cassandra-1.0.6/conf/cassandra.yaml
#
# directories where Cassandra should store data on disk.
data_file_directories:
    - /Users/MYUSER/cassandra/cassandra/var/data
# commit log
commitlog_directory: /Users/MYUSER/cassandra/var/commitlog
# saved caches
saved_caches_directory: /Users/MYUSER/cassandra/var/saved_caches

The Log4J configuration needs a similar change:

# in ~/cassandra/apache-cassandra-1.0.6/conf/log4j-server.properties
# Edit the next line to point to your logs directory
log4j.appender.R.File=/Users/MYUSER/cassandra/var/log/system.log

At this point I was able to fire up cassandra without errors (note: the -f flag keeps the process in the foreground).

$ ~/cassandra/bin/cassandra -f
...
INFO 16:43:00,979 Node localhost/127.0.0.1 state jump to normal
INFO 16:43:00,981 Bootstrap/Replace/Move completed! Now serving reads.
INFO 16:43:00,982 Will not load MX4J, mx4j-tools.jar is not in the classpath

Install gems

To keep things isolated, I set up a clean gemset with rvm before installing any gems. At the time of writing, the latest release of the cassandra gem didn’t support the 1.0 release of Cassandra, so I installed head from github via bundler.

$ mkdir cassandratest && cd cassandratest
$ echo rvm --create use 1.9.2@cassandratest > .rvmrc
$ source .rvmrc
Using ~/.rvm/gems/ruby-1.9.2-p290 with gemset cassandratest
$ gem install bundler
gem install bundler
Fetching: bundler-1.0.21.gem (100%)
Successfully installed bundler-1.0.21
1 gem installed

## add to Gemfile:
# source 'http://rubygems.org'
# gem "cassandra", :git => "git://github.com/twitter/cassandra.git"

$ bundle install
...
Your bundle is complete! Use `bundle show [gemname]` to see where a bundled gem is installed.

Create a schema

I’m not yet sure what the best long-term strategy (and tooling) for schema management is, so for now I went with setting up a test schema via the CLI. I followed the blog schema described here, with a couple of changes for recent deprecations.

# in cassandratest/schema/schema.rb
create keyspace demo
	with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
	and strategy_options = [{replication_factor:1}];
use demo;

create column family users
	with comparator = UTF8Type
	and key_validation_class=UTF8Type
	and column_metadata = [
		{column_name: full_name, validation_class: UTF8Type}
		{column_name: email, validation_class: UTF8Type}
		{column_name: state, validation_class: UTF8Type}
		{column_name: birth_year, validation_class: LongType}
	];

create column family blog_entry
	with comparator = TimeUUIDType
	and key_validation_class=UTF8Type
	and default_validation_class = UTF8Type;

Running this through the CLI got the schema set up:

$ ~/cassandra/bin/cassandra-cli  -host localhost -port 9160 -f schema/schema.txt
Connected to: "Test Cluster" on localhost/9160
4d47e380-23e7-11e1-0000-242d50cf1ff6
Waiting for schema agreement...
... schemas agree across the cluster
Authenticated to keyspace: demo
...
...
... schemas agree across the cluster

Insert some data using the Cassandra Ruby library

And so we’re ready to write some data. Here’s a quick script to connect from Ruby and set up some sample data along the lines referenced in the CLI wiki page above:

# in cassandratest/data1.rb
require 'rubygems'
require 'cassandra/1.0'
include SimpleUUID

client = Cassandra.new('demo', '127.0.0.1:9160')

client.insert("users", 'johnsmith', {
  'full_name' => 'John Smith',
  'email' => 'jsmith@test.com',
  'state' => 'CA',
  'birth_year' => Cassandra::Long.new(1968).to_s
})

client.insert("users", 'janesmith', {
  'full_name' => 'Jane Smith',
  'email' => 'janey@test.com',
  'state' => 'CA',
  'birth_year' => Cassandra::Long.new(1970).to_s
})

client.each_key("users") do |user|
  1.upto(20) do |x|
    client.insert("blog_entry", user, {
      UUID.new => "Blog post #{x}"
    })
  end
end

Similar thing with CQL

The cassandra gem, and the data insertion code above, interfaces with Cassandra via its Thrift api. Recently, though, there has been a move towards a new, more SQL-like, api called CQL (syntax) which started shipping with version 0.8.

To try out this alternative api I installed the cassandra CQL gem:

## add to Gemfile:
# gem "cassandra-cql"
$ bundle update
...
Your bundle is updated!

Note: in the course of trying out CQL I fired up the cqlsh (CQL shell) which ships with Cassandra. That program required a python cql module to run (more details here). I installed that using the following command (for python 2.7):

$ sudo easy_install-2.7 cql

Then I put together the following script which closely follows the example above. Note that this creates a new keyspace before inserting data—I didn’t see a way to do that using the cassandra gem.

require 'cassandra-cql'

client = CassandraCQL::Database.new('127.0.0.1:9160')

#schema creation
begin
  client.execute("drop keyspace demo2")
rescue CassandraCQL::Error::InvalidRequestException
  #noop
end
client.execute("create keyspace demo2
                with strategy_class='org.apache.cassandra.locator.SimpleStrategy'
                and strategy_options:replication_factor=1")
client.execute("use demo2")

client.execute("create columnfamily users (id text primary key, full_name text, email text,
                  state text, birth_year 'LongType')
                  with default_validation = 'BytesType'")

client.execute("create columnfamily blog_entry (id text primary key)  with comparator = 'TimeUUIDType'")

#data insertion
client.execute("insert into users(id, full_name, email, state, birth_year)
                  values('johnsmith', 'John Smith', 'jsmith@test.com', 'CA', 1969)")
client.execute("insert into users(id, full_name, email, state, birth_year)
                  values('janesmith', 'Jane Smith', 'janey@test.com', 'CA', 1970)")

client.execute("select id from users").fetch do |row|
  userid = row.to_hash['id']
  1.upto(20) do |x|
    stamp =  CassandraCQL::UUID.new.to_guid
    client.execute("insert into blog_entry(id, #{stamp}) values(?, ?)", userid, "Blog post #{x}")
  end
end

References

The following posts were useful in piecing together the details for installing and using cassandra and the cassandra gem:

Overview of Cassandra gem syntax
Charles Max Wood’s intro video

For reaching an initial understanding on the Cassandra data model:

WTF is a Supercolumn?
Up and running with Cassandra
DataStax Cassandra documentation

On the move from Thrift to CQL:

Benchmark
Benchmark
Mailing list thread
CQL Getting Started

blog comments powered by Disqus