Getting up and running with Cassandra 1.0 and Ruby on OS X
I’ve been looking at the possibility of using Cassandra for an internal projects, so wanted to get a test system up and running. The latest Cassandra release at the time of writing is 1.0.6, but most of the resources I found referenced earlier versions. So here’s a quick guide to getting set up for Cassandra development with Ruby on OS X (Lion specifically, though it shouldn’t matter too much).
Download and install Cassandra
The latest tarball is linked from the project homepage. I decided to run my local install as my login user rather than root, so I installed into a new directory in my home directory and set up a convenient symlink:
$ mkdir ~/cassandra && cd ~/cassandra
$ tar xzf apache-cassandra-1.0.6-bin.tar.gz
$ ln -s apache-cassandra-1.0.6/bin/ bin
By default, Cassandra stores data in /var/lib/cassandra/. That won’t do when running as my own user, but a few changes to the config fix that problem:
# in ~/cassandra/apache-cassandra-1.0.6/conf/cassandra.yaml
#
# directories where Cassandra should store data on disk.
data_file_directories:
- /Users/MYUSER/cassandra/cassandra/var/data
# commit log
commitlog_directory: /Users/MYUSER/cassandra/var/commitlog
# saved caches
saved_caches_directory: /Users/MYUSER/cassandra/var/saved_caches
The Log4J configuration needs a similar change:
# in ~/cassandra/apache-cassandra-1.0.6/conf/log4j-server.properties
# Edit the next line to point to your logs directory
log4j.appender.R.File=/Users/MYUSER/cassandra/var/log/system.log
At this point I was able to fire up cassandra without errors (note: the -f flag keeps the process in the foreground).
$ ~/cassandra/bin/cassandra -f
...
INFO 16:43:00,979 Node localhost/127.0.0.1 state jump to normal
INFO 16:43:00,981 Bootstrap/Replace/Move completed! Now serving reads.
INFO 16:43:00,982 Will not load MX4J, mx4j-tools.jar is not in the classpath
Install gems
To keep things isolated, I set up a clean gemset with rvm before installing any gems. At the time of writing, the latest release of the cassandra gem didn’t support the 1.0 release of Cassandra, so I installed head from github via bundler.
$ mkdir cassandratest && cd cassandratest
$ echo rvm --create use 1.9.2@cassandratest > .rvmrc
$ source .rvmrc
Using ~/.rvm/gems/ruby-1.9.2-p290 with gemset cassandratest
$ gem install bundler
gem install bundler
Fetching: bundler-1.0.21.gem (100%)
Successfully installed bundler-1.0.21
1 gem installed
## add to Gemfile:
# source 'http://rubygems.org'
# gem "cassandra", :git => "git://github.com/twitter/cassandra.git"
$ bundle install
...
Your bundle is complete! Use `bundle show [gemname]` to see where a bundled gem is installed.
Create a schema
I’m not yet sure what the best long-term strategy (and tooling) for schema management is, so for now I went with setting up a test schema via the CLI. I followed the blog schema described here, with a couple of changes for recent deprecations.
# in cassandratest/schema/schema.rb
create keyspace demo
with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
and strategy_options = [{replication_factor:1}];
use demo;
create column family users
with comparator = UTF8Type
and key_validation_class=UTF8Type
and column_metadata = [
{column_name: full_name, validation_class: UTF8Type}
{column_name: email, validation_class: UTF8Type}
{column_name: state, validation_class: UTF8Type}
{column_name: birth_year, validation_class: LongType}
];
create column family blog_entry
with comparator = TimeUUIDType
and key_validation_class=UTF8Type
and default_validation_class = UTF8Type;
Running this through the CLI got the schema set up:
$ ~/cassandra/bin/cassandra-cli -host localhost -port 9160 -f schema/schema.txt
Connected to: "Test Cluster" on localhost/9160
4d47e380-23e7-11e1-0000-242d50cf1ff6
Waiting for schema agreement...
... schemas agree across the cluster
Authenticated to keyspace: demo
...
...
... schemas agree across the cluster
Insert some data using the Cassandra Ruby library
And so we’re ready to write some data. Here’s a quick script to connect from Ruby and set up some sample data along the lines referenced in the CLI wiki page above:
# in cassandratest/data1.rb
require 'rubygems'
require 'cassandra/1.0'
include SimpleUUID
client = Cassandra.new('demo', '127.0.0.1:9160')
client.insert("users", 'johnsmith', {
'full_name' => 'John Smith',
'email' => 'jsmith@test.com',
'state' => 'CA',
'birth_year' => Cassandra::Long.new(1968).to_s
})
client.insert("users", 'janesmith', {
'full_name' => 'Jane Smith',
'email' => 'janey@test.com',
'state' => 'CA',
'birth_year' => Cassandra::Long.new(1970).to_s
})
client.each_key("users") do |user|
1.upto(20) do |x|
client.insert("blog_entry", user, {
UUID.new => "Blog post #{x}"
})
end
end
Similar thing with CQL
The cassandra gem, and the data insertion code above, interfaces with Cassandra via its Thrift api. Recently, though, there has been a move towards a new, more SQL-like, api called CQL (syntax) which started shipping with version 0.8.
To try out this alternative api I installed the cassandra CQL gem:
## add to Gemfile:
# gem "cassandra-cql"
$ bundle update
...
Your bundle is updated!
Note: in the course of trying out CQL I fired up the cqlsh (CQL shell) which ships with Cassandra. That program required a python cql module to run (more details here). I installed that using the following command (for python 2.7):
$ sudo easy_install-2.7 cql
Then I put together the following script which closely follows the example above. Note that this creates a new keyspace before inserting data—I didn’t see a way to do that using the cassandra gem.
require 'cassandra-cql'
client = CassandraCQL::Database.new('127.0.0.1:9160')
#schema creation
begin
client.execute("drop keyspace demo2")
rescue CassandraCQL::Error::InvalidRequestException
#noop
end
client.execute("create keyspace demo2
with strategy_class='org.apache.cassandra.locator.SimpleStrategy'
and strategy_options:replication_factor=1")
client.execute("use demo2")
client.execute("create columnfamily users (id text primary key, full_name text, email text,
state text, birth_year 'LongType')
with default_validation = 'BytesType'")
client.execute("create columnfamily blog_entry (id text primary key) with comparator = 'TimeUUIDType'")
#data insertion
client.execute("insert into users(id, full_name, email, state, birth_year)
values('johnsmith', 'John Smith', 'jsmith@test.com', 'CA', 1969)")
client.execute("insert into users(id, full_name, email, state, birth_year)
values('janesmith', 'Jane Smith', 'janey@test.com', 'CA', 1970)")
client.execute("select id from users").fetch do |row|
userid = row.to_hash['id']
1.upto(20) do |x|
stamp = CassandraCQL::UUID.new.to_guid
client.execute("insert into blog_entry(id, #{stamp}) values(?, ?)", userid, "Blog post #{x}")
end
end
References
The following posts were useful in piecing together the details for installing and using cassandra and the cassandra gem:
Overview of Cassandra gem syntax
Charles Max Wood’s intro video
For reaching an initial understanding on the Cassandra data model:
WTF is a Supercolumn?
Up and running with Cassandra
DataStax Cassandra documentation
On the move from Thrift to CQL:
Benchmark
Benchmark
Mailing list thread
CQL Getting Started