up and running with Cassandra

Posted on March 21, 2010. Filed under: Java, Linux, MySQL, PHP, Services | Tags: , , |

Cassandra is a hybrid non-relational database in the same class as Google’s BigTable. It is more featureful than a key/value store like Dynomite, but supports fewer query types than a document store like MongoDB.

Cassandra was started by Facebook and later transferred to the open-source community. It is an ideal runtime database for web-scale domains like social networks.

This post is both a tutorial and a “getting started” overview. You will learn about Cassandra’s features, data model, API, and operational requirements—everything you need to know to deploy a Cassandra-backed service.

Jan 8, 2010: post updated for Cassandra gem 0.7 and Cassandra version 0.5.


There are a number of reasons to choose Cassandra for your website. Compared to other databases, three big features stand out:

  • Flexible schema: with Cassandra, like a document store, you don’t have to decide what fields you need in your records ahead of time. You can add and remove arbitrary fields on the fly. This is an incredible productivity boost, especially in large deployments.
  • True scalability: Cassandra scales horizontally in the purest sense. To add more capacity to a cluster, turn on another machine. You don’t have restart any processes, change your application queries, or manually relocate any data.
  • Multi-datacenter awareness: you can adjust your node layout to ensure that if one datacenter burns in a fire, an alternative datacenter will have at least one full copy of every record.

Some other features that help put Cassandra above the competition :

  • Range queries: unlike most key/value stores, you can query for ordered ranges of keys.
  • List datastructures: super columns add a 5th dimension to the hybrid model, turning columns into lists. This is very handy for things like per-user indexes.
  • Distributed writes: you can read and write any data to anywhere in the cluster at any time. There is never any single point of failure.


You need a Unix system. If you are using Mac OS 10.5, all you need is Git. Otherwise, you need to install Java 1.6, Git 1.6, Ruby, and Rubygems in some reasonable way.

Start a terminal and run:

sudo gem install cassandra

If you are using Mac OS, you need to export the following environment variables:

export JAVA_HOME="/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home"
export PATH="/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin:$PATH"

Now you can build and start a test server with cassandra_helper:

cassandra_helper cassandra

It runs!

live demo

The above script boots the server with a schema that we can interact with. Open another terminal window and start irb, the Ruby shell:


In the irb prompt, require the library:

require 'rubygems'
require 'cassandra'
include Cassandra::Constants

Now instantiate a client object:

twitter ='Twitter')

Let’s insert a few things:

user = {'screen_name' => 'buttonscat'}
twitter.insert(:Users, '5', user)  

tweet1 = {'text' => 'Nom nom nom nom nom.', 'user_id' => '5'}
twitter.insert(:Statuses, '1', tweet1)

tweet2 = {'text' => '@evan Zzzz....', 'user_id' => '5', 'reply_to_id' => '8'}
twitter.insert(:Statuses, '2', tweet2)

Notice that the two status records do not have all the same columns. Let’s go ahead and connect them to our user record:

twitter.insert(:UserRelationships, '5', {'user_timeline' => { => '1'}})
twitter.insert(:UserRelationships, '5', {'user_timeline' => { => '2'}})

The call creates a collation key based on the current time; our tweet ids are stored in the values.

Now we can query our user’s tweets:

timeline = twitter.get(:UserRelationships, '5', 'user_timeline', :reversed => true) { |time, id| twitter.get(:Statuses, id, 'text') }
# => ["@evan Zzzz....", "Nom nom nom nom nom."]

Two tweet bodies, returned in recency order—not bad at all. In a similar fashion, each time a user tweets, we could loop through their followers and insert the status key into their follower’s home_timeline relationship, for handling general status delivery.

the data model

Cassandra is best thought of as a 4 or 5 dimensional hash. The usual way to refer to a piece of data is as follows: a keyspace, a column family, a key, an optional super column, and a column. At the end of that chain lies a single, lonely value.

Let’s break down what these layers mean.

  • Keyspace (also confusingly called “table”): the outer-most level of organization. This is usually the name of the application. For example, 'Twitter' and 'Wordpress' are both good keyspaces. Keyspaces must be defined at startup in the storage-conf.xml file.
  • Column family: a slice of data corresponding to a particular key. Each column family is stored in a separate file on disk, so it can be useful to put frequently accessed data in one column family, and rarely accessed data in another. Some good column family names might be :Posts, :Users and :UserAudits. Column families must be defined at startup.
  • Key: the permanent name of the record. You can query over ranges of keys in a column family, like :start => '10050', :finish => '10070'—this is the only index Cassandra provides for free. Keys are defined on the fly.

After the column family level, the organization can diverge—this is a feature unique to Cassandra. You can choose either:

  • A column: this is a tuple with a name and a value. Good columns might be 'screen_name' => 'lisa4718' or 'Google' => ''.It is common to not specify a particular column name when requesting a key; the response will then be an ordered hash of all columns. For example, querying for (:Users, '174927') might return:
    {'name' => 'Lisa Jones',
     'gender' => 'f',
     'screen_name' => 'lisa4718'}

    In this case, name, gender, and screen_name are all column names. Columns are defined on the fly, and different records can have different sets of column names, even in the same keyspace and column family. This lets you use the column name itself as either structure or data. Columns can be stored in recency order, or alphabetical by name, and all columns keep a timestamp.

  • A super column: this is a named list. It contains standard columns, stored in recency order.Say Lisa Jones has bookmarks in several categories. Querying (:UserBookmarks, '174927') might return:
    {'work' => {
        'Google' => '',
        'IBM' => ''},
     'todo': {...},
     'cooking': {...}}

    Here, work, todo, and cooking are all super column names. They are defined on the fly, and there can be any number of them per row. :UserBookmarks is the name of the super column family. Super columns are stored in alphabetical order, with their sub columns physically adjacent on the disk.

Super columns and standard columns cannot be mixed at the same (4th) level of dimensionality. You must define at startup which column families contain standard columns, and which contain super columns with standard columns inside them.

Super columns are a great way to store one-to-many indexes to other records: make the sub column names TimeUUIDs (or whatever you’d like to use to sort the index), and have the values be the foreign key. We saw an example of this strategy in the demo, above.

If this is confusing, don’t worry. We’ll now look at two example schemas in depth.

twitter schema

Here is the schema definition we used for the demo, above. It is based on Eric Florenzano’s Twissandra:

<Keyspace Name="Twitter">
  <ColumnFamily CompareWith="UTF8Type" Name="Statuses" />
  <ColumnFamily CompareWith="UTF8Type" Name="StatusAudits" />
  <ColumnFamily CompareWith="UTF8Type" Name="StatusRelationships"
    CompareSubcolumnsWith="TimeUUIDType" ColumnType="Super" />
  <ColumnFamily CompareWith="UTF8Type" Name="Users" />
  <ColumnFamily CompareWith="UTF8Type" Name="UserRelationships"
    CompareSubcolumnsWith="TimeUUIDType" ColumnType="Super" />

What could be in StatusRelationships? Maybe a list of users who favorited the tweet? Having a super column family for both record types lets us index each direction of whatever many-to-many relationships we come up with.

Here’s how the data is organized:

Click to enlarge

Cassandra lets you distribute the keys across the cluster either randomly, or in order, via the Partitioner option in the storage-conf.xml file.

For the Twitter application, if we were using the order-preserving partitioner, all recent statuses would be stored on the same node. This would cause hotspots. Instead, we should use the random partitioner.

Alternatively, we could preface the status keys with the user key, which has less temporal locality. If we used user_id:status_id as the status key, we could do range queries on the user fragment to get tweets-by-user, avoiding the need for a user_timeline super column.

multi-blog schema

Here’s a another schema, suggested to me by Jonathan Ellis, the primary Cassandra maintainer. It’s for a multi-tenancy blog platform:

<Keyspace Name="Multiblog">
  <ColumnFamily CompareWith="TimeUUIDType" Name="Blogs" />
  <ColumnFamily CompareWith="TimeUUIDType" Name="Comments"/>

Imagine we have a blog named ‘The Cutest Kittens’. We will insert a row when the first post is made as follows:

require 'rubygems'
require 'cassandra'
include Cassandra::Constants

multiblog ='Multiblog')

multiblog.insert(:Blogs, 'The Cutest Kittens',
  { =>
    '{"title":"Say Hello to Buttons Cat","body":"Buttons is a cute cat."}' }) generates a unique, sortable column name, and the JSON hash contains the post details. Let’s insert another:

multiblog.insert(:Blogs, 'The Cutest Kittens',
  { =>
    '{"title":"Introducing Commie Cat","body":"Commie is also a cute cat"}' })

Now we can find the latest post with the following query:

post = multiblog.get(:Blogs, 'The Cutest Kittens', :reversed => true).to_a.first

On our website, we can build links based on the readable representation of the UUID:

guid = post.first.to_guid
# => "b06e80b0-8c61-11de-8287-c1fa647fd821"

If the user clicks this string in a permalink, our app can find the post directly via:

multiblog.get(:Blogs, 'The Cutest Kittens', :start =>, :count => 1)

For comments, we’ll use the post UUID as the outermost key:

multiblog.insert(:Comments, guid,
  { => 'I like this cat. - Evan'})
multiblog.insert(:Comments, guid,
  { => 'I am cuter. - Buttons'})

Now we can get all comments (oldest first) for a post by calling:

multiblog.get(:Comments, guid)

We could paginate them by passing :start with a UUID. See this presentation to learn more about token-based pagination.

We have sidestepped two problems with this data model: we don’t have to maintain separate indexes for any lookups, and the posts and comments are stored in separate files, where they don’t cause as much write contention. Note that we didn’t need to use any super columns, either.

storage layout and api comparison

The storage strategy for Cassandra’s standard model is the same as BigTable’s. Here’s a comparison chart:

multi-file per-file intra-file
Relational server database table* primary key column value
BigTable cluster table column family key column name column value
Cassandra, standard model cluster keyspace column family key column name column value
Cassandra, super column model cluster keyspace column family key super column name column name column value

* With fixed column names.

Column families are stored in column-major order, which is why people call BigTable a column-oriented database. This is not the same as a column-oriented OLAP database like Sybase IQ—it depends on what you use the column names for.

Click to enlarge

In row-orientation, the column names are the structure, and you think of the column families as containing keys. This is the convention in relational databases.

Click to enlarge

In column-orientation, the column names are the data, and the column families are the structure. You think of the key as containing the column family, which is the convention in BigTable. (In Cassandra, super columns are also stored in column-major order—all the sub columns are together.)

In Cassandra’s Ruby API, parameters are expressed in storage order, for clarity:

Relational SELECT `column` FROM `database`.`table` WHERE `id` = key;
BigTable table.get(key, "column_family:column")
Cassandra: standard model keyspace.get("column_family", key, "column")
Cassandra: super column model keyspace.get("column_family", key, "super_column", "column")

Note that Cassandra’s internal Thrift interface mimics BigTable in some ways, but this is being changed.

going to production

Cassandra is an alpha product and could, theoretically, lose your data. In particular, if you change the schema specified in the storage-conf.xml file, you must follow these instructions carefully, or corruption will occur (this is going to be fixed). Also, the on-disk storage format is expected to change in version 0.4.0. After that things will be a bit more stable.

The biggest deployment is at Facebook, where hundreds of terabytes of token indexes are kept in about a hundred Cassandra nodes. However, their use case allows the data to be rebuilt if something goes wrong. Currently there are no known deployments of non-transient data. Proceed carefully, keep a backup in an unrelated storage engine…and submit patches if things go wrong.

That aside, here is a guide for deploying a production cluster:

  • Hardware: get a handful of commodity Linux servers. 16GB memory is good; Cassandra likes a big filesystem buffer. You don’t need RAID. If you put the commitlog file and the data files on separate physical disks, things will go faster. Don’t use EC2 or friends except for testing; the virtualized I/O is too slow.
  • Configuration: in the storage-conf.xml schema file, set the replication factor to 3. List the IP address of one of the nodes as the seed. Set the listen address to the empty string, so the hosts will resolve their own IPs. Now, adjust the contents of for your various paths and JVM options—for a 16GB node, set the JVM heap to 4GB.
  • Deployment: build a package of Cassandra itself and your configuration files, and deliver it to all your servers (I use Capistrano for this). Start the servers by setting CASSANDRA_INCLUDE in the environment to point to your file, and run bin/cassandra. At this point, you should see join notices in the Cassandra logs:
    Cassandra starting up...
    Node has now joined.
    Node has now joined.

    Congratulations! You have a cluster. Don’t forget to turn off debug logging in the file.

  • Visibility: you can get a little more information about your cluster via the tool bin/nodeprobe, included:
    $ bin/nodeprobe --host ring
    Token(124007023942663924846758258675932114665)  3  |<--|
    Token(106858063638814585506848525974047690568)  3  |   ^
    Token(141130545721235451315477340120224986045)  3  |-->|

    Cassandra also exposes various statistics over JMX.

Note that your client machines (not servers!) must have accurate clocks for Cassandra to resolve write conflicts properly. Use NTP.


There is a misperception that if someone advocates a non-relational database, they either don’t understand SQL optimization, or they are generally a hater. This is not the case.

It is reasonable to seek a new tool for a new problem, and database problems have changed with the rise of web-scale distributed systems. This does not mean that SQL as a general-purpose runtime and reporting tool is going away. However, at web-scale, it is more flexible to separate the concerns. Runtime object lookups can be handled by a low-latency, strict, self-managed system like Cassandra. Asynchronous analytics and reporting can be handled by a high-latency, flexible, un-managed system like Hadoop. And in neither case does SQL lend itself to sharding.

I think that Cassandra is the most promising current implementation of a runtime distributed database, but much work remains to be done. We’re beginning to use Cassandra at Twitter, and here’s what I would like to happen real-soon-now:

  • Interface cleanup: the Thrift API for Cassandra is incomplete and inconsistent, which makes writing clients very irritating.
  • Online migrations: restarting the cluster 3 times to add a column family is silly.
  • ActiveModel or DataMapper adapter: for interaction with business objects in Ruby.
    Done! Michael Koziarski on the Rails core team wrote an ActiveModel adapter.
  • Scala client: for interoperability with JVM middleware.

Go ahead and jump on any of those projects—it’s a chance to get in on the ground floor.

Cassandra has excellent performance. There some benchmark results for version 0.5 at the end of the Yahoo performance study.

further resources

Read Full Post | Make a Comment ( 1 so far )

Useful Linux commands: Screen, ttyload, mytop, mtop

Posted on October 23, 2009. Filed under: Linux, MySQL, Shell | Tags: , , , , , |

1. Screen

Screen is a full-screen window manager that multiplexes a physical terminal between several processes (typically interactive shells). The same way tabbed browsing revolutionized the web experience, GNU Screen can do the same for your experience in the command line. Instead of opening up several terminal instances on your desktop or using those ugly GNOME/KDE-based tabs, Screen can do it better and simpler. Not only that, with GNU Screen, you can share sessions with others and detach/attach terminal sessions. It is a great tool for people who have to share working environments between work and home


2. ttyload

ttyload is a little *NIX utility I wrote which is meant to give a color-coded graph of load averages over time.


3. mtop/mkill

mtop (MySQL top) monitors a MySQL server showing the queries which are taking the most amount of time to complete. Features include ‘zooming’ in on a process to show the complete query, ‘explaining’ the query optimizer information for a query and ‘killing’ queries. In addition, server performance statistics, configuration information, and tuning tips are provided.

mkill (MySQL kill) monitors a MySQL server for long running queries and kills them after a specified time interval. Queries can be selected based on regexes on the user, host, command, database, state and query.


4. mytop

mytop is a console-based (non-gui) tool for monitoring the threads and overall performance of a MySQL 3.22.x, 3.23.x, and 4.x server. It runs on most Unix systems (including Mac OS X) which have Perl, DBI, and Term::ReadKey installed. And with Term::ANSIColor installed you even get color.


Read Full Post | Make a Comment ( None so far )

Python Trouble Shooting

Posted on August 10, 2008. Filed under: MySQL, Python, Windows | Tags: , |

1. If you get data from mysql

After the mysql_connection.execute(“some sql”), then get the result by line =mysql_connection.fetchone(), and the results are listed by line[0], line[1], line[2] …..(not the “key”->”value” pair)

2. in wxPython, the toolbar should add toolbar.Realize()

If there is no Realize() at the end of the toolbar show segment, the button will not show

toolbar = self.CreateToolBar()
toolbar.AddTool(ID_ABC, wx.Bitmap(‘icons/icon.png’))
self.Bind(wx.EVT_TOOL, self.OnChangeDepth, id=ID_ABC)


pytz.UnknownTimeZoneError: ‘US/Central’ after py2exe in python 2.6

I noticed that the old version of pytz I was using compiled each timezone into a .pyc, and these would be included in the resulting for my programs. When I build against the new pytz, these files are no longer getting compiled to .pyc. Instead, when I check the pytz directory in, I see these files:

It appears that the zoneinfo directory is missing.

in file for the py2exe

import py2exe
‘py2exe’: {
‘packages’ : [‘matplotlib’, ‘pytz’],

The “packages” in “options” is very important.



DeprecationWarning: the sets module is deprecated

There are two methods:
4.1. python -W ignore::DeprecationWarning

4.2. in Python26\Lib\  comment line 83-85
#import warnings
#warnings.warn(“the sets module is deprecated”, DeprecationWarning,stacklevel=2)

5. Socket timeout in xmlrpclib

import xmlrpclib
import socket
socket.setdefaulttimeout(10)        #set the timeout to 10 seconds
x = xmlrpclib.ServerProxy('')

x.func_name(args)                   #times out after 10 seconds
socket.setdefaulttimeout(None)      #sets the default back


Read Full Post | Make a Comment ( None so far )

PHP trouble shooting

Posted on August 10, 2008. Filed under: MySQL, PHP |

1. In the php mysql related script:

After use “mysql_query”, if you want to get the real results array, you should call the function “mysql_fetch_assoc” or “mysql_fetch_row”. The result are “key”->”value” pairs where the “key” is the table column name in the mysql_query sentence.

Useful Links:

1.  _SERVER parameter

2. Getting the current full URL in PHP

3. PHP tip: How to convert a relative URL to an absolute URL

4. mb_strrchr()

5.Introduction to PHP Regex

6. urlencode and rawurlencode

Read Full Post | Make a Comment ( None so far )

Administrating a MySQL server

Posted on August 5, 2005. Filed under: Linux, MySQL |


Administrating a MySQL server

Setting the password:

1. From Unix:
       shell> mysql -u username -h hostname -p password
       mysql> SET PASSWORD FOR username@localhost=PASSWORD(‘new_password’);

2. Directly manipulate the privilege tables:
       shell> mysql -u username -h host -u username -p
       mysql> UPDATE user SET Password=PASSWORD(‘new_password’) WHERE user=’root’;
       mysql> FLUSH PRIVILEGES;

3. Using the mysqladmin command:
       shell> mysqladmin -u username password new_password

In our case we were able to change password specifying host name along with user name:
       shell> bin/myslqadmin u username h localhost

MySQL Permissions & Grant Tables

In order to add a new user or update user’s privileges in mysql grant tables login to mysql as a root user.

There are two options: use GRANT/REVOKE command or manipulating the MySQL grant tables directly.
The preferred method is to use GRANT statements – more concise and less error-prone.

If you modify the grant tables manually (using INSERT, UPDATE, etc.), you should execute
a FLUSH PRIVILEGES statement to tell the server to reload the grant tables.

To remove user: mysql> delete from user where user=’username’;
        mysql> FLUSH PRIVILEGES;

Examples adding a new user with different level of privileges:
  dummy: A user who can connect without a password, but only from the local host.

       mysql> GRANT USAGE ON *.* TO dummy@localhost;

myUser : A full superuser who can connect to the server from anywhere,
but who must use a password ‘pass’ to do so.
GRANT statements should be for both myUser@localhost and myUser@”%”.
to prevent the anonymous user entry for localhost take precedence.

        mysql> GRANT ALL PRIVILEGES ON *.* TO myUser@localhost
                        IDENTIFIED BY ‘pass’ WITH GRANT OPTION;
       mysql> GRANT ALL PRIVILEGES ON *.* TO myUser@”%”
                        IDENTIFIED BY ‘some_pass’ WITH GRANT OPTION;

      “%” – is a wildcard in mysql. If you are defining your DB table and in the ‘host’ field
enter ‘%’, that means that any host can access that database (Of course, that host
must also have a valid db user).

       admin: A user who can connect from localhost without a password and who is granted
the RELOAD and PROCESS administrative privileges.
No database-related privileges are granted.

          mysql> GRANT RELOAD,PROCESS ON *.* TO admin@localhost;

Add a user that has full rights to his database only but cannot see other database:
           mysql> GRANT USAGE ON *.* TO ‘user’@’host’ GRANT Select, Insert, Update, Delete,
                           Create, Drop ON `database`.* TO ‘user’@’host’ FLUSH PRIVELEGS;

The FILE privelege and WITH GRANT OPTION may not be the best way to include, it is
only in case of creating another superuser with full set of privileges or
giving privileges to load data using mysql command INLOAD DATA.

TABLE USER: Everything after “password” is a privelege granted with values ‘Y’ or ‘N’.
This table controls individual user global access rights.


TABLE DB: This controls access of USERS to databases.


TABLE HOST: This controls which HOSTS are allowed what global access rights.


HOST, USER, and DB table are very closely connected – if an authorized USER
attempts an SQL request from an unauthorized HOST, it is denied.
If a request from an authorized HOST is not an authorized USER, it is denied.
If a globally authorized USER does not have rights to a certain DB, it is denied.

Backups in MySQL

Full backup of MySql databases:
     1. shell> mysqldump –tab=/path/to/some/dir –opt –full
     2. shell> mysqlhotcopy database /path/to/some/dir
     3. simply copy all table files (`*.frm’, `*.MYD’, and `*.MYI’ files)

For a SQL level backup of a table use SELECT INTO OUTFILE or BACKUP TABLE.

           mysql> BACKUP TABLE tbl_name[,tbl_name…] TO ‘/path/to/backup/directory’
Copies to the backup directory the minimum number of table files needed to
restore the table, after flushing any buffered changes to disk.

RESTORE TABLE tbl_name[,tbl_name…] FROM ‘/path/to/backup/directory’
Restores the table(s) from the backup that was made with BACKUP TABLE.
Existing tables will not be overwritten; if you try to restore over an existing
table, you will get an error. Restoring will take longer than backing up due to
the need to rebuild the index. The more keys you have, the longer it will take.
Just as BACKUP TABLE, RESTORE TABLE currently works only for MyISAM tables.

Selective backups can be done with:
              SELECT * INTO OUTFILE ‘file_name’ FROM tbl_name
and restore with:
              LOAD DATA INFILE ‘file_name’ REPLACE …
To avoid duplicate records, you need a PRIMARY KEY or a UNIQUE key in the table.
The REPLACE keyword causes old records to be replaced with new ones when a new
record duplicates an old record on a unique key value.

Monitoring tools

The myisamchk utility is used to get information, check, repair or optimise mysql database tables:
              shell> myisamchk [options] tbl_name

With no options, myisamchk simply checks the table.

Some useful Options for myisamchk utility:

1. Print informational statistics about the table that is checked: -i or –information
2. Check only tables that have changed since the last check: -C or –check-only-changed
3. The recommended way to quickly check all tables:
              myisamchk –silent –fast /path/to/datadir/*/*.MYI

To Start the server automatically at system startup time

The mysql.server and safe_mysqld scripts can be used to start/stop the server automatically.
        shell> mysql.server start
        shell> mysql.server stop

See mysql.server in the `share/mysql’ directory or in the `support-files’ directory of the MySQL source tree.
The mysql.server script understands the following options: datadir, basedir, and pid-file.

If your system uses `/etc/rc.local’ to start external scripts, you should append the following to it:

     /bin/sh -c ‘cd /usr/local/mysql ; ./bin/safe_mysqld –user=mysql &’

The mysql.server script understands the following options: datadir, basedir, and pid-file.


Read Full Post | Make a Comment ( None so far )

Liked it here?
Why not try sites on the blogroll...