What every programmer should know about memory

Posted on January 3, 2014. Filed under: C/C++, Performance | Tags: , , , |

This is a big document about how to improve program based on memory hardware implementation and software design/compile. (Arthur is Ulrich Drepper who was from Redhat)

The link on is

The outline is:

  • Part 2 (CPU caches)
  • Part 3 (Virtual memory)
  • Part 4 (NUMA systems)
  • Part 5 (What programmers can do – cache optimization)
  • Part 6 (What programmers can do – multi-threaded optimizations)
  • Part 7 (Memory performance tools)
  • Part 8 (Future technologies)
  • Part 9 (Appendices and bibliography)
Read Full Post | Make a Comment ( None so far )

Make ipmitool working for ESXi 5.1

Posted on September 17, 2013. Filed under: Linux, Programming | Tags: , , |

ESXi 5.1 has no ipmitool command built-in, and has limited IPMI capability, we can build a static ipmitool binary for ESXi 5.1.

1. Login to ESXi 5.1, check it’s version by:
~# /lib/

Compiled by GNU CC version 4.1.2 20070626 (Red Hat 4.1.2-14).
Compiled on a Linux 2.6.9 system on 2012-03-21.

This means the is built on linux with kernel 2.6.9 and gcc 4.1.2, so we’d better setup a linux environment with the same linux kernel and gcc version, such as CentOS 4.8.

2. login to the linux environment, build static linked ipmitool binary
2.1 download the ipmitool source code from sourceforge
2.2 Compile. And adding “-static” linker option. The resulting binary I got back was not statically compiled, but worked. Omitting the -static flag caused ipmitool to throw compile errors, so I left it.
~/src/ipmitool-1.8.11$ ./configure CFLAGS=-m32 LDFLAGS=-static

~/src/ipmitool-1.8.11/src/$ ldd ipmitool => (0x009f0000) => /lib/ (0x00c80000) => /lib/ (0x00ac4000)
/lib/ (0x00a9e000)

2.3 copy the built binary to ESXi 5.1 server, it works fine


Read Full Post | Make a Comment ( 4 so far )

Linux Page Faults

Posted on September 16, 2013. Filed under: Linux, Programming | Tags: , , , |

Page Fault

4 Linux Commands to View Page Faults Stats: ps, top, sar, time

  1. major fault occurs when disk access required. For example, start an app called Firefox. The Linux kernel will search in the physical memory and CPU cache. If data do not exist, the Linux issues a major page fault.
  2. A minor fault occurs due to page allocation.


Read Full Post | Make a Comment ( None so far )

Cross compiling environment setup for ARM Architecture pidora OS

Posted on September 2, 2013. Filed under: C/C++, Linux | Tags: , , , , , , , |

In this article, I’m using raspberry pi hardware, and using pedora 32-bit target OS, Scientific Linux 6.1 64-bit as host OS, crosstool-ng as cross compiling tool. I will give a brief introduction steps about how to build binaries and shared/static libraries, and in the process, some building depends on third party libraries

Question: Why we need setup the cross compiling environment?
The ARM architecture is not powerful enough to build the binary as quick as we want, so we need setup a toolchain build environment on a powerful host OS which is using Intel or AMD cpu to build binaries for target OS running on ARM cpu.

    1. After install fedora in raspberry pi, need following information from it:

kernel version by uname -a

gcc version by gcc --version

glibc version by run /lib/ directly

    1. download and install crosstool-ng to host OS

login as root to the host OS, and then:
tar xjvf crosstool-ng-1.18.0.tar.bz2
yum install gperf.x86_64 texinfo.x86_64 gmp-devel.x86_64

./configure –prefix=/usr/local/
make install

unset LD_LIBRARY_PATH (or “export -n LD_LIBRARY_PATH”, this step is because of crosstool-ng may use this environment, so if we leave the host one, it may screw up the target compiling)
and add the destination address to PATH:
export PATH=/usr/local/x-tools/crosstool-ng/bin:$PATH (then you can use “ct-ng”)
mkdir ~/pi-staging
cd ~/pi-staging
ct-ng menuconfig

In the configuration, select related configuration of Host and Target OS, such as:
In target option, select target architecture as “ARM”, and set “Endianness set” as “Little endian” and “Bitness” as “32-bit”

      • enable “EXPERIMENTAL” features in “Paths and misc”, leave your prefix directory as “${HOME}/x-tools/${CT_TARGET}”
      • in “Operating System”, set “Target OS” as “linux”, and “Linux Kernel” to that ARM pedora’s kernel version
      • Set “binutils” version to “2.21.1a”
      • Set “C Compiler” version to what ARM pedora is using, and enable “C++” if needed
      • Set “glibc” or “eglibc” or “ulibc” version to what ARM pedora is using (get this information from /lib/

ct-ng build

this will build the basic environment of target os, and it may take 1 to 2 hours, after this building, you can find the binaries under directory “~/x-tools/arm-unknown-linux-gnueabi/bin/”, and we need add this directory into PATH, and following binary provided:

    1. After build crosstool-ng, there is one “sysroot” directory under “x-tools/arm-unknown-linux-gnueabi/arm-unknown-linux-gnueabi/sysroot”, this information can be found in cross tool-ng’s document “crosstool-ng/docs/”When we build a big project which has multiple libraries (static or dynamic) and binaries, the project may depend on more third party libraries, so we need make the build environment supports third party libraries deployment capability, the straightforward way is include all source code of third party libraries into the project, but that costs a lot of debugging and waste building time, the simple way is to setup a sysroot (like chroot) to “install”/”unpack” the third party libraries into the sysroot directory
        1. we’re not going to use built-in sysroot directory, we will setup a new sysroot directory that can benefit multiple projects use their own sysroot and screw one won’t affect other projects
        2. run following commands for a new sysroot

      cp -r ~/x-tools/arm-unknown-linux-gnueabi/arm-unknown-linux-gnueabi/sysroot ~/my-sysroot

        1. once you need a third party library to build a project, you can download the rpm from

      such as I need libcurl-devel for my project, I can download rpm and unpack the package to my-sysroot:
      cd ~/my-sysroot/
      rpm2cpio libcurl-devel-7.27.0-6.fc18.armv6hl.rpm | cpio -idmv

    2. start cross compiling your project

Lots of our projects are using autoconf or automake, so we may use “./configure” and “make” during the project compilation
Set CC to the cross-compiler for the target you configured the library for; it is important to use this same CC value when running configure, like this: ‘CC=target-gcc configure target’. Set BUILD_CC to the compiler to use for programs run on the build system as part of compiling the library. You may need to set AR to cross-compiling versions of ar if the native tools are not configured to work with object files for the target you configured for. such as:

CC=arm-unknown-linux-gnueabi-gcc AR=arm-unknown-linux-gnueabi-ar ./configure

4.1 once the Makefile is generated, go check the Makefile is using the right gcc and ar version as you want
and if there is architecture parameter in the Makefile, please check its using “armv6” instead of “i386” or “x86_64”

4.2 If there is header files needed, please check the CFLAGS that it’s pointing to the header file in your ~/my-sysroot directory, not host system header file

4.3 Some binary or library needs third party library to link, so add lib directory of your sysroot to CFLAGS and LDFLAGS as well, like:


Sometimes, the make process can not find some library even you give the path it exist, then you may need following adding to Makefile

  1. Troubleshootings
    5.1 if you want to see the process of buildadding “-v” to “CFLAGS“5.2 cannot find crti.o: No such file or directory

    Adding sysroot lib directory “-B/root/my-sysroot/usr/lib” to “CFLAGS

    5.3 arm-unknown-linux-gnueabi/bin/ld: cannot find /lib/
    arm-unknown-linux-gnueabi/bin/ld: skipping incompatible /usr/lib/libpthread_nonshared.a when searching for /usr/lib/libpthread_nonshared.a
    arm-unknown-linux-gnueabi/bin/ld: cannot find /usr/lib/libpthread_nonshared.a

    This need edit file in my-sysroot to delete absolute path:
    5.3.1 cd ~/my-sysroot/usr/lib/

    change “GROUP ( /lib/ /usr/lib/libpthread_nonshared.a )
    GROUP ( libpthread_nonshared.a )

    5.3.2 this solution apply to file for cross-compiling as well

    5.4 arm-unknown-linux-gnueabi/bin/ld: cannot find -lboost_regex
    download and install boost-static from pedora to sysroot which includes static boost libraries
    And adding the sysroot lib directory to LDFLAGS such as:

    This method applies to,, not found issues

    5.5 arm-unknown-linux-gnueabi/bin/ld: warning:, needed by my-sysroot//usr/lib/, not found (try using -rpath or -rpath-link)

    This is because of the in your sysroot points to the Host OS one, we need relink it to target sysroot lib file:
    make sure you unpack the pedora glibc library into your sysroot (rpm2cpio glibc-2.16-28.fc18.a6.armv6hl.rpm | cpio -idmv),
    cd ~/my-sysroot/lib/
    ls -l
    ln -s

    5.6 wired link problems
    5.6.1 sometimes if you provide all static or dynamic libraries directories from your sysroot for the binary or library to link, it still fail, try to switch the third party libraries sequence in LDFLAGS, such as change “-lz -lcrypto” to “-lcrypto -lz
    5.6.2 In function `boost::iostreams::detail::bzip2_base::compress(int)': undefined reference to `BZ2_bzCompress'
    In this case, your Makefile may want to link static libraries into your binary, in this case you can try to link dynamic linked library instead of static linked library.

  • References:

Read Full Post | Make a Comment ( None so far )

Python based web servers

Posted on August 28, 2013. Filed under: Python | Tags: , |

1. tornado web server: Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed. By using non-blocking network I/O, Tornado can scale to tens of thousands of open connections, making it ideal for long polling, WebSockets, and other applications that require a long-lived connection to each user.

Read Full Post | Make a Comment ( None so far )

Active and updated JavaScript Chart library

Posted on August 28, 2013. Filed under: Javascript | Tags: , |

1. Highcharts: Highcharts is a charting library written in pure HTML5/JavaScript, offering intuitive, interactive charts to your web site or web application. Highcharts currently supports line, spline, area, areaspline, column, bar, pie, scatter, angular gauges, arearange, areasplinerange, columnrange, bubble, box plot, error bars, funnel, waterfall and polar chart types.

2. D3.js: D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG and CSS. D3’s emphasis on web standards gives you the full capabilities of modern browsers without tying yourself to a proprietary framework, combining powerful visualization components and a data-driven approach to DOM manipulation.

3. Googlecharts: Google Charts provides a perfect way to visualize data on your website. From simple line charts to complex hierarchical tree maps, the chart gallery provides a large number of ready-to-use chart types.

The most common way to use Google Charts is with simple JavaScript that you embed in your web page. You load some Google Chart libraries, list the data to be charted, select options to customize your chart, and finally create a chart object with an id that you choose. Then, later in the web page, you create a

with that id to display the Google Chart.

4.Datatables: Datatables is a jQuery JavaScript library for HTML table.

Read Full Post | Make a Comment ( None so far )

How to create Patch on Liunx

Posted on June 7, 2012. Filed under: Linux, Programming |

In this post I will explain how to create a patch file and how to apply patch.
I only know the basic stuff. Therefore, if you want to know more then please use command man diff at command prompt to explore other possible options.

How to Create a Patch File?
Patch file is a readable text file which is created using diff command.
To create a patch of entire source folder including its sub directories, do as follows:

$ diff -rauB  old_dir   new_dir   >  patchfile_name

options -rauB mean:
r  -> recursive (multiple levels dir)
a ->  treat all files as text
u -> Output NUM (default 3) lines of unified context
B -> B is to ignore Blank Lines

Blank lines are usually useless in the patch file.  Therefore I used B option, however you are free to tune your patch file according to your choice.
old_dir is the original source code directory,  new_dir is the new version of source code and patchfile_name is the name of the patch file.  You can give any name to your patch file.

How to Apply Patch to your Source Code?
Suppose you have got a patch file and you want to apply that patch to the version of your source code.
It is not necessary but I recommend to do a dry-run test first.

To do a dry run:
$ patch –dry-run -p1 -i patchfile_name

i : Read the patch from patchfile.
pnum : Strip  the  smallest prefix containing num leading slashes from each
file name found in the patch file.  A sequence of one or more  adjacent
slashes  is counted as a single slash.  This controls how file
names found in the patch file are treated, in  case  you  keep  your
files  in  a  different  directory  than the person who sent out the

If there is no failure then you can apply patch to your source code:
$ patch -p1 -i patchfile_name

I hope this post will help you to generate patch from your source code and apply patch to your source code.




Read Full Post | Make a Comment ( None so far )

How to Use C’s volatile Keyword

Posted on April 15, 2010. Filed under: C/C++ | Tags: , , , , |

by Nigel Jones

The proper use of C’s volatile keyword is poorly understood by many programmers. This is not surprising, as most C texts dismiss it in a sentence or two. This article will teach you the proper way to do it.

Have you experienced any of the following in your C or C++ embedded code?

  • Code that works fine–until you enable compiler optimizations
  • Code that works fine–until interrupts are enabled
  • Flaky hardware drivers
  • RTOS tasks that work fine in isolation–until some other task is spawned

If you answered yes to any of the above, it’s likely that you didn’t use the C keyword volatile. You aren’t alone. The use of volatile is poorly understood by many programmers. Unfortunately, most books about the C programming language dismiss volatile in a sentence or two.

C’s volatile keyword is a qualifier that is applied to a variable when it is declared. It tells the compiler that the value of the variable may change at any time–without any action being taken by the code the compiler finds nearby. The implications of this are quite serious. However, before we examine them, let’s take a look at the syntax.

volatile keyword syntax

To declare a variable volatile, include the keyword volatile before or after the data type in the variable definition. For instance both of these declarations will declare foo to be a volatile integer:

volatile int foo; 
int volatile foo;

Now, it turns out that pointers to volatile variables are very common, especially with memory-mapped I/O registers. Both of these declarations declare pReg to be a pointer to a volatile unsigned 8-bit integer:

volatile uint8_t * pReg; 
uint8_t volatile * pReg;

Volatile pointers to non-volatile data are very rare (I think I’ve used them once), but I’d better go ahead and give you the syntax:

int * volatile p;

And just for completeness, if you really must have a volatile pointer to a volatile variable, you’d write:

int volatile * volatile p;

Incidentally, for a great explanation of why you have a choice of where to place volatile and why you should place it after the data type (for example, int volatile * foo), read Dan Sak’s column “Top-Level cv-Qualifiers in Function Parameters” (Embedded Systems Programming, February 2000, p. 63).

Finally, if you apply volatile to a struct or union, the entire contents of the struct/union are volatile. If you don’t want this behavior, you can apply the volatile qualifier to the individual members of the struct/union.

Proper use of volatile

A variable should be declared volatile whenever its value could change unexpectedly. In practice, only three types of variables could change:

1. Memory-mapped peripheral registers

2. Global variables modified by an interrupt service routine

3. Global variables accessed by multiple tasks within a multi-threaded application

We’ll talk about each of these cases in the sections that follow.

Peripheral registers

Embedded systems contain real hardware, usually with sophisticated peripherals. These peripherals contain registers whose values may change asynchronously to the program flow. As a very simple example, consider an 8-bit status register that is memory mapped at address 0x1234. It is required that you poll the status register until it becomes non-zero. The naive and incorrect implementation is as follows:

uint8_t * pReg = (uint8_t *) 0x1234;

// Wait for register to become non-zero 
while (*pReg == 0) { } // Do something else

This will almost certainly fail as soon as you turn compiler optimization on, since the compiler will generate assembly language that looks something like this:

  mov ptr, #0x1234 mov a, @ptr

  bz loop

The rationale of the optimizer is quite simple: having already read the variable’s value into the accumulator (on the second line of assembly), there is no need to reread it, since the value will always be the same. Thus, in the third line, we end up with an infinite loop. To force the compiler to do what we want, we modify the declaration to:

uint8_t volatile * pReg = (uint8_t volatile *) 0x1234;

The assembly language now looks like this:

  mov ptr, #0x1234

  mov a, @ptr
  bz loop

The desired behavior is achieved.

Subtler problems tend to arise with registers that have special properties. For instance, a lot of peripherals contain registers that are cleared simply by reading them. Extra (or fewer) reads than you are intending can cause quite unexpected results in these cases.

Interrupt service routines

Interrupt service routines often set variables that are tested in mainline code. For example, a serial port interrupt may test each received character to see if it is an ETX character (presumably signifying the end of a message). If the character is an ETX, the ISR might set a global flag. An incorrect implementation of this might be:

int etx_rcvd = FALSE;

void main() 
    while (!ext_rcvd) 
        // Wait

interrupt void rx_isr(void) 
    if (ETX == rx_char) 
        etx_rcvd = TRUE;

With compiler optimization turned off, this code might work. However, any half decent optimizer will “break” the code. The problem is that the compiler has no idea that etx_rcvd can be changed within an ISR. As far as the compiler is concerned, the expression !ext_rcvd is always true, and, therefore, you can never exit the while loop. Consequently, all the code after the while loop may simply be removed by the optimizer. If you are lucky, your compiler will warn you about this. If you are unlucky (or you haven’t yet learned to take compiler warnings seriously), your code will fail miserably. Naturally, the blame will be placed on a “lousy optimizer.”

The solution is to declare the variable etx_rcvd to be volatile. Then all of your problems (well, some of them anyway) will disappear.

Multi-threaded applications

Despite the presence of queues, pipes, and other scheduler-aware communications mechanisms in real-time operating systems, it is still fairly common for two tasks to exchange information via a shared memory location (that is, a global). Even as you add a preemptive scheduler to your code, your compiler has no idea what a context switch is or when one might occur. Thus, another task modifying a shared global is conceptually identical to the problem of interrupt service routines discussed previously. So all shared global variables should be declared volatile. For example, this is asking for trouble:

int cntr;

void task1(void) 
    cntr = 0; 
    while (cntr == 0) 

void task2(void) 

This code will likely fail once the compiler’s optimizer is enabled. Declaring cntr to be volatile is the proper way to solve the problem.

Final thoughts

Some compilers allow you to implicitly declare all variables as volatile. Resist this temptation, since it is essentially a substitute for thought. It also leads to potentially less efficient code.

Also, resist the temptation to blame the optimizer or turn it off. Modern optimizers are so good that I cannot remember the last time I came across an optimization bug. In contrast, I come across failures by programmers to use volatile with depressing frequency.

If you are given a piece of flaky code to “fix,” perform a grep for volatile. If grep comes up empty, the examples given here are probably good places to start looking for problems.

This article was published in the July 2001 issue of Embedded Systems Programming. If you wish to cite the article in your own work, you may find the following MLA-style information helpful:

Jones, Nigel. “Introduction to the Volatile Keyword” Embedded Systems Programming, July 2001

Read Full Post | Make a Comment ( None so far )

Thrift vs. Protocol Buffers

Posted on March 21, 2010. Filed under: Computer Science, Programming | Tags: , , , |

Google recently released its Protocol Buffers as open source. About a year ago, Facebook released a similar product called Thrift. I’ve been comparing them; here’s what I’ve found:

Thrift Protocol Buffers
Backers Facebook, Apache (accepted for incubation) Google
Bindings C++, Java, Python, PHP, XSD, Ruby, C#, Perl, Objective C, Erlang, Smalltalk, OCaml, and Haskell C++, Java, Python
(Perl, Ruby, and C# under discussion)
Output Formats Binary, JSON Binary
Primitive Types bool
16/32/64-bit integersdouble
byte sequence
bool32/64-bit integers
byte sequence

“repeated” properties act like lists

Enumerations Yes Yes
Constants Yes No
Composite Type struct message
Exception Type Yes No
Documentation So-so Good
License Apache BSD-style
Compiler Language C++ C++
RPC Interfaces Yes Yes
RPC Implementation Yes No
Composite Type Extensions No Yes

Overall, I think Thrift wins on features and Protocol Buffers win on
documentation. Implementation-wise, they’re quite similar. Both use
integer tags to identify fields, so you can add and remove fields
without breaking existing code. Protocol Buffers support
variable-width encoding of integers, which saves a few bytes. (Thrift
has an experimental output format with variable-width ints.)

The major difference is that Thrift provides a full client/server RPC
implementation, whereas Protocol Buffers only generate stubs to use in
your own RPC system.

Update July 12, 2008: I haven’t tested for speed, but from a cursory examination it seems that, at the binary level, Thrift and Protocol Buffers are very similar. I think Thrift will develop a more coherent community now that it’s under Apache incubation. It just moved to a new web site and mailing list, and the issue tracker is active.

Reference: (Original Site)

Read Full Post | Make a Comment ( None so far )

up and running with Cassandra

Posted on March 21, 2010. Filed under: Java, Linux, MySQL, PHP, Services | Tags: , , |

Cassandra is a hybrid non-relational database in the same class as Google’s BigTable. It is more featureful than a key/value store like Dynomite, but supports fewer query types than a document store like MongoDB.

Cassandra was started by Facebook and later transferred to the open-source community. It is an ideal runtime database for web-scale domains like social networks.

This post is both a tutorial and a “getting started” overview. You will learn about Cassandra’s features, data model, API, and operational requirements—everything you need to know to deploy a Cassandra-backed service.

Jan 8, 2010: post updated for Cassandra gem 0.7 and Cassandra version 0.5.


There are a number of reasons to choose Cassandra for your website. Compared to other databases, three big features stand out:

  • Flexible schema: with Cassandra, like a document store, you don’t have to decide what fields you need in your records ahead of time. You can add and remove arbitrary fields on the fly. This is an incredible productivity boost, especially in large deployments.
  • True scalability: Cassandra scales horizontally in the purest sense. To add more capacity to a cluster, turn on another machine. You don’t have restart any processes, change your application queries, or manually relocate any data.
  • Multi-datacenter awareness: you can adjust your node layout to ensure that if one datacenter burns in a fire, an alternative datacenter will have at least one full copy of every record.

Some other features that help put Cassandra above the competition :

  • Range queries: unlike most key/value stores, you can query for ordered ranges of keys.
  • List datastructures: super columns add a 5th dimension to the hybrid model, turning columns into lists. This is very handy for things like per-user indexes.
  • Distributed writes: you can read and write any data to anywhere in the cluster at any time. There is never any single point of failure.


You need a Unix system. If you are using Mac OS 10.5, all you need is Git. Otherwise, you need to install Java 1.6, Git 1.6, Ruby, and Rubygems in some reasonable way.

Start a terminal and run:

sudo gem install cassandra

If you are using Mac OS, you need to export the following environment variables:

export JAVA_HOME="/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home"
export PATH="/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin:$PATH"

Now you can build and start a test server with cassandra_helper:

cassandra_helper cassandra

It runs!

live demo

The above script boots the server with a schema that we can interact with. Open another terminal window and start irb, the Ruby shell:


In the irb prompt, require the library:

require 'rubygems'
require 'cassandra'
include Cassandra::Constants

Now instantiate a client object:

twitter ='Twitter')

Let’s insert a few things:

user = {'screen_name' => 'buttonscat'}
twitter.insert(:Users, '5', user)  

tweet1 = {'text' => 'Nom nom nom nom nom.', 'user_id' => '5'}
twitter.insert(:Statuses, '1', tweet1)

tweet2 = {'text' => '@evan Zzzz....', 'user_id' => '5', 'reply_to_id' => '8'}
twitter.insert(:Statuses, '2', tweet2)

Notice that the two status records do not have all the same columns. Let’s go ahead and connect them to our user record:

twitter.insert(:UserRelationships, '5', {'user_timeline' => { => '1'}})
twitter.insert(:UserRelationships, '5', {'user_timeline' => { => '2'}})

The call creates a collation key based on the current time; our tweet ids are stored in the values.

Now we can query our user’s tweets:

timeline = twitter.get(:UserRelationships, '5', 'user_timeline', :reversed => true) { |time, id| twitter.get(:Statuses, id, 'text') }
# => ["@evan Zzzz....", "Nom nom nom nom nom."]

Two tweet bodies, returned in recency order—not bad at all. In a similar fashion, each time a user tweets, we could loop through their followers and insert the status key into their follower’s home_timeline relationship, for handling general status delivery.

the data model

Cassandra is best thought of as a 4 or 5 dimensional hash. The usual way to refer to a piece of data is as follows: a keyspace, a column family, a key, an optional super column, and a column. At the end of that chain lies a single, lonely value.

Let’s break down what these layers mean.

  • Keyspace (also confusingly called “table”): the outer-most level of organization. This is usually the name of the application. For example, 'Twitter' and 'Wordpress' are both good keyspaces. Keyspaces must be defined at startup in the storage-conf.xml file.
  • Column family: a slice of data corresponding to a particular key. Each column family is stored in a separate file on disk, so it can be useful to put frequently accessed data in one column family, and rarely accessed data in another. Some good column family names might be :Posts, :Users and :UserAudits. Column families must be defined at startup.
  • Key: the permanent name of the record. You can query over ranges of keys in a column family, like :start => '10050', :finish => '10070'—this is the only index Cassandra provides for free. Keys are defined on the fly.

After the column family level, the organization can diverge—this is a feature unique to Cassandra. You can choose either:

  • A column: this is a tuple with a name and a value. Good columns might be 'screen_name' => 'lisa4718' or 'Google' => ''.It is common to not specify a particular column name when requesting a key; the response will then be an ordered hash of all columns. For example, querying for (:Users, '174927') might return:
    {'name' => 'Lisa Jones',
     'gender' => 'f',
     'screen_name' => 'lisa4718'}

    In this case, name, gender, and screen_name are all column names. Columns are defined on the fly, and different records can have different sets of column names, even in the same keyspace and column family. This lets you use the column name itself as either structure or data. Columns can be stored in recency order, or alphabetical by name, and all columns keep a timestamp.

  • A super column: this is a named list. It contains standard columns, stored in recency order.Say Lisa Jones has bookmarks in several categories. Querying (:UserBookmarks, '174927') might return:
    {'work' => {
        'Google' => '',
        'IBM' => ''},
     'todo': {...},
     'cooking': {...}}

    Here, work, todo, and cooking are all super column names. They are defined on the fly, and there can be any number of them per row. :UserBookmarks is the name of the super column family. Super columns are stored in alphabetical order, with their sub columns physically adjacent on the disk.

Super columns and standard columns cannot be mixed at the same (4th) level of dimensionality. You must define at startup which column families contain standard columns, and which contain super columns with standard columns inside them.

Super columns are a great way to store one-to-many indexes to other records: make the sub column names TimeUUIDs (or whatever you’d like to use to sort the index), and have the values be the foreign key. We saw an example of this strategy in the demo, above.

If this is confusing, don’t worry. We’ll now look at two example schemas in depth.

twitter schema

Here is the schema definition we used for the demo, above. It is based on Eric Florenzano’s Twissandra:

<Keyspace Name="Twitter">
  <ColumnFamily CompareWith="UTF8Type" Name="Statuses" />
  <ColumnFamily CompareWith="UTF8Type" Name="StatusAudits" />
  <ColumnFamily CompareWith="UTF8Type" Name="StatusRelationships"
    CompareSubcolumnsWith="TimeUUIDType" ColumnType="Super" />
  <ColumnFamily CompareWith="UTF8Type" Name="Users" />
  <ColumnFamily CompareWith="UTF8Type" Name="UserRelationships"
    CompareSubcolumnsWith="TimeUUIDType" ColumnType="Super" />

What could be in StatusRelationships? Maybe a list of users who favorited the tweet? Having a super column family for both record types lets us index each direction of whatever many-to-many relationships we come up with.

Here’s how the data is organized:

Click to enlarge

Cassandra lets you distribute the keys across the cluster either randomly, or in order, via the Partitioner option in the storage-conf.xml file.

For the Twitter application, if we were using the order-preserving partitioner, all recent statuses would be stored on the same node. This would cause hotspots. Instead, we should use the random partitioner.

Alternatively, we could preface the status keys with the user key, which has less temporal locality. If we used user_id:status_id as the status key, we could do range queries on the user fragment to get tweets-by-user, avoiding the need for a user_timeline super column.

multi-blog schema

Here’s a another schema, suggested to me by Jonathan Ellis, the primary Cassandra maintainer. It’s for a multi-tenancy blog platform:

<Keyspace Name="Multiblog">
  <ColumnFamily CompareWith="TimeUUIDType" Name="Blogs" />
  <ColumnFamily CompareWith="TimeUUIDType" Name="Comments"/>

Imagine we have a blog named ‘The Cutest Kittens’. We will insert a row when the first post is made as follows:

require 'rubygems'
require 'cassandra'
include Cassandra::Constants

multiblog ='Multiblog')

multiblog.insert(:Blogs, 'The Cutest Kittens',
  { =>
    '{"title":"Say Hello to Buttons Cat","body":"Buttons is a cute cat."}' }) generates a unique, sortable column name, and the JSON hash contains the post details. Let’s insert another:

multiblog.insert(:Blogs, 'The Cutest Kittens',
  { =>
    '{"title":"Introducing Commie Cat","body":"Commie is also a cute cat"}' })

Now we can find the latest post with the following query:

post = multiblog.get(:Blogs, 'The Cutest Kittens', :reversed => true).to_a.first

On our website, we can build links based on the readable representation of the UUID:

guid = post.first.to_guid
# => "b06e80b0-8c61-11de-8287-c1fa647fd821"

If the user clicks this string in a permalink, our app can find the post directly via:

multiblog.get(:Blogs, 'The Cutest Kittens', :start =>, :count => 1)

For comments, we’ll use the post UUID as the outermost key:

multiblog.insert(:Comments, guid,
  { => 'I like this cat. - Evan'})
multiblog.insert(:Comments, guid,
  { => 'I am cuter. - Buttons'})

Now we can get all comments (oldest first) for a post by calling:

multiblog.get(:Comments, guid)

We could paginate them by passing :start with a UUID. See this presentation to learn more about token-based pagination.

We have sidestepped two problems with this data model: we don’t have to maintain separate indexes for any lookups, and the posts and comments are stored in separate files, where they don’t cause as much write contention. Note that we didn’t need to use any super columns, either.

storage layout and api comparison

The storage strategy for Cassandra’s standard model is the same as BigTable’s. Here’s a comparison chart:

multi-file per-file intra-file
Relational server database table* primary key column value
BigTable cluster table column family key column name column value
Cassandra, standard model cluster keyspace column family key column name column value
Cassandra, super column model cluster keyspace column family key super column name column name column value

* With fixed column names.

Column families are stored in column-major order, which is why people call BigTable a column-oriented database. This is not the same as a column-oriented OLAP database like Sybase IQ—it depends on what you use the column names for.

Click to enlarge

In row-orientation, the column names are the structure, and you think of the column families as containing keys. This is the convention in relational databases.

Click to enlarge

In column-orientation, the column names are the data, and the column families are the structure. You think of the key as containing the column family, which is the convention in BigTable. (In Cassandra, super columns are also stored in column-major order—all the sub columns are together.)

In Cassandra’s Ruby API, parameters are expressed in storage order, for clarity:

Relational SELECT `column` FROM `database`.`table` WHERE `id` = key;
BigTable table.get(key, "column_family:column")
Cassandra: standard model keyspace.get("column_family", key, "column")
Cassandra: super column model keyspace.get("column_family", key, "super_column", "column")

Note that Cassandra’s internal Thrift interface mimics BigTable in some ways, but this is being changed.

going to production

Cassandra is an alpha product and could, theoretically, lose your data. In particular, if you change the schema specified in the storage-conf.xml file, you must follow these instructions carefully, or corruption will occur (this is going to be fixed). Also, the on-disk storage format is expected to change in version 0.4.0. After that things will be a bit more stable.

The biggest deployment is at Facebook, where hundreds of terabytes of token indexes are kept in about a hundred Cassandra nodes. However, their use case allows the data to be rebuilt if something goes wrong. Currently there are no known deployments of non-transient data. Proceed carefully, keep a backup in an unrelated storage engine…and submit patches if things go wrong.

That aside, here is a guide for deploying a production cluster:

  • Hardware: get a handful of commodity Linux servers. 16GB memory is good; Cassandra likes a big filesystem buffer. You don’t need RAID. If you put the commitlog file and the data files on separate physical disks, things will go faster. Don’t use EC2 or friends except for testing; the virtualized I/O is too slow.
  • Configuration: in the storage-conf.xml schema file, set the replication factor to 3. List the IP address of one of the nodes as the seed. Set the listen address to the empty string, so the hosts will resolve their own IPs. Now, adjust the contents of for your various paths and JVM options—for a 16GB node, set the JVM heap to 4GB.
  • Deployment: build a package of Cassandra itself and your configuration files, and deliver it to all your servers (I use Capistrano for this). Start the servers by setting CASSANDRA_INCLUDE in the environment to point to your file, and run bin/cassandra. At this point, you should see join notices in the Cassandra logs:
    Cassandra starting up...
    Node has now joined.
    Node has now joined.

    Congratulations! You have a cluster. Don’t forget to turn off debug logging in the file.

  • Visibility: you can get a little more information about your cluster via the tool bin/nodeprobe, included:
    $ bin/nodeprobe --host ring
    Token(124007023942663924846758258675932114665)  3  |<--|
    Token(106858063638814585506848525974047690568)  3  |   ^
    Token(141130545721235451315477340120224986045)  3  |-->|

    Cassandra also exposes various statistics over JMX.

Note that your client machines (not servers!) must have accurate clocks for Cassandra to resolve write conflicts properly. Use NTP.


There is a misperception that if someone advocates a non-relational database, they either don’t understand SQL optimization, or they are generally a hater. This is not the case.

It is reasonable to seek a new tool for a new problem, and database problems have changed with the rise of web-scale distributed systems. This does not mean that SQL as a general-purpose runtime and reporting tool is going away. However, at web-scale, it is more flexible to separate the concerns. Runtime object lookups can be handled by a low-latency, strict, self-managed system like Cassandra. Asynchronous analytics and reporting can be handled by a high-latency, flexible, un-managed system like Hadoop. And in neither case does SQL lend itself to sharding.

I think that Cassandra is the most promising current implementation of a runtime distributed database, but much work remains to be done. We’re beginning to use Cassandra at Twitter, and here’s what I would like to happen real-soon-now:

  • Interface cleanup: the Thrift API for Cassandra is incomplete and inconsistent, which makes writing clients very irritating.
  • Online migrations: restarting the cluster 3 times to add a column family is silly.
  • ActiveModel or DataMapper adapter: for interaction with business objects in Ruby.
    Done! Michael Koziarski on the Rails core team wrote an ActiveModel adapter.
  • Scala client: for interoperability with JVM middleware.

Go ahead and jump on any of those projects—it’s a chance to get in on the ground floor.

Cassandra has excellent performance. There some benchmark results for version 0.5 at the end of the Yahoo performance study.

further resources

Read Full Post | Make a Comment ( 1 so far )

Google Protocol Buffers and other data interchange formats

Posted on March 20, 2010. Filed under: Computer Science, Programming, Services | Tags: , , , , , |

We’ve been planning on moving to a new messaging protocol for a while. We’ve looked at a lot of different solutions but had enough issues with every proposed solution to date that we haven’t made a decision. JR Boyens pointed us to Google’s announcement Protocol Buffers: Google’s Data Interchange Format in July. Glanced at it but then it got lost in the everyday noise. Recent work on a project caused it to get more attention. I like what I see.

As part of a new offering we decided to add in our new messaging direction. We’re processing realtime voice conversations. Some of our major considerations are:

  1. Latency and Performance – Latency matters to us. A LOT. I’m including not only network transport but also memory and CPU. The total time it takes for a message to get from it’s native format in the sender to it’s native format in the receiver. We’re dealing with real time voice communications, too much latency and best case is the callers experience suffers. Our labor model is also sensitive to even small changes in latency. The smaller the latency the more efficient we are, the happier our client’s customers are and the more money we make. As greedy capitalist we see that as a good thing.
  2. Versioning – Our current system has no versioning. Yeah, short sighted on my part. We have to fix it so it’s required for any new message protocol. Protobuf fits our needs on this nicely. Different versions have to coexist and interoperate. We could do this on a different layer than the messaging but it makes sense to me to keep it at this level.
  3. Java and C++ – Language independence is cool and all but in practice if the protocol support Java and C++ we’re good to go. Maybe I’m being a bit myopic but my feeling is the likely hood that whatever we choose will expand to support more languages in the future is very high if it supports several today.
  4. Internal – We control the end points. I don’t care if the schema is external to the data package. In fact, for our use case that’s a plus. For any external services we’ll still expose those using the usual standards. Internally our applications will be using PB for their messaging format.
In short, we’re all about high volume low latency messages.

Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. You can even update your data structure without breaking deployed programs that are compiled against the “old” format.

Protocol buffers have many advantages over XML for serializing structured data. Protocol buffers:

  • are simpler
  • are 3 to 10 times smaller
  • are 20 to 100 times faster
  • are less ambiguous
  • generate data access classes that are easier to use programmatically
Seems like a decent fit. OK, actually an awesome fit. One of our developers has been doing some testing. It’s impressive.



To me protobuf feels like compiled JSON. They are very similar.  The main difference being JSON sends data over the wire in text format verses protobuf’s binary format. The latter has the advantage of a smaller size and being faster for a computer to parse.


Why not ASN.1? Seems like one of the best choices. Well understood and widely used. Sure the full ASN.1 specification is complex but we’d only need a small subset. I’m still struggling with this one a bit. Tool support seems a bit better in protobuf and it’s definitely simpler.


Facebook’s Thrift is very similar to protobuf. Not surprising since the main author interned at Google. It’s a strong offering and recently became an Apache project. Nice stuff. Stuart Sierra has a nice comparison on his blog, Thrift vs. Protocol Buffers. Another worthy contender but not a big enough advantage to stop the internal momentum protobuf already has.


The HDF wiki has an entry Google Protocol Buffers and HDF5 that concludes:

In summary, Protocol Buffers and HDF5 were designed to serve different kinds of data intensive applications: a network based transient message system, and a high performance data storage system for very large datasets such as multi-dimensional images, respectively. That said, both 1) offer open source technologies that can reduce data management headaches for individual developers and projects, 2) increase the ability to share data through the use of well-defined binary formats and supporting libraries that run on a variety of platforms, and 3) provide the ability to access data stored with “older” versions of the data structures.

Different design goals. HDF5 doesn’t fit our needs as well.


The Hessian protocol has the following design goals:

  • It must not require external IDL or schema definitions, i.e. the protocol should be invisible to application code.
  • It must be language-independent.
  • It must be simple so it can be effectively tested and implemented.
  • It must be as fast as possible.
  • It must be as compact as possible.
  • It must support Unicode strings.
  • It must support 8-bit binary data (i.e. without encoding or using attachments.)
  • It must support encryption, compression, signature, and transaction context envelopes.

I still haven’t figured out how/if you can version your messages. Can you add and remove fields and still have compatibility (backwards and forwards)?  Cool effort, still feels very rough in places. Another worthy effort to consider.


At it’s core it’s

The Ice core library. Among many other features, the Ice core library manages all the communication tasks using a highly efficient protocol (including protocol compression and support for both TCP and UDP), provides a flexible thread pool for multi-threaded servers, and offers additional functionality that supports extreme scalability with potentially millions of Ice objects.

ICE is a comprehensive middleware system. It can even use PB as it’s messaging layer. It’s messaging layer doesn’t handle adding or removing fields as well as PB. We don’t need the RPC side of ICE. Just not a good fit for us.


Service Data Objects provides a rather ambitious messaging architecture. It’s concerns aren’t speed and efficiency. The SDO V2.1 White Paper states

SDO is intended to create a uniform data access layer that provides a data access solution for heterogeneous data sources in an easy-to-use manner that is amenable to tooling and frameworks.

Interesting, not a fit.

Cisco Etch

Primary focus is an RPC implementation, not a messaging protocol. Steve Vinoski summarized it nicely in Just What We Need: Another RPC Package. In fairness Steve had some negative thoughts on PB also in Protocol Buffers: Leaky RPC. However, his concerns are around the undefined RPC features Google put in PB, not the IDL type aspects of PB.

Some other XML based protocol

Yeah, I know the problem with XML isn’t XML it’s with the parsers. Cute argument. Getting my message from native format on one system to native format on another as fast as possible is what matters to me. So oddly enough parsers are part of the equation. Yeah, jaxb is fast but just how fast?

Remember, we’re all about high volume low latency messages. It’s not a focus for XML. Yep, no one will take issue with that statement!

Binary XML

Enough said. Next.


Well defined IDL, a bit complicated (because it addresses a wide range of issues).  Built in to the JDK! Not designed for speed or efficiency. Bad fit.


Several good choices. I’m sure there’s others I missed. We’re going with protobuf. Early tests by our developers have been very impressive. Google fan bois can rejoice and the Google haters gripe. In the meantime we’ve got a job to do.

Reference: (Original)

Read Full Post | Make a Comment ( 1 so far )

python thread.error: can’t start new thread

Posted on March 5, 2010. Filed under: Python | Tags: , , , |

The python has thread.error such as:

File "/usr/lib/python2.5/", line 440, in start
_start_new_thread(self.__bootstrap, ())
thread.error: can't start new thread

The “can’t start new thread” error almost certainly due to the fact that you have already have too many threads running within your python process, and due to a resource limit of some kind the request to create a new thread is refused.

You should probably look at the number of threads you’re creating(maybe in the /proc/pid/); the maximum number you will be able to create will be determined by your environment, but it should be in the order of hundreds at least. (Can try ulimit to solve this issue)

It would probably be a good idea to re-think your architecture here; seeing as this is running asynchronously anyhow, perhaps you could use a pool of threads to fetch resources from another site instead of always starting up a thread for every request.

Another improvement to consider is your use of Thread.join and Thread.stop; this would probably be better accomplished by providing a timeout value to the constructor.


Read Full Post | Make a Comment ( 2 so far )

Interesting videos

Posted on February 27, 2010. Filed under: Python | Tags: , |

1. Google I/O 2008 – Painless Python by Alex Martelli (Google)

Read Full Post | Make a Comment ( None so far )


Posted on February 26, 2010. Filed under: Javascript, Services | Tags: , , , |

CKEditor (formerly FCKeditor) is an open source WYSIWYG text editor from CKSource that can be used in web pages. It aims to be lightweight and requires no client-side installation.

Its core code is written in JavaScript, having server side interfaces with Active-FoxPro, ASP, ASP.NET, ColdFusion, Java, JavaScript, Lasso, Perl, PHP and Python.[3]

CKEditor is compatible with most Internet browsers, including: Internet Explorer 6.0+ (Windows), Firefox 2.0+, Safari 3.0+, Google Chrome (Windows), Opera 9.50+, and Camino 1.0+ (Apple).[3]


Read Full Post | Make a Comment ( None so far )


Posted on February 25, 2010. Filed under: C/C++, Linux, Mac, Windows | Tags: , , |

Netscape Plugin Application Programming Interface (NPAPI) is a cross-platform plugin architecture used by many web browsers.

It was first developed for the Netscape family of browsers starting with Netscape Navigator 2.0 but has subsequently been implemented in other browsers including Mozilla Application Suite, Mozilla Firefox, Safari, Google Chrome, Opera, Konqueror, and some older versions of Microsoft Internet Explorer.

Its success can be partly attributed to its simplicity. A plugin declares that it handles certain content types (e.g. “audio/mp3”) through exposed file information. When the browser encounters such content type it loads the associated plugin, sets aside the space within the browser content for the plugin to render itself and then streams data to it. The plugin is then responsible for rendering the data as it sees fit, be it visual, audio or otherwise. So a plugin runs in-place within the page, as opposed to older browsers that had to launch an external application to handle unknown content types.

The API requires each plugin to implement and expose a comparatively small number of functions. There are approximately 15 functions in total for initializing, creating, destroying, and positioning plugins. The NPAPI also supports scripting, printing, full screen plugins, windowless plugins and content streaming.


Read Full Post | Make a Comment ( 1 so far )

staticmethod vs classmethod in Python

Posted on January 19, 2010. Filed under: Python | Tags: , , |

Python’s static methods have a similar implementation as Java & C++. Static methods were not introduced into Python until version 2.2

Example of version 2.2 and higher implementation:

>>> class Foo:
...     def bar(arg):
...         Foo.arg = arg
...     bar = staticmethod(bar)
>>>'Hello World')
>>> Foo.arg
'Hello World'
>>> Foo().bar('Hello')
>>> Foo.arg

Static methods can be called either on the class (such as or on an instance (such as Foo().bar()). The instance is ignored except for its class.

In version 2.4, function decorator syntax was added, which allows another way to define a static method. If you are using 2.4 or above, this is the recommended way of creating a static method.

Example of version 2.4 and higher implementation:

>>> class Foo:
...     @staticmethod
...     def bar(arg):
...         Foo.arg = arg
>>>'Hello World')
>>> Foo.arg
'Hello World'
>>> Foo().bar('Hello')
>>> Foo.arg

If you are looking to do more advanced static methods, look into using classmethod instead of staticmethod. One of the differences between the two is that class method receives the class as implicit first argument, just like an instance method receives the instance. For further reading on class method, refer to the built in functions page.


Return a class method for function.

A class method receives the class as implicit first argument, just like an instance method receives the instance. To declare a class method, use this idiom:

class C:
    def f(cls, arg1, arg2, ...): ...

The @classmethod form is a function decorator – see the description of function definitions in Function definitions for details.

It can be called either on the class (such as C.f()) or on an instance (such as C().f()). The instance is ignored except for its class. If a class method is called for a derived class, the derived class object is passed as the implied first argument.

Class methods are different than C++ or Java static methods. If you want those, see staticmethod() in this section.

For more information on class methods, consult the documentation on the standard type hierarchy in The standard type hierarchy.

New in version 2.2.

Changed in version 2.4: Function decorator syntax added.

Return a static method for function.

A static method does not receive an implicit first argument. To declare a static method, use this idiom:

class C:
    def f(arg1, arg2, ...): ...

The @staticmethod form is a function decorator – see the description of function definitions in Function definitions for details.

It can be called either on the class (such as C.f()) or on an instance (such as C().f()). The instance is ignored except for its class.

Static methods in Python are similar to those found in Java or C++. For a more advanced concept, see classmethod() in this section.

For more information on static methods, consult the documentation on the standard type hierarchy in The standard type hierarchy.

New in version 2.2.

Changed in version 2.4: Function decorator syntax added.


Being educated under Java background, static method and class method are the same thing.

But not so in Python, there is subtle difference:

Say function a() is defined in Parent Class, while Sub Class extends Parent Class

If function a() has @staticmethod decorator, Sub.a() still refers to definition inside Parent Class. Whereas,

If function a() has @classmethod decorator, Sub.a() will points definition inside Sub Class.

Let’s talk about some definitions here:

@staticmethod function is nothing more than a function defined inside a class. It is callable without instantiating the class first. It’s definition is immutable via inheritance.

@classmethod function also callable without instantiating the class, but its definition follows Sub class, not Parent class, via inheritance. That’s because the first argument for @classmethod function must always be cls (class).


Usually, static methods would be used for Singleton or Factory Patterns (, but if you review this page, you will see Python doesn’t require a static method to be used to implement the Factory pattern.

Coming up with an example for static methods can be difficult. I usually use static methods in python for organization purposes. For instance, if I made a DB abstraction layer and had a few functions that were related to the DB layer, but weren’t necessary used in the class, I would added them as static methods. Keeps those functions organized in once place, plus I find it easier to remember.

For example:

>>> data = db_convertAsciiToHtml(”Foo & Bar”)


>>> data = DB.convertAsciiToHtml(”Foo & Bar”)


Read Full Post | Make a Comment ( 1 so far )

« Previous Entries

Liked it here?
Why not try sites on the blogroll...