What every programmer should know about memory

Posted on January 3, 2014. Filed under: C/C++, Performance | Tags: , , , |

This is a big document about how to improve program based on memory hardware implementation and software design/compile. (Arthur is Ulrich Drepper who was from Redhat)

The link on lwn.net is http://lwn.net/Articles/250967/

The outline is:

  • Part 2 (CPU caches)
  • Part 3 (Virtual memory)
  • Part 4 (NUMA systems)
  • Part 5 (What programmers can do – cache optimization)
  • Part 6 (What programmers can do – multi-threaded optimizations)
  • Part 7 (Memory performance tools)
  • Part 8 (Future technologies)
  • Part 9 (Appendices and bibliography)
Read Full Post | Make a Comment ( None so far )

Cross compiling environment setup for ARM Architecture pidora OS

Posted on September 2, 2013. Filed under: C/C++, Linux | Tags: , , , , , , , |

In this article, I’m using raspberry pi hardware, and using pedora 32-bit target OS, Scientific Linux 6.1 64-bit as host OS, crosstool-ng as cross compiling tool. I will give a brief introduction steps about how to build binaries and shared/static libraries, and in the process, some building depends on third party libraries

Question: Why we need setup the cross compiling environment?
The ARM architecture is not powerful enough to build the binary as quick as we want, so we need setup a toolchain build environment on a powerful host OS which is using Intel or AMD cpu to build binaries for target OS running on ARM cpu.

    1. After install fedora in raspberry pi, need following information from it:

kernel version by uname -a

gcc version by gcc --version

glibc version by run /lib/libc.so.6 directly

    1. download and install crosstool-ng to host OS

login as root to the host OS, and then:
wget http://crosstool-ng.org/download/crosstool-ng/crosstool-ng-1.18.0.tar.bz2
tar xjvf crosstool-ng-1.18.0.tar.bz2
yum install gperf.x86_64 texinfo.x86_64 gmp-devel.x86_64

./configure –prefix=/usr/local/
make install

unset LD_LIBRARY_PATH (or “export -n LD_LIBRARY_PATH”, this step is because of crosstool-ng may use this environment, so if we leave the host one, it may screw up the target compiling)
and add the destination address to PATH:
export PATH=/usr/local/x-tools/crosstool-ng/bin:$PATH (then you can use “ct-ng”)
mkdir ~/pi-staging
cd ~/pi-staging
ct-ng menuconfig

In the configuration, select related configuration of Host and Target OS, such as:
In target option, select target architecture as “ARM”, and set “Endianness set” as “Little endian” and “Bitness” as “32-bit”

      • enable “EXPERIMENTAL” features in “Paths and misc”, leave your prefix directory as “${HOME}/x-tools/${CT_TARGET}”
      • in “Operating System”, set “Target OS” as “linux”, and “Linux Kernel” to that ARM pedora’s kernel version
      • Set “binutils” version to “2.21.1a”
      • Set “C Compiler” version to what ARM pedora is using, and enable “C++” if needed
      • Set “glibc” or “eglibc” or “ulibc” version to what ARM pedora is using (get this information from /lib/libc.so.6)

ct-ng build

this will build the basic environment of target os, and it may take 1 to 2 hours, after this building, you can find the binaries under directory “~/x-tools/arm-unknown-linux-gnueabi/bin/”, and we need add this directory into PATH, and following binary provided:

    1. After build crosstool-ng, there is one “sysroot” directory under “x-tools/arm-unknown-linux-gnueabi/arm-unknown-linux-gnueabi/sysroot”, this information can be found in cross tool-ng’s document “crosstool-ng/docs/”When we build a big project which has multiple libraries (static or dynamic) and binaries, the project may depend on more third party libraries, so we need make the build environment supports third party libraries deployment capability, the straightforward way is include all source code of third party libraries into the project, but that costs a lot of debugging and waste building time, the simple way is to setup a sysroot (like chroot) to “install”/”unpack” the third party libraries into the sysroot directory
        1. we’re not going to use built-in sysroot directory, we will setup a new sysroot directory that can benefit multiple projects use their own sysroot and screw one won’t affect other projects
        2. run following commands for a new sysroot

      cp -r ~/x-tools/arm-unknown-linux-gnueabi/arm-unknown-linux-gnueabi/sysroot ~/my-sysroot

        1. once you need a third party library to build a project, you can download the rpm from http://downloads.raspberrypi.org/pidora/releases/18/packages/armv6hl/os/Packages

      such as I need libcurl-devel for my project, I can download rpm and unpack the package to my-sysroot:
      cd ~/my-sysroot/
      rpm2cpio libcurl-devel-7.27.0-6.fc18.armv6hl.rpm | cpio -idmv

    2. start cross compiling your project

Lots of our projects are using autoconf or automake, so we may use “./configure” and “make” during the project compilation
Set CC to the cross-compiler for the target you configured the library for; it is important to use this same CC value when running configure, like this: ‘CC=target-gcc configure target’. Set BUILD_CC to the compiler to use for programs run on the build system as part of compiling the library. You may need to set AR to cross-compiling versions of ar if the native tools are not configured to work with object files for the target you configured for. such as:

CC=arm-unknown-linux-gnueabi-gcc AR=arm-unknown-linux-gnueabi-ar ./configure

4.1 once the Makefile is generated, go check the Makefile is using the right gcc and ar version as you want
and if there is architecture parameter in the Makefile, please check its using “armv6” instead of “i386” or “x86_64”

4.2 If there is header files needed, please check the CFLAGS that it’s pointing to the header file in your ~/my-sysroot directory, not host system header file

4.3 Some binary or library needs third party library to link, so add lib directory of your sysroot to CFLAGS and LDFLAGS as well, like:


Sometimes, the make process can not find some library even you give the path it exist, then you may need following adding to Makefile

  1. Troubleshootings
    5.1 if you want to see the process of buildadding “-v” to “CFLAGS“5.2 cannot find crti.o: No such file or directory

    Adding sysroot lib directory “-B/root/my-sysroot/usr/lib” to “CFLAGS

    5.3 arm-unknown-linux-gnueabi/bin/ld: cannot find /lib/libpthread.so.0
    arm-unknown-linux-gnueabi/bin/ld: skipping incompatible /usr/lib/libpthread_nonshared.a when searching for /usr/lib/libpthread_nonshared.a
    arm-unknown-linux-gnueabi/bin/ld: cannot find /usr/lib/libpthread_nonshared.a

    This need edit file in my-sysroot to delete absolute path:
    5.3.1 cd ~/my-sysroot/usr/lib/
    vi libpthread.so

    change “GROUP ( /lib/libpthread.so.0 /usr/lib/libpthread_nonshared.a )
    GROUP ( libpthread.so.0 libpthread_nonshared.a )

    5.3.2 this solution apply to libc.so file for cross-compiling as well

    5.4 arm-unknown-linux-gnueabi/bin/ld: cannot find -lboost_regex
    download and install boost-static from pedora to sysroot which includes static boost libraries
    And adding the sysroot lib directory to LDFLAGS such as:

    This method applies to libpthread.so.0, librt.so.1, libresolv.so.2 not found issues

    5.5 arm-unknown-linux-gnueabi/bin/ld: warning: ld-linux.so.3, needed by my-sysroot//usr/lib/libz.so, not found (try using -rpath or -rpath-link)

    This is because of the ld-linux.so.3 in your sysroot points to the Host OS one, we need relink it to target sysroot lib file:
    make sure you unpack the pedora glibc library into your sysroot (rpm2cpio glibc-2.16-28.fc18.a6.armv6hl.rpm | cpio -idmv),
    cd ~/my-sysroot/lib/
    ls -l ld-linux.so.3
    rm ld-linux.so.3
    ln -s ld-linux-armhf.so.3 ld-linux.so.3

    5.6 wired link problems
    5.6.1 sometimes if you provide all static or dynamic libraries directories from your sysroot for the binary or library to link, it still fail, try to switch the third party libraries sequence in LDFLAGS, such as change “-lz -lcrypto” to “-lcrypto -lz
    5.6.2 In function `boost::iostreams::detail::bzip2_base::compress(int)': undefined reference to `BZ2_bzCompress'
    In this case, your Makefile may want to link static libraries into your binary, in this case you can try to link dynamic linked library instead of static linked library.

  • References:



Read Full Post | Make a Comment ( None so far )

How to Use C’s volatile Keyword

Posted on April 15, 2010. Filed under: C/C++ | Tags: , , , , |

by Nigel Jones

The proper use of C’s volatile keyword is poorly understood by many programmers. This is not surprising, as most C texts dismiss it in a sentence or two. This article will teach you the proper way to do it.

Have you experienced any of the following in your C or C++ embedded code?

  • Code that works fine–until you enable compiler optimizations
  • Code that works fine–until interrupts are enabled
  • Flaky hardware drivers
  • RTOS tasks that work fine in isolation–until some other task is spawned

If you answered yes to any of the above, it’s likely that you didn’t use the C keyword volatile. You aren’t alone. The use of volatile is poorly understood by many programmers. Unfortunately, most books about the C programming language dismiss volatile in a sentence or two.

C’s volatile keyword is a qualifier that is applied to a variable when it is declared. It tells the compiler that the value of the variable may change at any time–without any action being taken by the code the compiler finds nearby. The implications of this are quite serious. However, before we examine them, let’s take a look at the syntax.

volatile keyword syntax

To declare a variable volatile, include the keyword volatile before or after the data type in the variable definition. For instance both of these declarations will declare foo to be a volatile integer:

volatile int foo; 
int volatile foo;

Now, it turns out that pointers to volatile variables are very common, especially with memory-mapped I/O registers. Both of these declarations declare pReg to be a pointer to a volatile unsigned 8-bit integer:

volatile uint8_t * pReg; 
uint8_t volatile * pReg;

Volatile pointers to non-volatile data are very rare (I think I’ve used them once), but I’d better go ahead and give you the syntax:

int * volatile p;

And just for completeness, if you really must have a volatile pointer to a volatile variable, you’d write:

int volatile * volatile p;

Incidentally, for a great explanation of why you have a choice of where to place volatile and why you should place it after the data type (for example, int volatile * foo), read Dan Sak’s column “Top-Level cv-Qualifiers in Function Parameters” (Embedded Systems Programming, February 2000, p. 63).

Finally, if you apply volatile to a struct or union, the entire contents of the struct/union are volatile. If you don’t want this behavior, you can apply the volatile qualifier to the individual members of the struct/union.

Proper use of volatile

A variable should be declared volatile whenever its value could change unexpectedly. In practice, only three types of variables could change:

1. Memory-mapped peripheral registers

2. Global variables modified by an interrupt service routine

3. Global variables accessed by multiple tasks within a multi-threaded application

We’ll talk about each of these cases in the sections that follow.

Peripheral registers

Embedded systems contain real hardware, usually with sophisticated peripherals. These peripherals contain registers whose values may change asynchronously to the program flow. As a very simple example, consider an 8-bit status register that is memory mapped at address 0x1234. It is required that you poll the status register until it becomes non-zero. The naive and incorrect implementation is as follows:

uint8_t * pReg = (uint8_t *) 0x1234;

// Wait for register to become non-zero 
while (*pReg == 0) { } // Do something else

This will almost certainly fail as soon as you turn compiler optimization on, since the compiler will generate assembly language that looks something like this:

  mov ptr, #0x1234 mov a, @ptr

  bz loop

The rationale of the optimizer is quite simple: having already read the variable’s value into the accumulator (on the second line of assembly), there is no need to reread it, since the value will always be the same. Thus, in the third line, we end up with an infinite loop. To force the compiler to do what we want, we modify the declaration to:

uint8_t volatile * pReg = (uint8_t volatile *) 0x1234;

The assembly language now looks like this:

  mov ptr, #0x1234

  mov a, @ptr
  bz loop

The desired behavior is achieved.

Subtler problems tend to arise with registers that have special properties. For instance, a lot of peripherals contain registers that are cleared simply by reading them. Extra (or fewer) reads than you are intending can cause quite unexpected results in these cases.

Interrupt service routines

Interrupt service routines often set variables that are tested in mainline code. For example, a serial port interrupt may test each received character to see if it is an ETX character (presumably signifying the end of a message). If the character is an ETX, the ISR might set a global flag. An incorrect implementation of this might be:

int etx_rcvd = FALSE;

void main() 
    while (!ext_rcvd) 
        // Wait

interrupt void rx_isr(void) 
    if (ETX == rx_char) 
        etx_rcvd = TRUE;

With compiler optimization turned off, this code might work. However, any half decent optimizer will “break” the code. The problem is that the compiler has no idea that etx_rcvd can be changed within an ISR. As far as the compiler is concerned, the expression !ext_rcvd is always true, and, therefore, you can never exit the while loop. Consequently, all the code after the while loop may simply be removed by the optimizer. If you are lucky, your compiler will warn you about this. If you are unlucky (or you haven’t yet learned to take compiler warnings seriously), your code will fail miserably. Naturally, the blame will be placed on a “lousy optimizer.”

The solution is to declare the variable etx_rcvd to be volatile. Then all of your problems (well, some of them anyway) will disappear.

Multi-threaded applications

Despite the presence of queues, pipes, and other scheduler-aware communications mechanisms in real-time operating systems, it is still fairly common for two tasks to exchange information via a shared memory location (that is, a global). Even as you add a preemptive scheduler to your code, your compiler has no idea what a context switch is or when one might occur. Thus, another task modifying a shared global is conceptually identical to the problem of interrupt service routines discussed previously. So all shared global variables should be declared volatile. For example, this is asking for trouble:

int cntr;

void task1(void) 
    cntr = 0; 
    while (cntr == 0) 

void task2(void) 

This code will likely fail once the compiler’s optimizer is enabled. Declaring cntr to be volatile is the proper way to solve the problem.

Final thoughts

Some compilers allow you to implicitly declare all variables as volatile. Resist this temptation, since it is essentially a substitute for thought. It also leads to potentially less efficient code.

Also, resist the temptation to blame the optimizer or turn it off. Modern optimizers are so good that I cannot remember the last time I came across an optimization bug. In contrast, I come across failures by programmers to use volatile with depressing frequency.

If you are given a piece of flaky code to “fix,” perform a grep for volatile. If grep comes up empty, the examples given here are probably good places to start looking for problems.

This article was published in the July 2001 issue of Embedded Systems Programming. If you wish to cite the article in your own work, you may find the following MLA-style information helpful:

Jones, Nigel. “Introduction to the Volatile Keyword” Embedded Systems Programming, July 2001

Read Full Post | Make a Comment ( None so far )


Posted on February 25, 2010. Filed under: C/C++, Linux, Mac, Windows | Tags: , , |

Netscape Plugin Application Programming Interface (NPAPI) is a cross-platform plugin architecture used by many web browsers.

It was first developed for the Netscape family of browsers starting with Netscape Navigator 2.0 but has subsequently been implemented in other browsers including Mozilla Application Suite, Mozilla Firefox, Safari, Google Chrome, Opera, Konqueror, and some older versions of Microsoft Internet Explorer.

Its success can be partly attributed to its simplicity. A plugin declares that it handles certain content types (e.g. “audio/mp3”) through exposed file information. When the browser encounters such content type it loads the associated plugin, sets aside the space within the browser content for the plugin to render itself and then streams data to it. The plugin is then responsible for rendering the data as it sees fit, be it visual, audio or otherwise. So a plugin runs in-place within the page, as opposed to older browsers that had to launch an external application to handle unknown content types.

The API requires each plugin to implement and expose a comparatively small number of functions. There are approximately 15 functions in total for initializing, creating, destroying, and positioning plugins. The NPAPI also supports scripting, printing, full screen plugins, windowless plugins and content streaming.







Read Full Post | Make a Comment ( 1 so far )


Posted on January 8, 2010. Filed under: C/C++, Linux, Services | Tags: , |

Xen Intro- version 1.0:


  1. Introduction
  2. Xen and IA32 Protection Modes
  3. The Xend daemon:
  4. The Xen Store:
  5. VT-x (virtual technology) processors – support in Xen
  6. Vmxloader
  7. VT-i (virtual technology) processors – support in Xen
  8. AMD SVM
  9. Xen On Solaris
  10. Step by step example of creating guest OS with Virtual Machine Manager in Fedora Core 6
  11. Physical Interrupts
  12. Backend Drivers:
  13. Migration and Live Migration:
  14. Creating of a domain – behind the scenes:
  15. HyperCalls Mapping to code Xen 3.0.2
  16. Virtualization and the Linux Kernel
  17. Pre-Virtualization
  18. Xen Storage
  19. kvm – Kernel-based Virtualization Driver
  20. Tip: How to build Xen with your own tar ball
  21. Xen in the Linux Kernel
  22. VMI : Virtual Machine Interface
  23. Links
  24. Adding new device and triggering the probe() functions
    1. deviceback.c
    2. xenbus.c
    3. common.h
    4. Makefile
  25. Adding a frontend device
    1. Makefile
    2. devicefront.c
  26. Discussion


All of the following text refers to x86 platform of Xen-unstable, unless otherwise explicitly said. We will deal only with Xen on linux 2.6 ; We are not dealing at all with Xen on linux 2.4 (and as far as we know, in the future, domain 0 is intended to be based ONLY on 2.6). Moreover,currently the 2.4 linux tree is removed from Xen Tree (changeset 7263:f1abe953e401 from 8.10.05) but it can be that it will be back when some problems will be fixed.

This document deals only with Xen 3.0 version unless explictily said otherwise.

This is not intended to be a full and detailed documentation of the Xen project but we hope it will be a starting point to anyone who is interested in Xen and wants to learn more.

The Xen Project team is permitted to take part or all of this document and integrate it with the official Xen documentation or put it as a standalone document in the Xen Web Site if they wish, without any further notice.

This is not a complete detailed document nor a full walkthrough ,and many important issues are omitted. Any feedback is welcomed to : Rami Rosen , ramirose@gmail.com

Xen and IA32 Protection Modes

In the classical protection model of IA-32, there are 4 privilege levels; The highest ring is 0, where the kernel runs. (this level is also called SuperVisor Mode) The lowest is ring 3, where User applications run (this level is also called User Mode) Issuing some instructions , which are called “privileged instructions” , from ring which is NOT ring 0, will cause a General Protection Fault.

Ring 1,2 were not used through the years (except for in the case of OS/2). When running Xen, we run a Hypervisor in ring 0 and the guest OS in ring 1. The applications run unmodified at ring 3.

BTW, there are of course architectures which have a different privilege models; for example, in PPC both domain 0 and the Unprivileged domains run in supervisor mode. Diagram: Xen and IA32 Protection Modes


The Xend daemon:

The Xend Daemon handles requests issued from Domain 0; requests can be, for example, creating a new domain (“xm create”) or listing the domains (“xm list”), shutting down a domain (“xm destroy”). Running “xm help” will show all possibilities.

You start the Xend daemon by running, after booting into domain0, “xend start”. “xend start” creates two daemons: xenstored and xenconsoled (see toos/misc/xend). It also creates an instance of a python SrvDaemon class and calls its start() method. (see tools/python/xen/xend/server/SrvDaemon.py).

The SrvDaemon start() method is in fact the xend main program.

In the past,the start() method of SrvDaemon eventually started an http socket (8000) on which it listened to http requests. Now it does not open an http socket on port 8000 anymore.

Note : There is an altenative to the management layer of Xen which is called libvirt; see http://libvir.org. This is a free API (LGPL)

The Xen Store:

The Xen Store Daemon provides a simple tree-like database to which we can read and write values. The Xen Store code is mainly under tools\xenstore.

It replaces the XCS, which was a daemon handling control messages.

The physical xenstore resides in one file: /var/lib/xenstored/tdb. (previously it was sacttered in some files; the change to using one file (named “tdb”) was probably to increase performance).

Both user space (“tools” in Xen terminology) and kernel code can write to the XenStore.The kernel code writes to the XenStore by using XenBus.

The python scripts (under tools/python) uses lowlevel/xs.c to read/write to the XenStore.

The Xen Store Daemon is started in xenstored_core.c. It creates a device file (“/dev/xen/evtchn”) in case such a device file does not exists and it opens it. (see : domain_init() ,file tools/xenstore/xenstored_domain.c).

It opens 2 TCP sockets (UNIX sockets). One of these sockets is a Read-Only socket, and it resides under /var/run/xenstored/socket_ro. The second is /var/run/xenstored/socket.

Connections on these sockets are represented by the connection struct.

A connection can be in one of three states:

        BLOCKED (blocked by a transaction)
        BUSY    (doing some action)
        OK      (completed it's transaction)

struct connection is declared in xenstore/xenstored_core.h; When a socket is ReadOnly,the “can_write” member of it is false.

Then we start an endless loop in which we can get input/output from three sources: the two sockets and the event channel, mentioned above.

Events, which are received in the event channel,are handled by handle_event() method (file xenstored_domain.c).

There are six executables under tools/xenstore, five of which are in fact made from the same module, which is xenstore_client.c, each time built with a different DEFINE passed. (See the Makefile). The sixth tool is built from xsls.c

These executables are : xenstore-exists, xenstore-list, xenstore-read, xenstore-rm, xenstore-write and xsls.

You can use these executable for accessing xenstore. For example: to view the list of fields of domain 0 which has a path “local/domain/0”, you run:

xenstore-list /local/domain/0

and a typical result can be the following list:


The xsls command is very useful and recursively shows the contents of a specified XenStore path. Essentially it does a xenstore-list and then a xenstore-read for each returned field, displaying the fields and their values and then repeating this recursively on each sub-path. For example: to view information about all VIFs backends hosted in domain 0 you may use the following command.

xsls /local/domain/0/backend/vif

and a typical result may be:

14 = ""
 0 = ""
  bridge = "xenbr0"
  domain = "vm1"
  handle = "0"
  script = "/etc/xen/scripts/vif-bridge"
  state = "4"
  frontend = "/local/domain/14/device/vif/0"
  mac = "aa:00:00:22:fe:9f"
  frontend-id = "14"
  hotplug-status = "connected"
15 = ""
 0 = ""
  mac = "aa:00:00:6e:d8:46"
  state = "4"
  handle = "0"
  script = "/etc/xen/scripts/vif-bridge"
  frontend-id = "15"
  domain = "vm2"
  frontend = "/local/domain/15/device/vif/0"
  hotplug-status = "connected"

(The xenstored must be running for these six executables to run; If xenstored is not running, then running theses executables will usually hang. The Xend daemon can be stopped).

An instance of struct node is the elementary unit of the XenStore. (struct node is defined in xenstored_core.h). The actual writing to the XenStore is done by write_node() method of xenstored_core.c.

xen_start_info structure has a member named :store_evtchn. (declared in public/xen.h as u16). This is the event channel for store communication.

VT-x (virtual technology) processors – support in Xen

Note: following text refers only to IA-32 unless explicitly said otherwise.

Intel had announced Pentium® 4 672 and 662 processors in November 2005 with virtualization support. (see, for example: http://www.physorg.com/news8160.html).

How does Xen support the Intel Virtualization Technology ?

The VT extensions support in Xen3 code is mostly in xen/arch/x86/hvm/vmx*.c.

  • and xen/include/asm-x86/vmx*.h and xen/arch/x86/x86_32/entry.S.

arch_vcpu structure (file xen/include/asm-x86/domain.h) contains a member which is called arch_vmx and is an instance of arch_vmx_struct. This member is also important to understand the VT-x mechanism.

But the most important structure for VT-x is the VMCS( vmcs_struct in the code) which represents the VMCS region.

The definition (file include/asm-x86/vmx_vmcs.h) is short:

struct vmcs_struct

  • { u32 vmcs_revision_id; unsigned char data [0]; /* vmcs size is read from MSR */ };

The VMCS region contains six logical regions; most relevant to our discussions are Guest-state area and Host-state area. We will also deal with the other four regions: VM-execution control fields,VM-exit control fields, VM-entry control fields and VM-exit information fields.

Intel added 10 new opcodes in VT-x to support Intel Virtualization Technology. They are detailed in the end of this section.

When using this technology, Xen runs in “VMX root operation mode” while the guest domains (which are unmodified OSs) run in “VMX non-root operation mode”. Since the guest domains run in “non-root operation” mode, it is more restricted,meaning that certain actions will cause “VM exit” to the VM.

Xen enters VMX operation in start_vmx() method. ( file xen/arch/x86/vmx.c)

This method is called from init_intel() method (file xen/arch/x86/cpu/intel.c.) (CONFIG_VMX should be defined).

First we check the X86_FEATURE_VMXE bit in ecx register to see if the cpuid shows that there is support for VMX in the processor. In IA-32 Intel added in the CR4 control register a bit specifying whether we want to enable VMX. So we must set this bit to enable VMX on the processor (by calling set_in_cr4(X86_CR4_VMXE)); This bit is bit 13 in CR4 (VMXE).

Then we call _vmxon to start VMX operation. If we will try to start VMX operation by _vmxon when the VMXE bit in CR4 is not set we will get exception (#UD , for undefined opcode)

In IA-64, things are a little different due to different architecture structure: Intel added a new bit in IA-64 in the Processor Status Register (PSR). This is bit 46 and it’s called VM. It should be set to 1 in guest OSs; and when it’s values is 1 , certain instructions cause virtualization fault.

VM exit:

Some instructions can cause unconditionally VM exit and some can cause VM exit under certain VM-execution control fields. (see the discussion about VMX-region above)

The following instructions will cause VM exit unconditionally: CPUID, INVD, MOV from CR3, RDMSR, WRMSR, and all the new VT-x instructions (which are listed below).

There are other instruction like HLT,INVPLG (Invalidate TLB Entry instruction) MWAIT and others which will cause VM exit if a corresponding VM-execution control was set.

Apart from VM-execution control fields, there are 2 bitmpas which are used for determining whether to perform VM exit: The first is the exception bitmap (see EXCEPTION_BITMAP in vmcs_field enum , file xen/include/asm-x86/vmx_vmcs.h). This bitmap is 32 bit field; when a bit is set in this bitmap, this causes a VM exit if a corresponding exception occurs; by default ,the entries which are set are EXCEPTION_BITMAP_PG (for page fault) and EXCEPTION_BITMAP_GP (for General Protection). see MONITOR_DEFAULT_EXCEPTION_BITMAP in vmx.h.

The second bitmap is the I/O bitmap (in fact, there are 2 I/O bitmaps,A and B, each is 4KB in size) which controls I/O instructions on ports. I/O bitmap A contains the ports in the range 0000-7FFF and I/O bitmap B contains the ports in the range 8000-FFFF. (one bit for each I/O port). see IO_BITMAP_A and IO_BITMAP_B in vmcs_field enum (VMCS Encordings).

When there is an “VM exit” we reach the vmx_vmexit_handler(struct cpu_user_regs regs) in vmx.c. We handle the VM exit according to the exit reason which we read from the VMCS region. We read the vmcs by calling vmread() ; The return value of vmread is 0 in case of success.

We sometimes also need to read some additional data (VM_EXIT_INTR_INFO) from the vmcs.

We get additional data by getting the “VM-exit interruption information” which is a 32 bit field and the “Exit qualification” (64 bit value).

For example, if the exception was NMI, we check if it is valid by checking bit 31 (valid bit) of the VM-exit interruption field. In case it is not valid we call _hvm_bug() to print some statistics and crash the domain.

Example of reading the “Exit qualification” field is in the case where the VMEXIT was caused by issuing INVPLG instruction.

When we work with vt-x, the guest OSs work in shadow mode, meaning they use shadow page tables; this is because the guest kernel in a VMX guest does not know that it’s being virtualized. There is no software visible bit which indicates that the processor is in VMX non-root operation. We set shadow mode by calling shadow_mode_enable() in vmx_final_setup_guest() method (file vmx.c).

There are 43 basic exit reasons – you can see part of them in vmx.h (fields starting with EXIT_REASON_ like EXIT_REASON_EXCEPTION_NMI, which is exit reason number 0, and so on).

In VT-x, Xen will probably use an emulated devices layer which will send virtual interrupts to the VMM. We can prevent the OS from receiving interrupts by setting the IF flag of EFLAGS.

The new ten opcodes which Intel added in Vt-x are detailed below:


  • This simply calls the VM monitor, causing vm exit.


  • copies VMCS data to memory in case it does not written there.
  • wrapper : _vmpclear (u64 addr) in vmx.h.


  • launched a virtual machine; changes the launch state of the VMCS to
    • launched (if it is clear)


  • loads a pointer to the VMCS.
    • wrapper : _vmptrld (u64 addr) (file vmx.h)


  • stores a pointer to the VMCS.wrapper : _vmptrst (u64 addr) (file vmx.h.)


  • read specified field from VMCS.
  • wrapper : _vmread(x, ptr) (file vmx.h)


  • resumes a virtual machine ; in order it to resume the VM,
    • the launch state of the VMCS should be “clear.


  • write specified field in VMCS. wrapper _vmwrite (field, value).


  • terminates VMX operation.
    • wrapper : _vmxoff (void) (file vmx.h.)

10) VMXON (VMXON_OPCODE in vmx.h)

  • starts VMX operation.wrapper : _vmxon (u64 addr) (file vmx.h.)

QEMU and VT-D The io in Vt-x is performed by using QEMU. The QEMU code which Xen uses is under tools/ioemu. It is based on version 0.6.1 of QEMU. This version was patched accrording to Xen needs. Also AMD SVM uses QEUMU emulation.

The default network card which QEMU uses in Vt-x is AMD PCnet-PCI II Ethernet Controller. (file tools/ioemu/hw/pcnet.c). The reason to prefer this nic emulation to the other alternative, ne2000, is that pcnet uses DMA whereas ne2000 does not.

There is of course a performance cost for using QEMU, so there are chances that usage of QEMU will be replaced in the future with different soulutions which have lower performance costs.

Intel had annouced in March 2006 its VT-d Technology (Intel Virtualization Technology for Directred I/O). This technology enables to assign devices to virtual machines. It also enables DMA remapping, which can be configured for each device. There is a cache called IOTLB which improves performance.


There are some restrictions on VMX operation. Guest OSes in VMX cannot operate in Real Mode. If bit PE (Protection Enabled) of CR0 is 0 or bit PG (“Enable Paging”) of CR0 is 0, then trying to start the VMX operation (VMXON instruction) fails.If after entering VMX operation you try to clear these bits, you get an exception (General Protection Exception). When using a linux loader, it starts in real mode. As a result, a vmxloader was written for vmx images. (file tools/firmware/vmxassist/vmxloader.c.)

(In order to build vmxloader you must have dev86 package installed; dev86 is a real mode 80×86 assembler and linker).

After installing Xen, vmxloader is under /usr/lib/xen/boot. In order to use it, you should specify kernel = “/usr/lib/xen/boot/vmxloader” in the config file (which is an input to your “xm create” command.)

The vmxloader loads ROMBIOS at 0xF0000, then VGABIOS at 0xC0000, and then VMXAssist at D000:0000.

What is VMXAssist? The VMXAssist is an emulator for real mode which uses the Virtual-8086 mode of IA32. After setting Virtual-8086 mode, it executes in a 16-bit environment.

There are certain instructions which are not recognized in virtual-8086 mode. For example, LIDT (Load Interrupt Register Table), or LGDT (Load Global DescriptorTable).

These instructions cause #GP(0) when trying to run them in protected mode.

So the VMXAssist assist checks the opcode of the instructions which are being executed, and handles them so that they will not cause General Protection Exception (as would have happened without its intervention).

VT-i (virtual technology) processors – support in Xen

Note : the files mentioned in this sections are from the unstable xen version).

In Vt-i extension for IA64 processors,intel added a bit to the PSR (process status register). This bit is bit 46 of the PSR and is called PSR.vm. When this bit is set, some instructions will cause a fault.

A new instruction called vmsw (Virtual Machine Switch) was added. This instruction sets the PSR.vm to 1 or 0. This instruction can be used to cause transition to or from a VM without causing an interruption.

Also a descriptor named VPD was added; this descriptor represents the resources of a virtual processor. It’s size is 64 K. (It must be 32 aligned).

A VPD stands for “Virtual Processor Descriptor”. A structure named vpd_t represents the VPD descriptor (file include/public/arch-ia64.h).

Two vectors were added to the ivt: One is the External Interrupt vector (0x3400) and the other is the Virtualization vector (0x6100).

The virtualization vector handler is called when an instruction which need virtualization was called. This handler cannot be raised by IA-32 instructions.

Also nine PAL services were added. PAL stands for Processor Abstraction Layer.



AMD will hopefully release PACIFICA processors with virtualization support in Q2 2006. (probably on June 2006). The IOMMU virtualization support is to be out in 2007.

Them xen-unstable tree now includes both intel VT and SVM support, using a common API which is called HVM.

The inclusion of HVM in the unstable tree is since changeset 8708 from 31/1/06, which is a “Big merge the HVM full-virtualisation abstractions.”

You can download the code by: hg clone http://xenbits.xensource.com/xen-unstable.hg

The code for AMD SVM is mostly under xen/arch/x86/hvm/svm.

The code is developed by AMD team: Tom Woller, Mats Petersson, Travis Betak, Nagib Gulam, Leo Duran, Rosilmildo Dasilva and Wei Huang.

SVM stands for “Secure Virtual Machine”.

One major difference between Vt-x and AMD SVM is that the AMD SVM virtualization extensions include tagged TLB (whereas Intel virtualization extensions for IA-32 does not). The benefit of a tagged TLB is significantly reducing the number of TLB flushes ; this is achieved by using an ASID (Address Space Identifer) in the TLB. Using tagged TLB is common in RISC processors.

In AMD SVM, the most important struct (which is parallel to the VT-x vmcs_struct) is the vmcb_struct. (file xen/include/asm-x86/hvm/svm/vmcb.h). VMCB stands for Virtual Machine Control Block.

AMD added the following eight instructions to the SVM processor:

VMLOAD loads the processor state from the VMCB. VMMCALL enables the guest to communicate with the VMM. VMRUN starts the operation of a guest OS. VMSAVE store the processor state from the VMCB. CLGI clears the global interrupt flag (GIF) SLGI sets the global interrupt flag (GIF) INVPLGA invalidates the TLB mapping of a specified virtual page

  • and a specfied ASID.

SKINIT reinitilizes the CPU.

To issue these instructions SVM must be enabled. Enabling SVM is done by setting bit 12 of the EFER MSR register.

In VT-x, the vmx_vmexit_handler() method handles VM Entries. In AMD SVM, the svm_vmexit_handler() method is the one which handles VM exits. (file xen/arch/x86/hvm/svm/svm.c) When VM exit occurs, the processor saves the reason for this exit in the exit_code member of the VCMB. The svm_vmexit_handler() handles the VM EXIT according to the exit_reason of the VMCB.

Xen On Solaris

On 13 Feb 2006, Sun had released the Xen sources for Solaris x86. See : http://opensolaris.org/os/community/xen/opening-day.

This version currently supports 32 bit only ;it enables openSolaris to be a guest OS where dom0 is a modifed Linux kernel running Xen. Also this version is currently only for x86 (porting to SPARC processor is much more difficult). The members of the Solaris Xen project are Tim Marsland, John Levon, Mark Johnson, Stu Maybee, Joe Bonasera, Ryan Scott, Dave Edmondson and others. Todd Clayton is leading the 64-bit solaris Xen project. In order to boot the Solaris Xen guest many changes were done; can see more details in http://blogs.sun.com/roller/page/JoeBonasera.

You can download the Xen Solaris sources from : http://dlc.sun.com/osol/xen/downloads/osox-src-12-02-2006.tar.gz

Frontend net virtual device sources are in uts/common/io/xennet/xennetf.c. (xennet is the net front virtual driver.).

Frontend block virtual device sources are in uts/i86xen/io/xvbd (xvbd is the block front virtual driver.).

Currently the front block device does not work. There are many things which are similiar between Xen on Solaris and Xen on Linux.

In Xen Solaris Hypercall are also made by calling int 0x82 . (see #define TRAP_INSTR int $0x82 (file /uts/i86xen/ml/hypersubr.s)

Sun also released in february 2006 the specs for the T1 prcoessor, which supports virtualization: see : http://opensparc.sunsource.net/nonav/opensparct1.html

see: http://www.prnewswire.com/cgi-bin/stories.pl?ACCT=104&STORY=/www/story/02-14-2006/0004281587&EDATE=

Also the UltraSPARC T1 Hypervisor API Specification was released: http://opensparc.sunsource.net/specs/Hypervisor_api-26-v6.pdf

T1 virtualization:

The Hyperprivileged edition of the UltraSPARC Architecture 2005 Specification describes the Nonprivileged, Privileged, and Hyperprivileged (hypervisor/virtual machine firmware) spec.

The virtual processor on Sun supports three privilege modes:

2 bits determine the privilege mode of the processor: HPSTATE.hpriv and PSTATE.priv When both are 0 ,we are in nonprivileged mode When both are 1 ,we are in privileged mode When HPSTATE.hpriv is 1 , we are in Hyperprivileged mode (regardless of the value of PSTATE.priv). PSTATE is the Processor State register. HPSTATE is the Hyperprivileged State register HPSTATE.(64 bit). Each virtual processor has only one instance of the PSTATE and HPSTATE registers. The HPSTATE is one of the HPR state registers, and it is also called HPR 0. It can be read by the RDHPR instructions, and it can be written by the WRHPR instruction.

Step by step example of creating guest OS with Virtual Machine Manager in Fedora Core 6

This secrion describes a step by step example of creating guest OS based on FC6 i386 with Virtual Machine Manager in a Fedora Core 6 machine by installing from a WEB URL:

Go to : Application->System Tools->Virtual Machine Manager Choose : Local Xen Host Press New. Enter a name for the guest. You reach now the “Locating installation media” dialog. In “install media URL” you should enter a URL of Fedora Core 6 i386 download. For example, “http://download.fedora.redhat.com/pub/fedora/linux/core/6/i386/os/” then press forward. Choose simple file, and give a path to a non existing file in some existing folder. than: File size: choose 3.5 GB for example ; if you will assign less space, you will not be able to finish the installation, assuming it is a typical , non custom, installation.

Then press forward ; accept the defaults for memory/cpu and then press forward. Than press finish. That’s it! When insalling from web like this it can take 2-4 hours, depending on your bandwidth. You will get to the text mode installation of fedora core 6, and have to enter parameters for the installation.

After the installation is finished and you want to restart the guest OS , you do it by simply: “xm create /etc/xen/NameOfGuest“, where NameOfGuest is of course the name of guest you choose in the installation.

Physical Interrupts

In Xen, only the Hypervisor has an access to the hardware so that to achieve isolation (it is dangerous to share the hardware and let other domains access directly hardware devices simultaneously).

Let’s take a little walkthrough dealing with Xen interrupts:

Handling interrupts in Xen is done by using event channels. Each domain can hold up to 1024 events. An event channel can have 2 flags associated with it : pending and mask. The mask flag can be updated only by guests. The hypervisor cannot update it. These flags are not part of the event channel structure itself. (struct evtchn is defined in xen/include/xen/sched.h ). There are 2 arrays in struct shared_info which contains these flags: evtchn_pending[] and evtchn_mask[] ; each holds 32 elements. (file xen/include/public/xen.h)

(The shared_info is a member in domain struct; it is the domain shared data area).

TBD: add info about event selectors (evtchn_pending_sel in vcpu_info).

Registration (or binding) of irqs in guest domains:

The guest OS calls init_IRQ() when it boots (start_kernel() method calls init_IRQ() ; file init/main.c).

(init_IRQ() is in file sparse/arch/xen/kernel/evtchn.c)

There can be 256 physical irqs; so there is an array called irq_desc with 256 entries. (file sparse/include/linux/irq.h)

All elements in this array are initialized in init_IRQ() so that their status is disabled (IRQ_DISABLED).

Now, when a physical driver starts it usually calls request_irq().

This method eventually calls setup_irq() (both in sparse/kernel/irq/manage.c). which calls startup_pirq().

startup_pirq() send a hypercall to the hypervisor (HYPERVISOR_event_channel_op) in order to bind the physical irq (pirq) . The hypercall is of type EVTCHNOP_bind_pirq. See: startup_pirq() (file sparse/arch/xen/kernel/evtchn.c)

On the Hypervisor side, handling this hypervisor call is done in: evtchn_bind_pirq() method (file /common/event_channel.c) which calls pirq_guest_bind() (file arch/x86/irq.c). The pirq_guest_bind() changes the status of the corresponding irq_desc array element to be enabled (~IRQ_DISABLED). it also calls startup() method.

Now when an interrupts arrives from the controller (the APIC), we arrive at do_IRQ() method as is also in usual linux kernel (also in arch/x86/irq.c). The Hypervisor handles only timer and serial interrupts. Other interrupts are passed to the domains by calling _do_IRQ_guest() (In fact, the IRQ_GUEST flag is set for all interrupts except for timer and serial interrupts). _do_IRQ_guest() send the interrupt by calling send_guest_pirq() to all guests who are registered on this IRQ. The send_guest_pirq() creates an event channel (an instance of evtchn) and sets the pending flag of this event channel. (by calling evtchn_set_pending()) Then, asynchronously, Xen will notify this domain regarding this interrupt (unless it is masked).

TBD: shared interrupts; avoiding problems with shared interrupts when using PCI express.

Backend Drivers:

The Backend Drivers are started from domain 0. We will deal mainly with the network and block drivers. The network backend drivers reside under sparse/drives/xen/netback, and the block backend drivers reside under sparse/drives/xen/blkback.

There are many things in common between the netback and blkback device drivers. There are some differences, though. The blkback device drivers runs a kernel daemon thread (named :xenblkd) whereas the netback device driver does not run any kernel thread.

The netback and blkback register themselves with XenBus by calling xenbus_register_backend().

This method simply calls xenbus_register_driver_common(); both are in sparse/drivers/xen/xenbus/xenbus_probe.c.

(The xenbus_register_driver() method calls the generic kernel method for registering drivers, driver_register()).

Both netback (network backend driver) and blkback (block backend driver) has a module named xenbus.c. There are drivers which are not splitted to backend/frontend drivers;for example, the balloon driver.The balloon driver calls register_xenstore_notifier() in its initialization (balloon_init() method). The register_xenstore_notifier() uses a generic linux callback mechanism for passing status changes (notifier_block in include/linux/notifier.h).

The USB driver also has a backend and frontend drivers; currently it has no support to the xenbus/xenstore API so it does not have a module named xenbus.c but it will probably be adjusted in the future. As of writing of this document, the USB backend/frontend code was removed temporarily from the sparse tree.

Each of the backend drivers registers two watches: one for the backend and one for the frontend. The registration of the watches is done in the probe method:

* In netback it is in netback_probe() method (file netback/xenbus.c).

* In blkback it is in blkback_probe() method (file blkback/xenbus.c).

A registration of a watch is done by calling the xenbus_watch_path2() method. This method is implemented in sparse/drivers/xen/xenbus/xenbus_client.c. Evntually the watch registration is done by calling register_xenbus_watch(), which is implemented in sparse/drivers/xen/xenbus/xenbus_xs.c.

In both cases, netback and blkback, the callback for the backend watch is called backend_changed, and the callback for the forntend watch is called frontend_changed.

xenbus_watch is a simple struct consisting of 3 elements:

A reference to a list of watches (list_head)

A pointer to a node (char*)

A callback function pointer.

The xenbus.c in both netback and blkback defines a struct called backend_info; These structs have much in common: there are minor differences between them. One difference is that in the netback the communications channel is an instance of netif_t whereas in the blkback the communications channel is an instance of blkif_t; In the case of blkback, it includes also the major/minor numbers of the device and the mode (whereas these members don’t exist in the backend_info struct of the netback).

In the case of netback, there is also a XenbusState member. The state machine for XenBus includes seven states: Unknown, Initialising, InitWait (early initialisation was finished, and xenbus is waiting for information from the peer or hotplug scripts), Initialised (waiting for a connection from the peer), Connected, Closing (due to an error or an unplug event) and Closed.

One of the members of this struct (backend_info) is an instance of xenbus_device.(xenbus_device is declared in sparse/include/asm-xen/xenbus.h). The nodename looks like a directory path, for example, dev->nodename in the blkback case may look like:


and dev->nodename in the netback may look like:


We create an event channel for communication between the two domains by calling a bind_interdomain Hypervisor call. (HYPERVISOR_event_channel_op).

For the networking,this is done in netif_map() in netback/interface.c. For the block device, this is done in blkif_map() in blkback/interface.c.

We use the grant tables to create shared memory between frontend and backend domain. In the case of network drivers,this is done by calling: gnttab_grant_foreign_transfer_ref(). (called in: network_alloc_rx_buffers(), file netfront.c)

gnttab_grant_foreign_transfer_ref() sets a bit named GTF_accept_transfer in the grant_entry.

In the case of block drivers,this is done by calling: gnttab_grant_foreign_access_ref() in blkif_queue_request() (file blkfront.c)

gnttab_grant_foreign_access_ref() sets a bit named GTF_permit_access in the grant entry. grant entry (grant_entry_t) represents a page frame which is shared between domains.

Diagram: Virtual Split Devices


Migration and Live Migration:

Xend must be configured so that migration (which is also termed relocation) will be enabled. In /etc/xen/xend-config.sxp, there is the definition of the relocation-port, so the following line should be uncommented:

(xend-relocation-port 8002)

The line “(xend-address localhost)” prevents remote connections on the localhost,so this line must be commented.

Notice: if this line is commented in the side to which you want to migrate your domain, you will most likely get the following error after issuing the migrate command:

"Error: can't connect: Connection refused"

This error can be traced to domain_migrate() method in /tools/python/xen/xend/XendDomain.py which start a TCP connection on the relocation port (which is by default 8002)

def domain_migrate(self, domid, dst, live=False, resource=0):
        """Start domain migration."""
        dominfo = self.domain_lookup(domid)
        port = xroot.get_xend_relocation_port()
            sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
            sock.connect((dst, port))
        except socket.error, err:
            raise XendError("can't connect: %s" % err[1])

See more details on the relocation protocol (implemented in relocate.py) below.

The line “(xend-relocation-server yes)” should be uncommented so the migration server will be running.

When we issue a migration command like “xm migrate #numOfDomain ipAddress” or , if we want to use live-migration , we add the –live flag thus: “xm migrate #numOfDomain –live ipAddress”, we call server.xend_domain_migrate() after validating that the arguments are valid. (file /tools/python/xen/xm/migrate.py)

We enter the domain_migrate() method of XendDomain, which first performs a domain lookup and then creates a TCP connection for the migration with the target machine; then it sends a “receive” packet (a packets which contains the string “receive”) on port 8002. The other sides gets this message in in the dataReceived() method (see of web/connection.py) and delegates it to dataReceived() of RelocationProtocol (file /tools/python/xen/xend/server/relocate.py). Eventually it calls the save() method of XendChekpoint to start the migration process.

We end up with a call to op_receive() which ultimately sends back a “ready receive” message (also on port 8002). See op_receive() method in relocation.py.

The save() method of XendChekpoint opens a thread which calls the xc_save executable (which is in /usr/lib/xen/bin/xc_save). (file /tools/python/xen/xend/XendCheckpoint.py).

The xc_save executable is build from tools/xcutils/xc_save.c.

The xc_save executable calls xc_linux_save() in tools/libxc/xc_linux_save, which in fact performs most of the migration process. (see xc_linux_save() in /tools/libxc/xc_linux_save.c)

The xc_linux_save() returns 0 on success, and 1 in case of failure.

Live migration is currently supported for 32-bit and 64-bit architectures. TBD: find out if there is support for live-migration for pae architectures. Jacob Gorm Hansen is doing an interesting work with migration in Xen; see http://www.diku.dk/~jacobg/self-migration.

He wrote code which performs Xen “self-migration” , which is a migration which is done without the hypervisor involvement. The migration is done by opening a User Space socket and reading from a special file (/dev/checkpoint).

His code works with linux-2.6 bases xen-unstable.

You can get it by: hg clone http://www.distlab.dk/hg/index.cgi/xen-gfx.hg

His work was inspired by his own ‘NomadBIOS’ for L4 microkernel, which also uses this approach of self-migration.

It might be that this new alternative to managed migration will be adopted in future versions of Xen (time will tell).

Migration of operating systems has much in common with migration of single processes. see Master’s thesis: “Nomadic Operating Systems” Jacob G. Hansen , Asger kahl Henriksen,2002 http://nomadbios.sunsite.dk/NomadicOS.ps

You can find there a discussion about migration of single processes in Mosix (Barak and La’adan).

Creating of a domain – behind the scenes:

We create a domain by:

xm create configFile -c

A simple config file may look like (using ttylinux,as in the user manual):

kernel = "/boot/vmlinuz-2.6.12-xenU"
memory = 64
name = "ttylinux"
nics = 1
ip = ""
disk = ['file:/home/work/downloads/tmp/ttylinux-xen,hda3,w']
root = "/dev/hda3 ro"

The create() method of XendDomainInfo handles creation of domains. When a domain is created it is assigned a unique id (uuid ) which is cretaed using uuidgen command line utility of e2fsprogs.(file python/xen/xend/uuid.py).

If the memory paramter specified a too high memory which the hypervisor cannot allocate, we end up with the following message: “Error: Error creating domain: The privileged domain did not balloon!”

The devices of a domain are created using the createDevice() method which delegates the call to the createDevice() method of the Device Controller (see XendDomainInfo.py) The createDevice() in turn calls writeDetails() method (also in DevController). This writeDetails() method write the details in XenStore to trigger the creation of the device. The getDeviceDetails() is an abstract method which each subclass of DevController implements. Writing to the store is done by calling Write() method of xstransact. (file tools/pyhton/xen/xend/xenstore/xstransact.py) which returns the id of the newly created device.

By using transaction you can batch together some actions to perform against the xenstored (the common use is some read actions). You can create a domain also without Xend and without Python bindings; Jacob Gorm Hansen had demonstrated it in 2 little programs (businit.c and buscrate.c) (see http://lists.xensource.com/archives/html/xen-devel/2005-10/msg00432.html).

However, true to now these programs should be adjusted beacuse there were some API changes, especially that creation of interdomain event channel is done now with sending ioctl to event_channel (IOCTL_EVTCHN_BIND_INTERDOMAIN).

HyperCalls Mapping to code Xen 3.0.2

Mapping of HyperCalls to code :

Follwoing is the location of all hypercalls: The HyperCalls appear according to their order in xen.h.

The hypercall table itself is in xen/arch/x86/x86_32/entry.S (ENTRY(hypercall_table)).

HYPERVISOR_set_trap_table => do_set_trap_table() (file xen/arch/x86/traps.c)

HYPERVISOR_mmu_update => do_mmu_update() (file xen/arch/x86/mm.c)

HYPERVISOR_set_gdt => do_set_gdt() (file xen/arch/x86/mm.c)

HYPERVISOR_stack_switch => do_stack_switch() (file xen/arch/x86/x86_32/mm.c)

HYPERVISOR_set_callbacks => do_set_callbacks() (file xen/arch/x86/x86_32/traps.c)

HYPERVISOR_fpu_taskswitch => do_fpu_taskswitch(int set) (file xen/arch/x86/traps.c)

HYPERVISOR_sched_op_compat => do_sched_op_compat() (file xen/common/schedule.c)

HYPERVISOR_dom0_op => do_dom0_op() (file xen/common/dom0_ops.c)

HYPERVISOR_set_debugreg => do_set_debugreg() (file xen/arch/x86/traps.c)

HYPERVISOR_get_debugreg => do_get_debugreg() (file xen/arch/x86/traps.c)

HYPERVISOR_update_descriptor => do_update_descriptor() (file xen/arch/x86/mm.c)

HYPERVISOR_memory_op => do_memory_op() (file xen/common/memory.c)

HYPERVISOR_multicall => do_multicall() (file xen/common/multicall.c)

HYPERVISOR_update_va_mapping => do_update_va_mapping() (file /xen/arch/x86/mm.c)

HYPERVISOR_set_timer_op => do_set_timer_op() (file xen/common/schedule.c)

HYPERVISOR_event_channel_op => do_event_channel_op() (file xen/common/event_channel.c)

HYPERVISOR_xen_version => do_xen_version() (file xen/common/kernel.c)

HYPERVISOR_console_io => do_console_io() (file xen/drivers/char/console.c)

HYPERVISOR_physdev_op => do_physdev_op() (file xen/arch/x86/physdev.c)

HYPERVISOR_grant_table_op => do_grant_table_op() (file xen/common/grant_table.c)

HYPERVISOR_vm_assist => do_vm_assist() (file xen/common/kernel.c)

HYPERVISOR_update_va_mapping_otherdomain =>

  • do_update_va_mapping_otherdomain() (file xen/arch/x86/mm.c)

HYPERVISOR_iret => do_iret() (file xen/arch/x86/x86_32/traps.c) /* x86/32 only */

HYPERVISOR_vcpu_op => do_vcpu_op() (file xen/common/domain.c)

HYPERVISOR_set_segment_base => do_set_segment_base (file xen/arch/x86/x86_64/mm.c) /* x86/64 only */

HYPERVISOR_mmuext_op => do_mmuext_op() (file xen/arch/x86/mm.c)

HYPERVISOR_acm_op => do_acm_op() (file xen/common/acm_ops.c)

HYPERVISOR_nmi_op => do_nmi_op() (file xen/common/kernel.c)

HYPERVISOR_sched_op => do_sched_op() (file xen/common/schedule.c)

(Note: sometimes hypercalls are also called hcalls.)

Virtualization and the Linux Kernel

Virtualization in computer context can be thought of as extending the abilities of a computer beyond what a straight, non-virtual implelmentation allows.

In this category we can include also virtual memory, which allows a process to access 4GB virtual address space even though the physical RAM is usually much lower.

We can also think of the Linux IP Virtual Server (which is now a part of the linux kernel) as a kind of virtualization. By using the Linux IP Virtual Server you can configure a router to redirect service requests from a virtual server address to other machines (called real servers).

The IP Virtual Server is part of the kernel starting 2.6.10 (In the 2.4.* kernels it is also available as a patch; the code for 2.6.10 and above kernels is under net/ipv4/ipvs under the kernel tree ;there is still no implementation for ipv6).

The Linux Virtual Server (LVS) was started quite a time ago,in 1998; see http://www.linuxvirtualserver.org.

The idea of virtualization in the sense of enabling of running more than one operating system on a single platform is not new and was researched for many years. However, it seems that the Xen project is the first which produces performance benchmark metrics of such a feature which make this idea more practical and more attractive.

Origins of the Xen project: The Xen project is based on the Xenoservers project; It was originally built as part of the XenoServer project, see http://www.cl.cam.ac.uk/Research/SRG/netos/xeno.

Also the arsenic project has some ideas which were used in Xen. (see http://www.cl.cam.ac.uk/Research/SRG/netos/arsenic)

In the arsenic project, written by Ian Pratt and Keir Fraser, a big part of the Linux kernel TCP/IP stack was ported to user space. The arsenic project is based on Linux 2.3.29. After a short look at the Arsenic porject code you can find some data structures which can remind of parallel data structures in Xen, like the event rings. (for exmaple,the ring_control_block struct in arsenic-1.0/acenic-fw-12.3.10/nic/common/nic_api.h)

Meiosys is a French Company which was purchased by IBM. It deals with another different type of virtualization – Application Virtualization.

see http://www.virtual-strategy.com/article/articleview/680/1/2/ and http://www.infoworld.com/article/05/06/23/HNibmbuy_1.html

In context of the Meiosys project, it is worth to mention that a patch was sent recently to the Linux Kernel Mailing List from Serge E. Hallyn (IBM): see http://lwn.net/Articles/160015

This patch deals with process IDs. (the pid should stay the same after starting anew the application in Meiosys).

Another article on PID virtualization can be found in “PID virtualization: a wealth of choices” http://lwn.net/Articles/171017/?format=printable This article deals with PID virtualization in a context of a diffenet project (openVZ).

There is also the colinux open source project (see:http://colinux.sourceforge.net for more details) and the openvz project, which is based on Virtuozzo™. (Virtuozzo is a commercial solution).

The openvz offers server virtualization, linux-based solution: see http://openvz.org.

There are other projects which probably ispire virtualization; to name of few:

Denali Project uses (uses paravirtualization). http://denali.cs.washington.edu

A paper: Denali: Lightweight Virtual Machines for Distributed and Networked Applications By Andrew Whitaker et al. http://denali.cs.washington.edu/pubs/distpubs/papers/denali_usenix2002.pdf

Nemesis Operating System. http://www.cl.cam.ac.uk/Research/SRG/netos/old-projects/nemesis/index.html

Exokernel: see “Application Performance and Flexibility on Exokernel Systems” by M. Frans Kaashoek et al http://www.cl.cam.ac.uk/~smh22/docs/kaashoek97application.ps.gz

TBD: more details.


Another interesting virtulaization technique is Pre-Virtualization; in this method, we rewite sensitive instructions using the assembler files (whether generated by compiler, as is the usual case, or assembler files created manually). There is a problem in this method because there are instuctions which are sensitive only when they are performed in a certain context. A solution for this is to generate profiling data of a guest OS and then recompile the OS using the profiling data.



and an article: Pre-Virtualization: Slashing the Cost of Virtualization Joshua LeVasseur, Volkmar Uhlig, Matthew Chapman et al. http://l4ka.org/publications/2005/previrtualization-techreport.pdf

This technique is based on a paper by Hideki Eiraku and Yasushi Shinjo, “Running BSD Kernels as User Processes by Partial Emulation and Rewriting of Machine Instructions” http://www.usenix.org/publications/library/proceedings/bsdcon03/tech/eiraku/eiraku_html/index.html

Xen Storage

You can use iscsi for Xen Storage. The xen-tools package of OpenSuse has an example of using iscsi, called xmexample.iscsi. The disk entry for iscsi in the configuration file may look like: disk = [ ‘iscsi:iqn.2006-09.de.suse@0ac47ee2-216e-452a-a341-a12624cd0225,hda,w’ ]

TBD: more on iSCSI in Xen.

Solutions for using CoW in Xen: blktap (part of the xen project).

UnionFS: a stackable filesystem (used also in Knoppix Live-CD and other Live-CDs)

dm-userspace (A tool which uses device-mapper and a daemon called cowd; written by Dan Smith) You may download dm-userspace by:

To build as a module out-of-tree, copy dm-userspace.h to: /lib/modules/uname -r/build/include/linux and then run “make”.

Home of dm-userspace:

Copy-on-write NFS server: see http://www.russross.com/CoWNFS.html

kvm – Kernel-based Virtualization Driver

Kvm is as an open source virtualization project , written by Avi Kivity and Yaniv Kamay from qumranet. See : http://kvm.sourceforge.net.

It is included in the linux kerel tree since 2.6.20-rc1; see: http://lkml.org/lkml/2006/12/13/361 (“kvm driver for all those crazy virtualization people to play with”)

Currently it deals with Intel processors with the virtual extension (VT-X). and AMD SVM processors. You can know if your processor has these extensions by issuing from the command line: “egrep ‘^flags.*(vmx|svm)’ /proc/cpuinfo”

kvm.ko is a kernel module which handles userspace requests through ioctls. It works with a character device (/dev/kvm). The userspace part is built from patched quemu. One of KVM advantages is that it uses linux kernel mechanisms as they are without change (such as the linux scheduler). The Xen project, for example, made many changes to parts of the kernel to enable para-virtualization. Another advantage is the simplicty of the project: there is a kernel part and a userspace part. An advantage of KVM is that future versions of linux kernel will not entail changes in the kvm module code (and of course not in the user space part). The project currently support SMP hosts and will support SMP guests in the future.

Currently there is no support to live migration in KVM (but there is support for ordinary migration, when the migrated OS is stopped and than transfrerred to the target and than resumed).

In intel vt-x , VM-Exits are handled by the kvm module by kvm_handle_exit() method in kvm_main.c according to the reason which caused them (and which is specified and read from the VMCS). in AMD SVM , exit are handled by handle_exit() in svm.c.

There is an interesting usage of memory slots . There is already an rpm for openSUSE by Gerd Hoffman.

Tip: How to build Xen with your own tar ball

If you want to run “make world” without downloading the kernel (beacuse that you want to to your own tar ball which is a bit different from the original one because you made few changes inside the kernel), then do the following:

1) Let’s say that the kernel tar ball is named: my_linux-2.6.18.tar.bz2.

  • First, move my_linux-2.6.18.tar.bz2 to the folder from where you build Xen

2) Run from bash: XEN_LINUX_SOURCE=tarball make world

That’s it; it will use the my_linux-2.6.18.tar.bz2. tar ball that you copied to that folder.

Xen in the Linux Kernel

According to the following thread from xen-devel: http://lists.xensource.com/archives/html/xen-devel/2005-10/msg00436.html, there is a mercurial repository in which xen is a subarch of i386 and x86_64 of the linux kernel, and there is an intention to send releavant stuff to Andrew/Linus for the upcoming 2.6.15 kernel. In 22/3/2006 , a patchest of 35 parts was sent to the Linux Kernel Mailing List (lkml) for Xen i386 paravirtualization support in the linux kernel: see http://www.ussg.iu.edu/hypermail/linux/kernel/0603.2/2313.html

VMI : Virtual Machine Interface

On 13/3/06 , a patchset titled “VMI i386 Linux virtualization interface proposal” was sent to the LKML (Linux Kernel Mailing List) by Zachary Amsden and othes. (see http://lkml.org/lkml/2006/3/13/140) It suggests for a common interfcace which abstracts the specifics of each hypervisor and thus can be used by many hypervisors. According to the vmi_spec.txt of this patchset, when an OS is ported to a paravirtulizable x86 processor, it should access the hypervisor through the VMI layer.

The VMI layer interface:

The VMI is divided to the following 10 types of calls:


PROCESSOR STATE CALLS (like VMI_DisableInterrupts, VMI_EnableInterrupts,VMI_GetInterruptMask)







TIMER CALLS (VMI_GetWallclockTime)

MMU CALLS (like VMI_SetLinearMapping)


1) Xen Project HomePage http://www.cl.cam.ac.uk/Research/SRG/netos/xen

2) Xen Mailing Lists Pge: http://lists.xensource.com

(don’t forget to read the XenUsersNetiquette before posting on the lists)

3) Atricle : Analysis of the Intel Pentium’s Ability to Support a Secure Virtual Machine Monitor http://www.usenix.org/publications/library/proceedings/sec2000/full_papers/robin/robin_html/index.html

4) Xen Summits: 2005: http://xen.xensource.com/xensummit/xensummit_2005.html 2006 fall: http://xen.xensource.com/xensummit/xensummit_fall_2006.html 2006 winter: http://xen.xensource.com/xensummit/xensummit_winter_2006.html 2007 spring: http://xen.xensource.com/xensummit/xensummit_spring_2007.html

5) Intel Virtualizatiuon technology: http://www.intel.com/technology/virtualization/index.htm

6) Article by Ryan Maueron Linux Journal in 2 parts:

6-1) Xen Virtualization and Linux Clustering, Part 1 http://www.linuxjournal.com/article/8812

6-2) Xen Virtualization and Linux Clustering, Part 2 http://www.linuxjournal.com/article/8816

Commercial Companies: 7)XenSource:


8) Enomalism: http://www.enomalism.com/

9) Thoughtcrime is a brand new company specialising in opensource virtualisation solutions see http://debian.thoughtcrime.co.nz/ubuntu/README.txt


10)IA64 Master Thesis HPC Virtualization with Xen on Itanium by Havard K. F. Bjerke http://openlab-mu-internal.web.cern.ch/openlab%2Dmu%2Dinternal/Documents/2_Technical_Documents/Master_thesis/Thesis_HarvardBjerke.pdf

11) vBlades: Optimized Paravirtualization for the Itanium Processor Family Daniel J. Magenheimer and Thomas W. Christian http://www.usenix.org/publications/library/proceedings/vm04/tech/full_papers/magenheimer/magenheimer_html/index.html

12) Now and Xen Feature Story Article Written by Andrew Warfield and Keir Fraser http://www.linux-mag.com/2004-10/xen_01.html

13) Self Migration: http://www.diku.dk/~jacobg/self-migration

14) online magazine: http://www.virtual-strategy.com

15) Denali Project http://denali.cs.washington.edu

general links about virtualization:

16) http://www.virtualization.info

17) http://www.kernelthread.com/publications/virtualization

18) A Survey on Virtualization Technologies Susanta Nanda Tzi-cker Chiueh http://www.ecsl.cs.sunysb.edu/tr/TR179.pdf


19) AMD I/O virtualization technology (IOMMU) specification Rev 1.00 – February 03, 2006 http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/34434.pdf

20) AMD64 Architecture Programmer’s Manual: Vol 2 System Programming : Revision 3.11 added chapter 15 on virtualization (“Secure Virtual Machine”).(december 2005) http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf

21) AMD64 Architecture Programmer’s Manual: Vol 3: General-Purpose and System Instructions Revision 3.11 added SVM instructions (december 2005) http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24594.pdf

22) AMD virtualization on the Xen Summit: http://www.xensource.com/files/xs0106_amd_virtualization.pdf

23) AMD Press Release: SUNNYVALE, CALIF. — May 23, 2006: availability of AMD processors with virtualization extensions: http://www.amd.com/us-en/Corporate/VirtualPressRoom/0,,51_104_543~108605,00.html

Open Solaris:

24) Open Solaris Xen Forum: http://www.opensolaris.org/jive/category.jspa?categoryID=32

25) Update to opensolaris Xen: adding OpenSolaris-based dom0 capabilities, as well as 32-bit and 64-bit MP guest. 14/07/2006 http://www.opensolaris.org/os/community/xen/announcements/?monthYear=July+2006

26) Open Sparc Hypervisor Spec: http://opensparc.sunsource.net/nonav/Hypervisor_api-26-v6.pdf

27) Open Sparc T1 page: http://opensparc.sunsource.net/nonav/opensparct1.html extension to the Solaris Zones: http://opensolaris.org/os/community/brandz

28) OSDL Development Wiki Homepage (Virtualization) http://www.osdl.org/cgi-bin/osdl_development_wiki.pl?Virtualization

29) fedora xen mailing list archive: https://www.redhat.com/archives/fedora-xen

30) Xen Quick Start for FC4 (Fedora Core 4). http://www.fedoraproject.org/wiki/FedoraXenQuickstart?highlight=%28xen%29

31) Xen Quick Start for FC5 (Fedora Core 5): http://www.fedoraproject.org/wiki/FedoraXenQuickstartFC5

32) Xen Quick Start for FC6 (Fedora Core 6): http://fedoraproject.org/wiki/FedoraXenQuickstartFC6

Fedora 7 quick start: http://fedoraproject.org/wiki/Docs/Fedora7VirtQuickStart Fedora 8 quick start: http://fedoraproject.org/wiki/Docs/Fedora8VirtQuickStart

33) The Xen repository is handled by the mercurial version system. mercurial download: http://www.selenic.com/mercurial/release

34) Measuring CPU Overhead for I/O Processing in the Xen Virtual Machine Monitor Cherkasova Ludmila and Gardner, Rob http://www.hpl.hp.com/techreports/2005/HPL-2005-62.html

35) XenMon: QoS Monitoring and Performance Profiling Tool Gupta Diwaker and Gardner Rob; Cherkasova, Ludmila http://www.hpl.hp.com/techreports/2005/HPL-2005-187.html

36) Potemkin VMM: A virtual machine based on xen-unstable ; used in a honeypot By Michael Vrable, Justin Ma, Jay Chen, David Moore, Erik Vandekieft, Alex C. Snoeren, Geoffrey M. Voelker, and Stefan Savage. http://www.cs.ucsd.edu/~savage/papers/Sosp05.pdf

37) Memory Resource Management in VMware ESX Server http://www.usenix.org/events/osdi02/tech/waldspurger.html


38) Virtualization: From the Desktop to the Enterprise By Erick M. Halter , Chris Wolf Published: May 2005 http://www.apress.com/book/bookDisplay.html?bID=449

39) Virtualization with VMware ESX Server Publisher: Syngress; 2005 by Al Muller, Seburn Wilson, Don Happe, Gary J. Humphrey http://www.syngress.com/catalog/?pid=3315

40) VMware ESX Server: Advanced Technical Design Guide by Ron Oglesby, Scott Herold http://www.amazon.com/exec/obidos/ASIN/0971151067/virtualizatio-20/002-5634453-4543251?creative=327641&camp=14573&link_code=as1

41) PPC: Hollis Blanchard, IBM Linux Technology Center Jimi Xenidis, IBM Research http://wiki.xensource.com/xenwiki/Xen/PPC

42) http://about-virtualization.com/mambo

43) PID virtualization: a wealth of choices http://lwn.net/Articles/171017/?format=printable

44) The Xen Hypervisor and its IO Subsystem: http://www.mulix.org/lectures/xen-iommu/xen-io.pdf

45) G. J. Popek and R. P. Goldberg, Formal requirements for virtualizable third generation architectures, Commun. ACM, vol. 17, no. 7, pp. 412 421, 1974. http://www.cis.upenn.edu/~cis700-6/04f/papers/popek-goldberg-requirements.pdf

46) “Running multiple operating systems concurrently on an IA32 PC using virtualization techniques” by Kevin Lawton (1999). http://www.floobydust.com/virtualization/lawton_1999.txt

47) Automating Xen Virtual Machine Deployment (talks about integrating SystemImager with Xen and more) by Kris Buytaert http://howto.x-tend.be/AutomatingVirtualMachineDeployment/

48)Virtualizing servers with Xen Evaldo Gardenali VI International Conference of Unix at UNINET http://umeet.uninet.edu/umeet2005/talks/evaldoa/xen.pdf

49)Survey of System Virtualization Techniques Robert Rose March 8, 2004 http://www.robertwrose.com/vita/rose-virtualization.pdf



51)Interview on Xen with NetBSD develope Manuel Bouyer http://ezine.daemonnews.org/200602/xen.html

52) netbsd xen mailing list: http://www.netbsd.org/MailingLists/#port-xen

53) NetBSD/xen Howto http://www.netbsd.org/Ports/xen/howto.html

54) “C” API for Xen (LGPL) By Daniel Veillard and others


55) Fraser Campbell page: http://goxen.org/

56) Another page from Fraser Campbell : http://linuxvirtualization.com/

57) Virtualization blog

58)Hardware emulation with QEMU (article)

59) http://linuxemu.retrofaction.com

60) Linux Virtualization with Xen http://www.linuxdevcenter.com/pub/a/linux/2006/01/26/xen.html

61) The virtues of Xen by Alex Maier http://www.redhat.com/magazine/014dec05/features/xen/

62) Deploying VirtualMachines as Sandboxes for the Grid Sriya Santhanam, Pradheep Elango, Andrea Arpaci Dusseau, Miron Livny http://www.cs.wisc.edu/~pradheep/SandboxingWorlds05.pdf

63) article: Xen and the new processors:


64) Infiniband (Smart IO) wiki page http://wiki.xensource.com/xenwiki/XenSmartIO hg repository: http://xenbits.xensource.com/ext/xen-smartio.hg

65) Novell Infiniband and virtualization, Patrick Mullaney , may 1, 2007: http://www.openfabrics.org/archives/spring2007sonoma/Monday%20April%2030/Novell%20xen-ib-presentation-sonoma.ppt

66) A Case for High Performance Computing with Virtual Machines Wei Huangy, Jiuxing Liuz, Bulent Abaliz et al. http://nowlab.cse.ohio-state.edu/publications/conf-papers/2006/huangwei-ics06.pdf

67) High Performance VMM-Bypass I/O in Virtual Machines Wei Huangy, Jiuxing Liuz, Bulent Abaliz et al. (usenix 06) http://nowlab.cse.ohio-state.edu/publications/conf-papers/2006/usenix06.pdf

68) User Mode Linux , a book By Jeff Dike. Bruce Perens’ Open Source Series. Published: Apr 12, 2006; http://www.phptr.com/title/0131865056

69) Xen 3.0.3 features,schedule : http://lists.xensource.com/archives/html/xen-devel/2006-06/msg00390.html

70) Practical Taint-Based Protection using Demand Emulation Alex Ho, Michael Fetterman, Christopher Clark et al. http://www.cs.kuleuven.ac.be/conference/EuroSys2006/papers/p29-ho.pdf

71) http://stateless.geek.nz/2006/09/11/current-virtualisation-hardware/ Current Virtualisation Hardware by Nicholas Lee

72) RAID: Installing Xen with RAID 1 on a Fedora Core 4 x86 64 SMP machine: http://www.freax.be/wiki/index.php/Installing_Xen_with_RAID_1_on_a_Fedora_Core_4_x86_64_SMP_machine

73) RAID 1 and Xen (dom0) : (On Debian) http://wiki.kartbuilding.net/index.php/RAID_1_and_Xen_(dom0)

74) OpenVZ Virtualization Software Available for Power Processors http://lwn.net/Articles/204275/

75) Kernel-based Virtual Machine patchset (Avi Kivity) adding /dev/kvm which exposes the virtualization capabilities to userspace. http://lwn.net/Articles/205580

76) Intel Technology Journal : Intel Virtulaization Technology: articles by Intel Staff (96 pages) http://download.intel.com/technology/itj/2006/v10i3/v10_iss03.pdf

77) kvm site: (Avi Kivity and others) Includes a howto and a white paper, download , faq sections. http://kvm.sourceforge.net/

78) kvm on debian: http://people.debian.org/~baruch/kvm http://baruch.ev-en.org/blog/Debian/kvm-in-debian

79) Linux Virtualization Wiki

80) ” New virtualisation system beats Xen to Linux kernel” (about kvm) http://www.techworld.com/opsys/news/index.cfm?newsID=7586&pagtype=all

81) article about kvm: http://linux.inet.hr/finally-user-friendly-virtualization-for-linux.html

82) Virtual Linux : An overview of virtualization methods, architectures, and implementations An article by M. Tim Jones (auhor of “GNU/Linux Application Programming”,”AI Application Programming”, and “BSD Sockets Programming from a Multilanguage Perspective”. http://www-128.ibm.com/developerworks/library/l-linuxvirt/index.html

83) Lguest: The Simple x86 Hypervisor by Rusty Russel (formerly lhype) http://lguest.ozlabs.org/

84) “Infrastructure virtualisation with Xen advisory” – a wiki atricle : using iscsi for Xen-clustering ; shared storage http://docs.solstice.nl/index.php/Infrastructure_virtualisation_with_Xen_advisory

85) Xen with DRBD, GNBD and OCFS2 HOWTO http://xenamo.sourceforge.net/

86) Virtualization with Xen(tm): Including Xenenterprise, Xenserver, and Xenexpress (Paperback) by David E Williams Syngress Publishing (April 1, 2007) # ISBN-10: 1597491675 # ISBN-13: 978-1597491679 (Author)http://www.amazon.com/Virtualization-Xen-Including-Xenenterprise-Xenexpress/dp/1597491675/ref=pd_bbs_sr_2/103-3574046-3264617?ie=UTF8&s=books&qid=1173899913&sr=8-2

Paperback: 512 pages

87) Professional XEN Virtualization (Paperback) by William von Hagen (Author)

# Paperback: 500 pages # Publisher: Wrox (August 27, 2007) # Language: English # ISBN-10: 0470138114 # ISBN-13: 978-0470138113

88) Xen and the Art of Consolidation Tom Eastep Linuxfest NW. April 29, 2007. http://www.shorewall.net/Linuxfest-2007.pdf

89) Optimizing Network Virtualization in Xen

By Willy Zwaenepoel, Alan L. Cox, Aravind Menon, usenix 2006 http://www.usenix.org/events/usenix06/tech/menon/menon_html/paper.html

90) Virtual Machine Checkpointing

Brendan Cully,University of British Columbia with Andrew Warfield, University of Cambridge


Adding new device and triggering the probe() functions

The following is a simple example which shows how to add a new device and trigger the probe() function of a backend driver using xenstore-write tool. This is relevant for Xen 3.1

Currently in Xen, triggering of the probe() method in a backend driver or a frontend driver is done by writing some values to the xenstore into directories where the xenbus poses watches. This writing to the xenstore is currently done in Xen from the python code, and it is wrapped deep inside the xend and/or xm commands. Eventually it is done in the writeDetails method of the DevController class. (And both blkif and netif use it).

For those who want who want to be able to trigger the probe() function without diving too deeply into the python code, this should suffice.

For the purposes of this little tutorial, let’s assume that you have built and installed Xen 3.1 from source and have used it to fire up a guest domain at least once. After you’ve done that, let’s say we want to add new device. We will add a device named “mydevice”. Let’s begin with the backend. For this purpose, we will add a directory named “deviceback” to linux-2.6-sparse/drivers/xen. This directory will store the backend portion of our driver.

First, create linux-2.6-sparse/drivers/xen/deviceback. Next, add the following three files to that directory: deviceback.c, xenbus.c, common.h and Makefile.

Here is a minimal skeleton implementation of these files:


#include <linux/module.h>
#include "common.h"
static int __init deviceback_init(void)
static void deviceback_cleanup()


#include <xen/xenbus.h>
#include <linux/module.h>
#include <linux/slab.h>
struct backendinfo
        struct xenbus_device* dev;
        long int frontend_id;
        struct xenbus_watch backend_watch;
        struct xenbus_watch watch;
        char* frontpath;
static int device_probe(struct xenbus_device* dev,
                        const struct xenbus_device_id* id)
        struct backendinfo* be;
        char* frontend;
        int err;
        be = kmalloc(sizeof(*be),GFP_KERNEL);
        be->dev = dev;
        printk("Probe fired!\n");
        return 0;
static int device_uevent(struct xenbus_device* xdev,
                          char** envp, int num_envp,
                          char* buffer, int buffer_size)
        return 0;
static int device_remove(struct xenbus_device* dev)
        return 0;
static struct xenbus_device_id device_ids[] =
        { "mydevice" },
        { "" }
static struct xenbus_driver deviceback =
        .name    = "mydevice",
        .owner   = THIS_MODULE,
        .ids     = device_ids,
        .probe   = device_probe,
        .remove  = device_remove,
        .uevent  = device_uevent,
void device_xenbus_init()


#ifndef COMMON_H
#define COMMON_H
void device_xenbus_init(void);


obj-y += xenbus.o deviceback.o

Next, we should add our new backend device to the Makefile in linux-2.6-sparse/drivers/xen/Makefile. Add the following line to the bottom of that file:

obj-y += deviceback/

This will make sure that it will be included in the build.

Next, we need to add symlinks from linux-2.6-sparse/drivers/xen/deviceback into linux-2.6.18-xen/drivers/xen/deviceback:

  1. Create linux-2.6.18-xen/drivers/xen/deviceback
  2. Change into that directory
  3. Add the symlinks: ‘ln -s ../../../../linux-2.6-xen-sparse/./drivers/xen/deviceback/./* .’

Now we should build the new drivers and reboot with the new Xen image. You can do this by going back to the root directory of the source tree (the place where you typed ‘make world’ when doing your normal build before) and do ‘make install-kernels’. (Note: This will overwrite the previous Xen kernel!) Finally, reboot.

After the machine boots back up, go ahead and start a guest domain and you should notice that device_probe() does not get executed. (Check /var/log/syslog on dom0 to look for the printk() to show up.)

How can we trigger the probe() function of our backend driver? We just need to write the correct key/value pairs into the xenstore.

The call to xenbus_register_backend() in xenbus.c causes xenbus to set a watch on local/domain/0/backend/mydevice in the xenstore. Specifically, anytime anything is written into that location in the store the watch fires and checks for a specific set of key/value pairs that indicate the probe should be fired.

So performing the following 4 calls using xenstore-write will trigger our probe() function. Change the X with the ID of a running guest domain. (Check ‘xm list’ for this. If you’ve only started one guest, this number is probably 1.)

xenstore-write /local/domain/X/device/mydevice/0/state 1
xenstore-write /local/domain/0/backend/mydevice/X/0/frontend-id X
xenstore-write /local/domain/0/backend/mydevice/X/0/frontend /local/domain/X/device/mydevice/0
xenstore-write /local/domain/0/backend/mydevice/X/0/state 1

You should see the probe message appear the Dom0’s /var/log/syslog. What happened here behind the scenes ,without going too deep, is that the xenbus_register_backend() put a watch on the xenback directory of /local/domain/0 in the xenstore. Once frontend, frontend-id, and state are all written to the watched location, the xenbus driver will gather all of that information, as well as the state of the frontend driver (written in that first line) and use it to setup the appropriate data structures. From there, the probe() function is finally fired.

Adding a frontend device

For this purpose,we will add a directory named “devicefront” to linux-2.6-sparse/drivers/xen.

We will create 2 files there: devicefront.c and Makefile.

We will also add directories and symlinks as we did in the deviceback case.


obj-y := devicefront.o


The devicefront.c will be (a minimalist implementation):

// devicefront.c
#include <xen/xenbus.h>
#include <linux/module.h>
#include <linux/list.h>
struct device_info
        struct list_head list;
        struct xenbus_device* xbdev;
static int devicefront_probe(struct xenbus_device* dev,
                             const struct xenbus_device_id* id)
        printk("Frontend Probe Fired!\n");
        return 0;
static struct xenbus_device_id devicefront_ids[] =
static struct xenbus_driver devicefront =
        .name  = "mydevice",
        .owner = THIS_MODULE,
        .ids   = devicefront_ids,
        .probe = devicefront_probe,
static int devicefront_init(void)

We should also remember to add the following to the Makefile under linux-2.6-sparse/drivers/xen:

obj-y   += devicefront/

Getting the frontend driver to fire is a bit more complicated, the following bash script should help you:

if [ $# != 2 ]
        echo "Usage: $0 <device name> <frontend-id>"
        # Write backend information into the location the frontend will look
        # for it.
        xenstore-write /local/domain/${2}/device/${1}/0/backend-id 0
        xenstore-write /local/domain/${2}/device/${1}/0/backend \
        # Write frontend information into the location the backend will look
        # for it.
        xenstore-write /local/domain/0/backend/${1}/${2}/0/frontend-id ${2}
        xenstore-write /local/domain/0/backend/${1}/${2}/0/frontend \
        # Set the permissions on the backend so that the frontend can
        # actually read it.
        xenstore-chmod /local/domain/0/backend/${1}/${2}/0 r
        # Write the states.  Note that the backend state must be written
        # last because it requires a valid frontend state to already be
        # written.
        xenstore-write /local/domain/${2}/device/${1}/0/state 1
        xenstore-write /local/domain/0/backend/${1}/${2}/0/state 1

Here’s how to use it:

  • Startup a Xen guest that contains your frontend driver, and be sure dom0 contains the backend driver.
  • Figure out the frontend-id for the guest. This is the ID field when running xm list. Let’s say that number is 3.
  • Run the script as so: ./probetest.sh mydevice 3

That should fire both the frontend driver. (You’ll have to check /var/log/messages in the guest to verify that the probe was fired.)


SimonKagstrom: Maybe it would be a good idea to split this document into several pages? It’s starting to be fairly long :) TimPost

TimPost: ACK, even if printed, this is hard to digest as an intro



Read Full Post | Make a Comment ( 1 so far )

Memory Leak Detection in C++

Posted on April 10, 2009. Filed under: C/C++, Programming |


Memory Leak Detection in C++ under linux
dmalloc, ccmalloc, NJAMD, YAMD, Valgrind, mpatrol, Insure ++


C/C++ Memory Corruption And Memory Leaks
This tutorial will discuss examples of memory leaks and code constructs which lead to memory corruption


Memory Leak Detection and Isolation in Windows

Read Full Post | Make a Comment ( None so far )

Some multithreading tutorial URLs

Posted on June 6, 2007. Filed under: C/C++, Linux, Programming, Windows | Tags: , , , , , |


POSIX Threads Tutorial


POSIX thread (pthread) libraries



Multithreading Tutorial(Mutex, deadlock, livelock, scoped lock, Read/Write lock, Monitor Object, Active Object, boost) — Good!


Multithreading Tutorial (mainly on Win32)


C++/CLI Threading

Part I:


Part II:


Threads in C++


Read Full Post | Make a Comment ( None so far )

Unix Daemon Server Programming

Posted on November 2, 2006. Filed under: C/C++, Linux, Services |


Unix processes works either in foreground or background. A process running in foreground interacts with the user in front of the terminal (makes I/O), whereas a background process runs by itself. The user can check its status but he doesn’t (need to) know what it is doing. The term ‘daemon’ is used for processes that performs service in background. A server is a process that begins execution at startup (not neccessarily), runs forever, usually do not die or get restarted, operates in background, waits for requests to arrive and respond to them and frequently spawn other processes to handle these requests.

Readers are suppossed to know Unix fundamentals and C language. For further description on any topic use “man” command (I write useful keywords in brackets), it has always been very useful, trust me :)) Keep in mind that this document does not contain everything, it is just a guide.

1) Daemonizing (programming to operate in background) [fork]

First the fork() system call will be used to create a copy of our process(child), then let parent exit. Orphaned child will become a child of init process (this is the initial system process, in other words the parent of all processes). As a result our process will be completely detached from its parent and start operating in background.

	if (i<0) exit(1); /* fork error */
	if (i>0) exit(0); /* parent exits */
	/* child (daemon) continues */

2) Process Independency [setsid]

A process receives signals from the terminal that it is connected to, and each process inherits its parent’s controlling tty. A server should not receive signals from the process that started it, so it must detach itself from its controlling tty.

In Unix systems, processes operates within a process group, so that all processes within a group is treated as a single entity. Process group or session is also inherited. A server should operate independently from other processes.

	setsid() /* obtain a new process group */

This call will place the server in a new process group and session and detach its controlling terminal. (setpgrp() is an alternative for this)

3) Inherited Descriptors and Standart I/0 Descriptors [gettablesize,fork,open,close,dup,stdio.h]

Open descriptors are inherited to child process, this may cause the use of resources unneccessarily. Unneccesarry descriptors should be closed before fork() system call (so that they are not inherited) or close all open descriptors as soon as the child process starts running.

	for (i=getdtablesize();i>=0;--i) close(i); /* close all descriptors */

There are three standart I/O descriptors: standart input ‘stdin’ (0), standart output ‘stdout’ (1), standart error ‘stderr’ (2). A standard library routine may read or write to standart I/O and it may occur to a terminal or file. For safety, these descriptors should be opened and connectedthem to a harmless I/O device (such as /dev/null).

	i=open("/dev/null",O_RDWR); /* open stdin */
	dup(i); /* stdout */
	dup(i); /* stderr */

As Unix assigns descriptors sequentially, fopen call will open stdin and dup calls will provide a copy for stdout and stderr.

4) File Creation Mask [umask]

Most servers runs as super-user, for security reasons they should protect files that they create. Setting user mask will pre vent unsecure file priviliges that may occur on file creation.


This will restrict file creation mode to 750 (complement of 027).

5) Running Directory [chdir]

A server should run in a known directory. There are many advantages, in fact the opposite has many disadvantages: suppose that our server is started in a user’s home directory, it will not be able to find some input and output files.


The root “/” directory may not be appropriate for every server, it should be choosen carefully depending on the type of the server.

6) Mutual Exclusion and Running a Single Copy [open,lockf,getpid]

Most services require running only one copy of a server at a time. File locking method is a good solution for mutual exclusion. The first instance of the server locks the file so that other instances understand that an instance is already running. If server terminates lock will be automatically released so that a new instance can run. Recording the pid of the running instance is a good idea. It will surely be efficient to make ‘cat mydaamon.lock’ instead of ‘ps -ef|grep mydaemon’

	if (lfp<0) exit(1); /* can not open */
	if (lockf(lfp,F_TLOCK,0)<0) exit(0); /* can not lock */
	/* only first instance continues */

	write(lfp,str,strlen(str)); /* record pid to lockfile */

7) Catching Signals [signal,sys/signal.h]

A process may receive signal from a user or a process, its best to catch those signals and behave accordingly. Child processes send SIGCHLD signal when they terminate, server process must either ignore or handle these signals. Some servers also use hang-up signal to restart the server and it is a good idea to rehash with a signal. Note that ‘kill’ command sends SIGTERM (15) by default and SIGKILL (9) signal can not be caught.

	signal(SIG_IGN,SIGCHLD); /* child terminate signal */

The above code ignores the child terminate signal (on BSD systems parents should wait for their child, so this signal should be caught to avoid zombie processes), and the one below demonstrates how to catch the signals.

	void Signal_Handler(sig) /* signal handler function */
	int sig;
			case SIGHUP:
				/* rehash the server */
			case SIGTERM:
				/* finalize the server */

	signal(SIGHUP,Signal_Handler); /* hangup signal */
	signal(SIGTERM,Signal_Handler); /* software termination signal from kill */

First we construct a signal handling function and then tie up signals to that function.

8 ) Logging [syslogd,syslog.conf,openlog,syslog,closelog]

A running server creates messages, naturally some are important and should be logged. A programmer wants to see debug messages or a system operator wants to see error messages. There are several ways to handle those messages.

Redirecting all output to standard I/O : This is what ancient servers do, they use stdout and stderr so that messages are written to console, terminal, file or printed on paper. I/O is redirected when starting the server. (to change destination, server must be restarted) In fact this kind of a server is a program running in foreground (not a daemon).

	# mydaemon 2> error.log

This example is a program that prints output (stdout) messages to console and error (stderr) messages to a file named “error.log”. Note that this is not a daemon but a normal program.

Log file method : All messages are logged to files (to different files as needed). There is a sample logging function below.

	void log_message(filename,message)
	char *filename;
	char *message;
	FILE *logfile;
		if(!logfile) return;

	log_message("conn.log","connection accepted");
	log_message("error.log","can not open file");

Log server method : A more flexible logging technique is using log servers. Unix distributions have system log daemon named “syslogd”. This daemon groups messages into classes (known as facility) and these classes can be redirected to different places. Syslog uses a configuration file (/etc/syslog.conf) that those redirection rules reside in.

	syslog(LOG_INFO, "Connection from host %d", callinghostname);
	syslog(LOG_ALERT, "Database Error !");

In openlog call “mydaemon” is a string that identifies our daemon, LOG_PID makes syslogd log the process id with each message and LOG_DAEMON is the message class. When calling syslog call first parameter is the priority and the rest works like printf/sprintf. There are several message classes (or facility names), log options and priority levels. Here are some examples :



This text is written by Levent Karakas <levent at mektup dot at >. Several books, sources and manual pages are used. This text includes a sample daemon program (compiles on Linux 2.4.2, OpenBSD 2.7, SunOS 5.8, SCO-Unix 3.2 and probably on your flavor of Unix). You can also download plain source file : exampled.c. Hope you find this document useful. We do love Unix.

UNIX Daemon Server Programming Sample Program
Levent Karakas <levent at mektup dot at> May 2001

To compile:	cc -o exampled examped.c
To run:		./exampled
To test daemon:	ps -ef|grep exampled (or ps -aux on BSD systems)
To test log:	tail -f /tmp/exampled.log
To test signal:	kill -HUP `cat /tmp/exampled.lock`
To terminate:	kill `cat /tmp/exampled.lock`

#include <stdio.h>
#include <fcntl.h>
#include <signal.h>
#include <unistd.h>

#define RUNNING_DIR	"/tmp"
#define LOCK_FILE	"exampled.lock"
#define LOG_FILE	"exampled.log"

void log_message(filename,message)
char *filename;
char *message;
FILE *logfile;
	if(!logfile) return;

void signal_handler(sig)
int sig;
	switch(sig) {
	case SIGHUP:
		log_message(LOG_FILE,"hangup signal catched");
	case SIGTERM:
		log_message(LOG_FILE,"terminate signal catched");

void daemonize()
int i,lfp;
char str[10];
	if(getppid()==1) return; /* already a daemon */
	if (i<0) exit(1); /* fork error */
	if (i>0) exit(0); /* parent exits */
	/* child (daemon) continues */
	setsid(); /* obtain a new process group */
	for (i=getdtablesize();i>=0;--i) close(i); /* close all descriptors */
	i=open("/dev/null",O_RDWR); dup(i); dup(i); /* handle standart I/O */
	umask(027); /* set newly created file permissions */
	chdir(RUNNING_DIR); /* change running directory */
	if (lfp<0) exit(1); /* can not open */
	if (lockf(lfp,F_TLOCK,0)<0) exit(0); /* can not lock */
	/* first instance continues */
	write(lfp,str,strlen(str)); /* record pid to lockfile */
	signal(SIGCHLD,SIG_IGN); /* ignore child */
	signal(SIGTSTP,SIG_IGN); /* ignore tty signals */
	signal(SIGHUP,signal_handler); /* catch hangup signal */
	signal(SIGTERM,signal_handler); /* catch kill signal */

	while(1) sleep(1); /* run */

/* EOF */
1. Compile
cc -o exampled examped.c
2. run
3. testing daemon process
ps -ef|grep exampled (or ps -aux on BSD systems)
tail -f /tmp/exampled.log
5. testing signal
kill -HUP `cat /tmp/exampled.lock`
6. Kill
kill `cat /tmp/exampled.lock`
Read Full Post | Make a Comment ( None so far )

Slicing in C++

Posted on August 8, 2005. Filed under: C/C++ | Tags: , |

Suppose that class D is derived from class C. We can think of D as class C with some extra data and methods. In terms of data, D has all the data that C has, and possible more. In terms of methods, D cannot hide any methods of C, and may have additional methods. In terms of existing methods of C, the only thing that D can do is to override them with its own versions.

If x is an object of class D, then we can slice x with respect to C, by throwing away all of the extensions that made x a D, and keeping only the C part. The result of the slicing is always an object of class C.

Design Principle: Slicing an object with respect to a parent class C should still produce a well-formed object of class C.

Usage Warning: Even though D is-a C, you must be careful. If you have a argument type that is a C and you supply a D it will be sliced if you are doing call by value, pointer, or reference. See the example below.

Note on virtual functions. Their signatures are used to identify which one to execute.

Watch out for the sliced = operator, it can make the lhs inconsistent. Also, the operator= is never virtual, it wouldn’t make sense. For example, suppose classes A, B are both subclasses of class C. Just because an A is a C, and a B is a C, it doesn’t mean you can assign a B object to an A object. Without run-time type information you cannot make a safe assignment.

Usage Warning: If you ever change the size of an object, it and all its base classes must be recompiled! Unless everything is polymorphic and virtual!

13.2 slice.h


#ifndef _SLICE_
#define _SLICE_
#include <iostream.h>
class C {
    explicit C(int initial = 0): id(initial), d1(100+initial) {}
    C(const C& arg): id(1000+arg.id), d1(arg.d1) {}
    virtual ~C()    {
        cout << "object " << id << " dying as a C\n";
    void operator=(const C& rhs) 
        cout << "Object " << id << " doing C =\n";
    virtual void Assign(const C& rhs) 
        cout << "Assign C\n";
        d1 = rhs.d1;
    void Ident() 
        cout << "Object " << id 
            << " thinks it is a C object, d1="
            << d1 << "\n";
    virtual void VirtIdent() 
        cout << "Object " << id << 
            " virtually thinks it is a C object, d1="
            << d1 << "\n";
    int id;
    int d1;
class D: public C {
    explicit D(int initial = 0): C(initial), d2(200+initial) {}
    D(const D& arg): C(arg.id), d2(arg.d2) {}
        cout << "object " << id << " dying as a D\n";
    void operator=(const D& rhs) 
        cout << "Object " << id << " doing D =\n";
    void Assign(const D& rhs)
        cout << "Assign in D\n";
        d2 = rhs.d2;
    // This version of Ident overrides the parent's, 
    // if we are not sliced!
    void Ident() 
        cout << "Object " << id << 
            " thinks it is a D object, d1=" 
            << d1 << " d2=" << d2 << "\n";
    void VirtIdent() 
        cout << "Object " << id << 
            " virtually thinks it is a D object, d1=" 
            << d1 << " d2=" << d2 << "\n";
    int d2;


13.3 demo.cpp


#include <iostream.h>
#include "slice.h"
void SliceIt1(C arg) {
    cout << "Inside SliceIt1: ";
void SliceIt2(C* arg) {
    cout << "Inside SliceIt2: ";
void SliceIt3(C& arg) {
    cout << "Inside SliceIt3: ";
void CopyIt1(C& x, C& y) {
    cout << "Inside CopyIt1:\n";
    x.Ident(); x.VirtIdent();
    y.Ident(); y.VirtIdent();
    x = y;
    x.Ident(); x.VirtIdent();
int main(int argc, char* argv[])
    D x(1);
    C y(2);
    D z(3);
    x.Ident(); x.VirtIdent();
    y.Ident(); y.VirtIdent();
    z.Ident(); z.VirtIdent();
    cout << "\n";
    cout << "\n";
    cout << "\n";
    cout << "\n";
    CopyIt1(x, y);  
    // state of x wrong now
    cout << "\n";
    CopyIt1(x, z);
    cout << "\n";
    return 0;


13.4 output.txt


Object 1 thinks it is a D object, d1=101 d2=201
Object 1 virtually thinks it is a D object, d1=101 d2=201
Object 2 thinks it is a C object, d1=102
Object 2 virtually thinks it is a C object, d1=102
Object 3 thinks it is a D object, d1=103 d2=203
Object 3 virtually thinks it is a D object, d1=103 d2=203
Inside SliceIt1: Object 1001 thinks it is a C object, d1=101
Object 1001 virtually thinks it is a C object, d1=101
object 1001 dying as a C
Inside SliceIt1: Object 1002 thinks it is a C object, d1=102
Object 1002 virtually thinks it is a C object, d1=102
object 1002 dying as a C
Inside SliceIt2: Object 1 thinks it is a C object, d1=101
Object 1 virtually thinks it is a D object, d1=101 d2=201
Inside SliceIt2: Object 2 thinks it is a C object, d1=102
Object 2 virtually thinks it is a C object, d1=102
Inside SliceIt3: Object 1 thinks it is a C object, d1=101
Object 1 virtually thinks it is a D object, d1=101 d2=201
Inside SliceIt3: Object 2 thinks it is a C object, d1=102
Object 2 virtually thinks it is a C object, d1=102
Inside CopyIt1:
Object 1 thinks it is a C object, d1=101
Object 1 virtually thinks it is a D object, d1=101 d2=201
Object 2 thinks it is a C object, d1=102
Object 2 virtually thinks it is a C object, d1=102
Object 1 doing C =
Assign C
Object 1 thinks it is a C object, d1=102
Object 1 virtually thinks it is a D object, d1=102 d2=201
Object 1 virtually thinks it is a D object, d1=102 d2=201
Object 1 thinks it is a D object, d1=102 d2=201
Inside CopyIt1:
Object 1 thinks it is a C object, d1=102
Object 1 virtually thinks it is a D object, d1=102 d2=201
Object 3 thinks it is a C object, d1=103
Object 3 virtually thinks it is a D object, d1=103 d2=203
Object 1 doing C =
Assign C
Object 1 thinks it is a C object, d1=103
Object 1 virtually thinks it is a D object, d1=103 d2=201
Object 3 virtually thinks it is a D object, d1=103 d2=203
Object 3 thinks it is a D object, d1=103 d2=203
object 3 dying as a D
object 3 dying as a C
object 2 dying as a C
object 1 dying as a D
object 1 dying as a C


Here is another example:

class Shape {
	Shape() { };
	virtual void draw() {};

class Circle : public Shape {
	int m_iRadius;
	Circle(int iRadius){ m_iRadius = iRadius; }
	virtual void draw() { /*do drawing*/}
	int GetRadius(){ return m_iRadius; }

void funcx(Shape shape) { /*do something*/}

Now, what if we try this:

Circle circle(2);
funcx(circle); //Pass a Circle as parameter, when function expects a Shape

Will this result in an error? No. Because Circle is derived from Shape, the compiler will generate a default assignment operator and will copy all the fields common to Circle and Shape. But the m_iRadius field will be lost and there is no way it can be used in funcx.

Another problem is the loss of type information. This may result in undesirable behavior by virtual functions. This phenomenon is called slicing. It is common in exception handling, when exceptions are caught using base class. It is used either to avoid multiple catch blocks or when the types of possible thrown exception is unknown.

class CException {};
class CMemoryException : public CException {};
class CFileException: public CException{};

try{ /*do something silly*/ }
catch(CException exception) {/*handle exception*/}

To avoid slicing, change the operations so they use pointer or reference to the base class object rather then the base class object itself. In the given sample code, funcx can be changed so that it takes a pointer/reference to Shape as a parameter rather than Shape:

void funcx(Shape *shape) { /*do something*/}

Inside the function body, this pointer can be safely type-casted to a Circle pointer to access Circle specific information:


Read Full Post | Make a Comment ( None so far )

Some useful perl + XS + C URLs

Posted on February 11, 2005. Filed under: C/C++, Linux, Programming |

The Towers of Hanoi as a Perl Extension (XS) written in C



This module tests the perl C API. Currently tests that printf works correctly


Read Full Post | Make a Comment ( None so far )

Gluing C++ And Perl Together

Posted on February 10, 2005. Filed under: C/C++, Linux, Programming |

Gluing C++ And Perl Together


August 27, 2001


Perl XS (the Perl native glue) and C++ hookup is not clearly covered in any one reference that I can find. So here’s some coverage.

1. Preparing the Installation

XS totally supports C++. XS is not the problem. Getting MakeMaker happy is the problem. And even that is just a few words. Here’s the steps to create a working C++ project from scratch:

  1. Run h2xs -A -n MyPackage
  2. Edit MyPackage/MyPackage.xs (the XS/C++ source) to add extern “C” around the headers:
    #ifdef __cplusplus
    extern "C" {
    #include "EXTERN.h"
    #include "perl.h"
    #include "XSUB.h"
    #ifdef __cplusplus
  3. Put the file perlobject.map (found in Dean Roehrich’s directory) in the MyPackage directory.
  4. Create a MyPackage/typemap file with the following text:
    MyClass *         O_OBJECT
  5. Add C++ directives to MyPackage/Makefile.PL, like so:
    use ExtUtils::MakeMaker;
    $CC = 'g++';
    # See lib/ExtUtils/MakeMaker.pm for details of how to influence
    # the contents of the Makefile that is written.
        'NAME'              => 'MyPackage',
        'VERSION_FROM'      => 'MyPackage.pm', # finds $VERSION
        'PREREQ_PM'         => {}, # e.g., Module::Name => 1.1
        ($] >= 5.005 ?    ## Add these new keywords supported since 5.005
          (ABSTRACT_FROM => 'MyPackage.pm', # retrieve abstract from module
           AUTHOR        => 'John Keiser <john@johnkeiser.com>') : ()),
        'LIBS'              => [], # e.g., '-lm'
        'DEFINE'            => '', # e.g., '-DHAVE_SOMETHING'
        'CC'                => $CC,
        'LD'                => '$(CC)',
            # Insert -I. if you add *.h files later:
        'INC'               => "", # e.g., '-I/usr/include/other'
            # Un-comment this if you add C files to link with later:
        # 'OBJECT'          => '$(O_FILES)', # link all the C files too
        'XSOPT'             => '-C++',
        'TYPEMAPS'          => ['perlobject.map' ],

That’s all you have to do, basically. You might have a problem if you use the C++ standard library “list” class, so see below if you have that.

What this procedure does: Step 1 creates the actual package. Step 2 makes the Perl headers compile and link correctly. Step 3 adds the type mapping necessary to get object pointers to map to Perl objects. Step 4 causes MakeMaker (and consequently make) to use C++ to compile and link the program.

1.1 Running the Program

To run the program, just do:

perl Makefile.PL
make test

The examples in the next section are best placed into test.pl, which will automatically run when you do make test. It sets the variables and paths and such correctly. You will have to do make install before you can properly do use MyPackage; in Perl.

2. Programming the C++/Perl Interface

Once you have the files set up (the hard part) you get to do the programming, which is pretty easy. I am not going to go over XS programming here; it’s covered acceptably (though not as thoroughly as one would like) in the references below.

The typemap you copied into the directory handles mapping between Perl scalars and MyType *. In other words, the Perl object now represents a pointer to your object, without you having to do anything special to make it happen except declare new()! (And you don’t have to do any work in new either, XS handles that.

2.1 Straight C++

First, you should know that anything you put before that MODULE=… line will be compiled as straight C++. XS will not do any special processing to it, it will go straight up to the C++ compiler the way it is.

You might put a custom class definition here (or you can use a C++ class definition already in a header file). You might put some custom functions here. You might put some global variables here. You might ignore me and skip to the next section on constructors.

2.2 Define Your Class

XS won’t define your class for you. It is expecting a class to already be there. Either include the header file with the class in it or make a class definition in the straight C++ section.

2.3 Constructors/Destructors

This seems like a scary subject until you actually just put MyClass::new() and MyClass::DESTROY() functions into the file. XS does the mapping automatically. It even supports constructor args. Yay!



#ifdef __cplusplus
extern "C" {
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
#ifdef __cplusplus

#include <iostream.h>

class MyClass {
	MyClass(char * my_favorite_argument) {
		cout << "I'm constructin' my bad self ... " << my_favorite_argument << "n";
	~MyClass() { cout << "Destruction is a way of life for me.n"; }

MODULE = MyPackage          PACKAGE = MyPackage

MyClass *
MyClass::new(char * my_favorite_argument)


Note here how MyClass and MyPackage don’t have to have the same name. You can name the Perl package anything you want. XS will map the C++ functions to Perl functions based on the names you give the functions and the most recent PACKAGE directive.


Put this into test.pl at the bottom:

my $x = new MyPackage("Hi.");

This test will output:

I'm constructin' my bad self ... Hi.
Destruction is a way of life for me.

2.4 Member Functions

If you have simple non-list, non-hash return values and arguments, you can write the code for your member function as a member of the class, and just add <ret> MyClass::myFunction(<args>) after the MODULE= line. XS will automagically hook the Perl function up to the class member function and return its return value.



#ifdef __cplusplus
extern "C" {
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
#ifdef __cplusplus

#include <iostream.h>

class MyClass {
	MyClass(char * my_favorite_argument) {
		cout << "I'm constructin' my bad self ... " << my_favorite_argument << "n";
	~MyClass() { cout << "Destruction is a way of life for me.n"; }
	int wow() { return 12 / 3; }

MODULE = MyPackage          PACKAGE = MyPackage

MyClass *
MyClass::new(char * my_favorite_argument)




Put this into test.pl at the bottom (remove the previous test):

my $x = new MyPackage("Hi.");
print $x->wow, " is the magic number.n";

This test will output:

I'm constructin' my bad self ... Hi.
4 is the magic number.
Destruction is a way of life for me.

2.5 XS-Laden Member Functions

If you have to return a list or hash or accept a list or hash argument, or do other funky Perl stuff, you should use PPCODE. See the references below to learn how to code these. The only special thing that happens when you create a C++ member function is you get a MyClass * named THIS. So you can call functions like this: THIS->memberFunc().

Note that when you create a member function with PPCODE you do not have to have a corresponding class member function (though you can if you want, it just won’t be called automatically). The PPCODE is the function.



#ifdef __cplusplus
extern "C" {
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
#ifdef __cplusplus

#include <iostream.h>

class MyClass {
	MyClass(char * my_favorite_argument) {
		cout << "I'm constructin' my bad self ... " << my_favorite_argument << "n";
	~MyClass() { cout << "Destruction is a way of life for me.n"; }
	int wow() { return 12 / 3; }

MODULE = MyPackage          PACKAGE = MyPackage

MyClass *
MyClass::new(char * my_favorite_argument)



	for(int i=1;i<=10;i++) {
		XPUSHs(sv_2mortal(newSVnv(i * THIS->wow())));


Put this into test.pl at the bottom (remove the previous test):

my $x = new MyPackage("Hi.");
print "Multiples of 4: ", join(", ", $x->wow2()), "n";

This test will output:

I'm constructin' my bad self ... Hi.
Multiples of 4: 4, 8, 12, 16, 20, 24, 28, 32, 36, 40
Destruction is a way of life for me.

2.6 Member Variables

You’d think you could access member variables from Perl like a hash, right? Not recommended. Only functions are supported, though if you wanted you could create a hash and tie it to the class. Use SWIG and shadow classing if you want to access fields in a hash.

3. C++ stdlib and list.h

Perl has a conflict with the C++ standard library: they both define the “list” structure. There are two ways of solving this, depending on whether you are actually interested in using the list structure or not. If you don’t need the list (like if just the headers need it) or you need Perl’s list structure for some reason, you should just include your header files (and/or stdlib) before the standard Perl include files. For example, if you have a header file <myfile.h> that includes <list>, do this:

#include <myfile.h>

#ifdef __cplusplus
extern "C" {
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
#ifdef __cplusplus

If you want to use the C++ stdlib list object in the Perl file itself, you need to #undef list and then include your files after the standard Perl headers (including <list> if your header files do not include list).

#ifdef __cplusplus
extern "C" {
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
#ifdef __cplusplus

#undef list
#include <myfile.h>
#include <list>

If one method doesn’t work, try the other. If you need both in one file, you may be able to use the first method, put your functions that require the C++ list last in the file, and then do #undef list and #include <list> just before you . This is untested though, and really not recommended because it’s ugly. Thanks to Tye McQueen for the info that led to this section.

4. General XS Pointers

XS is pretty cool, and actually fairly simple once you understand it. The tutorial and manpage for XS give you some decent basic information, but they won’t tell you everything you need to know to get an XS package off the ground. Here are all the pointers I found (in order of their usefulness to me in figuring stuff out):

  1. The Perl XS Tutorial man page (“man perlxstut”) is the first place you should go when you know nothing at all about XS. Not that it will give you enough information to do what you want to do, mind you, but it will get you started in the basics rather admirably.
  2. Perl Guts and Perl API man pages (“man perlguts” and “man perlapi”) are very useful in figuring out how to actually talk to Perl directly when you need something a little more than the C/C++ hookup. Like when you want to return arrays, for example. Guts is a more structured explanation, API is a more complete reference.
  3. The XS Mailing List is the place where at least a few fairly advanced developers talk about XS. Answers to newbie questions are sporadic but very useful. Strange URL: develooper.com?
  4. Dean Roehrich’s Examples are great for getting moving. CookBookA and CookBookB in this directory are what you want. CookBookB in particular has a C++ example and many structure examples.
  5. The Perl XS man page (“man perlxs”) is where you’d expect you’d want to go, right? Well, I am not sure what the heck is wrong with it (or me) but I couldn’t extract much of any useful information from it. Maybe you’ll have better luck, though.
  6. MakeMaker is the tool you use to actually compile and install your stuff. Unfortunately this documentation by itself is inadequate for figuring out how to actually perform what you want to do–this is more of a reference than an introduction. I have not found a sufficient MakeMaker introduction to date, even though it is the foundation of CPAN and almost all Perl packaging and a damn cool tool.
  7. SWIG is supposed to be a wonderful (and simpler) alternative to Perl/XS. I couldn’t get it to work, though. 😦
  8. Inline is another package that does this stuff. I have not looked into it. Time is the enemy of us all …


Copyright (C) 2001 John Keiser (john@johnkeiser.com). Redistribute freely, just keep this copyright with file.
Read Full Post | Make a Comment ( None so far )

The “Clockwise/Spiral Rule” of C declaration

Posted on January 9, 2005. Filed under: C/C++ | Tags: , , , , |

The “Clockwise/Spiral Rule”

By David Anderson

There is a technique known as the “Clockwise/Spiral Rule” which enables any C programmer to parse in their head any C declaration!

There are three simple steps to follow:

  1. Starting with the unknown element, move in a spiral/clockwise direction; when ecountering the following elements replace them with the corresponding english statements: 
    [X] or []
    => Array X size of… or Array undefined size of…
    (type1, type2)
    => function passing type1 and type2 returning…
    => pointer(s) to…


  2. Keep doing this in a spiral/clockwise direction until all tokens have been covered. 
  3. Always resolve anything in parenthesis first!

Example #1: Simple declaration

                     | +-+   |
                     | ^ |   |
                char *str[10];
                 ^   ^   |   |
                 |   +---+   |

Question we ask ourselves: What is str?

“str is an…

  • We move in a spiral clockwise direction starting with `str’ and the first character we see is a `[‘ so, that means we have an array, so…

    “str is an array 10 of…

  • Continue in a spiral clockwise direction, and the next thing we encounter is the `*’ so, that means we have pointers, so…

    “str is an array 10 of pointers to…

  • Continue in a spiral direction and we see the end of the line (the `;’), so keep going and we get to the type `char’, so…

    “str is an array 10 of pointers to char”

  • We have now “visited” every token; therefore we are done!

Example #2: Pointer to Function declaration

                     | +---+              |
                     | |+-+|              |
                     | |^ ||              |
                char *(*fp)( int, float *);
                 ^   ^ ^  ||              |
                 |   | +--+|              |
                 |   +-----+              |

Question we ask ourselves: What is fp?

“fp is a…

  • Moving in a spiral clockwise direction, the first thing we see is a `)’; therefore, fp is inside parenthesis, so we continue the spiral inside the parenthesis and the next character seen is the `*’, so…

    “fp is a pointer to…

  • We are now out of the parenthesis and continuing in a spiral clockwise direction, we see the `(‘; therefore, we have a function, so…

    “fp is a pointer to a function passing an int and a pointer to float returning…

  • Continuing in a spiral fashion, we then see the `*’ character, so…

    “fp is a pointer to a function passing an int and a pointer to float returning a pointer to…

  • Continuing in a spiral fashion we see the `;’, but we haven’t visited all tokens, so we continue and finally get to the type `char’, so…

    “fp is a pointer to a function passing an int and a pointer to float returning a pointer to a char”

Example #3: The “Ultimate”

                      |                  +---+      |
                      |  +---+           |+-+|      |
                      |  ^   |           |^ ||      |
                void (*signal(int, void (*fp)(int)))(int);
                 ^    ^      |      ^    ^  ||      |
                 |    +------+      |    +--+|      |
                 |                  +--------+      |

Question we ask ourselves: What is `signal’?

Notice that signal is inside parenthesis, so we must resolve this first!

  • Moving in a clockwise direction we see `(‘ so we have…

    “signal is a function passing an int and a…

  • Hmmm, we can use this same rule on `fp’, so… What is fp? fp is also inside parenthesis so continuing we see an `*’, so…

    fp is a pointer to…

  • Continue in a spiral clockwise direction and we get to `(‘, so…

    “fp is a pointer to a function passing int returning…”

  • Now we continue out of the function parenthesis and we see void, so…

    “fp is a pointer to a function passing int returning nothing (void)”

  • We have finished with fp so let’s catch up with `signal’, we now have…

    “signal is a function passing an int and a pointer to a function passing an int returning nothing (void) returning…

  • We are still inside parenthesis so the next character seen is a `*’, so…

    “signal is a function passing an int and a pointer to a function passing an int returning nothing (void) returning a pointer to…

  • We have now resolved the items within parenthesis, so continuing clockwise, we then see another `(‘, so…

    “signal is a function passing an int and a pointer to a function passing an int returning nothing (void) returning a pointer to a function passing an int returning…

  • Finally we continue and the only thing left is the word `void’, so the final complete definition for signal is:

    “signal is a function passing an int and a pointer to a function passing an int returning nothing (void) returning a pointer to a function passing an int returning nothing (void)”

The same rule is applied for const and volatile. For Example:

	const char *chptr;
  • Now, what is chptr??

    “chptr is a pointer to a char constant”

How about this one:

	char * const chptr;
  • Now, what is chptr??

    “chptr is a constant pointer to char”


	volatile char * const chptr;
  • Now, what is chptr??

    “chptr is a constant pointer to a char volatile.”

Practice this rule with the examples found in K&R II on page 122.



Read Full Post | Make a Comment ( None so far )

The Open Compression Toolkit for C++

Posted on November 11, 2004. Filed under: C/C++, Programming |

The Open Compression Toolkit is a set of modular C++ classes and utilities for implementing and testing compression algorithms.

  • Simple interface and skeleton code for creating new compression algorithms.
  • Complete testing framework for validating and comparing new algorithms.
  • Support for algorithms that use external dictionaries/headers.
  • Utility classes and sample code for bitio, frequency counting, etc.


Read Full Post | Make a Comment ( 1 so far )

What is the output of these simple C programs?

Posted on March 25, 2004. Filed under: C/C++ |

Do you know what is the output of these simple C programs?

Check your thought with real compiler.

int main(int argc, char **argv)
    char *ptr = “ABCD”;
    printf( “%cn”,*(ptr++) );
    return 0;


int main(int argc, char **argv)
    char *ptr = “ABCD”;
    printf( “%cn”,*(ptr++) );
    printf(“%sn”, ptr);

    return 0;


int main(int argc, char **argv)
    char *ptr = “ABCD”;
    printf( “%cn”,*ptr++ );
    printf(“%sn”, ptr);

    return 0;

Read Full Post | Make a Comment ( None so far )

Liked it here?
Why not try sites on the blogroll...