sysfs, procfs, sysctl, debugfs and other similar kernel interfaces

Posted on November 20, 2013. Filed under: Kernel | Tags: |

2. Procfs, Sysfs, and Similar Kernel Interfaces

2.1 Introduction

These file systems are optional to the Linux kernel, and may not be enabled on your system. The file “/lib/modules/`uname -r`/build/.config” will tell you how your kernel is configured.

In order to exchange data between user space and kernel space the Linux kernel provides a couple of RAM based file systems. These interfaces are, themselves, based on files. Usually a file represents a single value, but it may also represent a set of values. The user space can access these values by means of the standard read(2) and write(2) functions. For most file systems the read and write function results in a callback function in the Linux kernel which has access to the corresponding value.

Despite offering similar functionality, the different RAM based file systems are all designed for separate purposes. However it is easy to use these file systems for other purposes as well. Questions such as “Which file system should be used?” or “Why is there a need for the different file systems?” often arise on the Linux kernel mailing list. The arguments are controversial and each developer seems to have a unique view.

The benefit of using the read and write function in comparison to, for example, socket based approaches, is that the user space has a lot of tools available to send data to the kernel space (e.g. cat(1), echo (1)). These programs are well known to users and they can be used in scripts.

2.2 Procfs

Description

The procfs, located in /proc, is the best known interface of this class. It was originally designed to export all kind of process information such as the current status of the process, or all open file descriptors to the user space. Despite its initial purposes, the procfs has been used for a lot of other purposes:

  • provide information about the running system such as cpu information, information about interrupts, about the available memory or the version of the kernel.
  • information about “ide devices”, “scsi devices” and “tty’s”.
  • networking information such as the arp table, network statistics or lists of used sockets

There is a special subdirectory: /proc/sys.
It allows to configure a lot of parameters of the running system. Usually each file consists of a single value. This value may correspond to:

  • a limit (e.g. maximum buffer size)
  • turn on or off a given functionality (for example routing)
  • or represent some other kernel variable

All directories and files below /proc/sys/ are not implemented with the procfs interface. Instead they use a mechanism called sysctl. See section sysctl for further details about sysctl.

Note, despite the wide use of the procfs, it is deprecated and should only be used to export information related to a process itself.

Implementation

In order to use the procfs it needs to be compiled with the Linux kernel source code. This is done by setting the parameter CONFIG_PROC_FS=y. In most standard configurations this is enabled by default

Procfs supports two different APIs for kernel modules:
The legacy procfs API: It is easy to use as long as the amount of data to be handled is small. In this context small means smaller than one page size (PAGE_SIZE), which is in i386 systems 4096 bytes.
The seq_file API: Seq_file was designed to facilitate the handling of read requests. It supports read requests for more than PAGE_SIZE bytes and it provides mechanism to traverse a list, collect the elements of the list, and send all elements to user space.

1. Legacy procfs API

procfs.c legacy procfs API

The legacy procfs API allows for the creation of files and directories. For each file you have to specify two callback functions: One which is executed when a user reads the file and the other when a user writes to the file. The use of this API is well described in the “Linux Kernel Procfs Guide” distributed with the Linux kernel source code. Therefore we give here only a very basic example: A module which creates a directory as well as a file. If your file provides more than PAGE_SIZE bytes of data it is easy to get things wrong. This is due to the API of the read function:
read(char *page, char **start, off_t off, int count, int *eof, void *data)
The first parameter of this function is a buffer with the size corresponding to one page. Hence, if there is more data, the read has to be split in multiple pieces.

2. Seq_file API

The seq_file API is concerned with read requests solely – no writes. It hides the PAGE_SIZE boundary from the developer and it provides an API to step through a series of objects, collect the data from each of them and put all those data in the file. An example module can be found at http://lwn.net/Articles/22359/.

Further Reading and Resources

  • http://lwn.net/Articles/22355/ lwn article “Driver porting: The seq_file interface”.
  • http://lwn.net/Articles/22359/ Example module that uses seq_file in relation with the lwn article.
  • http://www.linux-mag.com/id/2739?r=s Linux magazine article “Manipulating “Seq” Files” covers legacy API as well as seq_file.
  • http://kernelnewbies.org/Documents/SeqFileHowTo Documents/SeqFileHowTo seqfile how-to on kernelnewbies
  • Linux Kernel Procfs Guide available in the Linux kernel source code Documentation/DocBook/procfs-guide.tmpl. Description of the legacy procfs API
  • T H E /proc F I L E S Y S T E M Description of entries in /proc, available in the Linux kernel source code Documentation/filesystems/proc.txt

2.3 Sysfs

Description

Sysfs was designed to represent the whole device model as seen from the Linux kernel. It contains information about devices, drivers and buses and their interconnections. In order to represent the hierarchy and the interconnections sysfs is heavily structured and contains a lot of links between the individual directories. As for kernel 2.6.23 it contains the following 9 top-level directories:

  • sys/block/ all known block devices such as hda/ ram/ sda/
  • sys/bus/ all registered buses. Each directory below bus/ holds by default two subdirectories:
    • device/ for all devices attached to that bus
    • driver/ for all drivers assigned with that bus.
  • sys/class/ for each device type there is a subdirectory: for example /printer or /sound
  • sys/device/ all devices known by the kernel, organised by the bus they are connected to
  • sys/firmware/ files in this directory handle the firmware of some hardware devices
  • sys/fs/ files to control a file system, currently used by FUSE, a user space file system implementation
  • sys/kernel/ holds directories (mount points) for other filesystems such as debugfs, securityfs.
  • sys/module/ each kernel module loaded is represented with a directory.
  • sys/power/ files to handle the power state of some hardware

Implementation

In order to use sysfs it needs to be compiled with the Linux kernel source code. This is done by setting the parameter CONFIG_SYSFS=y.

The philosophy behind sysfs is to represent each value with a dedicated file. In addition each file has a maximum size of PAGE_SIZE bytes.

For a kernel module there are three possibilities to use a file below /sys:

  1. module parameter
  2. register new subsystem
  3. debugfs: debugfs, mounted in /sys/kernel/debug. More information about debugfs.

Module Parameter API

Similar to command line arguments for applications, Linux kernel modules may allow a set of parameters. These parameters can not only be specified upon module insertion but also during module run time. A module parameter can be defined with the following macro:
module_param_named(name, value, type, perm)
This macro creates a parameter called "name" which corresponds to the variable with name "value" of type "type". There are many predefined types such as byte (for a single character), int (for an integer) or charp (for a string). It is also possible to add new types. The file include/linux/stat.h provides all predefined types as well as an introduction how to define new types.

The module_param macro creates a file called /sys/modules/module_name/name with the access rights specified by perm. Depending on the specified access rights, the file – and thereby the parameter value – can be read or written. If perm is set to 0 the file is not created, and therefore the parameter cannot be accessed during run time.

The module does not receive a notification when a user reads or writes a given parameter, but the value is silently changed. Therefore it is not possible to do some additional stuff when a parameter changes its value. This may be acceptable in some circumstances as for changing a debug level, but in most circumstances the module wants to do some additional stuff such as sanity checks or manipulating a data structure.

Standard Sysfs API

The standard sysfs API uses a dedicated terminology: A file is called an attribute, the function executed upon reading an attribute is called show and the one for writing an attribute store.

Before starting with the implementation of a module which uses sysfs you have to figure out which subdirectory it belongs to. If you deal with a bus, it belongs to bus/, with a file system it belongs to fs/ or with a block device it belongs to block/. The API to use depends on the given subdirectory. We first show an example which uses the low level sysfs functions to add a new directory to fs/, and in a second example we show how to add a new entry to the bus/ directory.

1. new fs/ entry

sysfs_ex.c creates the directory /sys/fs/myfs/ along with two files first and second. Both containing one single integer value.

The first step is to declare our subsystem. This can be done with the use of the decl_subsys macro (on top of the file). This macro creates a struct kset with the name myfs_subsys.

The module_init() function performs the proper registration of our subsystem: The macro kobj_set_kset_s initializes myfs_subsys so that it will be part of the fs_subsys. The field myfs_subsys.kobj.ktype points to a structure which holds all the attributes as well as the functions to read and write the attributes. And finally a call to register_subsystem() registers our subsystem.

Files are generally represented by a struct attribute. This struct holds the name as well as the access permission for the corresponding file, but no data. Therefore you have to create your own attribute type which consists of at least the struct attribute and the value corresponding to that file.

By design all attributes share the same show and store functions. Each time one of these two functions is invoked it gets the corresponding struct attribute as an argument. Therefore in the show and store functions you can obtain the value corresponding to the file being read/written and you can manipulate it accordingly. For this purpose you need the macro container_of(ptr, type, member)ptr is a pointer to the member of the struct.type is the type of the struct this member is emeded in and member is the name of the member within the struct.

2. new bus/ entry

sysfs_ex2.c use of sysfs in combination with a bus. It provides the possibility to read and write one value with the help of “my_pseude_bus”.

First of all we define our bus my_pseudo_bus. Then we create our attribute with the help of the BUS_ATTR macro. In the init function we register our pseudo bus and we create a file (attribute). If we would like more than one attribute we would have to use BUS_ATTR several times and provide for each attribute its own store and show function. This example is similar to the debugging facility of the scsi bus, which is implemented indrivers/scsi/scsi_debug.c

Resources and Further Reading

2.4 Configfs

Description

The configfs is somewhat the counterpart of the sysfs. It can be seen as a filesystem based manager of kernel objects. An important difference between configfs and sysfs is that in configfs all objects are created from user space with a call to mkdir(2). The kernel responds with creating the attributes (files) and then they can be read and written by the user. If the user no longer needs the files, he calls rmdir(2) and everything gets deleted. Therefore the life cycle of a configfs object is fully controlled by user space.

Each time mkdir is invoked a new “config_item” is created by the kernel implementation. This config_item represents the files (attributes), the show and store callback functions as well as the associated value. Therefore each mkdir creates a new directory along with new files which represent new values.

Configfs has the same limitations than sysfs: each file should represent only one value and it should be smaller than PAGE_SIZE bytes.

Implementation

In order to use configfs it needs to be compiled with the Linux kernel source code. This is done by setting the parameter CONFIG_CONFIGFS_FS=y.

In order to access configfs it has to be mounted with the following command:
mount -t configfs none /config

The Linux kernel documentation provides a good manual for configfs along with an example module. Therefore we do not describe the configfs implementation aspects.

Resources and Further Reading

  • Linux kernel source code: Documentation/filesystems/configfs

2.5 Debugfs

Description

Debugfs is a simple to use RAM based file system especially designed for debugging purposes. Developers are encouraged to use debugfs instead of procfs in order to obtain some debugging information from their kernel code. Debugfs is quite flexible: it provides the possibility to set or get a single value with the help of just one line of code but the developer is also allowed to write its own read/write functions, and he can use the seq_file interface described in the procfs section.

Implementation

In order to use debugfs it needs to be compiled with the Linux kernel source code. This is done by setting the parameter CONFIG_DEBUG_FS=y.

Before having access to the debugfs it has to be mounted with the following command.
mount -t debugfs none /sys/kernel/debug

debugfs.c kernel module that implements the “one line” API for a variable of type u8 as well as the API with which you can specify your own read and write functions.

All the “one line” APIs start with debugfs_create_ and are listed in include/linux/debugfs.h

The API with which you can provide your own read and write functions is similar to the one of procfs. In contrast to sysfs, you may create directories and files without having to care about a given hierarchy.

Resources and Further Reading

2.6 Sysctl

Description

The sysctl infrastructure is designed to configure kernel parameters at run time. The sysctl interface is heavily used by the Linux networking subsystem. It can be used to configure some core kernel parameters; represented as files in /proc/sys/*. The values can be accessed by using cat(1)echo(1) or the sysctl(8) commands. If a value is set by the echo command it only persists as long as the kernel is running, but gets lost as soon as the machine is rebooted. In order to change the values permanently they have to be written to the file /etc/sysctl.conf. Upon restarting the machine all values specified in this file are written to the corresponding files in/proc/sys/.

Implementation

sysctl.c sysctl example module: write an integer to /proc/sys/net/test/value1 and value2 respectively

Each entry in the /proc/sys directory is represented by an entry in a table maintained by the Linux kernel, arranged in a hierarchy. A directory is represented by an entry pointing to a subtable. A file is represented by an entry of type struct ctl_table. This entry consists of the data represented by this file along with some access rules.

New files and directories can be added by expanding one of the subtables. In this example we add a new directory called test below the /proc/sys/net/ directory. Our directory has got two files: value1 and value2. Each of these files hold an integer variable which can have a value between 10 and 20. The user root is allowed to change the entries whereas normal user are allowed to read the entries.

Each file is represented with an entry in the test_table[] array:

static ctl_table test_table[] = {
        {
        .ctl_name       = CTL_UNNUMBERED,
        .procname       = "value1",
        .data           = &value1,
        .maxlen         = sizeof(int),
        .mode           = 0644,
        .proc_handler   = &proc_dointvec_minmax,
        .strategy       = &sysctl_intvec,
        .extra1         = &min,
        .extra2         = &max
        },
        ...
}

The struct ctl_table entries are:

  • .ctl_name: For new entries this has to be CTL_UNNUMBERED (according to Documentation/sysctl/ctl_unnumbered.txt).
  • .procname: The name of the file.
  • .data: A reference to the data we want to be shown in the file.
  • maxlen: The size of the data.
  • mode: Access permissions (read, write, execute for user, group, others)
  • proc_handler: The routine which handles read and write requests. There is a set of default routines declared near the end of include/linux/sysctl.h
  • strategy: Some routine that enforces additional access control. In this example it checks that the value to be written is between min and max.
static ctl_table test_net_table[] = {
        {
                .ctl_name       = CTL_UNNUMBERED,
                .procname       = "test",
                .mode           = 0555,
                .child          = test_table
        },
        { .ctl_name = 0 }
};

This table represents our test directory. The entry .child says that the elements below this directory are represented by the table test_table, discussed above.

static ctl_table test_root_table[] = {
        {
                .ctl_name       = CTL_UNNUMBERED,
                .procname       = "net",
                .mode           = 0555,
                .child          = test_net_table
        },
        { .ctl_name = 0 }
};

This table represents the directory to which we want to attach our new directory. In this example, this is the net directory. In the module_init() function we have to register this root table with a call to
register_sysctl_table(test_root_table);

Resources and Further Reading

  • Linux kernel source code: net/core/sysctl_net_core.c
  • Linux kernel Documentation: Documentation/sysctl/ctl_unnumbered.txt

2.7 Character Devices

Description

As the name suggests, this interface was designed for character device drivers, and is commonly used for communication between uer and kernel space. (For example, users with sufficient privileges my write directly to the virtual terminal 1 with echo "hi there" > /dev/tty1).

Each module can register itself as a character device and provide some read and write functions which handle the data. Files representing character devices are located within the /dev directory (where you will also find block devices, but we will not be describing them further). Usually these files correspond to a hardware device.

Implementation

cdev.c kernel module that prints its majorNumber to the system log. The minorNumber can be chosen to be 0.

As with all file system based approaches the module has to specify a read and a write callback function. Therefore, we have to register ourself with the function register_chrdev(unsigned int major, const char *name, struct file_operations *ops);major is the major number of this device. We can set it to 0 to let the kernel choose an appropriate number. name is the name of this character device, as it will be shown below the/dev directory. ops is a pointer to the read and write functions.

In contrast to most file system based approaches seen so far, the user has to create the device file explicitly with a call to:
mknod /dev/arbitrary_name c majorNumber minorNumber

Resources and Further Reading

Links:

 

Read Full Post | Make a Comment ( None so far )

Liked it here?
Why not try sites on the blogroll...