Linux Kernel & Device Driver Programming

Ch 2 - Building and Running Modules

 

Kernel Modules vs. User-Space Facilities

Examples of User-space solution:

Advantages to user-space solution:

Drawbacks/limitations of user-space solution:


These all boil down to primarily to performance and secondarily to security. That is, you can create kernel services to intermediate or otherwise export the required memory and device access to user space, but these mechanisms add overhead (time delay and code size). They also must be protected carefully, since the they can be abused (just as kernel modules can abuse direct access to kernel internals).

Kernel Modules

See example "Hello, World" module hello.c. Review features of this module.


Initializer function


Finalizer, a.k.a. cleanup function


Header Files, etc.


See how preprocessor symbols are defined in Makefile.

With earlier kernel versions I would have suggested reading this Makefile to see and understand the make features used, the style, and the techniques. This still may be a good idea, but it has grown so complicated that too fully understand it would take more time and effort than can be expected in this course.


Linkage


Do you understand what happens with namespace pollution?

What if two modules define and export the same symbol?


Defending Against Namespace Problems


Kernel Symbol Table


The following older-style module macros are deprecated, but you are still likely to see them used:


Kernel Faults


Kernel Space versus User Space


Beware that the terms "kernel space" and "user space" are used sometimes broadly and sometimes narrowly.

The broad use corresponds to hardware execution modes. Kernel code is executed in "kernel mode" (a.k.a. "supervisor mode" and "privileged mode", which means it has access to a larger set of hardware instructions and a larger range of addresses. called "supervisor mode" or "privileged mode") than user code.

The narrow use corresponds to virtual address spaces. The Unix/Linux convention is to divide the mapped portion of a process's virtual address space int two parts: "kernel memory" and "user memory". In kernel mode both parts are accessible, but in user mode only the user memory is accessible.

Because Unix/Linux links access to kernel memory with kernel execution mode, the terms "kernel space", "kernel mode", and "kernel memory" are interchangeable in many contexts.


Transfer Between User and Kernel Space/Mode


Are you clear on synchronous versus asynchronous transfers into the OS kernel?


Concurrency in Kernel

Are you clear on what reentrancy means?

Are you clear on the relationships and limitations of the different forms of concurrency control: nonpreemption, interrupt masking, and spin locks?


The Current Process

printk ("The process is \"%s\" (pid %i)\n", current->comm, current->pid);

Why is current a macro?

See one definition of this macro in linux/include/asm-i386/current.h.

The actual implementation has evolved, and is likely to continue to evolve. It started out as a simple global variable, but that did not work when Linux was extended to SMP systems, since each processor has its own current process. In the SMP version of Kernel 2.4 it was implemented via a hardware instruction that identified the current CPU and then used that as an index to find the right current process/task in an array. In Kernel 2.6.11 it seems the implementation has evolved further, based on a convention that the kernel stack space of each thread (the term that has replaced process and task) has a fixed (small) size. The task descriptor of each thread is laid out contiguous with the kernel stack of the thread, so it can be found by looking at the high-order bits of the stack pointer register value. At least this is what is done on the i-386 architecture. It will vary on other architectures. For example, on the SPARC there is a dedicated register for this purpose.

For information on how to read (and write) gcc inline assembly code, see the GCC Inline Assembly Howto or the gcc info pages.


Compiling a Kernel Module

In Kernel 2.4 it was fairly easy to compile kernel modules independently from the kernel source tree.

In Kernel 2.6 module compilation is done relative to a kernel source tree, and the kernel Makefile takes care of these details for you.

The misc-modules Makefile shows how you can invoke that makefile from a location outside the kernel source tree to compile code outside the source tree.

Observe that this makefile is called recursively. That is, one first calls make with this makefile; inside, there is a second call to make, using the kernel makefile; the kernel makefile then calls/includes this makefile back again to find out the set of modules to be compiled in the current directory.


If you try to compile a kernel module independently, you need to do the set-up work that is done by the kernel Makefile, including the following:

See also the file Documentation/CodingStyle for Linus' recommendations on coding style. Not everyone agrees with all of it. (I happen to agree with all but the rule about always indenting 8 spaces and the rule about putting start-function braces on a new line, and would add to it the rule to *never* use tabs.) In any case, when you are maintaining somebody else's code you need to preserve the established coding conventions. If you ever want your kernel module to be included in the baseline Linux distribution, you would be wise to follow Linus' style guideliness.


insmod


Version Dependency


The introduction of a new ".ko" format for kernel object files came after kernel 2.4. With the 2.4 kernel the convention was to use the ".o" format for kernel modules.


Versioning Macros


These are usually used with preprocessor conditionals (#ifdef) to write modules code that will compile and run with both older and newer kernel versions.


Normal Modules Installation

VERSIONFILE = $(INCLUDEDIR)/linux/version.h
VERSION = $(shell awk -F\" '/REFL/ {print $$2}' $(VERSIONFILE))
INSTALLDIR = /lib/modules/$(VERSION)/misc
...
install:
	install -d $(INSTALLDIR)
	install -c $(OBJS) $(INSTALLDIR)

The modules go into a subdirectory of /lib/modules whose name matches the kernel version.


Module Stacking


modprobe Utility


Module Initialization Error Handling with goto

int __init my_init (void) {
  int err;

  /* registration takes a pointer and a name */
  err = register_this (ptr1, "skull");
  if (err) goto fail_this;
  err = register_that (ptr2, "skull");
  if (err) goto fail_that;
  err = register_those (ptr3, "skull");
  if (err) goto fail_those;
  
  return 0; /* success */ 

fail_those: 
  unregister_that (ptr2, "skull");
fail_that:
  unregister_this (ptr1, "skull");
fail_this:
  return err; /* propagate the error */
}

"__init" tells gcc to put the code into a special section of the load module, which the kernel may unload after the code executes.

Standard error codes defined in <linux/errno.h>.


Module Cleanup Unregistration

void __exit my_cleanup (void) {
  unregister_those (ptr3, "skull");
  unregister_that (ptr2, "skull");
  unregister_this (ptr1, "skull");
}

It is customary, but not required, to unregister in reverse order of registration.

The "__exit" in the example above tells gcc to put the code into a special section of the load module, which does not need to be loaded if the module is statically linked into the kernel. In the example below there is not "__exit" because the function my_cleanup may be called during module initialization.


Error Handling with cleanup_module

struct something *item1;
struct somethingelse *item2;
int stuff_ok;

void my_cleanup (void) {
  if (item1) release_thing (item1);
  if (item2) release_thing2 (item2);
  if (stuff_ok) unregister_stuff ();
  return;
} 

int __init my_init (void) {
  int err = -ENOMEM;
  item1 = allocate_thing (arguments);
  item2 = allocate_thing2 (arguments2);
  if (!item1 || !item2) goto fail;
  err = register_stuff (item1, item2);
  if (!err) stuff_ok = 1; else goto fail;
  return 0; /* success */
fail:
  my_cleanup ();
  return err;
}

Module Loading/Unloading Races


Also, look out if you find yourself wanting to write co-dependent modules. "Therein lies madness".


Usage Count


What happens with double decrement?

Take a look at the /proc/modules file while some modules are in the kernel.


Example of /proc/modules

autofs                 13700   0 (autoclean) (unused)
3c59x                  31312   1
iptable_filter          2412   0 (autoclean) (unused)
ip_tables              15864   1 [iptable_filter]
mousedev                5688   0 (unused)
keybdev                 2976   0 (unused)
hid                    22404   0 (unused)
input                   6240   0 [mousedev keybdev hid]
usb-ohci               22088   0 (unused)
usbcore                80512   1 [hid usb-ohci]
ext3                   72960   3
jbd                    56752   3 [ext3]
raid1                  16300   3

Using Resources


I/O Ports

0000-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0070-007f : rtc
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : ide1
01f0-01f7 : ide0
02f8-02ff : serial(auto)
0376-0376 : ide1
03c0-03df : vga+
03f6-03f6 : ide0
03f8-03ff : serial(auto)
0cf8-0cff : PCI conf1
1000-103f : 3Com Corporation 3c905 100BaseTX [Boomerang]
  1000-103f : 00:0c.0
1050-1053 : Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P] System Controller
2000-2fff : PCI Bus #01
  2000-20ff : ATI Technologies Inc 3D Rage Pro AGP 1X/2X
f000-f00f : Advanced Micro Devices [AMD] AMD-766 [ViperPlus] IDE
  f000-f007 : ide0
  f008-f00f : ide1

I/O Port Registry API

int check_region(unsigned long start, unsinged long len);
struct resource *request_region(unsigned long start, unsigned long len, char *name);
void release_region(unsigned long start, unsigned long len);

This is a portable simplification. In kernel version 2.6.11 linux/ioport.h these are actually macros:

#define request_region(start,n,name)	__request_region(&ioport_resource, (start), (n), (name))
extern struct resource * __request_region(struct resource *, unsigned long start, unsigned long n, const char *name);

#define check_region(start,n)	__check_region(&ioport_resource, (start), (n))
extern int __check_region(struct resource *, unsigned long, unsigned long);

#define release_region(start,n)	__release_region(&ioport_resource, (start), (n))
extern void __release_region(struct resource *, unsigned long, unsigned long);

I/O Port Registry Usage

#include <linux/ioport.h>
#include <linux/errno.h>
static int skull_detect (unsigned int port; unsigned int range)
{
  int err;
  if ((err = check_region (port, range)) = 0) return err; /* busy */
  if (skull_probe_hw (port, range) != 0) return -ENODEV; /* not found */
  request_region (port, range, "skull"); /* "can't fail" */
  return 0;
}

static void skull_release (unsigned int port, unsigned int range)
{
  release_region (port, range);
}

I/O Memory Registry

00000000-0009f7ff : System RAM
0009f800-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000dc000-000dcfff : Advanced Micro Devices [AMD] AMD-766 [ViperPlus] USB
000dc000-000dcfff : usb-ohci
000e0000-000effff : Extension ROM
000f0000-000fffff : System ROM
00100000-3ffeffff : System RAM
00100000-0026b019 : Kernel code
0026b01a-0037b9c3 : Kernel data
3fff0000-3ffffbff : ACPI Tables
3ffffc00-3fffffff : ACPI Non-volatile Storage
f4001000-f4001fff : Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P] System Controller
f4100000-f41fffff : PCI Bus #01
f4100000-f4100fff : ATI Technologies Inc 3D Rage Pro AGP 1X/2X
f5000000-f5ffffff : PCI Bus #01
f5000000-f5ffffff : ATI Technologies Inc 3D Rage Pro AGP 1X/2X
f8000000-fbffffff : Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P] System Controller
fec00000-fec0ffff : reserved
fee00000-fee00fff : reserved
fff80000-ffffffff : reserved

I/O Memory Registry API

int check_mem_region (unsigned long start, unsigned long len);
int request_mem_region (unsigned long start, unsigend long len, char * name);
int release_mem_region (unsgined long start, unsigned long len);

I/O Memory Registry API Usage

if (check_mem_region (mem_add,mem_size)) {
   printk ("drivername: memory already in use\n"); return -EBUSY;
}
request_mem_region (mem_addr, mem_size, "drivername");

Is there a dangerous race condition here?


Resource Structure

declared in linux/ioport.h:

struct resource {
  const char *name;
  unsigned long start, end;
  unsigned long flags;
  struct resource *parent, *sibling, *child;
}

skull Autoconfiguration

/*
 * port range: the device can reside between 0x280 and 0x300, in steps of 0x10.
 * It uses 0x10 ports.
 */
#define SKULL_PORT_FLOOR 0x280
#define SKULL_PORT_CEIL 0x300
#define SKULL_PORT_RANGE 0x010

/*
 * the following function performs autodetection, unless a speciic
 * value was assigned by insmod to "skull_port_base"
 */

static int skull_port_base = 0; /* 0 forces autodetection */
MODULE_PARM (skull_port_base, "i");
MODULE_PARM_DESC (skull_port_base, "Base I/O port for skull");

static int skull_find_hw (void) /* returns the # of devices */
{
  /* base is either the load-time value of the first trial */
  int base = skull_port_base ? skull_port_base : SKULL_PORT_FLOOR;
  int result = 0;
  /* loop one time if value assigned; try them all if autodetecting */
  do {
    if (skull_detect (base, SKULL_PORT_RANGE) == 0) {
      skull_init_board (base); result++;
    }
    base += SKULL_PORT_RANGE: /* prepare for next trial */
  } while (skull_port_base == 0 && base < SKULL_PORT_CEIL);
  return result;
}

skull Example

The above concepts are illustrated in the example module skull.

© 2003, 2004, 2005 T. P. Baker. ($Id: ch2.html,v 1.1 2010/06/07 14:29:15 baker Exp baker $)