Linux Kernel & Device Driver Programming

Ch 6 - Advanced Char Driver Operations

 

Topics


ioctl


The text says POSIX specifies -NOTTTY for an invalid ioctl command, though returning -EINVAL is still common in practice. I believe the book is wrong on that. If you look at the on-line Single Unix Specification (which is a superset of the POSIX spec.) you will see that different errno values are specified for different errors, and EINVAL is specified for a request argument that is not valid for the device. I'm sorry to tsay that this has been pretty typical of the Linux development community, that they don't pay very careful attention to the Unix standards, though that may be improving.


Managing ioctl Command Numbers


The book uses the term "magic number" for the first field of the ioctl command. Don't be confused later when you see that term used for other things, such a number at the beginning of a file.


Example of ioctl Command Set

/* Use 'k' as magic number */
#define SCULL_IOC_MAGIC  'k'
/* Please use a different 8-bit number in your code */
#define SCULL_IOCRESET    _IO(SCULL_IOC_MAGIC, 0)
/*
 * S means "Set" through a ptr,
 * T means "Tell" directly with the argument value
 * G means "Get": reply by setting through a pointer
 * Q means "Query": response is on the return value
 * X means "eXchange": switch G and S atomically
 * H means "sHift": switch T and Q atomically
 */
#define SCULL_IOCSQUANTUM _IOW(SCULL_IOC_MAGIC,  1, int)
#define SCULL_IOCSQSET    _IOW(SCULL_IOC_MAGIC,  2, int)
#define SCULL_IOCTQUANTUM _IO(SCULL_IOC_MAGIC,   3)
#define SCULL_IOCTQSET    _IO(SCULL_IOC_MAGIC,   4)
#define SCULL_IOCGQUANTUM _IOR(SCULL_IOC_MAGIC,  5, int)
#define SCULL_IOCGQSET    _IOR(SCULL_IOC_MAGIC,  6, int)
#define SCULL_IOCQQUANTUM _IO(SCULL_IOC_MAGIC,   7)
#define SCULL_IOCQQSET    _IO(SCULL_IOC_MAGIC,   8)
#define SCULL_IOCXQUANTUM _IOWR(SCULL_IOC_MAGIC, 9, int)
#define SCULL_IOCXQSET    _IOWR(SCULL_IOC_MAGIC,10, int)
#define SCULL_IOCHQUANTUM _IO(SCULL_IOC_MAGIC,  11)
#define SCULL_IOCHQSET    _IO(SCULL_IOC_MAGIC,  12)
/*
 * The other entities only have "Tell" and "Query", because they're
 * not printed in the book, and there's no need to have all six.
 * (The previous stuff was only there to show different ways to do it.
 */
#define SCULL_P_IOCTSIZE _IO(SCULL_IOC_MAGIC,   13)
#define SCULL_P_IOCQSIZE _IO(SCULL_IOC_MAGIC,   14)
/* ... more to come */

#define SCULL_IOC_MAXNR 14

The above are from scull.h.

See include/asm/ioctl.h for definitions of macros _IO(), _IOC_SIZE(), etc.


Avoid Using Predefined Commands in a Driver


The function fcntl is much like ioctl in usage. Some functions overlap. Historically, ioctl was for devices and fcntl was for files, but this distinction has blurred over time. If you look at the Single Unix Spec reference above, you will see that ioctl is actually described as an operation on streams.


Checking Validity of User-Space Addresses


Example of Command and Argument Validation

int scull_ioctl(struct inode *inode, struct file *filp,
                 unsigned int cmd, unsigned long arg)
{  int err = 0, tmp;
   int retval = 0;
    /*
    * extract the type and number bitfields, and don't decode
    * wrong cmds: return ENOTTY (inappropriate ioctl) before access_ok()
    */
   if (_IOC_TYPE(cmd) != SCULL_IOC_MAGIC) return -ENOTTY;
   if (_IOC_NR(cmd) > SCULL_IOC_MAXNR) return -ENOTTY;
   /*
    * the direction is a bitmask, and VERIFY_WRITE catches R/W
    * transfers. `Type' is user-oriented, while
    * access_ok is kernel-oriented, so the concept of "read" and
    * "write" is reversed
    */
   if (_IOC_DIR(cmd) & _IOC_READ)
      err = !access_ok(VERIFY_WRITE, (void __user *)arg, _IOC_SIZE(cmd));
   else if (_IOC_DIR(cmd) & _IOC_WRITE)
      err =  !access_ok(VERIFY_READ, (void __user *)arg, _IOC_SIZE(cmd));
   if (err) return -EFAULT;
   switch(cmd) { ...

The above is from file scull/main.c.


Checking Capabilities


The above is from file scull/main.c.

You should already be familiar with the concept of "capability" in the general sense in which the term is used for operating systems. This is a special case, for the Linux kernel.

From my own reading of these capabilities, I'm skeptical about how far they go toward the general goal of capabilities as originally defined by OS security researchers. In particular, it seems to me that it would not be hard to use some of these capabilities to achieve (indirectly) the effect of others, and that several of them could pretty easily be used to escalate to a "superuser" shell, so long as the system has a superuser (one with all capabilities).


Implementation of ioctl Commands

 switch(cmd) {
#ifdef SCULL_DEBUG
   case SCULL_IOCRESET:
     scull_quantum = SCULL_QUANTUM;
     scull_qset = SCULL_QSET;
     break;
   case SCULL_IOCSQUANTUM: /* Set: arg points to the value */
     if (! capable (CAP_SYS_ADMIN))
 	return -EPERM;
     retval = __get_user(scull_quantum, (int __user *)arg);
     break;
   ...
}

What is the effect of resetting scull_quantum on an existing device? (Read the code to find out.)


Blocking I/O

For an example of the right way to block and unblock a client process, look at the implementation of the scull pipe device:


The example above uses wait_event. We will not talk much about the operations with the word "sleep" in their name, because they are very prone to misuse. However, you may see them used in older kernel code.


Warnings for Use of Sleep/Wakeup


Beware that most situations in which a process goes to sleep involve a potential race condition between the wait and the wakeup. That is why should should probably always be using the event versions.

The correct style of using the event wait and wakeup operations is somewhat like the Pthread condition variable operations, in needing a loop around the wait call to check for the wakeup condition.

Take a look at the implementation of wait_even_interruptible to see that there is a loop to check the condition, but that the check is subject to races.

That is, while the check is made there is mutual exclusion, i.e., the wait operation does not unlock and relock any lock. If the check on the wakeup condition requires mutual exclusion (as usual), you need to do the semaphore operations explicitly in an outer loop, as in the scull/pipe.c example.

If you did not see this the first time, go back and look at the loop around the use of wait_event_interruptible in the scull pipe read method.


Wait Queue Implementation


Processes are linked through temporary pointers stored near the top of the stack of a blocked process. See the definition of the macro DEFINE_WAIT, and how it is used in __wait_event_interruptible. This relies on the invariant that a process cannot be on a wait queue while it is running.


Avoiding the Thundering Herd

  1. one process per queue
  2. do-it-yourself wait queue manipulations
  3. use of add_wait_queue_exclusive

Reentrant Code

"Using local variables for large items is not good practice, because the data may not fit the single page of memory allocated for the stack space"

Blocking vs. Nonblocking Option


Asynchronous I/O: poll and select


poll Data Structures


Example

static unsigned int scull_p_poll(struct file *filp, poll_table *wait)
{
	struct scull_pipe *dev = filp->private_data;
	unsigned int mask = 0;

	/*
	 * The buffer is circular; it is considered full
	 * if "wp" is right behind "rp" and empty if the
	 * two are equal.
	 */
	down(&dev->sem);
	poll_wait(filp, &dev->inq,  wait);
	poll_wait(filp, &dev->outq, wait);
	if (dev->rp != dev->wp)
		mask |= POLLIN | POLLRDNORM;	/* readable */
	if (spacefree(dev))
		mask |= POLLOUT | POLLWRNORM;	/* writable */
	up(&dev->sem);
	return mask;
}

The above is from scull/pipe.c.

Normally, the poll method should return POLLHUP if not more data is (or may become) available. The example above is an exception to this rule.


Poll Codes that the Driver May Return


Asynchronous Notification (via signals)


The signal API is simpler, but "deprecated". The sigaction API is more general and more portable.

See also file asynctest.c.


Seeking on a Device

See example in scull pipe implementation.


Spin Locks

See examples of use in access.c:

See also the separate explanation of implementation for i386 architecture.


Implementing a Blocking open

See example code in access.c:

© 2003, 2005 T. P. Baker.($Id: ch6.html,v 1.1 2010/06/07 14:29:15 baker Exp baker $)