Linux Kernel & Device Driver Programming

Ch 14 - The Linux Device Model

This file uses the W3C HTML Slidy format. The "a" key toggles between one-slide-at-a-time and single-page mode, and the "c" key toggles on and off the table of contents. The ← and → keys can be used to page forward and backward. For more help on controls see the "help?" link at the bottom.

Linux Grand Unified Device Model

This slide has additional "handout" notes. You need to use the 'a' key to toggle into handout mode, in order to see them.

The /sys filesystem is large. Executing "ls -R /sys" on the CS Dept. Web server resulted in 2414 lines of output.

USB Mouse Driver Device Model Entities (LDD3)

usb device entities This has changed somewhat, as shown on the next slide.

Example: USB Mouse Driver Device Model Entities

usb device entities and this is not complete

Kobjects, Ksets, and Subsystems

struct kobject is used for:

  • reference counting
  • sysfs representation
  • "data structure glue" - representing relationships between devices
  • hotplug event handling
*

Example: struct cdev

  • Objects of type struct kobject are embedded within other types of objects, to provide capabilities
  • For example, see type struct cdev
*

This slide has additional "handout" notes. You need to use the 'a' key to toggle into handout mode, in order to see them.

Observe that container_of is not checked. It will not work correctly if kp is pointing to an object that is not contained in an object some type of object with the specified type and field name.

Kobject initialization

Reference Count Manipulation

This slide has additional "handout" notes. You need to use the 'a' key to toggle into handout mode, in order to see them.

Go back to the example above to see how this fits in.

The function cdev_alloc initializes the reference count to 1 by calling kobject_init, which calls kref_init.

Therefore, the error-recovery code inside register_chrdev must decrement the reference count before freeing the struct cdev object.

kobject_put calls kref_put and passes along the kobject_release function, which kref_put calls if the reference count goes to zero.

The kobject_release function calls the release method of the ktype of the specific kobject.

Go back and look at cdev_alloc to see where the ktype is set to ktype_cdev_dynamic.

struct kobj_type

This slide has additional "handout" notes. You need to use the 'a' key to toggle into handout mode, in order to see them.

In the case of struct cdev, only the release method is specified as

static struct kobj_type ktype_cdev_dynamic = {
        .release        = cdev_dynamic_release,
};
static void cdev_dynamic_release(struct kobject *kobj)
{
        struct cdev *p = container_of(kobj, struct cdev, kobj);
        cdev_purge(p);
        kfree(p);
};

Kobject Hierarchies

kobject hierarch diagram

This slide has additional "handout" notes. You need to use the 'a' key to toggle into handout mode, in order to see them.

It is possible (though not shown in the diagram) that the parent and kset pointers of a kobject could point to different objects.

It is also seems possible (though not shown in the diagram) that an object belongs to more than one kset, in which case the kset pointer of the kobject could only point to one of the ksets.

Ksets

struct kset {
        struct list_head list;
        spinlock_t list_lock;
        struct kobject kobj;
        struct kset_uevent_ops * uevent_ops;
};

This slide has additional "handout" notes. You need to use the 'a' key to toggle into handout mode, in order to see them.

kobject_register combines initialization and adding to a set, and kobject_unregister combines deletion from set and "put".

There have been many changes to this structure between kernels 2.6.16 and 2.6.25, and between 2.6.25 and 2.6.31. For example, there was once a type struct subsystem, and a field subsys in struct kset. Instead, there is now list.

Operations on Ksets

Operations are extensions of those on struct kobject:

Kobjects and Sysfs

Attributes

struct attribute {
         const char     *name;  /* as it appears in a sysfs directory */
         struct module  *owner; /* no longer used */
         mode_t          mode;  /* file protection bits, e.g., S_IRUGO */
};

This slide has additional "handout" notes. You need to use the 'a' key to toggle into handout mode, in order to see them.

The kobj_types for subsystem cdev, including ktype_cdev_dynamic, do not specify any default attributes or provide any sysfs_ops to show or store them.

We can (finally) find an example of default attributes via a struct driver_attribute in the declaration of driver_attribute_serio_driver_attrs. This defines "description" and "bind_mode" attributes, and an example of a show method in serio_driver_show_description.

Non-Default Attributes

This slide has additional "handout" notes. You need to use the 'a' key to toggle into handout mode, in order to see them.

For example, see fs_add_slot in linux/drivers/pci/hotplug/pci_hotplug_core.c.

Binary Attributes

struct bin_attribute {
        struct attribute        attr;
        size_t                  size;
        void                    *private;
        ssize_t (*read)(struct kobject *, struct bin_attribute *,
                        char *, loff_t, size_t);
        ssize_t (*write)(struct kobject *, struct bin_attribute *,
                         char *, loff_t, size_t);
        int (*mmap)(struct kobject *, struct bin_attribute *attr,
                    struct vm_area_struct *vma);
};
int sysfs_create_bin_file(struct kobject * kobj, struct bin_attribute * attr)
int sysfs_remove_bin_file(struct kobject * kobj, struct bin_attribute * attr)

Symbolic Links

int sysfs_create_link(struct kobject * kobj, struct kobject * target, char * name);
void sysfs_remove_link(struct kobject * kobj, char * name);

Uevent Generation

struct kset_uevent_ops {
        int (*filter)(struct kset *kset, struct kobject *kobj);
        char *(*name)(struct kset *kset, struct kobject *kobj);
        int (*uevent)(struct kset *kset, struct kobject *kobj, char **envp,
                       struct kobj_uevent_env *env);
};

This slide has additional "handout" notes. You need to use the 'a' key to toggle into handout mode, in order to see them.

It is interesting to read through the code of kobject_uevent_env to see how this all works.

We will abbreviate the treatment of the folowing topics, due to in-class time limits:

Some of these will be encountered in examples, as we walk through bits of the Linux source tree on other topics.

Bus Kobjects

struct bus_type {
        const char              *name;
        struct bus_attribute    *bus_attrs;
        struct device_attribute *dev_attrs;
        struct driver_attribute *drv_attrs;
        int (*match)(struct device *dev, struct device_driver *drv);
        int (*uevent)(struct device *dev, struct kobj_uevent_env *env);
        int (*probe)(struct device *dev);
        int (*remove)(struct device *dev);
        void (*shutdown)(struct device *dev);
        int (*suspend)(struct device *dev, pm_message_t state);
        int (*resume)(struct device *dev);
        struct dev_pm_ops *pm;
        struct bus_type_private *p;
};

See example of use in pci-driver.c

This slide has additional "handout" notes. You need to use the 'a' key to toggle into handout mode, in order to see them.

struct bus_type pci_bus_type = {
        .name           = "pci",
        .match          = pci_bus_match,
        .uevent         = pci_uevent,
        .probe          = pci_device_probe,
        .remove         = pci_device_remove,
        .shutdown       = pci_device_shutdown,
        .dev_attrs      = pci_dev_attrs,
        .bus_attrs      = pci_bus_attrs,
        .pm             = PCI_PM_OPS_PTR,
};

Device, Bus, Class, etc.

struct device {
        struct device           *parent;
        struct device_private   *p;
        struct kobject kobj;
        const char              *init_name; /* initial name of the device */
        struct device_type      *type;
        struct semaphore        sem; /* semaphore to synchronize calls to its driver */
        struct bus_type *bus; /* type of bus device is on */
        struct device_driver *driver; /* which driver has allocated this */
        void            *driver_data;   /* data private to the driver */
        void            *platform_data; /* platform specific data, device
                                           core doesn't touch it */
        struct dev_pm_info      power;
#ifdef CONFIG_NUMA
        int             numa_node;      /* NUMA node this device is close to */
#endif
        u64             *dma_mask;      /* dma mask (if dma'able device) */
        u64             coherent_dma_mask;
        struct device_dma_parameters *dma_parms;
        struct list_head        dma_pools;      /* dma pools (if dma'ble) */
        struct dma_coherent_mem *dma_mem; /* internal for coherent mem override */
        /* arch specific additions */
        struct dev_archdata     archdata;
        dev_t                   devt;   /* dev_t, creates the sysfs "dev" */
        spinlock_t              devres_lock;
        struct list_head        devres_head;
        struct klist_node       knode_class;
        struct class            *class;
        struct attribute_group  **groups;       /* optional groups */
        void    (*release)(struct device *dev)
};

struct device_driver {
        const char              *name;   
        struct bus_type         *bus;
        struct module           *owner;
        const char              *mod_name;      /* used for built-in modules */
        int (*probe) (struct device *dev);
        int (*remove) (struct device *dev);
        void (*shutdown) (struct device *dev);
        int (*suspend) (struct device *dev, pm_message_t state);
        int (*resume) (struct device *dev);
        struct attribute_group **groups;
        struct dev_pm_ops *pm;
        struct driver_private *p
};

struct pci_driver {
        struct list_head node;
        char *name;
        const struct pci_device_id *id_table;   /* must be non-NULL for probe to be called */
        int  (*probe)  (struct pci_dev *dev, const struct pci_device_id *id);   /* New device inserted */
        void (*remove) (struct pci_dev *dev);   /* Device removed (NULL if not a hot-plug capable driver) */
        int  (*suspend) (struct pci_dev *dev, pm_message_t state);      /* Device suspended */
        int  (*suspend_late) (struct pci_dev *dev, pm_message_t state);
        int  (*resume_early) (struct pci_dev *dev);
        int  (*resume) (struct pci_dev *dev);                   /* Device woken up */
        void (*shutdown) (struct pci_dev *dev);
        struct pci_error_handlers *err_handler;
        struct device_driver    driver;
        struct pci_dynids dynids;
};

struct pci_dev {
        struct list_head global_list;   /* node in list of all PCI devices */
        struct pci_bus  *bus;  /* bus this device is on */
        struct pci_bus  *subordinate /* bus this device bridges to */
        void            *sysdata;       /* hook for sys-specific extension */
        struct proc_dir_entry *procent; /* device entry in /proc/bus/pci */
        unsigned int    devfn;          /* encoded device & function index */
        unsigned short  vendor;
        unsigned short  device;
        unsigned short  subsystem_vendor;
        unsigned short  subsystem_device;
        unsigned int    class;          /* 3 bytes: (base,sub,prog-if) */
        u8              revision;       /* PCI revision, low byte of class word */
        u8              hdr_type;       /* PCI header type (`multi' flag masked out) */
        u8              pcie_type;      /* PCI-E device/port type */
        u8              rom_base_reg;   /* which config register controls the ROM */
        u8              pin;            /* which interrupt pin this device uses */
        struct pci_driver *driver; /* which driver has allocated this device */
        u64             dma_mask;       /* mask of the bits of bus address this implements */
        struct device_dma_parameters dma_parms;
        ...
};

struct sysdev_class {
        const char *name;
        struct list_head        drivers;
        /* Default operations for these types of devices */
        int     (*shutdown)(struct sys_device *);
        int     (*suspend)(struct sys_device *, pm_message_t state);
        int     (*resume)(struct sys_device *);
        struct kset kset;
};

struct sys_device {
        u32 id;
        struct sysdev_class   * cls;
        struct kobject          kobj;
};

struct bus_type_private {
        struct kset subsys;
        struct kset *drivers_kset;
        struct kset *devices_kset;
        struct klist klist_devices;
        struct klist klist_drivers;
        struct blocking_notifier_head bus_notifier;
        unsigned int drivers_autoprobe:1;
        struct bus_type *bus;
};

struct driver_private {
        struct kobject kobj;
        struct klist klist_devices;
        struct klist_node knode_bus;
        struct module_kobject *mkobj;
        struct device_driver *driver;
};

Kobject Relationships as of summer 2007

2007 kobject hierarchy diagram

Kobject Relationships as of summer 2008

2008 kobject hierarchy diagram

Kobject Relationships as of summer 2010

2010 kobject hierarchy diagram

This slide has additional "handout" notes. You need to use the 'a' key to toggle into handout mode, in order to see them.

This changed considerably from 2007 to 2008.

In the 2007 figure the black lines indicate containment of an instance of the lower structure within the upper structure. In the 2008 figure this relationsihp is shown only by nesting.

The dashed blue lines indicate fields of the source object that point to an object of the target class.

The dashed red lines in the 2007 figure indicate fields of the source object that are a kset of the target class.

The diagram is not complete. Left out are many other relationships, including fields that are sets of pointers to another class, fields that are lists of other classes of objects with pointers to another class, etc., etc.

This is an exceedingly (and, to me, probably unnecessarily) complex structure, with apparent redundancies and many apparent opportunities for inconsistencies and errors. It seems to be evolving, but it is unclear whether this evolution is converging and whether it is improving or just reflecting a change in personal styles of whoever currently has most influence over the code base. Specifically, from 2007 to 2008 we see the introduction of several new stylistic elements, including for example:

Are these trends? That is, will other structures be split into two parts, and will the remaining ksets be replaced by klists?

One should probably expect continued evolution in these details with future Linux kernel releases.

How It Plays Together: Adding a PCI Device

adding a pci device diagram

See kobject_uevent for more of the story.

This slide has additional "handout" notes. You need to use the 'a' key to toggle into handout mode, in order to see them.

The above diagram is an attempt to bring the LDD3 diagram up to date. I hope it is accurate. In any case it is likely to change again. If nothing else, this is another example of how volatile the kernel internals are.

You may want to also take a look at pci_scan_bus, which appears to be called from architecture-dependent code, and eventually calls the mechanism described above.

Hotplug

Two views:

When a PCI device is removed

/sbin/hotplug Utility

See default.hotplug and usb.agent, as well as kobject_uevent for more of the story.

/sbin/hotplug Evironment Variables

This slide has additional "handout" notes. You need to use the 'a' key to toggle into handout mode, in order to see them.

Note that the earlier Linux "hotplug" utilities have been subsumed by a more general "userspace event" (see kobject_uevent) mechanism.

Note also that both kobject_uevent is called many places other than at kobject registration, and that call_usermodehelp() is called in more placeg, including for purposes besides uevents.

Hotplug Scripts

udev

See Linux symposium paper for more details.

Loading Firmware (accessing user-space files)

Kernel Firmware Interface

#include <linux/firmware.h>
int request_firmware
   (const struct firmware **fw, char *name, struct device *device);

struct firmware {
   size_t size;
   u8 *data;
};
void release_firmware(struct firmware *fw);
int request_firmware_nowait
   (struct module *module,
    char *name,
    struct device *device,
    void *context,
    void (*cont)(const struct firmware *fw, void *(context));