Linux Kernel & Device Driver Programming

Ch 14 - The Linux Device Model

 

Linux Grand Unified Device Model


The /sys filesystem is large. Executing "ls -R /sys" on the CS Dept. Web server resulted in 2414 lines of output.


Example: Device Model Entities for a USB Mouse Driver

usb device entities

Kobjects, Ksets, and Subsystems

struct kobject is used for:


Objects of type struct kobject are embedded within other types of objects, to provide capabilities. For example, see the declaration of type struct cdev, which contains a field kobj of this type.

If we know that kp is a pointer to a kobject that is embedded within a struct cdev object we can obtain a pointer to the containing object as follows:

struct cdev *device = container_of(kp, struct cdev, kobj);

Observe that container_of is not checked. It will not work correctly if kp is pointing to an object that is not contained in an object some type of object with the specified type and field name.


Kobject initialization


Reference Count Manipulation


Go back to the example above to see how this fits in.

The function cdev_alloc initializes the reference count to 1 by calling kobject_init, which calls kref_init.

Therefore, the error-recovery code inside register_chrdev must decrement the reference count before freeing the struct cdev object.

kobject_put calls kref_put and passes along the kobject_release function, which kref_put calls if the reference count goes to zero.

The kobject_release function calls the release method of the ktype of the specific kobject.

Go back and look at cdev_alloc to see where the ktype is set to ktype_cdev_dynamic.


struct kobj_type


In the case of struct cdev, only the release method is specified.

static struct kobj_type ktype_cdev_dynamic = {
        .release        = cdev_dynamic_release,
};

A release method must always be specified. In this case it is:

static void cdev_dynamic_release(struct kobject *kobj)
{
        struct cdev *p = container_of(kobj, struct cdev, kobj);
        cdev_purge(p);
        kfree(p);
};

Kobject Hierarchies

kobject hierarch diagram

It is possible (though not shown in the diagram) that the parent and kset pointers of a kobject could point to different objects.

It is also seems possible (though not shown in the diagram) that an object belongs to more than one kset, in which case the kset pointer of the kobject could only point to one of the ksets.


Ksets

struct kset {
        struct list_head list;
        spinlock_t list_lock;
        struct kobject kobj;
        struct kset_uevent_ops * uevent_ops;
};

kobject_register combines initialization and adding to a set, and kobject_unregister combines deletion from set and "put".

There have been many changes to this structure between kernels 2.6.16 and 2.6.25. For example, there was once a type struct subsystem, and a field subsys in struct kset. Instead, there is now list.


Operations on Ksets

Operations are extensions of those on struct kobject:



Kobjects and Sysfs


Attributes

struct attribute {
         char           *name;  /* as it appears in a sysfs directory */
         struct module  *owner; /* module that implements the attribute */
         mode_t          mode;  /* file protection bits, e.g., S_IRUGO */
};

The kobj_types for subsystem cdev, including ktype_cdev_dynamic, do not specify any default attributes or provide any sysfs_ops to show or store them.

We can (finally) find an example of default attributes via a struct driver_attribute in the declaration of driver_attribute_serio_driver_attrs. This defines "description" and "bind_mode" attributes, and an example of a show method in serio_driver_show_description.


Non-Default Attributes


For example, see fs_add_slot in linux/drivers/pci/hotplug/pci_hotplug_core.c.


Binary Attributes

struct bin_attribute {
        struct attribute        attr;
        size_t                  size;
        void                    *private;
        ssize_t (*read)(struct kobject *, char *, loff_t, size_t);
        ssize_t (*write)(struct kobject *, char *, loff_t, size_t);
        int (*mmap)(struct kobject *, struct bin_attribute *attr,
                    struct vm_area_struct *vma);
};
int sysfs_create_bin_file(struct kobject * kobj, struct bin_attribute * attr)
int sysfs_remove_bin_file(struct kobject * kobj, struct bin_attribute * attr)

Symbolic Links

int sysfs_create_link(struct kobject * kobj, struct kobject * target, char * name);
void sysfs_remove_link(struct kobject * kobj, char * name);

Hotplug Event Generation

struct kset_hotplug_ops {
        int (*filter)(struct kset *kset, struct kobject *kobj);
        char *(*name)(struct kset *kset, struct kobject *kobj);
        int (*hotplug)(struct kset *kset, struct kobject *kobj, char **envp,
                       int num_envp, char *buffer, int buffer_size);
};

It is interesting to read through the code of kobject_hotplug to see how this all works.


We will abbreviate the treatment of the folowing topics, due to in-class time limits:

Some of these will be covered via examples, as we walk through bits of the Linux source tree.


Bus Kobjects

struct bus_type {
        char                    *name;
        struct bus_attribute    *bus_attrs;
        struct device_attribute *dev_attrs;
        struct driver_attribute *drv_attrs;

        int (*match)(struct device *dev, struct device_driver *drv);
        int (*uevent)(struct device *dev, struct kobj_uevent_env *env);
        int (*probe)(struct device *dev);
        int (*remove)(struct device *dev);
        void (*shutdown)(struct device *dev);
        int (*suspend)(struct device *dev, pm_message_t state);
        int (*suspend_late)(struct device *dev, pm_message_t state);
        int (*resume_early)(struct device *dev);
        int (*resume)(struct device *dev);
        struct bus_type_private *p
};

See example of use in pci-driver.c:

struct bus_type pci_bus_type = {
        .name           = "pci",
        .uevent         = pci_uevent,
        .probe          = pci_device_probe,
        .remove         = pci_device_remove,
        .suspend        = pci_device_suspend,
        .suspend_late   = pci_device_suspend_late,
        .resume_early   = pci_device_resume_early,
        .resume         = pci_device_resume,
        .shutdown       = pci_device_shutdown,
        .dev_attrs      = pci_dev_attrs
};

Device, Bus, Class, etc.s

struct device {
        struct klist            klist_children;
        struct klist_node       knode_parent;  /* node in sibling list */
        struct klist_node       knode_driver;
        struct klist_node       knode_bus;
        struct device           *parent;
        struct kobject kobj;
        char bus_id[BUS_ID_SIZE];    /* position on parent bus */
        struct device_type      *type;
        unsigned                is_registered:1;
        unsigned                uevent_suppress:1;
        struct semaphore        sem;    /* semaphore to synchronize calls to
                                         * its driver.
                                         */
        struct bus_type *bus;           /* type of bus device is on */
        struct device_driver *driver;   /* which driver has allocated this
                                           device */
        void            *driver_data;   /* data private to the driver */
        void            *platform_data; /* Platform specific data, device
                                           core doesn't touch it */
        struct dev_pm_info      power;
#ifdef CONFIG_NUMA
        int             numa_node;      /* NUMA node this device is close to */
#endif
        u64             *dma_mask;      /* dma mask (if dma'able device) */
        u64             coherent_dma_mask;/* Like dma_mask, but for
                                             alloc_coherent mappings as
                                             not all hardware supports
                                             64 bit addresses for consistent
                                             allocations such descriptors. */
        struct device_dma_parameters *dma_parms;
        struct list_head        dma_pools;      /* dma pools (if dma'ble) */
        struct dma_coherent_mem *dma_mem; /* internal for coherent mem
                                             override */
        /* arch specific additions */
        struct dev_archdata     archdata;
        spinlock_t              devres_lock;
        struct list_head        devres_head;
        /* class_device migration path */
        struct list_head        node;
        struct class            *class;
        dev_t                   devt;   /* dev_t, creates the sysfs "dev" */
        struct attribute_group  **groups;       /* optional groups */
        void    (*release)(struct device *dev);
};

struct device_driver {
        struct bus_type         *bus;
        struct module           *owner;
        const char              *mod_name;      /* used for built-in modules */
        int (*probe) (struct device *dev);
        int (*remove) (struct device *dev);
        void (*shutdown) (struct device *dev);
        int (*suspend) (struct device *dev, pm_message_t state);
        int (*resume) (struct device *dev);
        struct attribute_group **groups;
        struct driver_private *p
};

struct pci_driver {
        struct list_head node;
        char *name;
        const struct pci_device_id *id_table;   /* must be non-NULL for probe to be called */
        int  (*probe)  (struct pci_dev *dev, const struct pci_device_id *id);   /* New device inserted */
        void (*remove) (struct pci_dev *dev);   /* Device removed (NULL if not a hot-plug capable driver) */
        int  (*suspend) (struct pci_dev *dev, pm_message_t state);      /* Device suspended */
        int  (*suspend_late) (struct pci_dev *dev, pm_message_t state);
        int  (*resume_early) (struct pci_dev *dev);
        int  (*resume) (struct pci_dev *dev);                   /* Device woken up */
        void (*shutdown) (struct pci_dev *dev);
        struct pci_error_handlers *err_handler;
        struct device_driver    driver;
        struct pci_dynids dynids;
};

struct pci_dev {
        struct list_head global_list;   /* node in list of all PCI devices */
        struct list_head bus_list;      /* node in per-bus list */
        struct pci_bus  *bus;           /* bus this device is on */
        struct pci_bus  *subordinate;   /* bus this device bridges to */

        void            *sysdata;       /* hook for sys-specific extension */
        struct proc_dir_entry *procent; /* device entry in /proc/bus/pci */

        unsigned int    devfn;          /* encoded device & function index */
        unsigned short  vendor;
        unsigned short  device;
        unsigned short  subsystem_vendor;
        unsigned short  subsystem_device;
        unsigned int    class;          /* 3 bytes: (base,sub,prog-if) */
        u8              revision;       /* PCI revision, low byte of class word */
        u8              hdr_type;       /* PCI header type (`multi' flag masked out) */
        u8              pcie_type;      /* PCI-E device/port type */
        u8              rom_base_reg;   /* which config register controls the ROM */
        u8              pin;            /* which interrupt pin this device uses */

        struct pci_driver *driver;      /* which driver has allocated this device */
        u64             dma_mask;       /* Mask of the bits of bus address this
                                           device implements.  Normally this is
                                           0xffffffff.  You only need to change
                                           this if your device has broken DMA
                                           or supports 64-bit transfers.  */

        struct device_dma_parameters dma_parms;

        pci_power_t     current_state;  /* Current operating state. In ACPI-speak,
                                           this is D0-D3, D0 being fully functional,
                                           and D3 being off. */

        pci_channel_state_t error_state;        /* current connectivity state */
        struct  device  dev;            /* Generic device interface */

        int             cfg_size;       /* Size of configuration space */

        /*
         * Instead of touching interrupt line and base address registers
         * directly, use the values stored here. They might be different!
         */
        unsigned int    irq;
        struct resource resource[DEVICE_COUNT_RESOURCE]; /* I/O and memory regions + expansion ROMs */

        /* These fields are used by common fixups */
        unsigned int    transparent:1;  /* Transparent PCI bridge */
        unsigned int    multifunction:1;/* Part of multi-function device */
        /* keep track of device state */
        unsigned int    is_busmaster:1; /* device is busmaster */
        unsigned int    no_msi:1;       /* device may not use msi */
        unsigned int    no_d1d2:1;      /* only allow d0 or d3 */
        unsigned int    block_ucfg_access:1;    /* userspace config space access is blocked */
        unsigned int    broken_parity_status:1; /* Device generates false positive parity */
        unsigned int    msi_enabled:1;
        unsigned int    msix_enabled:1;
        unsigned int    is_managed:1;
        unsigned int    is_pcie:1;
        pci_dev_flags_t dev_flags;
        atomic_t        enable_cnt;     /* pci_enable_device has been called */

        u32             saved_config_space[16]; /* config space saved at suspend time */
        struct hlist_head saved_cap_space;
        struct bin_attribute *rom_attr; /* attribute descriptor for sysfs ROM entry */
        int rom_attr_enabled;           /* has display of the rom attribute been enabled? */
        struct bin_attribute *res_attr[DEVICE_COUNT_RESOURCE]; /* sysfs file for resources */
#ifdef CONFIG_PCI_MSI
        struct list_head msi_list;
#endif
};

struct class_device {
        struct list_head        node;
        struct kobject          kobj;
        struct class            *class;
        dev_t                   devt;
        struct device           *dev;
        void                    *class_data;
        struct class_device     *parent;
        struct attribute_group  **groups;

        void (*release)(struct class_device *dev);
        int (*uevent)(struct class_device *dev, struct kobj_uevent_env *env);
        char class_id[BUS_ID_SIZE];
};

struct sysdev_class {
        const char *name;
        struct list_head        drivers;

        /* Default operations for these types of devices */
        int     (*shutdown)(struct sys_device *);
        int     (*suspend)(struct sys_device *, pm_message_t state);
        int     (*resume)(struct sys_device *);
        struct kset kset;
};

struct sys_device {
        u32 id;
        struct sysdev_class     * cls;
        struct kobject          kobj;
};

struct bus_type_private {
        struct kset *drivers_kset;
        struct kset *devices_kset;
        struct klist klist_devices;
        struct klist klist_drivers;
        struct blocking_notifier_head bus_notifier;
        unsigned int drivers_autoprobe:1;
        struct bus_type *bus;
};

struct driver_private {
        struct kobject kobj;
        struct klist klist_devices;
        struct klist_node knode_bus;
        struct module_kobject *mkobj;
        struct device_driver *driver;
};

Kobject Relationships as of summer 2007 (now obsolete)

2007 kobject hierarchy diagram

Kobject Relationships as of summer 2008

2008 kobject hierarchy diagram

This changed considerably from 2007 to 2008.

In the 2007 figure the black lines indicate containment of an instance of the lower structure within the upper structure. In the 2008 figure this relationsihp is shown only by nesting.

The dashed blue lines indicate fields of the source object that point to an object of the target class.

The dashed red lines in the 2007 figure indicate fields of the source object that are a kset of the target class.

The diagram is not complete. Left out are many other relationships, including fields that are sets of pointers to another class, fields that are lists of other classes of objects with pointers to another class, etc., etc.

This is an exceedingly (and, to me, probably unnecessarily) complex structure, with apparent redundancies and many apparent opportunities for inconsistencies and errors. It seems to be evolving, but it is unclear whether this evolution is converging and whether it is improving or just reflecting a change in personal styles of whoever currently has most influence over the code base. Specifically, from 2007 to 2008 we see the introduction of several new stylistic elements, including for example:

Are these trends? That is, will other structures be split into two parts, and will the remaining ksets be replaced by klists?

One should probably expect continued evolution in these details with future Linux kernel releases.


How It Plays Together: Adding a PCI Device

adding a pci device diagram

The following is a partial trace of what happens, with links to the code. I have not found the place where device_register is callled for ordinary PCI devices.


Take a look at pci_scan_bus, which appears to be called from architecture-dependent code.


... Still could add more material here!...


Hotplug

two views:


When a PCI device is removed


/sbin/hotplug Utility


Note that the "hotplug" utilities have been subsumed by a more general "userspace event" (see kobject_uevent) mechanism.


Hotplug Scripts


udev

See Linux symposium paper for more details.


Loading Firmware (accessing user-space files)


Kernel Firmware Interface

#include <linux/firmware.h>
int request_firmware
   (const struct firmware **fw, char *name, struct device *device);
struct firmware {
   size_t size;
   u8 *data;
};
void release_firmware(struct firmware *fw);
int request_firmware_nowait
   (struct module *module,
    char *name,
    struct device *device,
    void *context,
    void (*cont)(const struct firmware *fw, void *(context));
© 2005 T. P. Baker ($Id: ch14.html,v 1.1 2007/06/05 16:14:32 baker Exp baker $)