| Linux Kernel & Device Driver Programming |
The /sys filesystem is large. Executing "ls -R /sys" on the CS Dept. Web server resulted in 2414 lines of output.
struct kobject is used for:
Objects of type struct kobject are embedded within other types of objects, to provide capabilities. For example, see the declaration of type struct cdev, which contains a field kobj of this type.
If we know that kp is a pointer to a kobject that is embedded within a struct cdev object we can obtain a pointer to the containing object as follows:
struct cdev *device = container_of(kp, struct cdev, kobj);
Observe that container_of is not checked. It will not work correctly if kp is pointing to an object that is not contained in an object some type of object with the specified type and field name.
void kobject_init (struct kobject *kobj);
The kobj field of struct cdev is initialized in cdev_alloc:
memset(p, 0, sizeof(struct cdev)); p->kobj.ktype = &ktype_cdev_dynamic; INIT_LIST_HEAD(&p->list); kobject_init(&p->kobj);
void kobject_set_name (struct kobject *kobj, const char *format, ...);
The kobj field of struct cdev is named in register_chrdev:
...kobject_set_name(&cdev->kobj, "%s", name); for (s = strchr(kobject_name(&cdev->kobj),'/'); s; s = strchr(s, '/')) *s = '!'; err = cdev_add(cdev, MKDEV(cd->major, 0), 256); if (err) goto out: cd->cdev = cdev; return major ? 0 : cd->major; out: kobject_put(&cdev->kobj);...
struct kobject *kobject_get(struct kobject *kobj);increments reference count
void kobject_put(struct kobject *kobj);calls kref_put, which decrements reference count of this object and cleans up if the reference count goes to zero
void my_object_release(struct kobject *kobj);one of these must be defined for each type of kobject
Go back to the example above to see how this fits in.
The function cdev_alloc initializes the reference count to 1 by calling kobject_init, which calls kref_init.
Therefore, the error-recovery code inside register_chrdev must decrement the reference count before freeing the struct cdev object.
kobject_put calls kref_put and passes along the kobject_release function, which kref_put calls if the reference count goes to zero.
The kobject_release function calls the release method of the ktype of the specific kobject.
Go back and look at cdev_alloc to see where the ktype is set to ktype_cdev_dynamic.
struct kobj_type is used for:
void (*release)(struct kobject *);
struct sysfs_ops * sysfs_ops;
struct attribute ** default_attrs;
};
struct kobj_type *get_ktype(struct kobject *kobj);
In the case of struct cdev, only the release method is specified.
static struct kobj_type ktype_cdev_dynamic = {
.release = cdev_dynamic_release,
};
A release method must always be specified. In this case it is:
static void cdev_dynamic_release(struct kobject *kobj)
{
struct cdev *p = container_of(kobj, struct cdev, kobj);
cdev_purge(p);
kfree(p);
};
It is possible (though not shown in the diagram) that the parent and kset pointers of a kobject could point to different objects.
It is also seems possible (though not shown in the diagram) that an object belongs to more than one kset, in which case the kset pointer of the kobject could only point to one of the ksets.
struct kset {
struct list_head list;
spinlock_t list_lock;
struct kobject kobj;
struct kset_uevent_ops * uevent_ops;
};
int kobject_add(struct kobject *kobj);adds kobject to the set specified by its kset field
void kobject_del(struct kobject *kobj);
kobject_register combines initialization and adding to a set, and kobject_unregister combines deletion from set and "put".
There have been many changes to this structure between kernels 2.6.16 and 2.6.25. For example, there was once a type struct subsystem, and a field subsys in struct kset. Instead, there is now list.
Operations are extensions of those on struct kobject:
struct attribute {
char *name; /* as it appears in a sysfs directory */
struct module *owner; /* module that implements the attribute */
mode_t mode; /* file protection bits, e.g., S_IRUGO */
};
struct sysfs_ops {
ssize_t (*show)(struct kobject *, struct attribute *,char *);
ssize_t (*store)(struct kobject *,struct attribute *,const char *, size_t);
};
The kobj_types for subsystem cdev, including ktype_cdev_dynamic, do not specify any default attributes or provide any sysfs_ops to show or store them.
We can (finally) find an example of default attributes via a struct driver_attribute in the declaration of driver_attribute_serio_driver_attrs. This defines "description" and "bind_mode" attributes, and an example of a show method in serio_driver_show_description.
int sysfs_create_file(struct kobject *kobj, struct attribute *attr);creates a sysfs attribute entry for attr under kobj
int sysfs_remove_file(struct kobject *kobj, struct attribute *attr);removes one
For example, see fs_add_slot in linux/drivers/pci/hotplug/pci_hotplug_core.c.
struct bin_attribute {
struct attribute attr;
size_t size;
void *private;
ssize_t (*read)(struct kobject *, char *, loff_t, size_t);
ssize_t (*write)(struct kobject *, char *, loff_t, size_t);
int (*mmap)(struct kobject *, struct bin_attribute *attr,
struct vm_area_struct *vma);
};
int sysfs_create_bin_file(struct kobject * kobj, struct bin_attribute * attr)
int sysfs_remove_bin_file(struct kobject * kobj, struct bin_attribute * attr)
int sysfs_create_link(struct kobject * kobj, struct kobject * target, char * name); void sysfs_remove_link(struct kobject * kobj, char * name);
struct kset_hotplug_ops {
int (*filter)(struct kset *kset, struct kobject *kobj);
char *(*name)(struct kset *kset, struct kobject *kobj);
int (*hotplug)(struct kset *kset, struct kobject *kobj, char **envp,
int num_envp, char *buffer, int buffer_size);
};
It is interesting to read through the code of kobject_hotplug to see how this all works.
We will abbreviate the treatment of the folowing topics, due to in-class time limits:
Some of these will be covered via examples, as we walk through bits of the Linux source tree.
struct bus_type {
char *name;
struct bus_attribute *bus_attrs;
struct device_attribute *dev_attrs;
struct driver_attribute *drv_attrs;
int (*match)(struct device *dev, struct device_driver *drv);
int (*uevent)(struct device *dev, struct kobj_uevent_env *env);
int (*probe)(struct device *dev);
int (*remove)(struct device *dev);
void (*shutdown)(struct device *dev);
int (*suspend)(struct device *dev, pm_message_t state);
int (*suspend_late)(struct device *dev, pm_message_t state);
int (*resume_early)(struct device *dev);
int (*resume)(struct device *dev);
struct bus_type_private *p
};
See example of use in pci-driver.c:
struct bus_type pci_bus_type = {
.name = "pci",
.uevent = pci_uevent,
.probe = pci_device_probe,
.remove = pci_device_remove,
.suspend = pci_device_suspend,
.suspend_late = pci_device_suspend_late,
.resume_early = pci_device_resume_early,
.resume = pci_device_resume,
.shutdown = pci_device_shutdown,
.dev_attrs = pci_dev_attrs
};
struct device {
struct klist klist_children;
struct klist_node knode_parent; /* node in sibling list */
struct klist_node knode_driver;
struct klist_node knode_bus;
struct device *parent;
struct kobject kobj;
char bus_id[BUS_ID_SIZE]; /* position on parent bus */
struct device_type *type;
unsigned is_registered:1;
unsigned uevent_suppress:1;
struct semaphore sem; /* semaphore to synchronize calls to
* its driver.
*/
struct bus_type *bus; /* type of bus device is on */
struct device_driver *driver; /* which driver has allocated this
device */
void *driver_data; /* data private to the driver */
void *platform_data; /* Platform specific data, device
core doesn't touch it */
struct dev_pm_info power;
#ifdef CONFIG_NUMA
int numa_node; /* NUMA node this device is close to */
#endif
u64 *dma_mask; /* dma mask (if dma'able device) */
u64 coherent_dma_mask;/* Like dma_mask, but for
alloc_coherent mappings as
not all hardware supports
64 bit addresses for consistent
allocations such descriptors. */
struct device_dma_parameters *dma_parms;
struct list_head dma_pools; /* dma pools (if dma'ble) */
struct dma_coherent_mem *dma_mem; /* internal for coherent mem
override */
/* arch specific additions */
struct dev_archdata archdata;
spinlock_t devres_lock;
struct list_head devres_head;
/* class_device migration path */
struct list_head node;
struct class *class;
dev_t devt; /* dev_t, creates the sysfs "dev" */
struct attribute_group **groups; /* optional groups */
void (*release)(struct device *dev);
};
struct device_driver {
struct bus_type *bus;
struct module *owner;
const char *mod_name; /* used for built-in modules */
int (*probe) (struct device *dev);
int (*remove) (struct device *dev);
void (*shutdown) (struct device *dev);
int (*suspend) (struct device *dev, pm_message_t state);
int (*resume) (struct device *dev);
struct attribute_group **groups;
struct driver_private *p
};
struct pci_driver {
struct list_head node;
char *name;
const struct pci_device_id *id_table; /* must be non-NULL for probe to be called */
int (*probe) (struct pci_dev *dev, const struct pci_device_id *id); /* New device inserted */
void (*remove) (struct pci_dev *dev); /* Device removed (NULL if not a hot-plug capable driver) */
int (*suspend) (struct pci_dev *dev, pm_message_t state); /* Device suspended */
int (*suspend_late) (struct pci_dev *dev, pm_message_t state);
int (*resume_early) (struct pci_dev *dev);
int (*resume) (struct pci_dev *dev); /* Device woken up */
void (*shutdown) (struct pci_dev *dev);
struct pci_error_handlers *err_handler;
struct device_driver driver;
struct pci_dynids dynids;
};
struct pci_dev {
struct list_head global_list; /* node in list of all PCI devices */
struct list_head bus_list; /* node in per-bus list */
struct pci_bus *bus; /* bus this device is on */
struct pci_bus *subordinate; /* bus this device bridges to */
void *sysdata; /* hook for sys-specific extension */
struct proc_dir_entry *procent; /* device entry in /proc/bus/pci */
unsigned int devfn; /* encoded device & function index */
unsigned short vendor;
unsigned short device;
unsigned short subsystem_vendor;
unsigned short subsystem_device;
unsigned int class; /* 3 bytes: (base,sub,prog-if) */
u8 revision; /* PCI revision, low byte of class word */
u8 hdr_type; /* PCI header type (`multi' flag masked out) */
u8 pcie_type; /* PCI-E device/port type */
u8 rom_base_reg; /* which config register controls the ROM */
u8 pin; /* which interrupt pin this device uses */
struct pci_driver *driver; /* which driver has allocated this device */
u64 dma_mask; /* Mask of the bits of bus address this
device implements. Normally this is
0xffffffff. You only need to change
this if your device has broken DMA
or supports 64-bit transfers. */
struct device_dma_parameters dma_parms;
pci_power_t current_state; /* Current operating state. In ACPI-speak,
this is D0-D3, D0 being fully functional,
and D3 being off. */
pci_channel_state_t error_state; /* current connectivity state */
struct device dev; /* Generic device interface */
int cfg_size; /* Size of configuration space */
/*
* Instead of touching interrupt line and base address registers
* directly, use the values stored here. They might be different!
*/
unsigned int irq;
struct resource resource[DEVICE_COUNT_RESOURCE]; /* I/O and memory regions + expansion ROMs */
/* These fields are used by common fixups */
unsigned int transparent:1; /* Transparent PCI bridge */
unsigned int multifunction:1;/* Part of multi-function device */
/* keep track of device state */
unsigned int is_busmaster:1; /* device is busmaster */
unsigned int no_msi:1; /* device may not use msi */
unsigned int no_d1d2:1; /* only allow d0 or d3 */
unsigned int block_ucfg_access:1; /* userspace config space access is blocked */
unsigned int broken_parity_status:1; /* Device generates false positive parity */
unsigned int msi_enabled:1;
unsigned int msix_enabled:1;
unsigned int is_managed:1;
unsigned int is_pcie:1;
pci_dev_flags_t dev_flags;
atomic_t enable_cnt; /* pci_enable_device has been called */
u32 saved_config_space[16]; /* config space saved at suspend time */
struct hlist_head saved_cap_space;
struct bin_attribute *rom_attr; /* attribute descriptor for sysfs ROM entry */
int rom_attr_enabled; /* has display of the rom attribute been enabled? */
struct bin_attribute *res_attr[DEVICE_COUNT_RESOURCE]; /* sysfs file for resources */
#ifdef CONFIG_PCI_MSI
struct list_head msi_list;
#endif
};
struct class_device {
struct list_head node;
struct kobject kobj;
struct class *class;
dev_t devt;
struct device *dev;
void *class_data;
struct class_device *parent;
struct attribute_group **groups;
void (*release)(struct class_device *dev);
int (*uevent)(struct class_device *dev, struct kobj_uevent_env *env);
char class_id[BUS_ID_SIZE];
};
struct sysdev_class {
const char *name;
struct list_head drivers;
/* Default operations for these types of devices */
int (*shutdown)(struct sys_device *);
int (*suspend)(struct sys_device *, pm_message_t state);
int (*resume)(struct sys_device *);
struct kset kset;
};
struct sys_device {
u32 id;
struct sysdev_class * cls;
struct kobject kobj;
};
struct bus_type_private {
struct kset *drivers_kset;
struct kset *devices_kset;
struct klist klist_devices;
struct klist klist_drivers;
struct blocking_notifier_head bus_notifier;
unsigned int drivers_autoprobe:1;
struct bus_type *bus;
};
struct driver_private {
struct kobject kobj;
struct klist klist_devices;
struct klist_node knode_bus;
struct module_kobject *mkobj;
struct device_driver *driver;
};
This changed considerably from 2007 to 2008.
In the 2007 figure the black lines indicate containment of an instance of the lower structure within the upper structure. In the 2008 figure this relationsihp is shown only by nesting.
The dashed blue lines indicate fields of the source object that point to an object of the target class.
The dashed red lines in the 2007 figure indicate fields of the source object that are a kset of the target class.
The diagram is not complete. Left out are many other relationships, including fields that are sets of pointers to another class, fields that are lists of other classes of objects with pointers to another class, etc., etc.
This is an exceedingly (and, to me, probably unnecessarily) complex structure, with apparent redundancies and many apparent opportunities for inconsistencies and errors. It seems to be evolving, but it is unclear whether this evolution is converging and whether it is improving or just reflecting a change in personal styles of whoever currently has most influence over the code base. Specifically, from 2007 to 2008 we see the introduction of several new stylistic elements, including for example:
Are these trends? That is, will other structures be split into two parts, and will the remaining ksets be replaced by klists?
One should probably expect continued evolution in these details with future Linux kernel releases.
The following is a partial trace of what happens, with links to the code. I have not found the place where device_register is callled for ordinary PCI devices.
Take a look at pci_scan_bus, which appears to be called from architecture-dependent code.
two views:
DIR="/etc/hotplug.d"
for I in "${DIR}/$1/"*.hotplug "${DIR}/"default/*.hotplug ; do
if [ -f $I ]; then
test -x $I && $I $1 ;
fi
done
exit 1
Note that the "hotplug" utilities have been subsumed by a more general "userspace event" (see kobject_uevent) mechanism.
See Linux symposium paper for more details.
#include <linux/firmware.h> int request_firmware (const struct firmware **fw, char *name, struct device *device);
struct firmware {
size_t size;
u8 *data;
};
void release_firmware(struct firmware *fw);
int request_firmware_nowait
(struct module *module,
char *name,
struct device *device,
void *context,
void (*cont)(const struct firmware *fw, void *(context));
| © 2005 T. P. Baker ($Id: ch14.html,v 1.1 2007/06/05 16:14:32 baker Exp baker $) |