io.html

COP 4610: OPERATING SYSTEMS & CONCURRENT PROGRAMMING

The I/O Subsystem

Topics

Introduction to I/O devices
CPU-I/O interaction models
I/O buffering
Disk performance parameters
Disk scheduling

RAID
Disk cache

Connections between CPU scheduling and the I/O Subsystem

I/O devices are processors, and also need scheduling
scheduling principles are similar for both, though differences in time scales cause some different choices to be made
processes block to wait for I/O, and are then unblocked by the I/O system when the I/O they requested is completed

Introduction to I/O Subsystem

the largest and most complex part of the OS
complexity due to huge number of different types of devices and interfaces
the probable origin of object-oriented design
- classes of I/O devices
- hierarchy of types of device drivers
- overloading of operations (e.g., open, read, write) on different types of devices
OOD necessary to cope with the complexity
predates OOPLs by decades, so done in assembly and C language

The main reason for the complexity is the huge and continually expanding number and variety of devices that must be supported, with corresponding device-specific code.

I/O device interfaces are generally not well documented by the manufacturers, and increasingly even the manufacturers do not have internal documentation. The reasons include:

proprietary concerns
too much trouble to produce documentation
do not want to be limited to stick to a fixed device interface
costs can be shaved by moving functionality between hardware and device driver
hardware bugs can be compensated for by device driver
do not want customers complaining about failure to conform to any fixed interface
can get away with this now that tehre are few OS's, by manufacturer providing device driver with product
this furthers monopolies of few major OS's
this is not entirely good for the OS vendor, though
vendor-provided drivers can have bugs, can kill the OS
because device drivers must be trusted, to run in protected/privileged mode, so that they can access hardware directly, mask interrupts, use kernel memory

Categories of I/O Devices

Human readable/writeable

Printers
Video displays
Keyboard
Mouse

Machine readable/writeable
- Disk and tape drives
- Sensors
- Controllers
- Actuators
Communications
- Network interface cards (e.g. Ethernet)
- Modems
- Digital line drivers

Differences in I/O Devices Affect their Treatment by the OS

Data transfer rates

may differ by several orders of magnitude

Application
- Disk to store files requires file-management system support
- Disk for VM backing store requires MM hardware & VM system software
- Terminal may be trusted for "root" logins, or not
Complexity of control
Unit of transfer
- Data may be transferred as a stream of bytes for a terminal or in larger blocks for a disk
Data representation
- Encoding schemes: parity, CRC, compression, character sets
Error conditions

Devices respond to errors differently

Models of I/O Interaction

Programmed I/O
- Process uses busy-waiting for the operation to complete
Interrupt-driven I/O
- I/O command is issued
- Processor continues executing instructions
- I/O module sends an interrupt when done
Direct Memory Access (DMA)

DMA module controls exchange of data between main memory and the I/O device
Processor interrupted only after entire block has been transferred

Evolution of the I/O Function

Processor directly controls a peripheral device
Controller or I/O module is added
- Processor uses programmed I/O without interrupts
- Processor does not need to handle details of external devices
- Processor needs to poll state of I/O module to detect completion of operation
Controller or I/O module with interrupts
- Processor does not spend time waiting/polling for I/O operations to complete
Direct Memory Access
- Blocks of data are moved into memory without involving the processor
- Processor involved at beginning and end only
I/O module is a separate processor ("I/O channel")
- Can move data to or from memory without CPU
I/O processor is a full-fledged computer, with its own OS
- I/O module has its own local memory
- Is a computer in its own right

Note: The latter model of I/O processor was already in use at least as early as 1970 for mainframe computers. It has only recently made its way down to microprocessor-based computing.

Three Different I/O Interaction Models

	No Interrupts	Use Interrupts
Data transferred to/from memory via processor	Programmed I/O	Interrupt-driven I/O
Data transferred to/from memory directly		Direct Memory Access (DMA)

Logic of the Three I/O Interaction Models

Direct Memory Access

Takes control of the system from the CPU to transfer data to and from memory over the system bus
Cycle stealing is used to transfer data on the system bus
The instruction cycle is suspended so data can be transferred
The CPU pauses one bus cycle
No interrupts occur, so no saving & restoring of CPU context
Cycle stealing causes the CPU to execute more slowly

DMA and Interrupt Breakpoints During an Instruction Cycle

DMA does not need to wait for an instruction to complete, since (unlike an interrupt) there is no effect on the flow of instruction execution.

Single-bus, detached DMA

If the DMA module must use the system bus to get to the I/O device, there is a bottleneck.

Single-bus, Integrated DMA-I/O

If there is a path between DMA module and I/O module that does not include the system bus the contention is reduced, and the number of required busy (stolen) system bus cycles can be cut. One way is by integrating the DMA and I/O functions in a single module. (Can you think of a specific example of this?)

I/O Bus

This can be carried further, by providing a separate bus between the DMA controller module and a collection of I/O devices. (Can you think of a specific example of this?)

Operating System I/O Design Issues

Efficiency
- Most I/O devices extremely slow compared to main memory
- Use of multiprogramming allows for some processes to be waiting on I/O while another process executes
- I/O cannot keep up with processor speed
- Swapping is used to bring in additional Ready processes, which is another I/O operation
Generality
- Desirable to handle all I/O devices in a uniform manner
- Hide most of the details of device I/O in lower-level routines so that processes and upper levels see devices in general terms such as read, write, open, close, lock, unlock

A Model of I/O Organization

I/O System Layers for a Stream Device

Logical
- implements application-level commands, like open, close, read, write
- is not concerned with device-specific details
Device
- converts logical operations to device operations
- may impose its own buffering
Scheduling & control
- implements details of device operations
- queueing and scheduling of I/O operations
- interrupt handling
- issues instructions to the device controller hardware

In reality, each of these may be divided into further layers, especially to handle classes of devices. For example, we might have a hierarchy of Block Device -> Disk Device -> SCSI Device -> Specific SCSI Controller.

I/O Buffering

Reasons for buffering
- Processes must wait for I/O to complete before proceeding
- Certain pages must remain in main memory during I/O
Block-oriented
- Information is stored in fixed sized blocks
- Transfers are made a block at a time
- Used for disks and tapes
Stream-oriented

Transfer information as a stream of bytes
Used for terminals, printers, communication ports, mouse, and most other devices that are not secondary storage

Single Buffer

Operating system assigns a buffer in main memory for an I/O request
Block-oriented
- Input transfers made to buffer
- Block moved to user space when needed
- Another block is moved into the buffer
- May read ahead
- User process can process one block of data while next block is read in
- Swapping can occur since input is taking place in system memory, not user memory
- Operating system keeps track of assignment of system buffers to user processes
Stream-oriented
- Can be used line-at-a-time, or byte-at-a time
- Line-at-a-time
  - suitable for text I/O to & from terminal
  - carriage return signals end of input line
  - output to the terminal is also one line at a time

No Buffering

Single Buffering

Double Buffer

Use two system buffers instead of one
A process can transfer data to or from one buffer while the operating system empties or fills the other buffer

Double Buffering

Circular Buffer

More than two buffers are used
Each individual buffer is one unit in a circular buffer
Used when I/O operation must keep up with process

Recall the example of the Producer-Consumer synchronization problem.

Circular Buffering

Notice that time spent copying data from the OS buffer to the user buffer can be saved if we use memory mapping.

Disk Performance Parameters

To read or write, the disk head must be positioned at the desired track and at the beginning of the desired sector
Seek time
- time it takes to position the head at the desired track
- for current 15K RPM drive, average 4.2 ms
Rotational delay or rotational latency

time its takes for the beginning of the sector to reach the head
for current 15K RPM drive, average 2.0 ms

Timing of a Disk I/O Transfer

Disk Performance Parameters

Access time

Sum of seek time and rotational delay
The time it takes to get in position to read or write

Data transfer occurs as the sector moves under the head

Disk Performance Parameters

Data Transfer Rate

The rate at which data can stream to/from drive
Limited by internal disk controller logic and I/O interface
The maximum rate at which data can stream to/from drive
for current Ultra320 SCSI interfce on 15K RPM drive, 320 MB/sec

Disk Scheduling Policies

Seek time is the reason for differences in performance
For a single disk there will be a number of I/O requests
If requests are selected randomly, we will get the worst possible performance
Some disk scheduling policies
- First-in, first-out (FIFO)
- Priority
- Last-in, first-out
- Shortest service time first
- SCAN
- C-SCAN
- N-step-SCAN
- FSCAN

Priority

Goal is not to optimize disk use but to meet other objectives
Short batch jobs may have higher priority
Provide good interactive response time

Last-In, First-Out (LIFO)

Good for transaction processing systems

The device is given to the most recent user so there should be little arm movement
Possibility of starvation since a job may never regain the head of the line

First-In, First-Out (FIFO)

Process request sequentially
Fair to all processes
Approaches random scheduling in performance if there are many processes

Shortest Service Time First (SSTF)

Select the disk I/O request that requires the least movement of the disk arm from its current position
Always choose the minimum Seek time

SCAN

Arm moves in one direction only, satisfying all outstanding requests until it reaches the last track in that direction
Direction is reversed

C-SCAN

Restricts scanning to one direction only
When the last track has been visited in one direction, the arm is returned to the opposite end of the disk and the scan begins again

N-step-SCAN

Avoids "stickiness" problem of SSTF, SCAN,and C-SCAN
(i.e., tendency to not serve requests that require head to be moved)
Segments the disk request queue into subqueues (batches) of length N
Subqueues are process one at a time, using SCAN
New requests added to other queue when queue is processed

FSCAN

Another way to avoids "stickiness"
Two queues
One queue is empty for new request

Summary of Disk Scheduling Policies

Note that within the policies we have considered you can distinguish trade-offs between maximizing overall disk utlization (and I/O throughput), and providing some quality of service to specific processes.

Note also that the environment in which disk scheduling by the OS really paid off does not quite exist anymore, at least in smaller systems. When all disk I/O was done under direct control of the OS, the OS could implement scheduling policies such as the above. As more intelligence has been added to disk controllers and disk drives themselves, control over the real order or disk operations may be taken over by an external device. Deciding whether and how to schedule disk I/O within the OS becomes more problematic. Some factors that complicate the situation include:

disk internal cache - may eliminate need for head movement one some requests
disk controller module cache - may eliminate need for disk access
virtual disk geometries and addresses (logical vs. physical)

If you detect a trend here, toward relying on multi-layered caches more than scheduling, you may be right.

RAID (Redundant Array of Inexpensive Disks)

treats combination of disks as a single logical drive
uses striping of data across disks, for two benefits
throughput: through parallel I/O operations
- if small strips: high throughput for single requests
- if large strips: high concurrency for independent requests
reliability: through redundancy
various combinations possible

Redundancy, Reliability, Performance

	One Unit	N Units
Data throughput	r	r N*
Probability of single failure	p	p N*
Probability of total failure	p	p^N

The situation here with multiple diesks is analogous to the situation with multiple CPUs in an SMP system.

Through parallel operation of multiple disks, we can achieve higher data throughput, up to the bandwidth limit of our other system components such as the I/O and memory buses.

However, adding more units also increases the probability that one of the units will fail.

Fortunately, the probability of total system failure goes down faster. Therefore, if we can find a way to use some of the disk redundancy to allow us to recover from failures we can maintain or improve total system reliability.

Data Mapping for a RAID Level 0 Array

The different types of RAID are identified by numbers. The numbers are sometimes called "levels", but that does not mean they are numbered hierarchically according to any measure of performance or reliability. The numbers have more to do with the order in which the various types of RAID were invented.

The figure illustrates the basis idea, which applies in all cases: to implement a virtual disk, whose actual data is distributed over several actual disks. In this case, it is RAID 0, which seeks only parallelism. The data is divided up into strips, and the strips are distributed over the drives.

RAID 0 (non-redundant)

non-redundant
only gains performance
gains throughput through parallel I/O on different disks

What is gained by distributing strips over disks, as compared to distributing entire files?

The answer depends on the kind of computing, i.e., whether we want to read/write in single large files very fast, versus reading/writing multiple independent files. The main use of this kind of RAID is for supercomputers, for transferring huge blocks of data. Ther may be no gain in performance at all, if one is reading or writing small blocks of data.

RAID 1 (a.k.a. Mirroring)

complete redundancy, duplication of all data
some gain through read parallelism
only one disk must be read
no gain in write performance
redundant but parallel writes

This type of RAID is mainly done for reliability, to tolerate disk drive failures. It also helps with read performance, since read operations on different files may proceed in parallel. It is practical for small systems (two disks) where reliability is needed but RAID 5 cannot be afforded.

RAID 2 (redundancy through Hamming code)

usually small strips
encoding is used that can correct single-bit errors and detect double-bit errors
reads must read from all disks, to check codes
needs log n parity disks for n redundancy
can catch non-catastrophic disk errors
more overhead than other techniques, though

This type of RAID does not appear to have been successful. It is complicated to implement, and the kind of correction it provides is more than we need. The virtue of the Hamming code is the ability to detect and correct local bit-level errors in the data read from a drive. This kind of error detection is already done internally by modern disk drives, so there is no need for the RAID system to do the same.

It is unlikely that a disk drive will produce bad data on a single read or write operation, but continue to function normally otherwise. When we do have a disk failure the failure is usually more catastrophic, involving the entire disk (e.g. bearing failure) or an entire disk surface. (e.g., head failure). The other types of RAID handle major failures equally well, and are simpler.

However, it is interesting to look for a moment at Hamming codes, and the use of redundancy for detecting and correcting errors in general.

Using Parity to Detect Errors

count the number of 1-bits in the data
if the count is odd, the sequence has odd parity (1)
in other words: parity is exclusive or of the data bits
if we add an extra bit, representing the parity (forcing the parity of the entire string to be even), we can detect single-bit errors:

Data Bits Parity Bit Data with Parity Bit Added
0110100 1 01101001
0111100 0 01111000

Data Bits	Parity Bit	Data with Parity Bit Added
0110100	1	01101001
0111100	0	01111000

Hamming Code

uses parity
can correct single-bit errors
can detect double-bit errors
the originator of the idea was a person named Hamming

Hamming Code uses extra parity bits to allow the identification (and correction) of a single error. Creating an encoded word is done as follows:

To encode 10101100

1	2	3	4	5	6	7	8	9	10	11	12
-	-	1	-	0	1	0	-	1	1	1	0
1	.	1	.	0	.	0	.	1	.	1	.
.	0	1	.	.	1	0	.	.	1	1	.
.	.	.	1	0	1	0	.	.	.	.	0
.	.	.	.	.	.	.	1	1	1	1	0
1	0	1	1	0	1	0	1	1	0	1	1

First, we lay out the data word, reserving spaces for the parity bits in the bit positions that are powers of two (1, 2, 4, 8 , ...). All other bit positions (3, 5, 6, 7, 9, 10, ...) are available to hold data.

Each parity bit is set to achieve even parity for a different subsequence of the bits, chosen according to an overlapping binary pattern, as follows:

bit 1: first bit of every two bits (1,3,5,7,9,...)
bit 2: 2 of every 4 bits, starting after bit 1 (2-3,6-7,10-11,...)
bit 4: 4 of every 8 bits, starting after bit 3 (4-7,12-15,...)
bit 8: 8 of every 16 bits, starting after bit 7 (8-15,24-31,40-47,...)
bit 16: 16 of every 32 bits, starting after bit 15 (16-31,48-63,80-95,...)

To correct 101101111011

1	2	3	4	5	6	7	8	9	10	11	12
1	0	1	1	0	1	1	1	1	0	1	1
1	.	1	.	0	.	1	.	1	.	1	.
.	0	1	.	.	1	1	.	.	1	1	.
.	.	.	1	0	1	1	.	.	.	.	0
.	.	.	.	.	.	.	1	1	1	1	0
1 + 2 + 4 = 7
1	0	1	1	0	1	0	1	1	0	1	1

The way to verify and correct an encoded number is to verify each parity bit. Write down all the incorrect parity bits, and add up their positions. The sum of the positions of the incorrect bits gives the location of the bad bit.

For example, suppose the encoded word of the example above were corrupted from 101101011011into 101101011011. We verify the parity bits and discover that parity bits 1, 2, and 4 do not match. This tells us that the error is in bit 7. We correct it and get back the original word.

Using Parity to Reconstruct Data for ofther RAID Types

M	K	M xor K	(M xor K) xor K	(M xor K) xor M
0	0	0	0	0
0	1	1	0	1
1	0	1	1	0
1	1	0	1	1

The higher numbered types of RAID make use of a simpler error correction scheme, based on the exclusive or (xor) operation.

This is just one of the many uses of the exclusive or operation. Other applications include data structures (storing ambidirectional pointers), and cryptography (encryption-decryption). A useful property of this operation is that is invertible. One can recover the value of either operand by combining the other operand with the xor of the pair. The table shows how the value of M can be recovered from the values of M and M xor K, and vice versa.

RAID 3 (bit-interleaved parity)

Creation of parity disk X4:

X4(i) = X3(i) xor X2(i) xor X1(i) xor X0 (i)

Recreation of failed disk X1:

X1(i) = X4(i) xor X3(i) xor X2(i) xor X0 (i)

uses the xor technique to recover lost data
only requires one parity disk, regardless of the array size
small data striping allows high transfer rates
allows only one write at a time, over all the disks

This form of RAID achieves high data bandwidth and can tolerate loss of a single disk drive, but only allows one transaction at a time. Thus, the performance on independent concurrent I/O processes is worse than if we had not used RAID (instead using each disk as an independent drive).

RAID 4 (block-level parity)

Creation of parity disk X4:

X4(i) = X3(i) xor X2(i) xor X1(i) xor X0 (i)

Update of for modification to one stripe on disk X1:

X4'(i) = X3(i) xor X2(i) xor X1'(i) xor X0 (i) xor X1 (i) xor X1 (i)
= X4(i) xor X1(i) xor X1'(i)

This form of RAID allows concurrent independent writes on different disks.

The algebra shows how this works. When we update one strip, we only need to update two disks: the "original" disk and the parity disk.

However, the parity disk is still involved in every write operation.

RAID 4 (block-level parity)

uses xor as above
but uses large strips
independent access allows multiple requests to be served (somewhat) in parallel
can do N different writes in parallel, with a single parity-disk update
in practice, parity disk tends to become a bottleneck

The parity disk can become a bottleneck. It is also subject to the most wear, and so is the most likely disk to fail.

RAID 5 (block-level distributed parity)

similar to RAID 4
but parity strips are distributed across disks
this reduces the bottleneck problem

In practice, this seems to be the preferred type of RAID for large systems.

RAID 6 (dual redundancy)

similar to RAID 5
adds another parity disk, for dual redundancy
uses xor for one check-disk, and another agorithm for the other

The benefit is higher reliability. Now we can tolerate two (2) disk failures without data loss.

Disk Cache

Buffer in main memory for disk sectors
Contains a copy of some of the sectors on the disk
Can save lots of time for disk blocks that are accessed frequently

As mentioned further above, there are actually several layers of disk caching, including cache inside the disk drive, cache inside the disk controller, and cache inside the main memory. The general principles of replacement algorithms mentioned below could apply to any level.

Least Recently Used Buffer Replacement Scheme

The block that has been in the cache the longest with no reference to it is replaced
The cache consists of a stack of blocks
Most recently referenced block is on the top of the stack
When a block is referenced or brought into the cache, it is placed on the top of the stack
The block on the bottom of the stack is removed when a new block is brought in
Blocks don't actually move around in main memory
A stack of pointers is used

Why is LRU practical as a disk cache replacement policy, but not as a VM page replacement policy? What is different here?

Least Frequently Used Buffer Replacement Scheme

The block that has experienced the fewest references is replaced
A counter is associated with each block
Counter is incremented each time block accessed
Block with smallest count is selected for replacement
Problem: Some blocks may be referenced many times in a short period of time and then not needed any more

The idea was to improve on LRU, but a "memory" problem crops up, which makes this policy perform very poorly.

Note the similarity of the problem here to a problem in processor scheduling, that if we just used average burst processing time the average is too dominated by ancient history. The solution there was to use an exponential average. In principle, the same technique could be used here. However, the techniques described by Stallings take a different approach, more analogous to the clock algorithms for page replacement.

LFU variant: "Frequency Based Allocation"

Note that the phrase "reference count" is used quite differently her from the way it is used in data structures. These counts represent instances of reference (not pointers) and they are never decremented.

Blocks in the "new" section are not eligible for replacement. Blocks in the "old" section are replaced, LFU-first. This is an improvement, but there is still a bias against newer blocks.

Addition of a "middle" section allows newer blocks a chance to build up their reference counts before they have to compete against older blocks.

Performance with this method is actually better than LRU.

Take a look at Figures 11.12 and 11.13 in the text, and the explanation that goes with them. We cannot reproduce those figures here. The publisher did not provide them (probably because they are used in the text by permission of another publisher). Pay attention to the comment about sensitivity of results to reference strings, and the incomparability of results from simulations with different reference strings.

I/O Subsystem in Traditional Unix Architecture

What real information can you get from this figure?

Go back to Chapter 2.5 and review what it says about the relationship if the I/O subsystem to the rest of operating system, for the original Unix system, and then for SVR4 Unix.

UNIX I/O Buffering Types

Device Type	Unbuffered I/O	Buffer Cache	Character Queue
disk	X	X
tape	X	X
terminal			X
com. line			X
printer	X		X

Unix uses two kinds of buffered I/O, as well as unbuffered I/O.

Block devices are treated differently from character devices in that a block that has been written may be re-read from the buffer. Character device buffering follows the producer-consumer model. Block device buffering follows the readers-writers model.

UNIX SVR4 Bufffered I/O

We will look at the Unix block buffer scheme in more detail below.

Unix Block I/OBuffer Scheme

Block buffers are looked up using a hash table, with chaining
The actual buffers are represented in the hash table by their headers
The "data area" of the buffer header is what actually holds a disk block's worth of data

The Unix buffer cache scheme uses a hash table to look up disk blocks in the cache.

What is the purpose of this scheme?

How does allocating more RAM for disk cache affect the performance of the system?

How does this relate to paged virtual memory?

BSD Buffer Header

The figure shows the approximate contents of a 4.3 BSD Unix block buffer header.

An interesting feature is that the real memory allocated to a block buffer can vary from 512 to 8Kbytes, but the virtual memory size is always 8Kbytes. The system uses the VM system to only map the number of pages that the buffer actually needs. Real memory is "moved" around from one virtual buffer to another, as needed.

BSD Buffer Hash Table & Free Lists

The chained hash table is used to find out whether a given block of data on disk is also in a memory buffer.

There are four free lists. The locked buffers always stay in the cache. This feature was intended for "superblocks" of mounted filesystems, but was never used because of a deadlock problem with processes trying to share access to superblocks. The LRU list is ordered by recency of use. The AGE list contains buffers moved from the LRU list, that have not been accessed recently. The EMPTY list contains buffer headers that have no real memory allocated to them. When a buffer is needed, it ordinarily is taken from the AGE list, and if that list is empty one is taken from the head of the LRU list.

Unificiation with VM Paging System

Observe that the disk block buffering functionality and the design considerations are very similar to those of the paging system that is needed to support virtual memory. Some simplicity and economy of design can be realized by using the some of the same software for both. Modern Unix systems, like Linux and Solaris, do this.

I/O Subsystem in the Linux Architecture

There is an excellent book, Linux Device Drivers that is available for downloading and printing over the Internet, free of charge. You can learn a lot about the whole operating system from this book. It covers the interfaces to other parts of the operating system, because device drivers interact with the other parts of the operating system. I recommend you read at least part of it.

The diagram, from that book, gives a good overall view of how the device drivers and I/O system relate to the rest of the OS.

Logical Structure of Linux Device Drivers

Actions initiated by a running process (synchronously) via a kernel call
- initiates I/O operation
- usually blocks the calling process until the operation is complete
Actions initiated by a hardware device (asynchronously) via an interrupt
1. "Top half": called asynchronously, a direct hardware interrupt handler
  - used to notify driver of completion of an I/O operation
  - cannot access most kernel data structures, since it can preempt the kernel
  - generally kept very short and simple
2. "Bottom half": called by the dispatcher, in response to flag set by the top half
  - can use spinlocks to access all kernel data structures,
    and so can unblock process that was waiting for I/O completion
  - can be longer, since other interrupts are not blocked

Linux device drivers can be separated into three parts, according to how the execution is initiated.

System Call Interface Hierarchy

The actions of an I/O device driver that are initiated by the action of a process begin with a system call. The first part of control the path of a call to the write() function in Linux 2.14 is shown in the figure. More details, including links to the full code, are given below.

System Call Interface Hierarchy: Specifics

library subprogram call -- e.g,.write()
kernel trap:
```
mov    $0x4,%eax
int    $0x80
```
kernel trap handler -- e.g,. see system_call in entry.S of Linux kernel 2.14 source code:
```
ENTRY(system_call)
	call *SYMBOL_NAME(sys_call_table)(,%eax,4)
```

call table -- e.g,. see sys_call_table in entry.S of Linux kernel 2.14 source code:

ENTRY(sys_call_table)
	.long SYMBOL_NAME(sys_ni_syscall)	/* 0  -  old "setup()" system call*/
	.long SYMBOL_NAME(sys_exit)
	.long SYMBOL_NAME(sys_fork)
	.long SYMBOL_NAME(sys_read)
	.long SYMBOL_NAME(sys_write)
...
	.endr

sys_write in read_write.c of Linux kernel 2.14 source code:

asmlinkage ssize_t sys_write(unsigned int fd, const char * buf, size_t count) 
{...
   if (file->f_op && (write = file
                              ->f_op
                              ->write) != NULL)
      ret = write(file, buf, count, &file->f_pos);
...}

Observe that at two points the flow of control goes through pointers to functions. The first point where this happens is inside the system_call() function, which uses the value of a register (AX, on the Ix86 architecture) to select a function pointer from the vector sys_call_table. The second point where this happens is inside the sys_write() function, which uses the value of file->f_op->write to call the (you could say "overloaded" or "polymorphic") write function that is appropriate to the particular class of file.

Kernel Modules

Linux provides a framework for adding device drivers, based on kernel modules. A kernel module is a kernel component that can be loaded and unloaded dynamically. Besides making it possible to develop and compile device drivers independently from the rest of the kernel, it also allows the size of the kernel memory footprint to be reduced, by loading only those modules that are needed for the set of devices that are installed and active at any one time.

I/O Subsystem in Win2K Architecture

How much real information can you get from a figure such as this one?

Go back to Section 2.5 and review what it says about the relationship if the I/O subsystem to the rest of the Windows 2000 operating system.

Windows 2000 I/O

© 2002 T. P. Baker & Florida State University. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means without written permission. (Last updated by $Author: cop4610 $ on $Date: 2002/09/26 20:06:06 $.)

1	2	3	4	5	6	7	8	9	10	11	12
-	-	1	-	0	1	0	-	1	1	1	0
1	.	1	.	0	.	0	.	1	.	1	.
.	0	1	.	.	1	0	.	.	1	1	.
.	.	.	1	0	1	0	.	.	.	.	0
.	.	.	.	.	.	.	1	1	1	1	0
1	0	1	1	0	1	0	1	1	0	1	1

1	2	3	4	5	6	7	8	9	10	11	12
-	-	1	-	0	1	0	-	1	1	1	0
1	.	1	.	0	.	0	.	1	.	1	.
.	0	1	.	.	1	0	.	.	1	1	.
.	.	.	1	0	1	0	.	.	.	.	0
.	.	.	.	.	.	.	1	1	1	1	0
1	0	1	1	0	1	0	1	1	0	1	1

1	2	3	4	5	6	7	8	9	10	11	12
-	-	1	-	0	1	0	-	1	1	1	0
1	.	1	.	0	.	0	.	1	.	1	.
.	0	1	.	.	1	0	.	.	1	1	.
.	.	.	1	0	1	0	.	.	.	.	0
.	.	.	.	.	.	.	1	1	1	1	0
1	0	1	1	0	1	0	1	1	0	1	1