COP 4610: Operating Systems & Concurrent Programming

Virtual Memory

 

Background & Motivation

  1. The segmented/paged memory principle
  2. The hierarchical memory (cache) principle.

These are the two key observations that motivate the development of virtual memory. We just finished looking at pages and segmented memories.

Common Features of Paging & Segmentation

Observation: All pieces of a process do not need to be loaded in main memory during execution


Once we have paid the price for the extra hardware to do dynamic address translation (originally to solve the problem of storage management), we discover that we can do more with it.


Advantages of Breaking up a Process


This is another case of the hierarchical memory concept, which we saw previously applied to memory cache. This is a very useful concept, which has several applications in operating systems.


Hierarchical Storage, Memory, Buffers, Caches


We hope that by the end of this course, you should be able to answer:

You should have seen memory caches before. We will now look at virtual memroy and the TLB. The other two will come up latter, when we look at the I/O and file systems.

If you have not read Chapter 1 of Stallings' book (including Appendix 1A) carefullly before, now is a good time to go back and read it. In particular, make certain you have read and understood the sections on The Memory Hierarchy, Cache Memory, and I/O Communication Techniques. All of what we are studying now is an elaboration on the themes introduced there.

As we have observed before, there are a few basic ideas in computing, which people apply over and over in slightly different ways. One of these ideas is that of a cache. A cache is a special kind of buffer, that is used to reduce the average-case cost of accessing a comparatively slow storage medium. Main memory access time is much slower than CPU instruction cycle times. Disk storage access time is MUCH slower than main memory access times.

Because all of the items on the list above are based on the cache principle, you will see many similarities between them. The similarities will make it easier for you to understand them, but you must take care not to forget that they are all distinct.


Similarities: Cache Principles/Assumptions


Note that all of the above is based on probabilities, and long term, average case, behavior. The fundamental assumption is that of locality of reference.

Recall from Chapter 1 the two forms of locality:


Similarities: Cache/VM Performance Issues


The design and performance issues for the various applications of the cache principle are similar. Generally, a larger total cache size results in a higher probability of hits, which is good. Generally, the granularity must be tuned to fit the system. If the granularity is too large or too small, the miss ratio will go up. In all cases, a critical decision is the replacement policy, i.e., how to decide which data should be kicked out of the cache when we need to bring in some new data.

We will see there are also important differences in each application of the cache principle. Some of these have to do with the relative speeds of the devices, the relative costs of devices, the degree of success we can expect in predicting the locality of references, and the relative cost of bad guesses. For example, the faster the replacement decision must be made the simpler the policy must be, since the performance improvement due to a better hit ratio would be eaten up by the overhead of making the decisions.


Execution of a Program with Virtual Memory

  1. OS brings a few piece of the program into main memory when process starts execution
  2. Resident set = portion of process that is in main memory
  3. A trap/interrupt is generated when an address is needed that is not in the resident set
    (This trap is called a page/segment fault)
  4. OS puts the process in the Blocked state
  5. Piece of process that contains the logical address is brought into main memory
  6. OS puts the process in the Ready state

Virtual versus Real Memory


Virtuality

The term virtual is used a lot in computing.


Do you know what these all mean?

Can you think of any more?

You should be able to generalize the meaning. What do these all have in common?

By now, the term virtual has migrated from the technical jargon of computer scientists to common speach, through the popular media, so most people already understand the difference between a virtual thing and a real thing.

At a deeper philosophical level, it may be argued that none of us knows what is real and what is virtual. However, the implementor of a virtual system knows effectively what is virtual, meaning what she/he implements, and what is real, meaning what he/she uses to implement it.


How can Virtual Memory Work?


Principle of Locality

People observed:

These all suggested that virtual memory might work efficiently.

They guessed correctly.

Historical Note:This work was done in the late 1960's, and pioneered by IBM.


What Factors Affect Locality of Reference?

Positive influences:

Negative influences:


The effectiveness of virtual memory seems to have decreased in recent years. One factor is the lower cost of main memory. When we can afford to put several gigabytes of main memory on a machine, the utility of having a larger virtual address space than physical address space is reduced. Another factor seems to be reduced locality of reference.

How has the adoption of OOP and object-oriented languages affected locality of reference?

Which of the above causes of locality are temporal and which are physical?

Do you undertand how each of the negative influences listed above reduceds locality of reference?


Support Needed for Virtual Memory


Paging


Page Table Entries


Address Translation in a Paging System


Handling Large Page Tables


Two-Level Scheme for 32-bit Address


Translation Lookaside Buffer (TLB)


Operation of TLB


How is the TLB distinguished functionally from the memory cache?

How do they operate together?

What effects limit the practical size of the TLB?


Operation of TLB as Flowchart


Use of a Translation Lookaside Buffer


TLB Fault Processing


The figure shows the timing and causal relationships between the events and agents involved in a TLB fault. It starts when the CPU issues a memory fetch operation (possibly for data or for an instruction) that refers to an address whose page frame number s not in the TLB. The TLB fetches the corresponding page table entry from memory. It then translates the virtual address to a physical address and fetches the operand originally requested by the CPU.

The fetch operation ends up taking two memory cycles. In order for this not to slow the execution down appreciably, most memory references must be TLB hits.


Relation of TLB to Memory Cache


Page Fault Processing


The figure shows the timing and causal relationships between some of the events and agents involved in a page fault. It starts when the CPU issues a memory reference for an adress that is not in the TLB.

In order to focus on the interesting events, the figure leaves out many details, among them the entire role of the memory cache(s), and the details of the disk subsystem.

The dashed arrows allude to some of these details that we cannot ignore entirely, i.e., the many interactions between the executing OS software and the memory that are needed to handle the page fault, bring the page in from the disk, and update the page table.

The OS actions start with the execution of an interrupt (or trap) handler, the page fault (PF) handler. The OS marks the page-faulted process as being blocked, and initiates a disk read operation to fetch the page into main memory. The PF handler may do some of this work directly, or it may wake up a pager process and let that process do the work. (What are the arguments for each of these two alternatives? Is there a middle ground?)

At some point the PF handler will call the dispatcher, so that some other process can use the CPU while the disk is busy.

When the disk subsystem (what is included in the disk subsystem?) has brought the page into memory, it interrupts the CPU again. The disk I/O interrupt handler, or a pager process that it wakes up, updates the page table in memory, and marks the page-faulted process as being ready.

Eventually, the dispatcher will resume execution of the page-faulted process. It will reissue the instruction that caused the page fault. This time the TLB will succeed in fetching the page table entry from memory, and the operation of the TLB will proceed as in the preceding figure, i.e., there will be a TLB fault, the page table will entry will be fetched, the TLB will be updated, and finally the CPU

After all this, the system is back at the point of the original TLB fault, which led to the page fault. It is clear that the OS does quite a bit of work to handle each page fault. It follows that we cannot afford to have page faults very often.

In particular, it would be disastrous if the OS code for handling a page fault itself were to cause a page fault! If it did, the system could get into an infinite cycle of page faults. For this reason, the code that handles page faults is kept permanently in main memory. (However, it is possible that some of the processing by the OS for a page fault may generate further page faults. In particular, this may be the case if the page table itself is paged.)


Page Size Performance Trade-Offs

Smaller page size=>less internal fragmentation
Smaller page size=>fewer page faults
Larger page size=>more disk I/O efficiency
Larger page size=>smaller page tables
 =>fewer pages per process
 =>fewer TLB misses, fewer secondary page faults

Graph (a) shows the general effect of increasing the page size. I think this is a little bit misleading, since it is not clear from the figure what assumptions are being made about the number of page frames assigned to the process. The explanation for the increase on the left edge of the chart is that as the page size increases the probability of having the entire "working set" in memory goes down. However, that presumes a limited amount of real memory is available. On the other hand, the reason given for the decrease in page fault rate on the right end of the chart is that the whole process fits into one page. That seems to presume there is enough real memory to hold the entire process.

Graph (b) shows how, as the number of frames allocated to the process grows (for any page size), the page fault rate decreases, until the whole process is in memory.


Example Page Sizes

Atlas   512 48-bit words
Honeywell-Multics   1024 36-bit words
IBM 370/XA and 370/ESA   4K bytes
VAX family   512 bytes
IBM AS/400   512 bytes
DEC Alpha   8K bytes
MIPS   4K to 16M bytes
UltraSPARC   8K to 4M bytes
Pentium   4K to 4M bytes
PowerPC   4K bytes

Page table sizes are chosen based on experimentation, with performance monitoring.


Variable Page Sizes?


Segmentation


Each Segment Table Entry Contains


Segment Table Entries


Combined Paging and Segmentation


Combined Segmentation and Paging


Example: Typical Protection Relationships Between Segments


Other Protection Relationships


What are the benefits of each of these restrictions?


Implementation Issues for Paged Virtual Memory


Fetch Policy


Placement Policy


Replacement Policy


Frame Locking


What are the reasons for locking each of the above?

The Unix API for this feature is the functions mlock(), munlock(), mlockall(), and munlockall(). Take a look at the man-pages for these.


Basic Replacement Policies


Optimal Policy


Least Recently Used (LRU)


Can you explain why implementing LRU is not feasible? That is, where is the "overhead" mentioned above and how do we know it is too large for us to want to implement LRU?


First-in, first-out (FIFO)


Clock Policy


Example of Clock Policy Operation


Using The Modified-Bit

Modified?Used?Priority
001
012
103
114

The clock algorithm can be extended to make use of a modified bit along with the use bit. This gives a slightly closer approximation to the LRU behavior, and potentially cuts in half the amount of disk traffic.

Where is the difference in the disk traffic?


Replacement Algorithms


The figure shows approximately how the algorithms compared in performance on a simulated page reference string.


Page Buffering


How can we afford to leave pages in memory, if we needed to replace them?

Only makes sense if the OS reserves some of main memory for the page buffer.

Makes more sense if each process has a set limit on its resident set size (see local replacement, below).


Resident Set Management


Variable Allocation, Global Scope


Variable Allocation, Local Scope


Working Set Concept


What is meant by virtual time here?


Resident Set Size Strategy Based on Working Set Model

Naive idea:

Problems:

Observation:


Page Fault Frequencey (PFF) Algorithm


How do we determine the thresholds?

Why does this not perform well during shifts to new locality?

How does the VSWS (variable-interval sampled working set) policy address this problem?


Cleaning Policy


Load Control


Process Suspension


UNIX and Solaris Memory Management Data Structures


SVR4 Unix Memory Management Table Structures


UNIX and Solaris Memory Management


What is meant by a "lazy" Buddy System? What is the intended advantage over the normal Buddy System?


Linux Memory Management

See separate notes on Linux memory management.

In order to provide more detail, the notes on Linux memory management are in a separate file.


Windows 2000 Memory Management


Windows 2000 Memory Management


© 2002, 2005 T. P. Baker & Florida State University. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means without written permission. (Last updated by $Author: cop4610 $ on $Date: 2002/09/26 20:06:06 $.)