Lecture #2: Distributed Operating Systems: an introduction
Topics for today
-  Overview of major issues in distributed operating systems
 -  Terminology
 -  Communication models
 -  Remote procedure calls
 
These topics are from Chapter 4 in the Advanced Concepts in
OS text.
What is a distributed system?
- It consists of multiple computers that do not share a memory.
 - Each Computer has its own memory and runs its own operating system.
 - The computers can communicate with each other through a communication 
    network.
 - See Figure 4.1 for the architecture of a distributed system.
 
Why build a distributed system?
- Microprocessors are getting more and more powerful.
 - A distributed system combines (and increases) the computing power of 
    individual computer.
 - Some advantages include:
-  Resource sharing
    (but not as easily as if on the same machine)
 -  Enhanced performance
    (but 2 machines are not as good as a single
     machine that is 2 times as fast)
 -  Improved reliability & availability
    (but probability of single failure increases,
     as does difficulty of recovery)
 -  Modular expandability
 
 -  Distributed OS's have not been economically successful!!!
 
System models:
-  the minicomputer model (several minicomputers with each computer
supporting multiple users and providing access to remote resources).
 -  the workstation model (each user has a workstation, the system provides
some common services, such as a distributed file system).
 -  the processor pool model (the model allocates processor to a user
     according to the user's needs).
 
Where is the knowledge of distributed operating systems 
likely to be useful?
-  custom OS's for high performance computer systems
 -  OS subsystems, like NFS, NIS
 -  distributed ``middleware'' for large computations
 -  distributed applications
 
Issues in Distributed Systems
-  the lack of global knowledge
 -  naming
 -  scalability
 -  compatibility
 -  process synchronization (requires global knowledge)
 -  resource management (requires global knowledge)
 -  security
 -  fault tolerance, error recovery
 
Lack of Global Knowledge
-  Communication delays are at the core of the problem
 -  Information may become false before it can be acted upon
 -  these create some fundamental problems:
-  no global clock -- scheduling based on fifo queue?
 -  no global state -- what is the state of a task? What is a correct
                        program?
 
 
Naming
-  named objects: computers, users, files, printers, services
 -  namespace must be large
 -  unique (or at least unambiguous) names are needed
 -  logical to physical mapping needed
 -  mapping must be changeable, expandable, reliable, fast
 
Scalability
- How large is the system designed for?
 - How does increasing number of hosts affect overhead?
 - broadcasting primitives, directories stored at every computer -- these
    design options will not work for large systems.
 
Compatibility
- Binary level: same architecture (object code)
 - Execution level: same source code can be compiled and executed (source code).
 - Protocol level: only requires all system components to support a common set
    of protocols.
 
Process synchronization
-  test-and-set instruction won't work.
 -  Need all new synchronization mechanisms for distributed systems.
 
 
Distributed Resource Management
-  Data migration: data are brought to the location that needs them.
-  distributed filesystem (file migration)
 -  distributed shared memory (page migration)
 
 -  Computation migration: the computation migrates to another location.
-  remote procedure call: computation is done at the remote machine.
 -  processes migration: processes are transferred to other processors.
 
 
Security
- Authetication: guaranteeing that an entity is what it claims to be.
 - Authorization: deciding what privileges an entity has and making only 
                   those privileges available.
 
Structuring
- the monolithic kernel: one piece
 - the collective kernel structure: a collection of processes
 - object oriented: the services provided by the OS are implemented as 
    a set of objects.
 - client-server: servers provide the services and clients use the services.
 
Communication Networks
- WAN and LAN
 - traditional operating systems implement the 
    TCP/IP protocol stack: host to network layer, IP layer, transport layer,
    application layer.
 -  Most distributed operating systems are not concerned with the lower
     layer communication primitives.
 
Communication Models
-  message passing
 -  remote procedure call (RPC)
 
Message Passing Primitives
-  Send (message, destination), Receive (source, buffer)
 -  buffered vs. unbuffered
 -  blocking vs. nonblocking
 -  reliable vs. unreliable
 -  synchronous vs. asynchronous
 
Example: Unix socket I/O primitives
#include <sys/socket.h>
ssize_t sendto(int socket, const void *message,
  size_t length, int flags,
  const struct sockaddr *dest_addr, size_t dest_len);
ssize_t recvfrom(int socket, void *buffer,
  size_t length, int flags, struct sockaddr *address,
  size_t *address_len);
int poll(struct pollfd fds[], nfds_t nfds,
  int timeout);
int select(int nfds, fd_set *readfds, fd_set *writefds,
  fd_set *errorfds, struct timeval *timeout);
You can find more information on these and other
socket I/O operations in the Unix man pages.
RPC
With message passing, the application programmer must
worry about many details:
- parsing messages
 
- pairing responses with request messages
 
- converting between data representations
 
- knowing the address of the remote machine/server
 
- handling communication and system failures
 
RPC is introduced to help hide and automate these details.
RPC is based on a ``virtual'' procedure call model 
-  client calls server, specifying operation and arguments
 -  server executes operation, returning results
 
RPC Issues
-  Stubs  (See Unix rpcgen tool, for example.)
-  are automatically generated, e.g. by compiler
 -  do the ``dirty work'' of communication
 
 -  Binding method
-  server address may be looked up by service-name
 -  or port number may be looked up
 
 -  Parameter and result passing
 -  Error handling semantics
 
RPC Diagram