Summer 1996 - Lesson 07

 
                      The Network File System - NFS


A. Introduction

   - What was life like before NFS?

   - built on top of:

     UDP - User Datagram Protocol (unreliable delivery)
     XDR - eXternal Data Representation (machine independent data format)
     RPC - Remote Procedure Call

  1. NFS is both a set of specifications and an implementation

  2. The protocol specifications are independent of architecture
     and operating system 
 
  3. two protocols - mount protocol and NFS protocols

     - mount protocol establishes initial link between client and
       server machines

     - NFS protocols provide a set of RPCs for remote file
       operations

       > searching a directory
       > reading a set of directory entries
       > manipulating links and directories
       > accessing file attributes
       > read and writing files
       > notably missing are open() and close()
       > there is no equivalent to UNIX file tables on the server
         side
       > each request must provide full set of arguments including
         a unique file identifier and offset

  4. problems

     - performance (even with UDP)
       > modified data may be cached locally on the client
       > once the cache flushes to server the data must be written to
         disk before results are returned to the client and the cache
         is flushed
       > the benefits of server caching are lost


     - semantics

       > UNIX semantics (without NFS) and session semantics (ala Andrew File System)
       > NFS claimed to implement UNIX semantics

       > UNIX semantics (without NFS)
         + writes to an open file are visible immediately to other
           users who have the file open at the same time
         + the file is viewed as a single resource
       > Session semantics (ala Andrew file system)
         + writes to an open file are not visible to others having
           it open at the same time
         + once a file is closed the changes are visible only in
           the sessions opened later
       > NFS claimed to implement UNIX semantics
         + there are two client caches: file blocks and file attributes
         + cached attributes are validated with server on an open()
         + the biod implements read-ahead and delayed-write techniques
         + newly created files may not be visible to other sites for
           up to 30 seconds
         + it is indeterminate whether writes to a file will be immediately
           seen by other clients who have the file open for reading
       > example
         - touch file on xi
         - ls on delta
         - rm file on xi
         - ls on delta

     - If a single NFS stat() request hangs, it can hang up UNIX commands,
       like "df"!

     - "magic cookies" (random numbers) used to short-cut future
	validations.  Given to client from server, client can use it
	to re-connect whenever a server comes back up after a crash.
	--> can be spoofed <--  Note that "stale cookies" (yuck) can
	make a client hang (solution: remount the filesystem on the
	client to make it get a new, fresh cookie).

B. Server

   1. mountd - Sun's UNIX implementation of the mount protocol

      - SunOS 4.x reads /etc/exports
      - uses "exportfs" to have mountd reload table ("exportfs -a")

      - example: xi:/etc/exports

      /          -ro,access=lpdaemon:lpdaemon2,root=mu
      /usr       -ro,access=lpdaemon:lpdaemon2,root=mu
      /real/cs25 -access=lpdaemon:lpdaemon2:majorslab,root=mu:nu:tau
      /real/cs26 -access=lpdaemon:lpdaemon2:majorslab,root=mu:nu

      - SunOS 5.x reads /etc/dfs/dfstab
      - uses "share" to have mountd reload table (see Table 17.4, p. 371)

      - example: export:/etc/dfs/dfstab

      share -F nfs -o ro,root=nu:mu /
      share -F nfs -o ro,root=nu:mu /usr
      share -F nfs -o rw=lpdaemon:lpdaemon2:majorslab,root=nu:mu: /real/cs13
      share -F nfs -o rw=lpdaemon:lpdaemon2:dad,root=nu:mu: /real/cs14
      share -F nfs -o rw=lpdaemon:lpdaemon2:,root=nu:mu: /real/cs15
      share -F nfs -o rw=lpdaemon:lpdaemon2,root=nu:mu:beta:chi\
            :epsilon:kill:rho:sigma:socket:exec:sync /real/cs16

      - Linux (Slackware, at least) uses /etc/exports and "kill -HUP"
	to mountd.  Linux (apparently) provides "NFS multiplying" --
	NFS serving of an NFS mounted file system.

      - Table 17.1, 17.2, and 17.3 give further implementation specifics.

   2. nfsd

      - handles requests for NFS file service
      - very small, basically turn around and call kernel
      - system tuning - See Table 17.5, page 372
      - Nemeth says 10 on a dedicated file server
      - Loukides says leave it at 4 (performance tuning book)
      - he says the kernel inode table and file table size are
        more important (an NFS server has more open files)

C. Client side

   1. extended "mount" command, accepts "host:path" syntax for NFS filesystems

      - /etc/fstab in SunOS 4.x

      - example: 
          /dev/sd0a              /              4.2 rw 1 1
          /dev/sd0g             /usr            4.2 rw 1 2

-- Where's the remote file systems?  Done via automounter (see below)

      - /etc/vfstab in SunOS 5.x
    
      - example:

#device            device              mount             FS   fsck mount
#to mount          to fsck             point             type pass boot 
#-----------------------------------------------------------------------
/proc              -                   /proc             proc  -   no   
fd                 -                   /dev/fd           fd    -   no   
swap               -                   /tmp              tmpfs -   yes  
/dev/dsk/c0t3d0s0  /dev/rdsk/c0t3d0s0  /                 ufs   1   no   
/dev/dsk/c0t3d0s6  /dev/rdsk/c0t3d0s6  /usr              ufs   2   no   
/dev/dsk/c0t3d0s5  /dev/rdsk/c0t3d0s5  /opt              ufs   5   yes  
/dev/dsk/c0t3d0s1  -                   -                 swap  -   no   

      - Type "mount" to see currently mounted file systems

      - example:

/dev/sd0a on / type 4.2 (rw)
/dev/sd0g on /usr type 4.2 (rw)
mount:/real/cs4 on /tmp_mnt/home/cs4 type nfs (rw,suid,hard,intr)
mount:/real/cs5 on /tmp_mnt/home/cs5 type nfs (rw,nosuid,hard,intr)
access:/real/cs23 on /tmp_mnt/home/cs23 type nfs (rw,nosuid,hard,intr)


   2. NFS service is provided in kernel

      - transparent to user

   3. biod 

      - provides read-ahead and write-behind caching
      - another tuning issue


D. Administering NFS

   1. user must have account on file server or access rights can't be checked
      (can default to user "nobody").

   2. for example: Majors labs DOS users must have accounts
      on xi, sed, mount, export, access, pi, upsilon

   3. In CompSci, the artificial shells setup keeps them from logging into
      the file servers and running up the load

   4. must keep UIDs and GIDs consistent across machines

   5. don't mount outside of local net
 
   6. write performance is an issue
   
      - consider dedicated non-volatile NFS cache cards
      - or spread out the load on more small user disks

E. auto-mounting

   1. Sun's "automount" daemon (used on CompSci network)

      - nice to keep one NIS automount map instead of ~50 /etc/fstab
        maps

      Operation (using CompSci mappings):

      - the autmounter appears to the kernel to be an NFS server
      - automount uses its maps to locate a real NFS file server
      - it then mounts the  file system in a temporary location
      - and creates a symbolic link to the temporary location
      - If the file system is not accessed within an appropriate
        interval (five minutes by default), the daemon unmounts the
        file system  and removes the symbolic link
      - if the indicated directory has not already been created, the
        daemon creates it, and then removes it upon exiting.
      - this is different from a regular mount for which the mount point
        must already exist

      - example (somewhat convoluted) configuration maps:

      - auto.master (available via a NIS file; "ypcat -k auto.master")

      /home   auto.home         # an indirect map all rooted at "/home"
      /-      auto.direct	# "/-" means a direct map
      /net    -hosts         -rw,nosuid,hard,intr # "-host" means use
						  # NIS "host.byname" to look
						  # up the hostname; will 
						  # mount any permissible
						  # NFS server on "/net/..."
      - auto.direct ("ypcat -k auto.direct")

      Path	      mount() options         actual location
      ----	      ---------------         ---------------
      /nu0            -rw,nosuid,hard,intr    sync:/real/nu0
      /nu1            -rw,suid,hard,intr      sync:/real/nu1
      /nu2            -rw,suid,hard,intr      sync:/real/nu2
      /var/spool/mail -rw,nosuid,hard,intr    nu:/usr/spool/realmail

      - auto.home ("ypcat -k auto.home")

      Path    mount() options         actual location
      ----    ---------------         ---------------
      s5      -rw,nosuid,hard,intr    psi:/s5
      s6      -rw,nosuid,hard,intr    psi:/s6
      cs4     -rw,suid,hard,intr      mount:/real/cs4
      cs5     -rw,nosuid,hard,intr    mount:/real/cs5
      cs6     -rw,nosuid,hard,intr    mount:/real/cs6
      cs7     -rw,nosuid,hard,intr    mount:/real/cs7
      cs8     -rw,nosuid,hard,intr    mount:/real/cs8
      cs9     -rw,suid,hard,intr      mount:/real/cs9
      cs10    -rw,nosuid,hard,intr    mount:/real/cs10
      cs11    -rw,suid,hard,intr      mount:/real/cs11 
      .
      .
      .
      cs38    -rw,nosuid,hard,intr    pi:/real/cs38


  2. "amd" - Public domain automounter from Jan-Simon Pendry's doctoral
     thesis (used at SCRI)

     - new features; more flexible

     - irritating features of the Sun implementation were improved

       > amd does not hang if a remote file system goes down

       > amd attempts to mount a replacement file system if and
         when they become available

    - amd automatically unmounts (via "keep-alive")

    - Interesting list of mount types (Table 17.7, page 380)

    - non-blocking operation

    - amd maps can be just as convoluted!

F. Security

   Don't export to hosts for which non-trusted users have root access.

   If you don't control root on the machine then don't export the file system.

   Block NFS UDP traffic at your router, if possible.

G. tuning NFS

   nfsstat -c  to see client side

   ---------------------------------------------------------------------
   Client rpc:
   calls    badcalls retrans  badxid   timeout  wait     newcred  timers
   3175986  1991     0        1232     1991     0        0        5330

   Client nfs:

   calls      badcalls   nclget     nclsleep
   3173970    0          3173970    0

   getattr    setattr    root       lookup     readlink    read
   192650  6% 49  0%     0  0%      831671 26% 2059211 64% 78054  2%


   write      create     remove     rename     link       symlink
   140  0%    124  0%    50  0%     3  0%      7  0%      0  0%

   mkdir      rmdir      readdir    fsstat
   0  0%      0  0%      940  0%    11071  0%

   > what does the client spend most of its time doing?

     - reading links and looking up information about files

     - percentage of writes is low (don't need NFS server card?)

     - is timing out some but isn't having to retransmit

     - badxid: received a reply for which there is no outstanding call

     - timeout: a call timed out

     - badxid and timeouts are roughly equal, but are only .0006 of
       all calls

     - if timeouts or retransmissions were high, say > 5% then we
       want to know why

     - if badxid ~= timeout then server is too slow (and is dropping
       packets)

     - if badxid << timeout then go get your network analyzer
       because packets are getting lost on the net due to some
       other hardware problem


   tuning with mount command:

       rsize=n       Set the read buffer size to n bytes.

       wsize=n       Set  the  write  buffer  size  to  n
                     bytes.

       timeo=n       Set the NFS timeout to n tenths of a
                     second. 

       retrans=n     The number of NFS retransmissions. 

   ---------------------------------------------------------------------
   
   nfsstat -s

   Server rpc:
   calls      badcalls   nullrecv   badlen     xdrcall
   82414467   0          0          0          0

   Server nfs:
   calls      badcalls
   82414467   264

   null       getattr      setattr     root    lookup
   82760  0%  36039746 43% 217061  0%  0  0%   27784077 33%

   readlink     read
   287401  0%   6382386  7%

   wrcache    write        create      remove      rename
   0  0%      2130913  2%  397712  0%  184138  0%  31848  0%

   link       symlink    mkdir      rmdir      readdir     fsstat
   10468  0%  1062  0%   4461  0%   4616  0%   8807761 10% 48057  0%

   > what does the server spend most of its time doing?

     - getting attributes and performing lookups (for ls -l?)

     - its a good thing that attributes are cached on the client
       side (using biod)


H. Beyond NFS

	o AFS - Andrew File System, from CMU and Transarc Corp.

	  - Much better authentication (Kerberos)
	  - 8 inch high stack of installation books!
	  - Adds new file system type to kernel
	  - Addresses more than just file system semantics, also
	    user authentication, etc.
	  - Large local client-side disk cache improves performance

	o DFS - Distributed File System from OSF

	  - "successor" to AFS; AFS-like
	  - Beginning to show up in most vendor's UNIX implementations
	  - Major part of DCE (Distributed Computing Environment)