COMPUTER AND NETWORK
                       SYSTEM  ADMINISTRATION
                       Summer 1996 - Lesson 12


                     Adding Disks, File Systems


A The UNIX file system
  
  1. file systems

    - file systems reside on mass storage media such as disks

      > can reside on other media (RAM)

    - each disk is divided into one or more subdivisions called 
      partitions

    - each partition may contain only one filesystem

    - the file system abstraction:

         array of bytes ---> array of logical blocks

      > translates the user view of a file as an array of bytes 
        to the underlying structure of an array of logical blocks
        plus an offset

      > the file system reads and writes to logical blocks 

      > can't address anything smaller

    - a logical block consists of one or more physical blocks

    - the device driver

         logical blocks --> physical blocks

      > map logical blocks to physical blocks on the disk

    - the disk controller:

      > a physical block consists of one or more contiguous sectors    

      physical blocks ---> cylinder+head+sector

  2. directories
    
    - a directory is allocated in units called chunks

    - a chunk consists of a series of directory entries

    - each entry contains:

        a. the i-node number of the file, 
        b. the size of the directory entry, 
        c. the length of the filename, and 
        d. the file name

    - fields (b) and (c) are used merely for keeping track of 
      space in the chunk itself

    - the important fields are the i-node to name mapping

  3. i-nodes

    - index nodes or i-nodes contain information about one 
      particular file

    - the number of i-nodes is fixed and determined when the 
      file system is created (try "df -i")

    - the fields are dependent on type of file that the i-node
      references 

      > socket, directory, regular file

    - 128 bytes long (SunOS 4.x)
    
      > 8 per 1K, if average file size is 1K, what percentage
        of file system is taken up with i-nodes (11%)

      > using UNIX defaults the i-nodes typically take up 3-5%
        of the file system

    - see /usr/include/ufs/inode.h for full structure

    - fields are:

      + type of file and access mode (drwxr-xr-x)
      + file's owner (uid)
      + group-access identifier (group)
      + number of references to the file (hard links)
      + time of last access and modification
      + size of file in bytes
      + direct pointers (12 - 48 Kbytes if 4K blocks are used)
      + indirect pointers (points to a block of direct pointers)
      + double-indirect pointers
      + triple-indirect pointers

    - notably missing is the file name!

      > a file may have many names
    
  4. file system layout
    
    - bootblock (not really part of the file system)

    - superblock(s): contains (BSD-style superblock)
      + total number of blocks in fs 
      + number of i-node blocks in fs
      + total number of data blocks in fs
      + number of cylinder groups 
      + size of basic blocks in fs
      + pointers to cylinder group blocks can be calculated
        > from cylinder group size, offset into cylinder group 
        > in fact, given an i-node number one can calculate which
          cylinder group it belongs to
      + logical block size
      + lots more!
      + see /usr/include/ufs/fs.h for full structure

      > superblock is replicated to protect against catastophic loss
      > at least one in each cylinder group

    - cylinder group blocks

      + number of cyl's this cg
      + number of inode blocks this cg
      + number of data blocks this cg
      + free block map
      
    - i-nodes 
      + allocated within each cylinder group

    - data blocks
      + allocated within each cylinder group
    

B. BSD Fast File System
  
  1. Increased block size
    
    - block size was increased to be multiples of 4096 bytes vs. the
       old 1024-byte blocks

    > however, a uniformly large block size would waste space since many 
      UNIX files are small

  2. Fragments
    
    > try to get the best of both worlds 

    > large block size and little wasted space

    - write file in complete blocks except for last remainder which is
       written into a fragment

    - fragments may be break a block into 2, 4, or 8 pieces

      > each fragment is addressable

    - note that a fragment may not span blocks

    - to force the allocation routine to limit the number of fragments
      only direct blocks may refer to fragments (first 48K of a file)

    - indirect blocks must be full blocks
    
  3. Cylinder groups
    
    - increase locality of reference

    - locate i-nodes close to their associated data blocks 

    - keep a copy of the superblock in each cylinder group

  4. global allocation strategy
    
    - localize inodes - keep inodes for files in a directory in the same
      cylinder group

    - subdirectory entries are place in a cylinder group that has the
      most available free inodes 

      > when you create a subdirectory 

    - localize data blocks for each file 

      >  place all data blocks for a single file in the same 
         cylinder group

    - put in rotationally optimal positions

      > may be thwarted by zone sectoring as mentioned in book

    - keep cylinder groups from getting full

      > if file exceeds size of direct pointers
        move to new cylinder group
        > SunOS 4.x  i-node has 12 direct pointers (at 8K each)

        > move every 1 Mbyte thereafter
    
  5. local allocation strategy
    
    - when the global allocator requests a block the local allocator
       services the request

    - allocate the requested block if it is available

    - otherwise use the next available block that is rotationally
       closest to the requested block (in the same cylinder but
       perhaps a different platter)

    - if none is available then use a block within the same cylinder 
       group

    - if cg has < 10% free space then find another cylinder group 
      with a free block
    
  
  6. File system parameterization (parameters for newfs command)
    
    - block-size 

      > default on Sun is 8192

    - the number of cylinders per cylinder group in a
      file system 

      > The default is 16.

    - the fragment size of the file system in bytes

      > The default is 1024.

    - bytes/inode. 

      > This specifies the density of inodes in  the  file
         system.
      > The  default  is  to create an inode for
         each 2048 bytes of data space.
      > If fewer  inodes are  desired,  a  larger number should 
        be used
      > to create more inodes  a  smaller  number  should  be
         given.

    - reserved free space 

      > the  percentage  of  space  reserved  from  normal
         users;  the  minimum  free  space  threshold.  The
         default is 10%.

    - optimization (space or time)

      >  The file system  can  either  be
         instructed to try to minimize the time spent allocating blocks
      >  or to try  to  minimize  the  space fragmentation  on  the  disk.
      >  If the minimum free space threshold (as specified by the -m option) is
         less  than  10%,  the  default  is to optimize for
         space;
      >  if the  minimum  free  space  threshold  is
         greater  than  or  equal to 10%, the default is to
         optimize for time.
    
  7. Performance
    
    - multiply number of bytes per track time RPS to get upper bound on 
       disk bandwidth (18,432 x 60 = 1.1 Mbyte per second)

    - write test is "time cp /vmunix /var/tmp"

    - example: vmunix = 1,698,406 bytes, timed at 3.8sec = 446,948 BPS 
       or 40% of bandwidth)

  8. Some questions
  
     - The original Berkeley fast file system mentions that 
       someday soft links may be able to span machines. 
       Has this day come? 

       answer: yes, NFS

     - Why aren't hard links permitted to span file systems? 

       answer:
       a hard link is a name-inode pair and an inode is only unique within
       a file system, a symbolic link is a name-name pair and can follow
       the directory graph to any file system

     - Why is the per cylinder information placed at varying offsets from
       the beginning of each cylinder group? 

       answer: if the information was
       kept at the beginning of the cylinder group all of the info such as
       superblock copies would be on the same platter and could be wiped
       out by a single hardware failure. The info is offset about one 
       additional track into each cylinder group so that the info spirals
       down into the disk

    - Why did doubling the block size in the old UNIX file system from 512
      to 1024 bytes MORE than double the file system performance? 

      answer:
      The performance doubled because each disk transfer accessed twice as
      much data. An additional speedup occurred because many files no longer
      needed indirect blocks.

    - What are the trade-offs between dedicating more memory to the buffer
      cache versus dedicating it to the virtual memory system? 

      answer:
      dedicating it to the buffer cache improves the hit rate of disk access
      and I/O throughput improves; dedicating it to the virtual memory system
      means that more text pages will be in memory, which reduces the paging
      load, which allows the I/O bandwidth to be used for other disk
      requests, in addition the system is more responsive; more buffers
      for a file server, and more VM for a compute server?

      NOTE: Some more modern UNIXes combine the disk buffer cache and
      virtual memory, thus eliminating this schism (AIX, for example)

   - Why do you think that inodes are allocated statically at the time
     of file system creation rather than at the time of file creation?
 
      answer: to make access to i-nodes fast

      answer: to make fsck operate in less than a century; fsck scans
      the directory structure and then scans the inodes which comprise
      3-5% of the disk space (best to have them in known locations)
  
C. Disk installation

   1. connection

     - look at boot display (dmesg)

     - dkinfo  ("dkinfo sd0" on SunOS 4.x)

     - fdisk (Linux)

     - df

   2. creating device files

     - see if devices exist: ls -lg /dev/*sd3*

     - create devices: MAKEDEV sd3

   3. Low-level formatting, if needed

   4. labelling and partitioning

     - Use "format" command

   5. creating UNIX file system

     - newfs /dev/rsd3?

   6. verify integrity of file system

     - fsck /dev/rsd3?

   7. set up automatic mounting

     - add to /etc/fstab

     - type mount -a

       or, to just mount the single filesystem

       mount /new_mountpoint_name