COMPUTER AND NETWORK
SYSTEM ADMINISTRATION
Summer 1999 - Lesson 12

More Performance Analysis

A. Swap space

   1. The disk area used for paging and swapping

   2. Many rules of thumb

      - Look at swap utilization

        pstat -s (BSD-based)
        top, free, cat /proc/meminfo (Linux)
        swap -l or swap -s (Solaris)
        Windows NT Diagnostics (Administrative Tools) -> Memory

      - Example from SunOS 4.x (pstat -s):

      7624k allocated + 1608k reserved = 9232k used
      23488k available

      - Example from Linux (free):
                     total       used       free     shared    buffers     cached
        Mem:         63144      56868       6276      33992      12404      25240
        -/+ buffers:            19224      43920
        Swap:       130748         52     130696

      - Example from NT:

        Pagefile Space (K)
           Total: 44,032
           Total in use: 15,456
           Peak: 16,128
 
           C:\pagefile.sys
              Total: 44,032
              Total in use: 15,456
              Peak: 16,128

   3. Adding swap (Unix)

      - Spread swap out over several local disks should improve performance

      - Used to have to re-partition to add swap

      - Now can use a file within an existing filesystem

        mkfile  

        > creates a file suitable for swap
          (padded with zeroes)

        /usr/etc/swapon 
        swap -a  (Solaris)

        > Adds file to swap area 

        > Don't add until file system is mounted

        > Advice: only use this as a stop-gap measure until an
          actual swap partition can be added to the system.

   4. Adding swap (NT)

      - System control panel, choose Performance tab

      - click on Change in Virtual Memory section


C. Disk performance

   1. Disk storage efficiency

        > Minimize wasted space

        > Performance and space optimization are usually
          in an inverse relationship

        > Decrease block size increases usage
          but decreases performance

   2. Best recommendations

        > Spread disk load around, but locate them where
          the I/O bandwidth resides (larger server with
          multiple disk controllers, for example).

        > Most workstations these days have /, swap, and /usr
          on a local disk and everything else on NFS mounted
          partitions.
        
   3. iostat

    - Example: iostat -d 5

              sd0           sd1           sd3 
     bps tps msps  bps tps msps  bps tps msps  
       4   1  0.0   17   2  0.0    1   0  0.0 
      79  11  0.0   23   3  0.0    2   0  0.0 
      59   8  0.0   72  10  0.0    2   0  0.0 
       0   0  0.0   10   2  0.0    0   0  0.0 
      12   2  0.0    0   0  0.0   87  11  0.0 
       0   0  0.0    0   0  0.0  477  60  0.0 
       2   0  0.0   11   2  0.0  452  57  0.0 
      13   2  0.0   11   1  0.0  464  58  0.0 
      13   2  0.0   60   9  0.0  479  61  0.0 
       0   0  0.0    3   0  0.0  458  57  0.0 
       0   0  0.0    3   0  0.0  521  65  0.0 
       0   0  0.0   20   3  0.0  465  58  0.0 
       7   1  0.0   12   2  0.0  501  63  0.0 
       4   1  0.0   23   4  0.0  471  59  0.0 
       0   0  0.0    2   0  0.0  511  64  0.0 
       0   0  0.0   19   3  0.0  137  17  0.0 
       0   0  0.0   16   2  0.0    0   0  0.0 
       0   0  0.0   14   2  0.0    0   0  0.0 
      18   3  0.0   31   5  0.0    4   1  0.0 


      > bps - average Kbytes/sec during last interval

      > msps - msec / seek, unreliable due to controller 
               specificity, ignore it 

      > tps - average number of transfers/sec during previous 
              interval (above reflects 8K size)
    
    - Take averages over long period of use and study peaks

    - Move files among disk and servers to equalize load


   4. Large vs. small partition sizes

    - Example: 23 GB disk

    - Arguments for LARGE partitions:

      > Fewer mounts and unmounts

      > Fewer file systems to manage (fewer quota files, etc.)

      > Less wasted space

    - Arguments for SMALL partitions

      > Easier to backup if size is less than
        media size (2.0 G, for example)

      > File system overflow affects fewer users

      > File system corruption affects fewer users


   5. df: display file system statistics

      Example (use -k on Solaris):
      % df -k
      Filesystem            kbytes    used   avail capacity  Mounted on
      /dev/dsk/c0t3d0s0     206567   57710  128207    32%    /
      /dev/dsk/c0t3d0s6     529033  175197  300936    37%    /usr
      /proc                      0       0       0     0%    /proc
      /dev/dsk/c0t3d0s4      95994   22734   63670    27%    /var
      /dev/dsk/c0t2d0s0     978848   55892  825076     7%    /real/cs13
      /dev/dsk/c0t2d0s1     979893  526294  355619    60%    /real/cs14
      ...
      xi:/real/cs25         433632  300592   89680    78%    /home/cs25
      nu:/real/par1/mail    394287  116632  238227    33%    /var/spool/mail
      sed:/real/cs37       1668468 1414489   87133    95%    /home/cs37
      ...


   6. du: report on disk usage

       Example (use -k on Solaris):
       % du -k public_html
       2       public_html/.AppleDouble
       30      public_html/lecture
       93      public_html

       Example (summary info):
       % du -ks public_html
       93      public_html

   7. Disk usage in NT

       - Right click on drive or directory and choose properties       
       - Run "Disk Administrator"


D. Network performance (and integrity)

  1. netstat -i 

            input           output        
        packets errs    packets  errs     colls 
        ---------------------------------------
        37135998 8      27664258  72    1753344 

   - A large number of input errors means that there may
     some faulty hardware on the local net

     > should be under 0.025% of input packets

   - A large number of output errors may indicate that the
     local machine's transceiver, ethernet controller, or
     AUI cable is faulty

     > should be under 0.025% (above is 0.0003%)

   - Collisions are normal on an Ethernet, may be as high as 10%

     > if consistently higher than 10% then consider bridging
       or subnetting

     > above is 6.3%

  
  2. Other ways to assess network load

     - netstat shows active connections ("netstat -a")

     - use "spray" / "tcpspray" to load network and see weak spots
       (not a reliable tool, however)

       Example: 
       % spray -c 100 xi
       sending 100 packets of length 86 to xi ...
           no packets dropped by xi
           7736 packets/sec, 665325 bytes/se

     - statnet (Linux)


E. The Big Picture/Putting It All Together

   1. xsysstats: displays system statistics in graphical format

      - Can display multiple graphs in one window (even multiple machines)

      - The bottom of each graph shows graph type and scale

      - Graphs will be rescaled on the fly

      - Graph types:

        collisions: number of incoming Ethernet collisions since last update
        context: number of context switches per second
        cpu: percentage of cpu time being used
        disk: number of disk transfers per second
        errors: number of incoming ethernet errors since last update
        interrupts: average number of device interrupts, per second
        load1, load5, load15: load average for 1, 5 and 15 minutes
        packets: number of incoming ethernet packets per second
        page: page ins since last update
        swap: swap ins since last update 

      - Nice script that combines graphs with pretty colors: showstats
        Syntax: showstats [-s[how]] [-w[idth] n] [ host1 [host2 ...] ]

   2. monitor (AIX)

      - A large amount of info on one screen (uptime, cpu, I/O, NFS, etc.)


   3. WinNT Performance Monitor

      - Track variations in the use of system resources

      - Start by clicking + icon in toolbar

      - Select objects, instances and counters to monitor

      - Commonly used counters:

        Object        Counter                  Description
        -----------   ----------------------   -------------------------------------
        Processor     %Processor Time          Time executing non-idle thread
        Process       %Processor Time          Time allocated to a specific process
        Memory        Pages/sec                High value indicates excessive paging
        Cache         Data Map Hits%           Low value suggests memory shortage
        LogicalDisk   Avg. Disk Queue Length   Activity on logical disk

      - Note: in order to monitor disk activity, at the command prompt type

        diskperf -y

        and reboot

      - Choose "Explain" for explanation of a counter

      - Legend of monitored counters appears at the bottom

      - Chart is not the only view, also have report and log

      - Can save individual settings or entire workspace

      - Can monitor remote computers