CIS 5406 - Lecture Notes # 23 - More Performance Analysis

                          COMPUTER AND NETWORK
                         SYSTEM  ADMINISTRATION
                         Summer 1998 - Lesson 12

                        More Performance Analysis

A. Swap space

   1. the disk area used for paging and swapping

   2. many rules of thumb

      - look at swap utilization

        pstat -s (BSD-based)
        top, free (Linux)
        swap -l or swap -s (Solaris)
        Windows NT Diagnostics (Administrative Tools) -> Memory

      - example from mu:

      7624k allocated + 1608k reserved = 9232k used
      23488k available

      - example from linuxfs1 (free):
                     total       used       free     shared    buffers     cached
        Mem:         63144      56868       6276      33992      12404      25240
        -/+ buffers:            19224      43920
        Swap:       130748         52     130696

      - example from NT:

        Pagefile Space (K)
           Total: 44,032
           Total in use: 15,456
           Peak: 16,128
 
           C:\pagefile.sys
              Total: 44,032
              Total in use: 15,456
              Peak: 16,128

   3. adding swap (Unix)

      - spread swap out over several local disks should improve performance

      - used to have to repartition

      - now can use a file within a filesystem

        mkfile  

        > creates a file suitable for swap
          (padded with zeroes)

        /usr/etc/swapon 
        swap -a  (Solaris)

        > adds file to swap area 

        > don't add until file system is mounted

   4. adding swap (NT)

      - System control panel, choose Performance tab

      - click on Change in Virtual Memory section


C. Disk performance

   1. want to optimize 3 factors

      - per-process disk throughput

      - aggregate disk throughput

      - disk storage efficiency

   2. per-process disk throughput

        > the speed at which a single process can read or
          write to a disk

        > simple to measure - "cp" a large file

        > MAX throughput: 2M /sec

        > local disk to disk: 32M / 35 sec = 914K/sec

        > this is good since the "cp" process only got about
          half of the CPU time

        > network copy: 125K /sec

   3. aggregate disk throughput

        > difficult to measure
 
        > depends on job mix
   
   4. disk storage efficiency

        > minimize wasted space

        > performance and space optimization are usually
          in an inverse relationship

        > decrease block size increases usage
          but decreases performance

   5. best recommendation

        > spread disk load around

        > most workstations these days have /, swap, and /usr
          on a local disk and everything else on NFS mounted
          partitions

        > which drives (or NFS servers) are the most loaded?

        > place the handful of most common binaries on local
          /usr partition
        
   6. iostat

    - example: iostat -d 5

              sd0           sd1           sd3 
     bps tps msps  bps tps msps  bps tps msps  
       4   1  0.0   17   2  0.0    1   0  0.0 
      79  11  0.0   23   3  0.0    2   0  0.0 
      59   8  0.0   72  10  0.0    2   0  0.0 
       0   0  0.0   10   2  0.0    0   0  0.0 
      12   2  0.0    0   0  0.0   87  11  0.0 
       0   0  0.0    0   0  0.0  477  60  0.0 
       2   0  0.0   11   2  0.0  452  57  0.0 
      13   2  0.0   11   1  0.0  464  58  0.0 
      13   2  0.0   60   9  0.0  479  61  0.0 
       0   0  0.0    3   0  0.0  458  57  0.0 
       0   0  0.0    3   0  0.0  521  65  0.0 
       0   0  0.0   20   3  0.0  465  58  0.0 
       7   1  0.0   12   2  0.0  501  63  0.0 
       4   1  0.0   23   4  0.0  471  59  0.0 
       0   0  0.0    2   0  0.0  511  64  0.0 
       0   0  0.0   19   3  0.0  137  17  0.0 
       0   0  0.0   16   2  0.0    0   0  0.0 
       0   0  0.0   14   2  0.0    0   0  0.0 
      18   3  0.0   31   5  0.0    4   1  0.0 


      > bps - average Kbytes/sec during last interval

      > msps - msec / seek, unreliable due to controller 
               specificity, ignore it 

      > tps - average number of transfers/sec during previous 
              interval (above reflects 8K size)
    
    - take averages over long period of use and study peaks

    - move files among disk and servers to equalize load


   7. large vs. small partition sizes

    - example: 9 gbyte disk

    - arguments for LARGE partitions:

      > fewer mounts and unmounts

      > fewer file systems to manage (fewer quota files, etc.)

      > less wasted space

    - arguments for SMALL partitions

      > easier to backup if size is less than
        media size (2.0 G, for example)

      > file system overflow affects fewer users

      > file system corruption affects fewer users


   8. df: display file system statistics

      Example (use -k on Solaris):
      % df -k
      Filesystem            kbytes    used   avail capacity  Mounted on
      /dev/dsk/c0t3d0s0     206567   57710  128207    32%    /
      /dev/dsk/c0t3d0s6     529033  175197  300936    37%    /usr
      /proc                      0       0       0     0%    /proc
      /dev/dsk/c0t3d0s4      95994   22734   63670    27%    /var
      /dev/dsk/c0t2d0s0     978848   55892  825076     7%    /real/cs13
      /dev/dsk/c0t2d0s1     979893  526294  355619    60%    /real/cs14
      ...
      xi:/real/cs25         433632  300592   89680    78%    /home/cs25
      nu:/real/par1/mail    394287  116632  238227    33%    /var/spool/mail
      sed:/real/cs37       1668468 1414489   87133    95%    /home/cs37
      ...


   9. du: report on disk usage

       Example (use -k on Solaris):
       % du -k public_html
       2       public_html/.AppleDouble
       30      public_html/lecture
       93      public_html

       Example (summary info):
       % du -ks public_html
       93      public_html

   10. disk usage in NT

       - right click on drive or directory and choose properties       
       - Run "Disk Administrator"


D. Network performance (and integrity)

  1. netstat -i 

            input           output        
        packets errs    packets  errs     colls 
        ---------------------------------------
        37135998 8      27664258  72    1753344 

   - a large number of input errors means that there may
     some faulty hardware on the local net

     > should be under 0.025% of input packets

   - a large number of output errors may indicate that the
     local machine's transceiver, ethernet controller, or
     AUI cable is faulty

     > should be under 0.025% (above is 0.0003%)

   - collisions are normal on an ethernet, may be as high as 10%

     > if consistently higher than 10% then consider bridging
       or subnetting

     > above is 6.3%

  2. nfsstat -c (revisited)

   - can also be used to test network corruption of packets

   Example from nu:

   Client rpc:

   calls    badcalls retrans  badxid   timeout  wait     newcred  timers
   1966207  21774    0        549      21756    0        0        8344   

   - retrans field indicates number of packets this host had to 
         retransmit as an RPC client

   - badxid field indicates that the client received a reply for which
     there is no outstanding call

   - if retrans is over 5% of total calls then suspect trouble

     > above example is 1%

   - what kind of trouble?

     > if badxid and retrans are roughly equal then some
       NFS server is having trouble keeping up with NFS load

     > if retrans is high and badxid is low then network itself
       is the problem (high load or data corruption)

   - nfsstat lets you zero the counters at any time with:

                    nfsstat -z


  3. tracking network problems

    - right way, buy a LAN analyzer

      > time reflections (with a Time Domain Reflectometer - TDR)

	tells you how far up the wire there is a 
	reflection (within inches)

    - wrong way

      > methodically dissect the network

  4. other ways to assess network load

     - netstat shows active connections ("netstat -a")

     - use "spray" / "tcpspray" to load network and see weak spots

       Example: 
       % spray -c 100 xi
       sending 100 packets of length 86 to xi ...
           no packets dropped by xi
           7736 packets/sec, 665325 bytes/se

     - statnet (Linux)


E. The Big Picture/Putting It All Together

   1. xsystats: displays system statistics in graphical format

      - can display multiple graphs in one window (even multiple machines)

      - the bottom of each graph shows graph type and scale

      - graphs will be rescaled on the fly

      - graph types:
        collisions: number of incoming ethernet collisions since last update
        context: number of context switches per second
        cpu: percentage of cpu time being used
        disk: number of disk transfers per second
        errors: number of incoming ethernet errors since last update
        interrupts: average number of device interrupts, per second
        load1, load5, load15: load average for 1, 5 and 15 minutes
        packets: number of incoming ethernet packets per second
        page: page ins since last update
        swap: swap ins since last update 

      - nice script that combines graphs with pretty colors: showstats
        Syntax: showstats [-s[how]] [-w[idth] n] [ host1 [host2 ...] ]


   2. monitor (AIX)

      - a large amount of info on one screen (uptime, cpu, I/O, NFS, etc.)


   3. WinNT Performance Monitor

      - track variations in the use of system resources

      - start by clicking + icon in toolbar

      - select objects, instances and counters to monitor

      - Commonly used counters:

        Object        Counter                  Description
        -----------   ----------------------   -------------------------------------
        Processor     %Processor Time          Time executing non-idle thread
        Process       %Processor Time          Time allocated to a specific process
        Memory        Pages/sec                High value indicates excessive paging
        Cache         Data Map Hits%           Low value suggests memory shortage
        LogicalDisk   Avg. Disk Queue Length   Activity on logical disk

      - Note: in order to monitor disk activity, at the command prompt type
        diskperf -y
        and reboot

      - choose "Explain" for explanation of a counter

      - legend of monitored counters appears at the bottom

      - chart is not the only view, also have report and log

      - can save individual settings or entire workspace

      - can monitor remote computers