COMPUTER AND NETWORK SYSTEM ADMINISTRATION Summer 1998 - Lesson 11 Performance Analysis A. Introduction 1. When performance is bad, user complaints come in the form of: "Why is the system so sloooooow?" or "My job is taking forever to run!" User will report slow keyboard response or long compilation times. Hopefully you as the administrator notice these problems first before the bombardment of user complaints. 2. Where to start? What to monitor? Performance is affected by the efficency of the four main resources that a system offers: - CPU - Memory - Disk - Network 3. These are all related. - NFS traffic depends on network bandwidth as well as disk bandwidth - disk bandwidth depends on memory if disk caching is in place 4. What is good performance? - the system administrator must distinguish between poor performance caused by system malfunctioning and that caused by heavy usage - times of heavy usage are good times to analyze the system and see where bottlenecks are - this will help you determine where to put scarce funds - long term analysis B. CPU monitoring 1. time: time a command - several system commands will time a job - /usr/bin/time, /usr/5bin/time (Solaris), shell's built-in "time" Example (/usr/bin/time): % /usr/bin/time find / -name csh.1 -print /usr/share/man.xi.orig/man1/csh.1 real 3.2 user 0.4 sys 1.9 real: wall clock time user: user CPU time sys: system CPU time Example: (csh built-in time): % time find / -name csh.1 -print /usr/share/man.xi.orig/man1/csh.1 0.39u 1.64s 0:02.56 79.2% 0.39u: user CPU time 1.64s: system CPU time 0:02.56: wall clock time 79.2%: percentage of time spent on CPU ((u+s)/w) 3. uptime: report current time, amount of time system has been up, number of users, load average Example: % uptime 3:03pm up 1 day(s), 1:20, 14 users, load average: 0.20, 0.09, 0.08 - load average is rough measure of CPU use - reports the average number of processes active during the last minute, 5 minutes, 15 minutes 4. rup: show host status of remote machines Example: % rup xi upsilon sed nu linuxfs1 linuxfs2 xi up 1 day, 1:31, load average: 0.13, 0.26, 0.19 upsilon up 63 days, 23:31, load average: 0.00, 0.02, 0.02 sed up 1 day, 21:47, load average: 0.00, 0.00, 0.00 nu up 37 days, 21:39, load average: 0.11, 0.09, 0.00 linuxfs1 up 1 day, 23:15, load average: 0.00, 0.06, 0.09 linuxfs2 up 14 days, 17:35, load average: 0.03, 0.01, 0.00 5. ps: report process status - has many options - read man page for specifics Example: (Solaris) % ps -ef UID PID PPID C STIME TTY TIME CMD root 0 0 0 Jul 03 ? 0:00 sched root 1 0 0 Jul 03 ? 0:04 /etc/init -r root 2 0 0 Jul 03 ? 0:00 pageout root 3 0 0 Jul 03 ? 3:36 fsflush root 449 1 0 Jul 03 ? 0:00 /usr/lib/saf/sac -t 300 root 224 1 0 Jul 03 ? 1:09 /usr/lib/autofs/automountd root 136 1 0 Jul 03 ? 0:27 /usr/sbin/rpcbind healy 7763 7712 1 14:18:43 pts/7 0:12 emacs signal.c koshy 3279 3276 0 Jul 03 pts/13 0:00 -reg-csh root 17242 243 0 19:20:12 ? 0:00 /usr/samba/bin/smbd -D UID: login name PID: process id PPID: process ID of the parent process C: current scheduler value STIME: start time TTY: associated terminal TIME: accumulated CPU time CMD: command - ps is often used in pipes Example: % ps -ef | grep httpd nobody 9538 299 0 15:29:23 ? 0:00 /usr/local/etc/httpd/httpd nobody 9302 299 0 15:22:59 ? 0:00 /usr/local/etc/httpd/httpd nobody 9557 299 0 15:31:29 ? 0:00 /usr/local/etc/httpd/httpd nobody 9540 299 0 15:29:24 ? 0:00 /usr/local/etc/httpd/httpd nobody 9112 299 0 15:17:34 ? 0:00 /usr/local/etc/httpd/httpd nobody 9304 299 0 15:23:22 ? 0:00 /usr/local/etc/httpd/httpd 6. top: display and update information about the top cpu processes - excellent tool for overall view of system - combines output of several commands (uptime, ps, vmstat) Example: % top last pid: 9649; load averages: 0.03, 0.05, 0.12 15:36:19 113 processes: 112 sleeping, 1 on cpu CPU states: 97.6% idle, 1.0% user, 1.4% kernel, 0.0% iowait, 0.0% swap Memory: 152M real, 48M free, 54M swap, 721M free swap PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND 9649 barnash 31 0 1864K 1472K cpu 0:00 1.26% 0.99% top 5356 sheff 33 0 2064K 1384K sleep 8:19 0.33% 0.39% xsysstats 9114 nobody 33 0 1968K 1472K sleep 0:00 0.11% 0.16% httpd 9585 nobody 35 0 1968K 1448K sleep 0:00 0.11% 0.12% httpd 9304 nobody 33 0 1984K 1480K sleep 0:00 0.01% 0.10% httpd PID: process id USERNAME: name of the process's owner PRI: current priority of the process NICE: nice amount (in the range -20 to 20) SIZE: total size of the process (text, data and stack; kilobytes) RES: current amount of resident memory (kilobytes) STATE: current state (sleep, wait, run, idl, zomb, stop) TIME: number of system and user cpu seconds the process has used WCPU: weighted percentage of cpu time CPU: raw percentage of cpu time COMMAND: name of the command - from within top, you can control behavior of processes with renice and kill renice: - change nice number (requested execution priority) - Syntax: r new-nice-number pid - nice range either -20 to 20 or 0 to 39 - the lower the nice number the higher the priority - only superuser can lower nice number kill: terminate process - Syntax: k [-signal] pid 7. Task Manager - Windows NT - ctrl-alt-delete and choose Task Manager or right click on taskbar and choose Task Manager - Applications: shows which applications are active - status should be "Running" - if status is "Not Responding" you can use End Task to kill it - double click on application or click "Switch To" to bring it to front - click "New Task" to start new application - right click on application to bring up menu of options - Processes: shows which processes are active (similar to top) - applications have one or more processes but not all processes have application - can end process by choosing "End Process"; be sure you know what process is before ending - right click on process to set priority - can reorder listing by clicking on column headings - Performance: CPU and memory utilization - graphical representation of CPU utilization - minimize Task Manager for CPU utilization graphic on Task Bar 8. QuickSlice - WinNT Server Resource Kit - nice graphical tool for analyzing cpu utilization B. Memory performance analysis 1. buying more memory is generally the cheapest way to improve performance 2. generally, active processes require more physical memory than is available - paging: involves moving sections of a process's memory to disk - page fault: occurs when a process needs a page of memory that is not resident and must be read in from disk - swapping: writing an entire process to disk, freeing all of its memory 3. swapper (BSD) / sched (Solaris) - the swapper moves processes which has been idle for more than 20 seconds (preventative swapping - normal housekeeping) - if the pagedaemon cannot keep lotsfree high enough, if the number of Kbytes of free memory fall below minfree then the swapper kicks in (desperation swapping) - the swapper chooses a process to swap out based on 2 criteria: > longest sleep time > if none are sleeping, then use resident memory size (the swapper chooses largest 4 processes, then picks the one which has been resident longest) - when a process is swapped out, everything goes - even the user structure and the page tables - swapping is much more expensive than paging so a highly loaded system - that invokes swapping frequently - does not perform well 4. When do we have problems? - preventative swapping is normal - a ps -aux usally shows many swapped out processes STAT column - W as second letter means swapped out - linux top also has STAT column - paging is also part of normal operations > a new process must have new pages brought into memory > also must page in when it references non-recently used section of memory - page faults always cause a performance degradation - usually, the pagedaemon quickly fixes the problem by getting rid of unneeded pages and loading the needed ones - when the pagedeamon fails then desperation swapping begins - what types of processes are likely to be swapped out by desperation swapping? > ans: ones that sleep: editors, shells, generally interactive processes > keyboard response time goes to pot since a keystroke requires a disk access (and the disk is probably heavily loaded at this time) 5. how to diagnose 1. tools - BSD: vmstat S5: sar Solaris: mpstat WinNT: Task Manager 2. these tools report: page-ins page-outs swap-ins swap-outs 3. page-ins - most UNIX systems use 'demand paging' - when a process is started only the memory maps for the process are loaded in physical memory - each memory access causes a page fault and each page is brought in 'on demand' - the alternative is 'pre-paging' - thus page-ins are normal 4. swap-ins - a new process acts like a swap-in - not very useful 5. page-outs - this is a first indicator that your memory is inadequate - some page-out activity is normal - does the frequency of page-outs dramatically increase whenever system performance is sluggish? - acceptable rate is O/S and hardware dependent - in order to know you need to establish baselines of activity 6. swap-outs - heavy amount of swap-outs signify problem 7. Example (BSD): % vmstat -S procs memory page disk faults cpu r b w avm fre si so pi po fr de sr d0 d1 d2 d3 in sy cs us sy id 0 0 0 0 3028 4 1 1 2 1 0 0 2 2 0 0 0 82 177 89 33 9 - procs Number of processes: r - runnable (not waiting for I/O or sleeping) b - blocked for resources (i/o, paging, etc.) w - runnable or short sleeper (< 20 secs) but swapped - any number but 0 in the w column indicates what? > ans: desperation swapping - memory avm - number of active virtual Kbytes (used in last 20 secs) fre - size of the free list in Kbytes > when this gets close to lotsfre, then page-outs begin - page Report information about swapping, page faults, and paging activity Reported in units per second (averaged over last 5 seconds) si - procs swap-ins so - procs swap-outs (not due to idle) pi - kilobytes per second paged in po - kilobytes per second paged out fr - kilobytes freed per second de - anticipated short term memory shortfall in Kbytes sr - pages scanned by clock algorithm, per-second - disk Report number of disk operations per second. - faults Report trap/interrupt rate averages per second over last 5 seconds in - (non clock) device interrupts per second sy - system calls per second cs - CPU context switch rate (switches/sec) - cpu Give a breakdown of percentage usage of CPU time. us - user time for normal and low priority processes sy - system time id - CPU idle - we are most concerned with swap-outs and page-outs procs memory page disk faults cpu r b w avm fre si so pi po fr de sr d0 d1 d2 d3 in sy cs us sy id 0 0 0 0 2508 20 0 0 0 0 0 0 13 0 0 0 226 216 350 7 6 87 0 0 0 0 2280 0 0 16 0 0 0 0 3 0 0 0 258 361 343 5 8 87 0 0 0 0 2104 21 0 124 56 184 0 111 5 0 0 0 545 667 563 14 16 70 0 0 0 0 2120 0 0 36 12 60 0 37 0 0 0 0 338 387 345 3 5 92 0 0 0 0 2076 0 0 12 0 28 0 23 1 0 0 0 263 271 370 3 4 92 0 1 0 0 2048 5 0 0 0 44 16 33 1 0 0 0 320 473 497 6 9 85 8 1 0 0 2116 10 0 0 0 100 0 56 23 0 0 0 514 377 898 14 14 72 0 0 0 0 2084 5 0 24 16 148 0 67 6 0 0 0 350 424 529 9 10 81 8. Example (Solaris): % sar -g 5 SunOS xi 5.5.1 Generic_103640-03 sun4u 07/05/97 20:15:43 pgout/s ppgout/s pgfree/s pgscan/s %ufs_ipf 20:15:48 0.00 0.00 0.00 0.00 0.00 - pgout/s: number of page out operations - ppgout/s: number of pages paged out - pgfree/s: number of reclaimed pages - pgscan/s: average number of pages scanned in order to find cadidates to reclain - percentage of inodes removed from the free list 9. Example: Task Manager - Choose "Process" tab - From "View" menu, choose "Select columns..." - Can choose from a variety of choices including Page Faults, Virtual Memory Size, etc. - "Performance" tab has graphical depiction of memory usage and other statistics - how may handles, threads and processes exist - total physical memory, how much is free and how much used for cache - commit charge shows how much memory is allocated to application and system programs. Also shows memory limit and peak. - memory used by kernel, how much is paged and nonpaged 10. wmem freeware utility for NT: shows RAM and paging information ftp://ftp.winsite.com/pub/pc/winnt/dskutil/wmem.zip ftp://mirrors.aol.com/pub/cica/pc/winnt/dskutil/wmem.zip