Resolving intermittent performance problems in computer systems is made easier by pinpointing when a change occurs in the system's perforrnance-determinin g factors (e.g., workload composition, configuration). Since we often lack direct measurements of performance factors, this paper presents a procedure for indirectly detecting such changes by analyzing performance characteristics (e.g., response times, queue lengths). Our procedure employs a widely used clustering algorithm to identify candidate change points (the times at which performance factors change), and a newly developed statistical test (based on an AR(1) time series model) to determine the signficance of candidate change points. We evaluate our procedure by using simulations of M/M/1, FCFS queueing systems and by applying our procedure to measurements of a mainframe computer system at a large telephone company. These evaluations suggest that our procedure is effective in practice, especially for larger sample sizes and smaller utilizations. We further conclude that indirectly detecting changes in performance factors appears to be inherently difficult in that the sensitivity of a detection procedure depends on the magnitude of the change in performance characteristics, which often has a nonlinear relationship with the change in performance factors. Thus, a change in performance factors (e.g., increased service times) may be more readily detected in some situations (e.g., very low or very high utilizations) than in others (e.g., moderate utilizations). A key insight here is that the sensitivity of the detection procedure can be improved by choosing appropriate measures of performance characteristics. For example, our experience and analysis suggest that queue lengths can be more sensitive than response times to changes in arrival rates.
|Journal||ACM SIGMETRICS Performance Evaluation Review|
|Publication status||Published - 2 Jun 1991|