Continuous Profiling: Where Have All the Cycles Gone?

References

AndL91
T. E. Anderson and E. D. Lazowska. Quartz: A tool for tuning parallel program performance. Proceedings of the ACM SIGMETRICS 1990 Conference on Measurement & Modeling of Computer Systems, 18(1):115--125, January 1991.
BalL94
T. Ball and J. Larus. Optimally profiling and tracing programs. ACM TOPLAS, 16(4):1319--1360, July 1994.
Bli92
D. Blickstein et al. The GEM optimizing compiler system. Digital Technical Journal, 4(4), 1992.
Car90
D. Carta. Two fast implementations of the `minimal standard' random number generator. CACM, 33(1):87--88, January 1990.
CohGLR97
R. Cohn, D. Goodwin, P. G. Lowney, and N. Rubin. Spike: An optimizer for Alpha/NT executables. In USENIX Windows NT Workshop, Seattle, Aug 1997.
CohL96
R. Cohn and P. G. Lowney. Hot cold optimization of large Windows/NT applications. In 29th Annual International Symposium on Microarchitecture (Micro-29), Paris, France, December 1996.
DCPI
DIGITAL Continuous Profiling Infrastructure project. http://www.research.digital.com/SRC/dcpi/.
DEC95a
Digital Equipment Corporation. Alpha 21164 Microprocessor Hardware Reference Manual. Maynard, MA, 1995. Order Number EC-QAEQB-TE.
DEC95b
Digital Equipment Corporation. DECchip 21064 and DECchip 21064A Alpha AXP Microprocessors Hardware Reference Manual. Maynard, MA, 1995. Order Number EC-Q9ZUA-TE.
GolH93
Aaron J. Goldberg and John L. Hennessy. MTOOL: An integrated system for performance debugging shared memory multiprocessor applications. IEEE Trans. on Parallel and Distributed Systems, pages 28--40, January 1993.
GraKM82
S. Graham, P. Kessler, and M. McKusick. gprof: A call graph execution profiler. SIGPLAN Notices, 17(6):120--126, June 1982.
Hal96
M. Hall et al. Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer, 29(12):84--89, December 1996.
Ipr
Iprobe. Digital internal tool.
JohPP94
R. Johnson, D. Pearson, and K. Pingali. The program structure tree: Computing control regions in linear time. In ACM PLDI, pages 171--185, 1994.
McC95
J. D. McCalpin. Memory bandwidth and machine balance in high performance computers. IEEE Technical Committee on Computer Architecture Newsletter, December 1995. http://www.cs.virginia.edu/stream.
McCKAK
J. McCormack, P. Karlton, S. Angebranndt, and C. Kent. x11perf. http://www.specbench.org/gpc/xpc.static/index.html.
MIPS90
MIPS Computer Systems. UMIPS-V Reference Manual (pixie and pixstats). Sunnyvale, CA, 1990.
prof
prof. Digital Unix man page.
ReiS94
J. F. Reiser and J. P. Skudlarek. Program profiling problems, and a solution via machine language rewriting. SIGPLAN Notices, 29(1):37--45, January 1994.
RosHWG95
M. Rosenblum, S. Herrod, E. Witchel, and A. Gupta. Complete computer simulation: The SimOS approach. IEEE Parallel and Distributed Technology, Fall 1995.
SitW95
R. Sites and R. Witek. Alpha AXP Architecture Reference Manual. Digital Press, Newton, MA, 1995.
SPEC95
The Standard Performance Evaluation Corporation. http://www.specbench.org/osg/spec95.
TPPC
Transaction Processing Performance Council. http://www.tpc.org/bench.descrip.html.
Vtune
Vtune: Intel's visual tuning environment. http://developer.intel.com/design/perftool/vtune.
Zag96
M. Zagha et al. Performance analysis using the MIPS R10000 performance counters. In Proceedings of Supercomputing, November 1996.
Zha97
X. Zhang et al. Operating system support for automated profiling & optimization. In Proceedings of the 16th ACM Symposium on Operating Systems Principles, St. Malo, France, Oct 1997.

Beginning of paper
Abstract
1. Introduction
2. Related Work
3. Data Analysis Examples
4. Data Collection System
5. Profiling Performance
6. Data Analysis Overview
7. Future Directions
8. Conclusions
Acknowledgements
References

This paper was published in the Proceedings of the 16th ACM Symposium on Operating Systems Principles, October, 1997. Copyright 1997 by the Assocation for Computing Machinery. All rights reserved. Republished by permission.