dcpid - DIGITAL Continuous Profiling Infrastructure daemon.
dcpid [-event type[:period]] [-mux interval] [-bypid image] [-map mapfile] [-forkid seconds] [-unknown] [-epoch] [-merge seconds] [-flush seconds] [-hash bytes] [-chunk bytes] [-device device] [-log logfile] [-quiet] [-verbose] [-status seconds] [-logmaps] [-help] [-version] [-bug] [-nodaemon] [-socket socket] database
- -event type[:period]
- -event type[:period]+type[:period]+...+type[:period]
- -t is shorthand for -event
- Selects event types to monitor, and specifies the sampling period for each event type. This option can be repeated; each instance of -event specifies a set of event types to monitor using a single hardware performance counter. When only one event type is specified, it is always monitored. When several event types are specified, they are time-multiplexed onto the same hardware counter.
If no -event arguments are specified on the command line, the default is to always monitor cycles and imiss events using the default sampling periods.
Event Types
Event types supported on all Alpha processors:
- cycles = processor cycles
- issues = instruction issues
- nonissue = non-issue cycles
- imiss = instruction cache misses
- dmiss = data cache misses
- branchmp = branch mispredicts
- flow = flow control changes (see Caveats below)
- pipelinedry = pipeline dry cycles (no valid I-stream data)
- issue2 = cycles with 2 issues
- intop = integer operations (excluding loads/stores)
- fpop = floating point operations (excluding loads/stores/br)
- load = load instructions
- store = store instructions
Additional event types supported on the Alpha 21064 processor:
- pipefrozen = pipeline frozen due to resource conflict
- palmode = cycles executing palcode
Additional event types supported on the Alpha 21164 processor:
- itbmiss = instruction translation buffer misses
- dtbmiss = data translation buffer misses
- pcmp = PC mispredicts
- iaccess = instruction cache accesses
- daccess = data cache accesses
- smiss = on-chip secondary cache misses
- srmiss = on-chip secondary cache read misses
- swmiss = on-chip secondary cache write misses
- saccess = on-chip secondary cache accesses
- sread = on-chip secondary cache reads
- swrite = on-chip secondary cache writes
- svictim = on-chip secondary cache victims
- sshwrite = on-chip secondary cache shared writes
- bmiss = board-level cache misses
- bhit = board-level cache hits
- bvictim = board-level cache victims
- bref = board-level cache references
- sysinv = system invalidates
- sysread = system read requests
- sysreq = system requests
- splitissue = split issue cycles
- replaytrap = replay traps
- issue1 = cycles with 1 issue
- issue3 = cycles with 3 issues
- issue4 = cycles with 4 issues
- mb = memory barriers
- loadmerged = loads merged (in MAF)
- ldureplay = load/use (ldu) replays
- wbmafreplay = write buffer or maf full replays
- loadlocked = LDx_L instructions
- longstall = stall longer than 12 cycles
- external = external event (system-specific or unused)
The optional event period follows the event type, and has the format :Mperiod, where M is a period modifier, and period is the sampling period. If the event period is omitted, reasonable defaults are automatically chosen based on the particular event type and the processor hardware.
The period modifier must be R, denoting a random sampling interval with a mean equal to period events, or F, denoting a fixed sampling interval equal to period events. If omitted, the default is to use a random sampling interval on hardware that supports it, or a fixed sampling interval otherwise.
The sampling period specifies how often the event should be sampled, expressed as a decimal number. The suffix K can be used to scale the specified period by 1024.
The period modifier and period specifications are limited on the Alpha 21064 processor, which uses a fixed sampling period (65536 for cycles, issues, and flow, and 4096 for the other events). Later Alpha processors such as the 21164 have hardware support for modifying the sampling period and can support arbitrary fixed periods, as well as randomized periods. Randomization of the sampling interval helps avoid undesirable synchronization effects with periodic code execution. Caveat: The current driver implementation restricts the set of valid randomized periods. For the cycles event, a valid randomized period must have the form (65536 - 2^n). Future versions of the driver may allow more flexibility.
Examples
- -event cycles:R63488 -event imiss+dmiss+branchmp
- Always monitor cycle counter events, with a randomized sampling period whose mean is one sample every 63488 cycles. In addition, rotate among gathering imiss, dmiss, and branchmp events, using the default sampling rates for those events.
- -event cycles:F64K -event imiss+imiss+imiss+dmiss
- Always sample cycles with a fixed period of 65536 (64K) cycles per sample, and switch between sampling imiss events 75% of the time and dmiss events 25% of the time, using the default sampling rate for those events. In this example, events are repeated within a single multiplexing -event option, in order to sample one kind of event more frequently than other kinds of events.
Caveats
Alpha performance counter interrupts are not precise for events other than cycles and dtbmiss, so a sample for some other event may not be correctly attributed to the instruction which generated the event.
There are only a limited number of hardware performance counters (2 on 21064 processors and 3 on 21164 processors), and each counter can only count a subset of all events. Thus, certain combinations of events cannot be simultaneously monitored. Consult the Alpha AXP Architecture Reference Manual by Sites & Witek, Appendix D, for detailed information about legal event combinations.
When multiplexing events, the cycles event type must always be monitored, since cycle sample interrupts are used to decide when to switch to the next multiplexed event type. This switching interval is controlled by the -mux option (see below).
On the Alpha 21064 processor, issues counts the total number of instruction issues divided by 2, and nonissues counts the total number of nonissues divided by 2.
On the Alpha 21164, the meaning of the "flow" event is altered by whether the "branchmp" or "pcmp" events are samples at the same time as the "flow" event: With "branchmp" sampling, "flow" events happen only at conditional branches. With "pcmp" sampling, "flow" events happend only at jsr and ret instructions. (Simultaneous sampling of "branchmp" and "pcmp" events is not possible, though multiplexed sampling of these events is possible.)
- -mux interval
- -I is shorthand for -mux
- For event multiplexing, switch the events being monitored every interval occurences of the cycle event performance counter interrupt. The default multiplexing interval is 10 on Alpha 21164-based machines; i.e. the monitored events will be switched every 10 cycle counter interrupts.
The default multiplexing interval is 100 for Alpha 21064-based machines. On the 21064, counter values cannot be read and restored. During event multiplexing, this means that the counter values are reset to zero whenever a multiplexing interval expires. With frequent time-multiplexed switching, this can result in distortion in the sampling of events other than cycles. For this reason, it is recommended that the multiplexing interval not be set below about 20 for this processor.
- -bypid image
- -i is shorthand for -bypid
- Store separate profiles for each process that loads the specified executable image. By default, the profile associated with an executable image contains aggregate samples for all processes that execute that image. This option allows samples to be identified by process as well as by image. The filenames for per-process profiles have the suffix "_hostPID", where host is the local hostname, and PID is a local process identifier. This option can be repeated to specify per-process profiling for multiple executable images.
- -map mapfile
- -m is shorthand for -map
- Use specified map file generated by dcpiscan(1) for associating processes with named images. This option can be repeated, allowing several map files to be specified; information from all of the supplied map files is merged.
A default map for common Digital Unix 3.2 and 4.0 binaries is compiled into dcpid; specifying additional maps will allow dcpid to identify site-specific binaries.
- -forkid seconds
- The modified system loader dcpiloader(5) provides information to dcpid about image loadmaps for dynamically-loaded processes. A hook in the kernel exec path provides information to dcpid about image loadmaps for statically-loaded processes. Unfortunately, there is no convenient hook for capturing information about processes that are created via fork(2) which do not subsequently invoke exec(2).
To obtain loadmap information for such forked processes that are relatively long-lived, periodic scans of system tables are performed to match unknown forked processes with information known about their parents. By default, a scan is performed every 30 seconds. This feature can be disabled by specifying a scan period of 0 seconds.
- -unknown
- -u is shorthand for -unknown
- Store separate per-process profiles for samples that cannot be associated with any image. Unknown samples will be stored in profiles associated with 1MB regions of each process address space; these "anonymous" profiles are given names of the form hostPID@address. If this option is not specified, a count of all unknown samples is stored in a single profile named unknown@host.
- -epoch
- -e is shorthand for -epoch
- Use the most recent existing epoch for storing new profiles. By default, a new epoch is created each time dcpid is restarted. New epochs can also be started using dcpiepoch(1).
- -merge seconds
- -M is shorthand for -merge
- Merge buffered profile samples from dcpid to the non-volatile profile database every seconds seconds. Defaults to every 600 seconds (10 minutes).
- -flush seconds
- -F is shorthand for -flush
- Flush samples from the performance counter device driver to dcpid every seconds seconds. Defaults to every 300 seconds (5 minutes). Samples are also automatically flushed from the driver to dcpid whenever remaining driver buffer space is low.
- -hash bytes
- -H is shorthand for -hash
- Specifies the desired size of the driver hash table data structure in bytes. The default is 262144 (256K bytes). The driver treats the specified size as a hint, and may impose additional constraints, such as forcing the actual size to be a power of two.
- -chunk bytes
- -C is shorthand for -chunk
- Specifies the desired chunk size to use when flushing driver hash table data structure. The default is 16384 (16K bytes). The driver treats the specified size as a hint, and may impose additional constraints, such as forcing the actual size to be a power of two.
- -device device
- -d is shorthand for -device
- Use specified performance counter device for collecting raw samples. Default device is /dev/pcount0.
- -log logfile
- -l is shorthand for -log
- Use specified file for logging warnings, errors, debugging information, and other messages. Defaults to dcpid-host.log in the specified profile database directory, where host is the local hostname. The log file is written using append mode, so it is safe to reuse the same log file across dcpid invocations.
Note: the Unix command tail -f is useful for watching the log as it is written.
- -quiet
- -q is shorthand for -quiet
- Operate in quiet mode, disabling most message logging. By default, dcpid logs errors, debugging information, and other messages to the specified log file.
- -verbose
- -v is shorthand for -verbose
- Operate in verbose mode, enabling extra message logging.
- -status seconds
- -L is shorthand for -status
- Log dcpid status information to the log file every seconds seconds. The default period is 0 (i.e. disabled).
- -logmaps
- -x is shorthand for -logmaps
- Log image loadmap information as it becomes available.
- -help
- Print dcpid usage message and then terminate.
- -version
- -V is shorthand for -version
- Print dcpid version string.
- -bug
- -b is shorthand for -bug
- Avoid DIGITAL Unix bug by disabling the initial scan of system tables to identify executables started before dcpid. This option is provided to eliminate the possiblity of dcpid triggering kernel bugs that can hand dcpid or even crash the kernel in rare cases.
Unlike earlier versions of dcpid, which performed frequent scans to identify images, the current version only performs a single scan during initialization. Thus, it is extremely unlikely that this problem will be encountered. It is generally worth the risk of performing the initial scan in order to obtain useful information about processes that were already executing when dcpid was started.
When -bug is specified, periodic scans for identifying forked images are also disabled. This is merely a precaution, since there are no reports of problems with -forkid scans.
- -nodaemon
- -D is shorthand for -nodaemon
- Do not run dcpid as a daemon. By default, dcpid places itself in the background, detaches from its terminal, and redirects all output to its log file.
- -socket socket
- -s is shorthand for -socket
- Use specified local Unix socket pathname for incoming messages from client applications such as dcpiflush(1), dcpiepoch(1), and dcpiquit(1). Defaults to /tmp/.dcpid0, the default path used by these client applications and dcpiloader(5).
Dcpid continuously extracts raw samples from the specified performance counter device, associates them with their corresponding images, and updates disk-based image profiles in the specified profile database. A new profile database can be created by specifying an empty directory.
Dcpid shuts down gracefully in response to termination signals, flushing all unsaved samples to their corresponding profiles before terminating. Dcpid may also be terminated using dcpiquit(1).
Dcpid must be executed with root privileges. It is recommended that dcpid be installed as a setuid-root program.
- 2.22
- Expanded set of supported event types. Supports labelled profiles. By default, place log file in profile database directory. Added "-forkid" option to help identify forked images. Classifies samples from dynamically-loaded kernel modules.
- 2.09
- Significantly improved functionality and performance. Uses new pdb file format. Supports long option names. Ported to Alpha/NT.
- 1.90
- Added support for sampling multiple events simultaneously, and for multiplexing events.
- 1.87
- Added "-i" option to support per-process profiles. Improved identification of dcpid, kernel, and loader images.
- 1.74
- Accurately classify samples resulting from loader activity, resulting in significantly fewer unknown samples. Rebuilt using updated pdb library to support enhanced profile headers.
- 1.50
- First external release beyond SRC/WRL.
dcpi(1), dcpiflow(1), dcpiprof(1), dcpilist(1), dcpidis(1), dcpiscan(1), dcpiepoch(1), dcpiflush(1), dcpicalc(1), dcpilabel(1), dcpi2ps(1), dcpicat(1), dcpiquit(1), dcpidiff(1), dcpitopstalls(1), dcpiwhatcg(1), dcpictl(1), dcpisource(1), dcpicc(1), dcpiversion(1), dcpiuninstall(1), dcpi2pix(1), dcpikdiff(1), dcpix(1), dcpisumxct(1), dcpistats(1), dcpiformat(4), dcpiloader(5)
For more information, see the DIGITAL Continuous Profiling Infrastructure project home page (http://www.research.digital.com/SRC/dcpi/ from outside DIGITAL).
Carl Waldspurger, Sanjay Ghemawat, Jeffrey DeanThis page was generated automatically by mtex software.