Patch Name: PHSS_28959 Patch Description: s700_800 11.X OV ITO7.1X HP-UX 10.x Agent Patch A.07.20 Creation Date: 03/07/02 Post Date: 03/07/07 Hardware Platforms - OS Releases: s700: 11.00 11.11 s800: 11.00 11.11 Products: OpenView Operations 7.1 Filesets: OVOPC-CLT.OVOPC-UX10-CLT,fr=A.07.10,fa=HP-UX_B.11.00_32/64,v=HP OVOPC-CLT.OVOPC-UX10-CLT,fr=A.07.10,fa=HP-UX_B.11.11_32/64,v=HP Automatic Reboot?: No Status: General Release Critical: No Category Tags: defect_repair general_release Path Name: /hp-ux_patches/s700_800/11.X/PHSS_28959 Symptoms: PHSS_28959: - SR: B555015785 opcecaas may report that it runs out of free process slots and therefore can't start anymore process. - SR: B555008674 The opcagt and the opcragt commands have a new option '-version'. In both man pages, this was not documented. The new option was not part of the usage strings of opcagt and opcragt. - SR: 8606189243 Utility for checking basic types of communication between OVO server and agents is missing. - SR: H555009399 opcmona cores if invalid string is tried to be converted. - SR: B555015479 When executing opcagt -kill there sometimes still remains an entry for the Control Agent (opcctla) in the pids file. This should not happen because the opcctla isn't running anymore. - SR: B555015626 opcif_read() doesn't return data if the signal pipe is empty, but there is still more data in the queue file. This can for example happen if the maximum pipe size of 8192 bytes was reached and therefore no more signal bytes could be written into the signal pipe. - SR: B555015496 Queue file handling was inefficient for larger queue files. - SR: 8606290675 Agent should report 'agent start' into the opcerror file. - SR: R555018185 If the agent is started in Simplified Chinese codeset, monitor agent, logfile encapsulator and message interceptor processes will fail. - SR: B555015449 Monitor agent may abort while receiving templates/policies. - SR: B555015155 Errors in the opcerror file are not written to the trace file. This makes it difficult to find the place in the trace file where the error occured. - SR: B555015047 opcecaas (Annotation Server) might log error "Illegal NULL parameter (function ) (OpC10-11)" with many similar lines. - SR: B555015557 The current implementation of the control agent does not allow to restart aborted sub agent processes automatically. - SR: B555015199 If OPC_IP_ADDRESS (nodeinfo) and OPC_NAMESRV_LOCAL_NAME (opcinfo) were used in combination the entry in OPC_IP_ADDRESS was not used in the right way. - SR: 8606282247 Logfile Encapsulator does not perform variable replacement for all Message Defaults fields. - SR: B555015331 The monitor agent, opcmona, may report wrong results of executed monitor scripts or programs when using many 'advanced monitors' such as OVPERF. In some of these cases opcmona might even abort. - SR: B555015712 The trap/event interceptor forwards traps with the source address 127.0.0.1 unchanged, therefore they are discarded by the management server due to the unknown source address. - SR: B555015758 opcmsgi aborts if one of the set attributes has an unmatched '<'. - SR: 8606232431 VPO tries to resolve node names that only contain blanks because of typo in template definition or variable assignment. This leads to a lot of unnecessary DNS traffic. - SR: B555013121 When distributing agent software after the installation of an agent patch, all agent components (opc_pkg.Z, comm_pkg.Z and perf_pkg.Z) are distributed and installed, although only one of the components was changed. - SR: B555015349 The monitor agent might abort while using templates/policies that have an external source specified. - SR: B555015325 opcmsga wrongly generates a message OpC30-3002 'opcctla not running on node localhost.' when receiving a reconfigure signal, or at exit time. - SR: 8606300119 The logfile encapsulator does not return the last line of a logfile immediately, if there is no carriage return in this line. - SR: 8606297998 The ECS engine (opcecm & opceca) might have problems when a circuit is using global dictionaries. (This is the case for the MessageStorm detection ECS circuit). PHSS_27386: - SR: H555008602 If setting OPC_RPC_ONLY to TRUE in opcinfo, after a while the message agent core dumps. - SR: R555019153 When deplying policies from a Windows management server and using the "StoreCollection" method within the Perl or VBScript, the monitor agent might show the Policy name instead of the given metricname or may abort. - SR: B555014591 When the OPC_INT_MSG_FLT is set to TRUE then the filtered message is received corrupted on the server in a Japanese environment. - SR: 8606275496 When installing an OVO 7.10 HP-UX Agent on a Managed Node which already has an OVO 7.0 or 6.X agent on it, the following Error occurs: ERROR: Unexpected swinstall problem on system xyz. refer to the logfile /var/adm/sw/swagent.log and /var/adm/sw/swinstall.log on the managed node xyz for further information on the problem. - SR: B553003927 opcmsgi leaks memory when handling messages with custom message attributes. - SR: B555014942 The opcle process loops if a logfile is removed while it is read. - SR: H555006719 If the agent is running as a non-root user and the management server processes are restarted, the agent does not resume sending messages but continues to buffer the messages. - SR: B555017068 A.07.12 changed the behavior of opcle: Newly created logfiles that are discovered by the dynamic logfile discovery functionality are now read from begin. But often this is not what is desired, for example if using DBSPI with new instances that need to be monitored. - SR: B555014851 opcmsga sends the same message operation (e.g. an acknowledge request created by opcmack(1) ) again and again if the related message is not in the cache and one of the target managers can not be reached. - SR: B555013891 In MoM environments, opcmsga does not return action responses to SECONDARY managers, if their name is not resolvable. - SR: H555008631 Customer receives a lot of OpC20-61 and OpC20-63 messages in the error logfile when using NCS agents. - SR: B555014574 opcagt -start/-stop/-status doesn't work correctly, if the currently running agent can't be reached over RPC. - SR: B555014093 opcmona may crash (UNIX) or doesn't process all SCHEDULE templates (Windows) when using SCHEDULE templates. - SR: 8606262299 The logfile encapsulator reports that the File to be executed for preprocessing of a logfile template failed. This error occurs randomly and only from time to time. You will get an error message similar to the following: Command 'opcfwtmp /tmp/wtmp.stat /var/adm/wtmp /tmp/wtmp.out' configured in source 'Logins (10.x/11.x HP-UX)' returns 1. Ignoring this logfile. (OpC30-107) - SR: R555017956 The monitor agent is terminated if you use a session variable within the message text of a template followed by any other variable (for example $INSTANCE). - SR: R555018043 Japanese characters in Perl scripts within templates don't get converted correctly into the server code set. If the Japanese characters are used as message text for example these characters are corrupted. - SR: B555014715 The Control Agent slowly grows in memory usage. - SR: B555012869 Instead of using the trap's UUID, the trap interceptor created a new message id for all but the first trap template. The original message id was not set correctly in these cases. - SR: B555014215 The port should be configurable where opctrapi listens for incoming traps. - SR: B555013719 Message agent doesn't stop message buffering when the management server is available again after a network outage, fixed DNS problem or similar. This can happen when the agent restarts/the machine reboots during the network problem occurred. - SR: 8606187183 After deploy/undeploy of opcmsg policies/templates the suppressing times are lost. Messages that should be suppressed after a deploy/undeploy of policies/templates are shown. - SR: H555008275 Message Agent can hang for no apparent reason and stop sending all messages to the Management Server regardless of its state. - SR: B555014667 The first lines of a logfile are not forwarded to the message browser when using a command to discover logfiles and the logfile was created after the first polling interval. - SR: B555014873 The exit code of commands executed through an ECS annotate node and the OVO annotation server is always 0. - SR: B555014132 During a distribution the agent may report an error like: ITO responsible manager configuration. (OpC30-1203) Cannot open file \usr\OV\tmp\OpC\cfgchg. System Error Number: 13 (d) - The data is invalid. (OpC20-63) - SR: B555014771 The opcqchk support utility dumps message operations (e.g. acknowledge requests from opcmack) only as hex dump so it would be nice to have some readable output. - SR: B555013548 The manual agent installation script opc_inst expects compressed packages. So if you run it a second time nothing happens because the packages are already uncompressed. - SR: B555009284 The authorization verification for remote start and stop requests of the agent was sometimes unreliable. Possibly this allowed more OVO servers to start or stop the agent, than specified in the MoM configuration. - SR: 8606242614 Messages are incorrectly suppressed by the logfile encapsulator if "suppress identical output messages" is specified and the messages differ only in the values of <$LOGFILE> and/or <$LOGPATH>. - SR: B555013620 Support for pmd's "u" option needed in opctrapi: use the UDP packet's address as source of the trap. - SR: B555014759 When enabling/disabling policies you might discover a memory leak in the agent processes. - SR: H555008529 If a process dies immediately after being started by the Control Agent, it is possible that OpC30-1094 messages start appearing in the error logfile. Defect Description: PHSS_28959: - SR: B555015785 One possible cause for opcecaas to report that there are no more free process slots is, that all slots are in use by applications that are running very long or might even hang. In order to have control on this, the opcecaas retrieves the timeout set for the "Annotate Node" in the ECS circuit and kills the process in case it is beyond this timeout. - SR: B555008674 The man pages for opcagt and opcragt now document the new option '-version'. The message catalog was updated to show the '-version' option in the usage string of the opcagt and opcragt commands. - SR: 8606189243 The support tool /opt/OV/contrib/OpC/opcnetchk was introduced to allow a basic ICMP check, TCP check and SNMP check. - SR: B555015496 So far a queue file garbage collection was done, if more than 256 Kbyte were unused. Now the unused space has to be more than 256 Kbyte and more than a quarter of the queue file size. This drastically reduces file I/O when handling large queue files. - SR: R555018185 Simplified Chinese codeset is now mapped as a valid codeset. - SR: B555015155 Whenever an error is added to the internal error list, a trace line with the DEBUG area ERRLIST is written to the trace file. When the error list is written to opcerror, another trace line with the DEBUG area ERROR is written. ERROR and ERRLIST are distinguished, since some errors are added to the error list, but then later ignored and therefore never appear in the opcerror file. - SR: B555015557 The control agent has been changed in order to make the restart of an aborted sub agent process configurable. The control agent can be configured to restart aborted sub agent processes. Furthermore it can be defined how often a process should be restarted in a certain time interval. To configure this, you can use the following variables in the opcinfo file: OPC_RESTART_SUBAGENT If set to TRUE, the control agent tries to restart aborted sub agent processes. The restart is done a defined number of times (OPC_RESTART_COUNT) in a specified period of time (OPC_RESTART_MINIMUM_RUN_TIME). In case the process aborts more often, it wont be restarted again. Type/Unit : TRUE|FALSE Default : TRUE OPC_RESTART_COUNT Defines how often an aborted sub agent process should be restarted within the specified minimum runtime. In case a process stops more often it wont be restarted. (See OPC_RESTART_SUBAGENT) Type/Unit : integer Default : 5 OPC_RESTART_DELAY Defines the time the control agent waits before it restarts an aborted sub agent process. The time is specified in seconds. Type/Unit : integer Default : 10 OPC_RESTART_MINIMUM_RUN_TIME Defines the time frame a sub agent process should run without being restarted more than specified by OPC_RESTART_COUNT. The time is specified in minutes. Type/Unit : integer (minutes) Default : 60 - SR: 8606282247 Variable replacement is now performed for all Message Defaults fields. - SR: B555015331 opcmona holds a central table for all subprocess related information. Advanced monitors are executed in separate threads and could access this table in parallel, thus overwriting each other's data. The table accesses are now serialized by a mutex. - SR: B555015712 In order to be able to correctly handle traps that have a source address 127.0.0.1 the trap/event interceptor is now able to replace the localhost address (127.0.0.1) with the IP address of the node processing the trap. In order to enable this, you need to add the following line to the opcinfo file on your managed node: OPC_RESOLVE_TRAP_LOCALHOST TRUE - SR: 8606232431 VPO now ignores node names that contain only white space characters without contacting the name service. - SR: B555013121 A new tool has been introduced, that will be called when an agent patch is installed to set the software flag in the database for all nodes of that platform to MODIFIED (node needs new agent software). Thus, it is no longer necessary to use force update to install the agent software to nodes of that platform. The second change will check what component versions are already installed on the agent and only distribute and install the newer agent packages, if force update is NOT used. - SR: B555015349 The monitor agent aborts when using templates/policies with external sources as soon as it receives a value from the external source during checking for the threshold with a previously received value. In this case the received value is stored temporary and as soon as the monitor agent tries to process this value it aborts. The root cause is that the monitor agent tries to free up already freed memory. - SR: B555015325 During startup a timing issue prevented a connection from opcmsga to opcctla, which generated the message. This message was not immediately reported, but only after receiving a signal. Retries will now prevent the connection failure. If it still fails because opcctla is really not running, the error will be reported immediately. PHSS_27386: - SR: H555008602 When using OPC_RPC_ONLY, ICMP handling is not initialized, but the message agent will call opc_pb_ping_reset() after a successful server checkalive cycle. This causes an invalid (NULL) pointer to be dereferenced and causes a core dump. opc_pb_ping_reset() now has a check to see if ICMP handling has been initialized and if not, immediately returns from the function. - SR: B555014591 The defect was caused by the double conversion from the server code set to the internal code set, once on the agents side when it sent the internal message to opcmsga and once by opcmsgi when it forwarded the message again. Now, the message is converted back from the internal code set to the server code set in opcmsga before sending the message to the opcmsgi queue. The management server will get the message through opcmsga in the internal code set, and it will convert it into the server code set. The conversion is made only if the internal code set is different from the server code set. - SR: H555006719 When a communication to a message receiver fails, the message agent starts buffering messages. It periodically checks if a server is alive by sending it ICMP packets. If the server cannot be reached with ICMP packets, no RPC communication is attempted. Sending ICMP packets is not possible when the agent is running as a non-root user, so the sending function cannot actually send anything. Therefore we also never receive any replies and the message agent will buffer messages forever. To fix this, the internal state of the message agent is updated after we tried to send an ICMP packet if the agent is running as a non-root user. - SR: B555017068 The default behavior is now again the bahavior of A.07.10: If a new logfile is returned by the logfile discovery program, only new lines of the new logfile are processed. If you want, that all lines of newly added logfiles are processed, add following line to the opcinfo file: OPC_NEW_LOGFILE_FROM_BEGIN TRUE - SR: B555014851 opcmsga maintains an internal cache to find out the target managers per message ID. The cache expires after 1 hour (can be changed with the opcsvinfo variable OPC_STORE_TIME_FOR_MGR_INFO) and then it runs into a problem in its algorithm so that a message operation on a non-cached message is sent again and again until the last target manager in an internal list can be reached. - SR: B555013891 Even if the IP address of the management server was specified in the mgrconf file, it was not used except for the primary manager. This behavior was changed to give the mgrconf file precedence over name resolution. - SR: H555008631 NCS agent open() and stat() calls did not handle EINTR, so a check/loop was implemented to handle it. - SR: B555014574 With this changes the opcctla is now be able to deal with a running opcctla that is not reachable via RPC: opcagt -status will display a warning if the currently running opcctla is not reachable over RPC, but then it will display the status according to the pids file. opcagt -stop will also kill the unresponsive opcctla and try to start a new one. If opcctla is not reachable over RPC, opcagt -start will kill all running agent processes and then start a new opcctla which starts the agent processes. But of course, the agent won't be able to start if RPC is still not available at that time. - SR: B555014093 opcmona may crash (UNIX) or doesn't process all SCHEDULE templates (Windows) when using SCHEDULE templates. This can occur when there are only spaces in one of the schedule fields (Minute, Hour, Day of the Month, Month, Year, Day of the Week). You can verify this by going to the conf/OpC directory on the node and doing an opcdcode monitor. When there are entries like WEEKDAY " " the problem can occur. Now the monitor agent treats sequences of spaces like an empty string that is a wildcard and uses all valid values in the possible range. For WEEKDAY this is 0-6. - SR: B555014215 Using the new opcinfo variable SNMP_TRAP_PORT opctrapi can now be configured to listen on another port than 162. This is only effective, if traps are not received through the NNM pmd. - SR: B555013719 Message agent remains in buffering mode even when the management server is available again. The reason is that the agent wasn't able to resolve the management server name to an IP address at startup and the agent doesn't try again during runtime. This has been fixed by checking for a resolvable name every time a message should be buffered till the name can be resolved, after this the normal checkalive mechanism which handles buffered message takes place. - SR: 8606187183 The opcmsg interceptor restarts after a deploy/undeploy of policies/templates. During this process all the policy/template information is cleaned and read again from a temporary file. Because suppressing times are not stored in this temporary file, this times are lost. Now the suppressing times are taken over to the new data. - SR: H555008275 Signal handler for SIGIO was installed before the socket on which we receive ICMP replies was set to non-blocking mode. An unsolicited SIGIO would trigger the signal handler which would wait indefinitely on the socket for data, which would never arrive. Since NCS agent is single-threaded, all communication would stop. The fix is in setting the non-blocking mode before installing the signal handler, so it would not wait forever. - SR: B555014873 The exit code of commands executed through an ECS annotate node and the OVO annotation server is always 0. The reason is an hardcoded return value in the OVO annotation server. With this patch the annotation server passes the received exit code to ECS. - SR: B555014132 During a distribution the agent may report error number OPC30-1203/OPC20-63 when trying to access the cfgchg file. The cause for this problem is that there are several processes trying to get exclusive access to this file at the same time. The problem has been fixed by doing a retry for 10 times in case the error should occur with a delay of one second. - SR: B555014771 The opcqchk support utility dumps message operations (Tag: 43, like acknowledge requests from opcmack). This tag type was not implemented in opcqchk so you only got the hex dump output. Now you get a more readable output like: Size of item 1: 76 bytes. Tag: 43 Data: Message operation = acknowledge request Msg id = >2878c8b8-d45e-71d6-00d3-c0a8f4220000< - SR: B555009284 There were two different authorization algorithms which interfered with each other. This has been consolidated and is now checked more strictly. - SR: 8606242614 The variables <$LOGFILE> and <$LOGPATH> were replaced after the suppression rules were evaluated. Therefore the comparison did not use the actual logfile name or path, but compared the string "<$LOGFILE>" or "<$LOGPATH>". - SR: B555013620 NNM 6.2 introduced an event option to pmd - "u". This option specifies to prefer the IP address in an SNMPv1 trap's UDP header over the contents of the SNMPv1 trap PDU's agent_addr field. A new opcinfo variable OPC_USE_UDP_AS_TRAP_SOURCE was added for opctrapi. If set to TRUE, opctrapi will use the UDP address instead of the agent_addr. - SR: H555008529 This is a timing issue, where internal structures are not updated by the signal handler in time for proper values to be written in the PIDS file. An additional check for process presence has been implemented before writing the PIDS file. Enhancement: Noatch Files: OVOPC-CLT.OVOPC-UX10-CLT,fr=A.07.10,fa=HP-UX_B.11.00_32/64, v=HP: OVOPC-CLT.OVOPC-UX10-CLT,fr=A.07.10,fa=HP-UX_B.11.11_32/64, v=HP: /var/opt/OV/share/databases/OpC/mgd_node/vendor/hp/s700/ hp-ux10/A.07.10/RPC_DCE_TCP/opc_pkg.Z /var/opt/OV/share/databases/OpC/mgd_node/vendor/hp/s700/ hp-ux10/A.07.10/RPC_DCE_TCP/opc_version /var/opt/OV/share/databases/OpC/mgd_node/vendor/hp/s700/ hp-ux10/A.07.10/RPC_DCE_TCP/install/opcrinst /var/opt/OV/share/databases/OpC/mgd_node/vendor/hp/s700/ hp-ux10/A.07.10/RPC_DCE_TCP/opc_inst what(1) Output: OVOPC-CLT.OVOPC-UX10-CLT,fr=A.07.10,fa=HP-UX_B.11.00_32/64, v=HP: /var/opt/OV/share/databases/OpC/mgd_node/vendor/hp/s700/ hp-ux10/A.07.10/RPC_DCE_TCP/opc_pkg.Z: None /var/opt/OV/share/databases/OpC/mgd_node/vendor/hp/s700/ hp-ux10/A.07.10/RPC_DCE_TCP/opc_version: None /var/opt/OV/share/databases/OpC/mgd_node/vendor/hp/s700/ hp-ux10/A.07.10/RPC_DCE_TCP/install/opcrinst: HP OpenView Operations A.07.20 (04/09/03) /var/opt/OV/share/databases/OpC/mgd_node/vendor/hp/s700/ hp-ux10/A.07.10/RPC_DCE_TCP/opc_inst: HP OpenView Operations A.07.20 (04/09/03) cksum(1) Output: OVOPC-CLT.OVOPC-UX10-CLT,fr=A.07.10,fa=HP-UX_B.11.00_32/64, v=HP: 701270558 16049257 /var/opt/OV/share/databases/OpC/mgd_node/ vendor/hp/s700/hp-ux10/A.07.10/RPC_DCE_TCP/opc_pkg.Z 4030162770 8 /var/opt/OV/share/databases/OpC/mgd_node/ vendor/hp/s700/hp-ux10/A.07.10/RPC_DCE_TCP/ opc_version 3936934096 120714 /var/opt/OV/share/databases/OpC/mgd_node/ vendor/hp/s700/hp-ux10/A.07.10/RPC_DCE_TCP/install/ opcrinst 4037647179 1293 /var/opt/OV/share/databases/OpC/mgd_node/ vendor/hp/s700/hp-ux10/A.07.10/RPC_DCE_TCP/opc_inst Patch Conflicts: None Patch Dependencies: None Hardware Dependencies: None Other Dependencies: None Supersedes: PHSS_27386 Equivalent Patches: ITOSOL_00224: sparcSOL: 2.7 2.8 Patch Package Size: 14860 KBytes Installation Instructions: Please review all instructions and the Hewlett-Packard SupportLine User Guide or your Hewlett-Packard support terms and conditions for precautions, scope of license, restrictions, and, limitation of liability and warranties, before installing this patch. ------------------------------------------------------------ 1. Back up your system before installing a patch. 2. Login as root. 3. Copy the patch to the /tmp directory. 4. Move to the /tmp directory and unshar the patch: cd /tmp sh PHSS_28959 5. Run swinstall to install the patch: swinstall -x autoreboot=true -x patch_match_target=true \ -s /tmp/PHSS_28959.depot By default swinstall will archive the original software in /var/adm/sw/save/PHSS_28959. If you do not wish to retain a copy of the original software, include the patch_save_files option in the swinstall command above: -x patch_save_files=false WARNING: If patch_save_files is false when a patch is installed, the patch cannot be deinstalled. Please be careful when using this feature. For future reference, the contents of the PHSS_28959.text file is available in the product readme: swlist -l product -a readme -d @ /tmp/PHSS_28959.depot To put this patch on a magnetic tape and install from the tape drive, use the command: dd if=/tmp/PHSS_28959.depot of=/dev/rmt/0m bs=2k Special Installation Instructions: BEFORE LOADING THIS PATCH... (A) Patch Installation Instructions ------------------------------- (A1) Install the patch, following the standard installation instructions. For backing up the system before installing a patch, you may use opc_backup(1m) NOTE: MAKE SURE THAT NO AGENT OF THE PLATFORM ADDRESSED BY THIS PATCH IS DISTRIBUTED (either from the VPO Administrator's GUI or from command line using inst.sh) WHILE RUNNING SWINSTALL. NOTE: This patch must be installed on the VPO Management Server system, NOT on an VPO Managed Node directly. Changes will take effect on managed nodes by means of VPO Software Distribution. See chapter 2 of the VPO Administrator's Reference manual for more information. NOTE: The VPO Agent consists of several components that are patched individually. This patch updates only the Event/Action component. Therefore the software distribution will tell, that the agent software is updated to A.07.10, not to A.07.20. Also the version of the ITOAgent bundle will still be A.07.10. You can verify the installed version of the components on the agent system using opcragt -agent_version. For example: opcragt -agent_version hpbbln8 Node hpbbln8.bbn.hp.com: OPC_INSTALLED_VERSION = A.07.20 PERF_INSTALLED_VERSION = A.07.10 COMM_INSTALLED_VERSION = 2.5.3.9 Done. (B) Patch Deinstallation Instructions --------------------------------- (B1) To deinstall the patch PHSS_28959 run swremove: NOTE: MAKE SURE THAT NO AGENT OF THE PLATFORM ADDRESSED BY THIS PATCH IS DISTRIBUTED (either from the ITO Administrator's GUI or from command line using inst.sh) WHILE RUNNING SWREMOVE. # swremove PHSS_28959