Patch Name: PHSS_30370 Patch Description: s700_800 11.11 MC/ServiceGuard A.11.15.00 Creation Date: 04/03/12 Post Date: 04/03/17 Hardware Platforms - OS Releases: s700: 11.11 s800: 11.11 Products: MC/ServiceGuard A.11.15.00 Filesets: Cluster-Monitor.CM-CORE,fr=A.11.15.00,fa=HP-UX_B.11.11_32/64,v=HP Package-Manager.CM-PKG,fr=A.11.15.00,fa=HP-UX_B.11.11_32/64,v=HP Cluster-Monitor.CM-CORE-MAN,fr=A.11.15.00,fa=HP-UX_B.11.11_32/64,v=HP Automatic Reboot?: No Status: General Release Critical: Yes PHSS_30370: OTHER MEMORY_LEAK ABORT PANIC If the root filesystem is full, a package may fail to halt successfully. There is a 16k memory leak exposed in the subagent, /usr/lbin/cmsnmpd, when retrieving new Serviceguard cluster configuration information. In 2 node Serviceguard cluster using serial heartbeat link, cmcld can abort with segmentation violation or bus error resulting in a node TOC. PHSS_30087: PANIC HANG In a cluster configured with 3 or more heartbeat LANs, the Serviceguard daemon cmcld may wait indefinitely for replies to a message it has sent out and not complete a crucial step in a cluster reformation. The node will TOC. The Serviceguard daemon cmcld may be triggered by unreliable network traffic to begin consuming 100% of the CPU. On a single CPU system this could result in a system hang. PHSS_29901: ABORT PANIC HANG The Serviceguard daemon, cmcld, is aborted after it fails to receive UDP data with EAGAIN errno while trying to join a cluster. The following error is shown in syslog: cmcld: recvfrom failed: 11 Resource temporarily unavailable cmcld: Aborting! UDP recvfrom failed (file: rcomm/comm_ ip_recv.c, line 662) If during the cmhaltcl or cmhaltnode, CVM disk groups are still active at VXVM-CVM-pkg halt time, the VxVM-CVM-pkg halt will timeout and fail to halt. vxclustd will hang waiting for the disk groups to be deactivated. Because FAILFAST is set for this system package, the timeout of the VxVM-CVM-pkg will cause the node to TOC. The commands cmhaltnode, cmhaltcl can hang when cmtaped is set to be started by SG but is not running when the commands are issued. PHSS_29053: PANIC HANG ABORT CORRUPTION If an EMS resource is configured with no RESOURCE_UP_VALUE criteria, a later online change of the resource may result in cmcld abort. cmcld may hang in an accept() call on the local communications socket if the socket pops but there is no connection to accept. This causes various threads to hang and frequent cluster reformations. Eventually when a connection comes along and the accept() call proceeds, all the threads resume execution but the processing of all the backed up activity results in a deadlock. The node is unable to respond to a sync request and aborts. When there are 2 IPv6 subnets specified in the pkg ascii file and just the second one is being powerfailed, not the first one, then there is a chance of memory corruption in cmcld. cmcld may hang in an accept() call on a remote communications socket if the socket pops but there is no connection to accept. If the cluster contains more than one node at the time, the problem node may TOC. If the cluster contains only one node at the time, cmcld may hang, commands may hang and other nodes may not join the cluster. Category Tags: defect_repair general_release critical panic halts_system corruption memory_leak manual_dependencies Path Name: /hp-ux_patches/s700_800/11.X/PHSS_30370 Symptoms: PHSS_30370: 1. Cluster formation fails after rebooting the machine. /etc/rc.log shows cmrunnode failing with the following message: "Unable to connect to online target of running cluster [cluster_name] (Device busy). Cluster may have been in transition." 2. If Serviceguard cluster reformation happens when a cmhaltpkg command is in progress, the cmhaltpkg command may return an error even when package has halted successfully. The error would look like: cmhaltpkg : Package orafintest is not currently running. Check the syslog and pkg log files for more detailed information. This can also happen if cmhaltpkg is called from the control script of another package at package start time. 3. These messages show up in syslog when a configuration command (cmquerycl, cmcheckconf, cmapplyconf) is issued: "Unable to query the I/O interface: Key is undefined for specified token, or token is NULL." "Unable to get interface type for disk [disk name]" 4. If the root filesystem is full, a package may fail to halt successfully. In the package log file, the following error messages can be seen: umount_pidsxxxx: Cannot find or open the file. vgchange_pidsxxxx: Cannot create the specified file. Examples of possible package log file names could be: /etc/cmcluster/(package name)/(package name).control.log /etc/cmcluster/(package name)/(package name).cntl.log 5. There is a 16k memory leak exposed in the subagent, /usr/lbin/cmsnmpd, when retrieving new Serviceguard cluster configuration information. 6. In Serviceguard cluster using Fiber Channel cluster lock disk, cmcld can get stuck on partially open cluster lock disk. The following messages can be observed in syslog: cmcld: Unable to query the health of cluster lock disk /dev/dsk/c3t6d6: Device busy cmcld: Check device, power, and cables. Issuing diskinfo on the cluster lock disk in this state fails with device busy. 7. In 2 node Serviceguard cluster using serial heartbeat link, cmcld can abort with segmentation violation or bus error resulting in a node TOC. One observed stack trace is: (gdb) bt #0 0xa2f58 in cl_comm_reply+0x9f0 () #1 0xbc81c in rcomm_health_event_handler+0x288 () #2 0x146784 in cl_event_loop+0x458 () #3 0x1f146c in cma__thread_base+0x204 () #4 0x1f3c40 in cma__thread_start1+0x38 () #5 0x1f36d8 in cma__thread_start0_PA20+0xc () PHSS_30087: 1. The Serviceguard daemon cmcld may be triggered by unreliable network response to begin consuming 100% of the CPU. On a single CPU system this could result in a system hang. 2. In a cluster configured with 3 or more heartbeat LANs, the Serviceguard daemon cmcld may wait indefinitely for replies to a message it has sent out and not complete a crucial step in a cluster reformation. The node will TOC with the following messages in syslog: cmcld: Halting to preserve data integrity cmcld: Reason: This node did not reach sync step 0 for activity 3 within timeout cmcld: Aborting! This node did not reach sync step 0 for activity 3 within timeout (file: utils.c, line: 228) 3. The relocatable IPv4 and IPv6 package addresses are not properly switched to a healthy standby LAN interface when multiple LAN failures occur shortly after each other. This problem typically occurs on a cluster node configuration where there are more than one standby LAN interfaces in a bridged net from which the primary and first standby interfaces are connected to the same switch and this switch fails. This would cause loss of connectivity to the package after the network switch failure. 4. Cmscancl outputs incorrect information in the "network connection checking" section when HyperFabric interfaces are present. Cmscancl shows that the connection between clic0(HyperFabric interface) and lan0(Ethernet interface) is OK, while this can never be the case because of the point-to-point connections. 5. Serviceguard can reduce the number of file descriptors available for package applications, which may cause applications requiring 1024 or more file descriptors to fail. PHSS_29901: 1. Package control scripts do not display the patch ID of the Serviceguard version they are generated from. 2. When there is a network component between two interfaces that does not allow any data link level (DLPI) traffic through, commands such as cmquerycl, cmcheckconf and cmapplyconf do not report the illegal configuration, but may instead print a misleading error message: Error: Non-uniform connections detected, successfully received from but did not receive from . This is probably due to heavy network traffic or heavy load on . 3. Inactive TCP connections that are stale between nodes may never be detected and cleaned up. The stale connections do not cause any problems with cluster behaviour, but they should be cleaned up. 4. The cmcheckconf and cmapplyconf commands succeed in adding a node into a cluster even when the node is already a member of another cluster. The commands should instead fail in this case. 5. The Serviceguard daemon, cmcld, is aborted after it fails to receive UDP data with EAGAIN errno while trying to join a cluster. The following error is shown in syslog: cmcld: recvfrom failed: 11 Resource temporarily unavailable cmcld: Aborting! UDP recvfrom failed (file: rcomm/comm_ ip_recv.c, line 662) 6. The cmquerycl -c -n node1 -n node2 -C command fails with the following errors when is used to remove a node from the cluster which have IPv6 addresses configured. Error: Invalid node id on network probe Failed to evaluate network Failed to gather configuration information. 7. The commands cmhaltnode, cmhaltcl can hang when cmtaped is set to be started by SG but is not running when the commands are issued. 8. The cmcld daemon may log the message "timers delayed x.x seconds" due to kernel latency issues, or a network partition may separate nodes in the cluster. A Serviceguard cluster of more than 2 nodes with a cluster lock, after experiencing such a hang or partition, may result in the formation of 2 clusters. This is a corner case where the hang or partition happens while a node is joining a previously formed 2- node cluster. The joining node forms a cluster with the original coordinator node, while the non-coordinator node forms a cluster by itself. 9. The Serviceguard daemon, cmcld, is aborted in the presence of CPU starvation and/or frequent network packet loss. One of the following errors is shown in syslog: cmcld: Assertion failed: node != (cm_node_t *)0, file: cm/comm.c, line: 930 or cmcld: Assertion failed: icp->state == CL_CONN_CLOSING, file: rcomm/comm_ip_state.c, line: 171 10.The cmcheckconf/cmapplyconf will fail with inappropriate error messages if CLUSTER_NAME or NODE_NAME in the cluster ascii file is more than 39 characters. A similar problem exists if PKG_NAME or SERVICE_NAME in the package ascii file is more than 40 characters. cmapplyconf may succeed with a CLUSTER_NAME of 40 to 42 characters in length, but after that, other cluster commands may fail, stating that the cluster cannot be found. 11.In a Serviceguard cluster configured with a serial heartbeat link, cmcld may abort if the heartbeat LAN becomes congested and experiences delays. When this happens the following messages will be logged to syslog.log: cmcld: Out of order message 1346602 > 1346601 from node 1 cmcld: Received REQ msg 1346602 req 0 from node 1 for group 1 service 2 cmcld: cl_abort: abort cl_kepd_printf failed: Invalid argument cmcld: cl_kepd_printf, fstat: kepd_fd=7, st_dev=1073741827, st_ino=658, st_rdev=-486539264 cmcld: Aborting! out of order message 12.In the SG package control scripts it is possible to specify that VxVM disk imports are done in parallel by setting the variable CONCURRENT_DISKGROUP_OPERATIONS to something other than 1. This was added in the theory that this would improve performance, but the design of vxclustd is such that it only works in a serial fashion. In some rare cases multiple concurrent requests can cause problems and result in failed disk group (dg) imports. 13.During hourly cluster lock health check if lock disk returns NOT_READY, then no message is logged about this in the syslog at this time. But if cluster is reformed later trying to form one node cluster, then cluster reformation might fail with message, cmcld: Obtaining Cluster Lock cmcld: Request to obtain cluster lock /dev/dsk/c7t1d0 failed: Device busy cmcld: Failed to request cluster lock. cmcld: Failed to get Cluster Lock. The earlier indication of the problem is not given by ServiceGuard during health check. 14.Applications which need a number of file descriptors larger than 2048 might fail if starting as Serviceguard packages. 15.The documentation about Quorum Server parameter QS_TIMEOUT_EXTENSION is not very clear. Also one configures the QS_TIMEOUT_EXTENSION value more than 35 minutes then actually the value will become negative. 16.If during the cmhaltcl or cmhaltnode, CVM disk groups are still active at VXVM-CVM-pkg halt time, the VxVM-CVM-pkg halt will timeout and fail to halt. vxclustd will hang waiting for the disk groups to be deactivated. Because FAILFAST is set for this system package, the timeout of the VxVM-CVM-pkg will cause the node to TOC. 17.Messages printed by cmquerycl on the stdout or console might be confusing for Bridged Network as which kind of network probing is done is not displayed. PHSS_29053: 1. When a service configured with SERVICE_FAIL_FAST_ENABLED set to "YES" fails, cmviewcl may display the node on which the service was running as "down" and "unknown", while displaying the package as "up" and "running", until the cluster reforms. 2. When executing cmapplyconf, if the VxVM-CVM-pkg package is specified along with the cluster configuration and/or failover packages, the command may fail with: CDB Prepare - Unable to link to /pkgs/VxVM-CVM-pkg, object does not exist CDB Prepare - Unable to perform configuration operation 4. Return value is 2. Error: Unable to apply the configuration change: No such file or directory 3. When running CVM using the VxVM-CVM-pkg, vxclustd may abort during cluster reformation while trying to query the heartbeat network information: /etc/cmcluster/cvm/VxVM-CVM-pkg.sh.log will show: ERROR: Cluster volume manager is inactive after 60 secs 4. If a node is shutting down, and at the same time if an online cmapplyconf deletes a package with EMS resources, it is possible for cmcld to abort and core dump. A stack trace of the core will contain something like the following: #2 0x60000000c04584d0:0 in abort+0x190 () from /usr/lib/hpux32/libc.so.1 #3 0x432b330:0 in cl_list_next () #4 0x417f3b0:0 in pm_resource_shutdown () #5 0x40c5e60:0 in resource_shutdown () cmclconfd will detect that cmcld has aborted and will put a message in syslog indicating that there has been a lost connection. 5. If an EMS resource is configured with no RESOURCE_UP_VALUE criteria, a later online change of the resource may result in cmcld abort. Syslog will show: Jun 18 21:02:24 cmcld: Aborting: cl_ems_support.c 1448 (Unknown resource type 6. If a node is shutting down, and during that time the following things happen in order a) cmmodpkg command is issued (ignoring the cmhaltnode warning) b) online apply deletes the same package (ignoring the cmhaltnode warning) c) cmlvmd does not shutdown, due to an external vg still being active then it is possible cmcld on the shutting down node will abort. The stack trace from the core dump will contain something similar to: #3 0x4355020:0 in cl_assfail () And syslog will show: cmcld: Assertion failed: p_ptr != NULL, file: pkg/pkg_owner_handler.c, line: 1635 7. If an online delete for a package is in progress and at the same time another resource becomes available on a node that satisfies the requirements to run that package, then cmcld may core dump after the online delete. Syslog may contain something like the following: Mar 29 16:38:53 node1 cmcld: Unknown package 32548 for message op 27 Mar 29 16:38:53 node1 cmcld: Aborting: pkg/pkg_coord_handler.c 115 (Unknown package). 8. After an upgrade of ServiceGuard cluster from version A.10.06 or earlier to A.11.15.00, cmrunnode command may fail with: # cmrunnode cmrunnode : Unable to determine the nodes on the current cluster cmrunnode : Either no cluster configuration file exists, or the file is corrupted, or cmclconfd is unable to run Also if /usr/sbin/convert command is issued manually it fails without any error and exit code of 1: # convert -f /etc/cmcluster/cmclconfig NOTE: Executing the conversion tool. # echo $? 1 # 9. ServiceGuard commands cmcheckconf/cmapplyconf -k option may fail if the volume groups mentioned in the cluster ascii file are not present on all the nodes in the cluster. 10. cmcld may hang in an accept() call on the local communications socket if the socket pops but there is no connection to accept. This causes various threads to hang and frequent cluster reformations. Eventually when a connection comes along and the accept() call proceeds, all the threads resume execution but the processing of all the backed up activity results in a deadlock. The node is unable to respond to a sync request and aborts with the following syslog messages: vmunix: Halting to preserve data integrity vmunix: Reason: This node did not reach sync step 0 for activity 3 within timeout cmcld: Daemon exiting due to halt message from node vmunix: Service Guard Aborting! vmunix: Cause: This node did not reach sync step 0 for activity 3 within timeout(File: utils.c, Line: 228) cmcld: Halting to preserve data integrity cmcld: Reason: This node did not reach sync step 0 for activity 3 within timeout cmcld: Aborting! This node did not reach sync step 0 for activity 3 within timeout (file: utils.c, line: 228) 11. During cmcld startup the IPv6 STATIONARY_IPs, if configured, are not logged to syslog. Only the IPv4 IP addresses are logged to syslog: May 5 19:06:29 mtlrc103 cmcld: lan0 0x00306e39d793 15.37.115.103 bridged net:1 May 5 19:06:29 mtlrc103 cmcld: lan1 0x00306e3957b1 192.6.100.103 bridged net:2 The IPv6 networks should be noted, for example: May 5 19:06:29 mtlrc103 cmcld: lan1 0x00306e3957b1 192.6.100.103 bridged net:2 IPv6 2001::103 fec0::103 12. cmquerycl sometimes might core dump when there are no IPv6 addresses configured on a node. Stack trace will look similar to: #0 0x2000000000124f80 in memcpy() from /lib/libc.so.6.1 #1 0x4000000000281810 in local_probe () #2 0x40000000002853c0 in cf_private_probe_network () #3 0x40000000002adaa0 in cf_private_find_config () #4 0x40000000002ae7f0 in cf_find_config () #5 0x40000000000c8f20 in query_main () #6 0x40000000000d8190 in main () 13. When there are 2 IPv6 subnets specified in the pkg ascii file and just the second one is being powerfailed, not the first one, then there is a chance of memory corruption in cmcld. One example of what the stack trace from cmcld core can look like is as follows: #0 0xc0209378 in kill () from /usr/lib/libc.2 #0 0xc0209378 in kill () from /usr/lib/libc.2 #1 0xc01a46ac in raise () from /usr/lib/libc.2 #2 0xc01e4aa0 in abort_C () from /usr/lib/libc.2 #3 0xc01e4afc in abort () from /usr/lib/libc.2 #4 0x1ed4f4 in removeEntry () #5 0x1edb54 in sgFree () #6 0x12e978 in pm_check_subnet6_status () #7 0x12d980 in pm_owner_eval () #8 0x12aa2c in pm_subnet_check () #9 0x129910 in pm_subnet_status_event () #10 0x10176c in pm_status_event () #11 0x101aa8 in pm_event_handler () #12 0x1c7814 in cl_event_loop () #13 0xc004b168 in __pthread_body () from /usr/lib/libpthread.1 #14 0xc00549ec in __pthread_start () from /usr/lib/libpthread.1 14. "Failed to release device in volume group : No such device or address.." error messages are printed in syslog by cmclconfd when cmcheckconf, cmapplyconf, cmgetconf or cmquerycl were issued. One possible side effect is that syslog may report on cluster start that cluster lock is not initialized, although early on, after cmapplyconf, it did report that cluster lock already got initialized. Another symptom is that subsequent tries to create or import a VG can fail. 15. In a few cases when DLPI primitives fail to complete when Serviceguard configuration commands such as cmquerycl, cmcheckconf, or cmapplyconf fail, there are no error messages that end-users can see other than non-specific indications of network driver problems. 16. cmquerycl hangs for 30 seconds while TEAC DV-28E-B DVD drive was installed on the system. 17. cmcld may hang in an accept() call on a remote communications socket if the socket pops but there is no connection to accept. If the cluster contains more than one node at the time, the problem node may TOC. If the cluster contains only one node at the time, cmcld may hang, commands may hang and other nodes may not join the cluster. A typical stack trace of cmcld obtained during the hang may look like: Thread 29 (user thread (2654, 28)): #0 0x1e0150 in cma__dispatch () #1 0x1dfe5c in cma__block () #2 0x1d7f0c in cma__int_wait () #3 0x1f3418 in cma__io_wait () #4 0x1bfe04 in cma_accept () #5 0xc0213a28 in accept () from /usr/lib/libc.2 #6 0x155468 in sg_accept () #7 0xab5e4 in cl_comm_ip_accept () #8 0xa3480 in cl_comm_ip_loop () #9 0x1ee32c in cma__thread_base () #10 0x1f0b00 in cma__thread_start1 () #11 0x1f0598 in cma__thread_start0_PA20 () 18. On HP-UX 11.11 systems with IPv6 bundle installed and with EMS version A.04.00 being used for cluster configured EMS resources, during cluster start or during cluster reformation or during EMS resource using package startup, cmcld might report external error in syslog as: cmcld: External error - Unable to connect to EMS registrar (An error occurred while trying to connect to a remote system: Invalid argument) Defect Description: PHSS_30370: 1. Upon reboot the cluster restart is handled by the rc script, cmcluster.init. cmcluster.init runs cmrunnode and looks for the string "Trying again may succeed" from the cmrunnode output to trigger a retry. The message did not include this string, although it clearly is a transient retry-able event. Resolution: Add "Trying again may succeed." to the error message. 2. In this case the cmhaltpkg command was in progress while waiting for the package control script to finish. During that time if reconfiguration happens then the command does not know if the halt of the control script was successful or not. The retry to learn the status might have found that the package is not running even though it might have halted successfully. So the command exits with an error Resolution: As this is a race condition and multiple things are happening at a particular time, so the error cannot be avoided in all cases. So a clearer message is displayed if this condition is encountered. 3. When SG does disk probing during configuration process, it tries to query the I/O interface of the disk and only expects to see type "INTERFACE". When it is another type, "VIRTBUS" in this case, SG goes ahead and tries to go up and query the parent node in the I/O tree, but couldn't. Resolution: Make change so that SG recognizes type "VIRTBUS". 4. When a package starts to halt, the package control script will create a temporary file on root filesystem to save the pids for vgchange and umount. If the root filesystem is full, the temporary file will fail to create and thus the package will fail to halt. Resolution: Use the local variables to keep the pid information instead of creating the temporary file. 5. When the subagent prepares to get new SG configuration information it deletes the entire package structure before freeing the dependencies and storage group lists. Resolution: Free package dependencies and storage group lists first, before freeing entire package structure. 6. The cmcld opens a physical link so that before cluster lock acquisition bus reset can be done to clear any pending I/0. For a Fiber Channel cluster lock disk, this open returns successfully but the disk can get partially open (its LUN size is 0) and all subsequent tries of cmcld to access the disk would fail. To fix this problem at HP-UX level the device needs to be closed by any process. Resolution: For Fiber Channel Storage bus reset are not supported, therefore cmcld does not open Fiber Channel cluster lock devices anymore. 7. In 2 node Serviceguard cluster using a serial heartbeat link, in one code path an uninitialised pointer is referenced which can result in a cmcld abort. The code path is only executed if a serial heartbeat is configured. Resolution: Removed uninitialised pointer reference and use another variable which serves the same purpose. PHSS_30087: 1. Unreliable network response may result in inconsistent network connections and cause cmcld to keep checking for but not actually reading incoming data in a loop. Resolution: Made changes such that cmcld will not keep checking for incoming data in a loop and such that inconsistent network connections will be cleaned up. 2. Due to a logic error, a node may discard a message on all its connections and never reply to the sender. Resolution: Fixed the logic error. 3. Because of a logic error and time-sensitive issue, only stationary IP address(es) are switched to the second standby interface while the relocatable IP address(es) remain in the primary interface after the shared network switch is powered down. Resolution: The logic error has been fixed so that Serviceguard will allow all IP address(es) to switch to the second standby properly. 4. The PPA values of the HyperFabric interfaces are the same as ethernet network interfaces thus causing the linkloop command used in cmscancl to give incorrect results. Clic0(HyperFabric interface) has PPA 0, same as lan0(Ethernet interface). Linkloop uses the card PPA number, and so can't really distinguish between cards. Resolution: Skip the network connectivity check for non-LAN hardware (i.e. HyperFabric, ATM etc.), if any, since linkloop command is supported only for LAN hardware. 5. The cmclconfd and cmcld daemon recalculates the number of file descriptors for its own environment and sets its value in the system (currently 1024 set by cmclconfd). The same value gets passed to cmcld and from cmcld to the applications starting from Serviceguard thus creating a problem. Resolution: Even if cmclconfd & cmcld daemon recalculates and sets new file descriptors limit, the original value will be restored for child processes so that they will have the original system limit. PHSS_29901: 1. cmmakepkg only puts the Serviceguard revision (e.g. 11.15) but not the patch ID in package control scripts. Resolution: Changed cmmakepkg to put the patch ID in package control scripts. 2. Commands do not check for the described illegal configuration. Resolution: Changed commands to check to make sure all connections that can communicate on the IP level can also communicate on the data link level. 3. Serviceguard does not set the keep alive option on connections, which would detect staleness after a certain period of time. Resolution: Set SO_KEEPALIVE option on all connections. 4. The Serviceguard command cmcheckconf or cmapplyconf does not recognize the fact that a node that is being added to a cluster is already a member in another cluster. Resolution: The commands will keep track of all the clusters that each node in the ASCII file is currently belonging to if any, so that they can report the error accordingly. 5. The Serviceguard daemon, cmcld, was not prepared to deal with EAGAIN error when the kernel temporarily runs out of resource. Thus causes the abort. Resolution: The daemon is now more resilient to transient errors such as EAGAIN when it fails to receive UDP data. Doing so will keep the daemon running instead of aborting when a temporary error occurs. 6. The node id of the deleting node is not properly removed from the cluster. Thus causes other nodes to make unneccessary references to the node that has been deleted. Resolution: Change the command to cleanly remove the node id of the deleting node. 7. When halting the service cmtaped, Serviceguard expects that in the routine where it actually does the halting, an event will be posted, so after calling the routine, it goes ahead and deletes the event without checking to see if the routine returns successfully. Resolution: If the routine returns with error, log an error message and do not delete the event. 8. While a node is joining a 2-node cluster, there is a kernel hang on the coordinator node or a network partition that separates the 2 non-joining nodes. The non-coordinator node gets the cluster lock and forms a 1-node cluster. Once the coordinator node resumes execution, a logic error allows it to set or clear the cluster lock and form a 2 node cluster with the joining node. Resolution: The logic error is fixed and assertions added to ensure that the same kind of error is not introduced in the future. 9. While in the middle of re-initializing a redundant heartbeat TCP connection, another heartbeat connection is closed and the cleanup logic did not handle this combination of events correctly. Resolution: The logic error is fixed. 10.The Serviceguard manual mentions the limit on CLUSTER_NAME and on some other parameters as 40. Internally strcpy is used to copy the parameters into Serviceguard data structures. As strcpy behaves differently under such condition, it creates a problem and command returns inappropriate error message. Resolution: Enforce the length of string while reading the ascii file. If it exceeds maximum allowed limit then print the error. 11.During heartbeat exchange, some other messages are also exchanged, for example health message. Also during processing of such message if error is encountered then the error is returned back. Due to varying network speed and serial link speed the error returned on serial link was delayed while other message in ip network reached first. This created an out of order sequence and thus cmcld aborted. Resolution: Make sure that only appropriate traffic means only heartbeat is directed on serial link. For other traffic only ip network is used. 12.vxclustd does not support the concurrent activation of disk groups (dgs) and doing so might create a problem. Resolution: Remove the option of CONCURRENT_DISKGROUP_OPERATIONS provided into package control script. This will ensure that dgs are activated sequentially. 13.During hourly health check, cmcld does realise the problem that cluster lock disk is returning NOT_READY (EBUSY). But there is no log message which indicates that this kind of problem has been experienced. Actually there is already a message logged at a higher logging level, so if tuning is on then such message can be seen. Resolution: Log the message indicating the problem at default level so that it will be the syslog for user for early correction of problem. 14.The cmcld daemon recalculates the number of file descriptors for its own environment and sets its value in the system (currently 2048 is set by cmcld). The same value gets passed to processes starting from cmcld thus creating problem. Resolution: Even if cmcld daemon recalculates and sets new file descriptors limit the original value will be restored & will be set by cmsrvassistd so that child processes will have the original system limit. 15.The man page of cmquerycl and cluster ascii file comments does not have enough information about QS_TIMEOUT_EXTENSION. Also upper limit of this value is not checked. Resoultion: Added comments into cluster ascii file and in cmquerycl man page that, recommended value for QS_TIMEOUT_EXTENSION is 0 and maximum supported value is 5 minutes. Also added check to make sure that value is within the limit. 16.If a Serviceguard cluster is configured with packages using CVM dgs then, CVM daemon, vxclustd is started under system multinode package, VxVM-CVM-pkg. During cmhaltcl or cmhaltnode, before cluster halting, this package is halted. But if some dgs are active which have been activated outside package (manually) or some package did not deactivate it properly, then halt of vxclustd will fail. This will result in a VxVM-CVM-pkg halt failure. As this runs as a system multinode package which is set to NODE_FAIL_FAST_ENABLED, the node will TOC. Resolution: Before actual halt of VxVM-CVM-pkg, a preshutdown script will be issued against vxclustd to make sure that no dgs are active. If there are, then appropriate error messages will be logged and command will fail. If not, then further halt will proceed. The script will be provided by Veritas in VxVM patch PHCO_29600. The script location is: /etc/cmcluster/cvm/VxVM-CVM-pkg-preshutdown.sh Without the script existing on the system, behaviour will default to original behaviour. 17.The stdout/console messages displayed by cmqueycl does not contain the information about what kind of network probing is done. Resolution: The command is enhanced to print out what kind of network probing is done. Also cmquerycl man page is updated to reflect what kind of probing done when. PHSS_29053: 1. Node information is often obtained in a different way from package information, so inconsistent data may be displayed. Resolution: Do extra checking to make sure the information displayed is consistent between nodes and packages. 2. Due to dependencies that must be satisfied concerning CVM disk groups, during the execution of cmapplyconf the VxVM-CVM-pkg package must be processed first before the cluster configuration and/or failover packages with CVM disk groups are processed. If cmapplyconf is invoked with the VxVM-CVM-pkg package along with the cluster configuration and/or other failover packages, it is not guaranteed that the VxVM-CVM-pkg package is processed first, which may produce the aforementioned error. Resolution: Impose new limitations on cmapplyconf. The user may invoke cmapplyconf with the VxVM-CVM-pkg package only, or cmapplyconf with the cluster configuration and/or failover packages but not the VxVM-CVM-pkg package. The cmapplyconf man page has also been updated to reflect these new limitations. 3. The API to get the cluster configuration information has a check to see if the configuration version has changed since the lookups occurred. It is possible on an SGeRAC cluster that cmgmsd transactions could slip in and increase the version, causing the API call to fail. Resolution: The fix is to turn off this checking for realtime API clients (which is only vxclustd). This is ok because vxclustd does not care if the cmgmsd part of the CDB tree is modified. 4. When a cmhaltnode is in the middle of shutting down a node, and if the user ignores the warning message and does an online change which deletes a package with EMS resources, it may cause data corruption in cmcld, leading to a cmcld abort and core dump. This was caused as there was no mutex protection for a critical region accessed by two different threads simultaneously. Resolution: Modified code to access the critical region by only one thread at all times. 5. cmapplyconf does not check to make sure at least one RESOURCE_UP_VALUE criterion is defined for each resource. When a resource is configured with no criteria defined, invalid information is stored in cmcld, which makes for fatal comparison operations. Resolution: Change cmapplyconf to check that at least one RESOURCE_UP_VALUE criterion is defined for each resource. 6. During a node shutdown, the halting node postponed the status update requests, causing the node to core, if the shutdown was aborted. Resolution: Fixed the code to process the status update requests even when the node is in the process of shutting down. 7. During an online delete of a package, if a resource becomes available and if the package is runnable on a node, it is possible after the package delete, cmcld may abort, as it did not handle the notify ownership request appropriately. Resolution: Fixed the code to handle the notify message from the owner appropriately during an online package delete operation. 8. Due to a defect in the convert utility which converts old binary configuration files of ServiceGuard into a new format, the conversion fails and the cmrunnode command fails. If convert is invoked manually then it will also fail without any error. Resolution: The defect in the convert utility is fixed. 9. With the -k option, cmcheckconf/cmapplyconf commands send a list of volume groups (as mentioned in the cluster ascii file) to be verified to all nodes. If a volume is not present on any node then the command treats it as an error and fails. Resolution: Even if the volume groups mentioned in the cluster ascii file are not present on to some nodes, cmcheckconf and cmapplyconf will not fail. Rather it will correctly configure them in the cluster as is done without the -k option. 10. The local communications socket is blocking, so if the socket pops but there is no connection to accept, it will cause cmcld to hang. Resolution: Changed socket to be non-blocking. 11. During cmcld startup, the IPv6 data structures were not being used to print any IPv6 information. Resolution: Read the IPv6 data structures to see if any IPv6 information needs to be logged to syslog during cmcld startup. 12. When trying to form a log message, we reference an uninitialized pointer if there are no IPv6 addresses configured on the node. Thus accessing this illegal address might cause potential memory corruption too. Resolution: Use initialized pointers to access data and form the log message to avoid any core dumps or memory corruption. 13. During an error condition, the output parameters of a function were not correctly handled leading to a double free of memory. Resolution: Appropriately initialize the parameters and handle the error condition. 14. The ServiceGuard config daemon, cmclconfd, passes an array that contains physical volume names to an LVM library function while trying to detach physical volume groups during the device query process of cmcheckconf/cmapplyconf. Later on, cmclconfd frees the memory allocated for the array but the LVM library keeps using it. This leads to memory corruption which results in LVM detaching from an incorrect list of physical volumes and therefore failure to release the volume group. A side effect of this problem is the failure to initialize the cluster lock properly in some circumstances. Resolution: Make change so that LVM library makes a copy of the physical volume data rather than use a pointer to the existing data. 15. The Serviceguard config daemon, cmclconfd, does not log DLPI errors in a few cases when the DLPI primitives fail to complete. This causes the debugging of problems to be more difficult on the Serviceguard production bits. Resolution: Make change to the config daemon so that DLPI error messages can now be logged. 16. The Serviceguard config daemon, cmclconfd, does not recognize the TEAC DV-28E-B as a read-only device and hangs for 30 seconds while trying to open the device until it times out. Resolution: Make change to config daemon to filter out the TEAC DV-28E-B before disk probing. 17. The remote communications sockets are blocking, so if one pops but there is no connection to accept, it will cause cmcld to hang. Resolution: Changed sockets to be non-blocking. 18. When cmcld tries to contact EMS for resource status, EMS returns error and cmcld prints it out. So the real problem is in EMS with IPv6 bundle. JAG ae73291 against EMS explains more about the problem. Resolution: The user should upgrade to EMS version A.04.00.01 which does not exhibit this problem. Another resolution is to use a workaround for EMS version A.04.00 by modifying /etc/inded.conf file as described below: Locate the following entry in /etc/inetd.conf, registrar stream tcp nowait root /etc/opt/resmon/lbin/registrar /etc/opt/resmon/lbin/registrar And change this to registrar stream tcp6 nowait root /etc/opt/resmon/lbin/registrar /etc/opt/resmon/lbin/registrar and reboot the machine or restart inetd by executing 'inetd -c'. Enhancement: No SR: 8606294079 8606307146 8606307795 8606309796 8606316216 8606302528 8606304014 8606304286 8606308755 8606321527 8606309795 8606312418 8606314968 8606326283 8606310984 8606309783 8606329554 8606329445 8606325992 8606320321 8606331371 8606326802 8606330137 8606329294 8606322722 8606329533 8606331747 8606331750 8606318010 8606323190 8606327010 8606321422 8606333423 8606331731 8606307156 8606311351 8606331371 8606335370 8606337577 8606330279 8606340496 8606343060 8606343063 8606345054 8606339173 8606323570 8606351564 8606299501 8606319400 8606352983 Patch Files: Cluster-Monitor.CM-CORE,fr=A.11.15.00, fa=HP-UX_B.11.11_32/64,v=HP: /usr/lbin/cmclconfd /usr/lbin/cmcld /usr/lbin/cmsnmpd /usr/lbin/cmsrvassistd /usr/lib/libsgcl.2 /usr/sbin/cmapplyconf /usr/sbin/cmcheckconf /usr/sbin/cmdeleteconf /usr/sbin/cmgetconf /usr/sbin/cmhaltcl /usr/sbin/cmhaltnode /usr/sbin/cmquerycl /usr/sbin/cmruncl /usr/sbin/cmrunnode /usr/sbin/cmscancl /usr/sbin/cmviewcl /usr/sbin/convert Package-Manager.CM-PKG,fr=A.11.15.00,fa=HP-UX_B.11.11_32/64, v=HP: /usr/sbin/cmhaltpkg /usr/sbin/cmhaltserv /usr/sbin/cmmakepkg /usr/sbin/cmmigrate /usr/sbin/cmmodnet /usr/sbin/cmmodpkg /usr/sbin/cmrunpkg /usr/sbin/cmrunserv /usr/sbin/cmstartres /usr/sbin/cmstopres Cluster-Monitor.CM-CORE-MAN,fr=A.11.15.00, fa=HP-UX_B.11.11_32/64,v=HP: /usr/share/man/man1m.Z/cmapplyconf.1m /usr/share/man/man1m.Z/cmquerycl.1m what(1) Output: Package-Manager.CM-PKG,fr=A.11.15.00,fa=HP-UX_B.11.11_32/64, v=HP: /usr/sbin/cmhaltpkg: HP92453-02A.11.00 HP-UX SYMBOLIC DEBUGGER (END.O ILP 32) $Revision: 75.02 $ Build date: Thu Mar 11 16:27:34 PST 2004 Build id: ibld_sg_a1115patch_1111_product Build platform: hpux Cluster Monitor Product $Revision: 82.2 $ Cluster Monitor Product Only $Revision: 82.2 $ A.11.15.00 Date: 03/11/04 Patch: PHSS_30370 /usr/sbin/cmhaltserv: HP92453-02A.11.00 HP-UX SYMBOLIC DEBUGGER (END.O ILP 32) $Revision: 75.02 $ Build date: Thu Mar 11 16:27:34 PST 2004 Build id: ibld_sg_a1115patch_1111_product Build platform: hpux Cluster Monitor Product $Revision: 82.2 $ Cluster Monitor Product Only $Revision: 82.2 $ A.11.15.00 Date: 03/11/04 Patch: PHSS_30370 /usr/sbin/cmmakepkg: HP92453-02A.11.00 HP-UX SYMBOLIC DEBUGGER (END.O ILP 32) $Revision: 75.02 $ Build date: Thu Mar 11 16:27:34 PST 2004 Build id: ibld_sg_a1115patch_1111_product Build platform: hpux Cluster Monitor Product $Revision: 82.2 $ Cluster Monitor Product Only $Revision: 82.2 $ A.11.15.00 Date: 03/11/04 Patch: PHSS_30370 /usr/sbin/cmmigrate: HP92453-02A.11.00 HP-UX SYMBOLIC DEBUGGER (END.O ILP 32) $Revision: 75.02 $ Build date: Thu Mar 11 16:27:34 PST 2004 Build id: ibld_sg_a1115patch_1111_product Build platform: hpux Cluster Monitor Product $Revision: 82.2 $ Cluster Monitor Product Only $Revision: 82.2 $ A.11.15.00 Date: 03/11/04 Patch: PHSS_30370 /usr/sbin/cmmodnet: HP92453-02A.11.00 HP-UX SYMBOLIC DEBUGGER (END.O ILP 32) $Revision: 75.02 $ Build date: Thu Mar 11 16:27:34 PST 2004 Build id: ibld_sg_a1115patch_1111_product Build platform: hpux Cluster Monitor Product $Revision: 82.2 $ Cluster Monitor Product Only $Revision: 82.2 $ A.11.15.00 Date: 03/11/04 Patch: PHSS_30370 /usr/sbin/cmmodpkg: HP92453-02A.11.00 HP-UX SYMBOLIC DEBUGGER (END.O ILP 32) $Revision: 75.02 $ Build date: Thu Mar 11 16:27:34 PST 2004 Build id: ibld_sg_a1115patch_1111_product Build platform: hpux Cluster Monitor Product $Revision: 82.2 $ Cluster Monitor Product Only $Revision: 82.2 $ A.11.15.00 Date: 03/11/04 Patch: PHSS_30370 /usr/sbin/cmrunpkg: HP92453-02A.11.00 HP-UX SYMBOLIC DEBUGGER (END.O ILP 32) $Revision: 75.02 $ Build date: Thu Mar 11 16:27:34 PST 2004 Build id: ibld_sg_a1115patch_1111_product Build platform: hpux Cluster Monitor Product $Revision: 82.2 $ Cluster Monitor Product Only $Revision: 82.2 $ A.11.15.00 Date: 03/11/04 Patch: PHSS_30370 /usr/sbin/cmrunserv: HP92453-02A.11.00 HP-UX SYMBOLIC DEBUGGER (END.O ILP 32) $Revision: 75.02 $ Build date: Thu Mar 11 16:27:34 PST 2004 Build id: ibld_sg_a1115patch_1111_product Build platform: hpux Cluster Monitor Product $Revision: 82.2 $ Cluster Monitor Product Only $Revision: 82.2 $ A.11.15.00 Date: 03/11/04 Patch: PHSS_30370 /usr/sbin/cmstartres: HP92453-02A.11.00 HP-UX SYMBOLIC DEBUGGER (END.O ILP 32) $Revision: 75.02 $ Build date: Thu Mar 11 16:27:34 PST 2004 Build id: ibld_sg_a1115patch_1111_product Build platform: hpux Cluster Monitor Product $Revision: 82.2 $ Cluster Monitor Product Only $Revision: 82.2 $ A.11.15.00 Date: 03/11/04 Patch: PHSS_30370 /usr/sbin/cmstopres: HP92453-02A.11.00 HP-UX SYMBOLIC DEBUGGER (END.O ILP 32) $Revision: 75.02 $ Build date: Thu Mar 11 16:27:34 PST 2004 Build id: ibld_sg_a1115patch_1111_product Build platform: hpux Cluster Monitor Product $Revision: 82.2 $ Cluster Monitor Product Only $Revision: 82.2 $ A.11.15.00 Date: 03/11/04 Patch: PHSS_30370 Cluster-Monitor.CM-CORE,fr=A.11.15.00, fa=HP-UX_B.11.11_32/64,v=HP: /usr/lbin/cmclconfd: HP92453-02A.11.00 HP-UX SYMBOLIC DEBUGGER (END.O ILP 32) $Revision: 75.02 $ Build date: Thu Mar 11 16:17:11 PST 2004 Build id: ibld_sg_a1115patch_1111_product Build platform: hpux Config Daemon Cluster Monitor Product $Revision: 82.2 $ Cluster Monitor Product Only $Revision: 82.2 $ A.11.15.00 Date: 03/11/04 Patch: PHSS_30370 /usr/lbin/cmcld: HP92453-02A.11.00 HP-UX SYMBOLIC DEBUGGER (END.O ILP 32) $Revision: 75.02 $ Build date: Thu Mar 11 16:26:15 PST 2004 Build id: ibld_sg_a1115patch_1111_product Build platform: hpux A.11.15.00 Date: 03/11/04 Patch: PHSS_30370 Daemon Cluster Monitor Product $Revision: 82.2 $ Cluster Monitor Product Only $Revision: 82.2 $ /usr/lbin/cmsnmpd: HP92453-02A.11.00 HP-UX SYMBOLIC DEBUGGER (END.O ILP 32) $Revision: 75.02 $ Copyright 1992-2001 SNMP Research, Incorporated SNMP Research Distribution version 15.3.1.0 Copyright 1992-2001 SNMP Research, Incorporated SNMP Research Distribution version 15.3.1.0 Copyright 1992-2001 SNMP Research, Incorporated SNMP Research Distribution version 15.3.1.0 Copyright 1992-2001 SNMP Research, Incorporated SNMP Research Distribution version 15.3.1.0 Copyright 1992-2001 SNMP Research, Incorporated SNMP Research Distribution version 15.3.1.0 Copyright 1992-2001 SNMP Research, Incorporated SNMP Research Distribution version 15.3.1.0 Copyright 1992-2001 SNMP Research, Incorporated SNMP Research Distribution version 15.3.1.0 Build date: Thu Mar 11 16:29:31 PST 2004 Build id: ibld_sg_a1115patch_1111_product Build platform: hpux A.11.15.00 Date: 03/11/04 Patch: PHSS_30370 Cluster Monitor Product $Revision: 82.2 $ Cluster Monitor Product Only $Revision: 82.2 $ Copyright 1992-2001 SNMP Research, Incorporated SNMP Research Distribution version 15.3.1.0 Copyright 1992-2001 SNMP Research, Incorporated SNMP Research Distribution version 15.3.1.0 Copyright 1992-2001 SNMP Research, Incorporated SNMP Research Distribution version 15.3.1.0 Copyright 1992-2001 SNMP Research, Incorporated SNMP Research Distribution version 15.3.1.0 /usr/lbin/cmsrvassistd: HP92453-02A.11.00 HP-UX SYMBOLIC DEBUGGER (END.O ILP 32) $Revision: 75.02 $ Build date: Thu Mar 11 16:28:12 PST 2004 Build id: ibld_sg_a1115patch_1111_product Build platform: hpux A.11.15.00 Date: 03/11/04 Patch: PHSS_30370 /usr/lib/libsgcl.2: Cluster Monitor Product $Revision: 82.2 $ Cluster Monitor Product Only $Revision: 82.2 $ Build date: Thu Mar 11 16:37:09 PST 2004 Build id: ibld_sg_a1115patch_1111_product Build platform: hpux /usr/sbin/cmapplyconf: HP92453-02A.11.00 HP-UX SYMBOLIC DEBUGGER (END.O ILP 32) $Revision: 75.02 $ Build date: Thu Mar 11 16:27:34 PST 2004 Build id: ibld_sg_a1115patch_1111_product Build platform: hpux Cluster Monitor Product $Revision: 82.2 $ Cluster Monitor Product Only $Revision: 82.2 $ A.11.15.00 Date: 03/11/04 Patch: PHSS_30370 /usr/sbin/cmcheckconf: HP92453-02A.11.00 HP-UX SYMBOLIC DEBUGGER (END.O ILP 32) $Revision: 75.02 $ Build date: Thu Mar 11 16:27:34 PST 2004 Build id: ibld_sg_a1115patch_1111_product Build platform: hpux Cluster Monitor Product $Revision: 82.2 $ Cluster Monitor Product Only $Revision: 82.2 $ A.11.15.00 Date: 03/11/04 Patch: PHSS_30370 /usr/sbin/cmdeleteconf: HP92453-02A.11.00 HP-UX SYMBOLIC DEBUGGER (END.O ILP 32) $Revision: 75.02 $ Build date: Thu Mar 11 16:27:34 PST 2004 Build id: ibld_sg_a1115patch_1111_product Build platform: hpux Cluster Monitor Product $Revision: 82.2 $ Cluster Monitor Product Only $Revision: 82.2 $ A.11.15.00 Date: 03/11/04 Patch: PHSS_30370 /usr/sbin/cmgetconf: HP92453-02A.11.00 HP-UX SYMBOLIC DEBUGGER (END.O ILP 32) $Revision: 75.02 $ Build date: Thu Mar 11 16:27:34 PST 2004 Build id: ibld_sg_a1115patch_1111_product Build platform: hpux Cluster Monitor Product $Revision: 82.2 $ Cluster Monitor Product Only $Revision: 82.2 $ A.11.15.00 Date: 03/11/04 Patch: PHSS_30370 /usr/sbin/cmhaltcl: HP92453-02A.11.00 HP-UX SYMBOLIC DEBUGGER (END.O ILP 32) $Revision: 75.02 $ Build date: Thu Mar 11 16:27:34 PST 2004 Build id: ibld_sg_a1115patch_1111_product Build platform: hpux Cluster Monitor Product $Revision: 82.2 $ Cluster Monitor Product Only $Revision: 82.2 $ A.11.15.00 Date: 03/11/04 Patch: PHSS_30370 /usr/sbin/cmhaltnode: HP92453-02A.11.00 HP-UX SYMBOLIC DEBUGGER (END.O ILP 32) $Revision: 75.02 $ Build date: Thu Mar 11 16:27:34 PST 2004 Build id: ibld_sg_a1115patch_1111_product Build platform: hpux Cluster Monitor Product $Revision: 82.2 $ Cluster Monitor Product Only $Revision: 82.2 $ A.11.15.00 Date: 03/11/04 Patch: PHSS_30370 /usr/sbin/cmquerycl: HP92453-02A.11.00 HP-UX SYMBOLIC DEBUGGER (END.O ILP 32) $Revision: 75.02 $ Build date: Thu Mar 11 16:27:34 PST 2004 Build id: ibld_sg_a1115patch_1111_product Build platform: hpux Cluster Monitor Product $Revision: 82.2 $ Cluster Monitor Product Only $Revision: 82.2 $ A.11.15.00 Date: 03/11/04 Patch: PHSS_30370 /usr/sbin/cmruncl: HP92453-02A.11.00 HP-UX SYMBOLIC DEBUGGER (END.O ILP 32) $Revision: 75.02 $ Build date: Thu Mar 11 16:27:34 PST 2004 Build id: ibld_sg_a1115patch_1111_product Build platform: hpux Cluster Monitor Product $Revision: 82.2 $ Cluster Monitor Product Only $Revision: 82.2 $ A.11.15.00 Date: 03/11/04 Patch: PHSS_30370 /usr/sbin/cmrunnode: HP92453-02A.11.00 HP-UX SYMBOLIC DEBUGGER (END.O ILP 32) $Revision: 75.02 $ Build date: Thu Mar 11 16:27:34 PST 2004 Build id: ibld_sg_a1115patch_1111_product Build platform: hpux Cluster Monitor Product $Revision: 82.2 $ Cluster Monitor Product Only $Revision: 82.2 $ A.11.15.00 Date: 03/11/04 Patch: PHSS_30370 /usr/sbin/cmscancl: None /usr/sbin/cmviewcl: HP92453-02A.11.00 HP-UX SYMBOLIC DEBUGGER (END.O ILP 32) $Revision: 75.02 $ Build date: Thu Mar 11 16:27:34 PST 2004 Build id: ibld_sg_a1115patch_1111_product Build platform: hpux Cluster Monitor Product $Revision: 82.2 $ Cluster Monitor Product Only $Revision: 82.2 $ A.11.15.00 Date: 03/11/04 Patch: PHSS_30370 /usr/sbin/convert: HP92453-02A.11.00 HP-UX SYMBOLIC DEBUGGER (END.O ILP 32) $Revision: 75.02 $ Build date: Thu Mar 11 16:30:59 PST 2004 Build id: ibld_sg_a1115patch_1111_product Build platform: hpux Cluster Monitor Product $Revision: 82.2 $ Cluster Monitor Product Only $Revision: 82.2 $ A.11.15.00 Date: 03/11/04 Patch: PHSS_30370 Cluster-Monitor.CM-CORE-MAN,fr=A.11.15.00, fa=HP-UX_B.11.11_32/64,v=HP: /usr/share/man/man1m.Z/cmapplyconf.1m: None /usr/share/man/man1m.Z/cmquerycl.1m: None cksum(1) Output: Package-Manager.CM-PKG,fr=A.11.15.00,fa=HP-UX_B.11.11_32/64, v=HP: 920987119 3058200 /usr/sbin/cmhaltpkg 920987119 3058200 /usr/sbin/cmhaltserv 920987119 3058200 /usr/sbin/cmmakepkg 920987119 3058200 /usr/sbin/cmmigrate 920987119 3058200 /usr/sbin/cmmodnet 920987119 3058200 /usr/sbin/cmmodpkg 920987119 3058200 /usr/sbin/cmrunpkg 920987119 3058200 /usr/sbin/cmrunserv 920987119 3058200 /usr/sbin/cmstartres 920987119 3058200 /usr/sbin/cmstopres Cluster-Monitor.CM-CORE,fr=A.11.15.00, fa=HP-UX_B.11.11_32/64,v=HP: 1899567829 3250712 /usr/lbin/cmclconfd 688503949 3680792 /usr/lbin/cmcld 4101194807 3054104 /usr/lbin/cmsnmpd 734995816 281112 /usr/lbin/cmsrvassistd 904833413 2449408 /usr/lib/libsgcl.2 920987119 3058200 /usr/sbin/cmapplyconf 920987119 3058200 /usr/sbin/cmcheckconf 920987119 3058200 /usr/sbin/cmdeleteconf 920987119 3058200 /usr/sbin/cmgetconf 920987119 3058200 /usr/sbin/cmhaltcl 920987119 3058200 /usr/sbin/cmhaltnode 920987119 3058200 /usr/sbin/cmquerycl 920987119 3058200 /usr/sbin/cmruncl 920987119 3058200 /usr/sbin/cmrunnode 2643030591 17566 /usr/sbin/cmscancl 920987119 3058200 /usr/sbin/cmviewcl 3521337021 2746904 /usr/sbin/convert Cluster-Monitor.CM-CORE-MAN,fr=A.11.15.00, fa=HP-UX_B.11.11_32/64,v=HP: 827612781 5138 /usr/share/man/man1m.Z/cmapplyconf.1m 4191773914 7565 /usr/share/man/man1m.Z/cmquerycl.1m Patch Conflicts: None Patch Dependencies: None Hardware Dependencies: None Other Dependencies: Only if running on IPv6: item 18 (JAGae90568) also requires upgrade of OnlineDiag EMS software to revision A.04.00.01 (Sep03) or later. Or, see the Special Installation Instructions below for a possible workaround. Supersedes: PHSS_29053 PHSS_29901 PHSS_30087 Equivalent Patches: PHSS_30371: s700: 11.23 s800: 11.23 Patch Package Size: 6120 KBytes Installation Instructions: Please review all instructions and the Hewlett-Packard SupportLine User Guide or your Hewlett-Packard support terms and conditions for precautions, scope of license, restrictions, and, limitation of liability and warranties, before installing this patch. ------------------------------------------------------------ 1. Back up your system before installing a patch. 2. Login as root. 3. Copy the patch to the /tmp directory. 4. Move to the /tmp directory and unshar the patch: cd /tmp sh PHSS_30370 5. Run swinstall to install the patch: swinstall -x autoreboot=true -x patch_match_target=true \ -s /tmp/PHSS_30370.depot By default swinstall will archive the original software in /var/adm/sw/save/PHSS_30370. If you do not wish to retain a copy of the original software, include the patch_save_files option in the swinstall command above: -x patch_save_files=false WARNING: If patch_save_files is false when a patch is installed, the patch cannot be deinstalled. Please be careful when using this feature. For future reference, the contents of the PHSS_30370.text file is available in the product readme: swlist -l product -a readme -d @ /tmp/PHSS_30370.depot To put this patch on a magnetic tape and install from the tape drive, use the command: dd if=/tmp/PHSS_30370.depot of=/dev/rmt/0m bs=2k Special Installation Instructions: For MC/ServiceGuard Clusters: 1) Halt ServiceGuard on the node the patch is to be installed on. 2) Install this patch on that node. 3) Restart ServiceGuard on that node. 4) Patch needs to be installed on all nodes in the cluster. The above instructions apply to any A.11.15.00 MC/ServiceGuard cluster and include all configurations including those using SGeRAC, SGeSAP, and MetroCluster, for example. Defect 8 (JAGae67631) listed for patch PHSS_29053 requires the convert utility to be used manually on each node in the cluster after the patch is installed to correct the problem. The following command should be used for running convert manually, assuming that the old configuration file is located at /etc/cmcluster/cmclconfig: # convert -f /etc/cmcluster/cmclconfig The cmrunnode command should then be reissued on each node. This is required only if symptoms are similar to Defect #8 listed in PHSS_29053. This step is not required for any other fix in this or other patches. Item 18 (JAGae90568) listed for patch PHSS_29053 requires a workaround or upgrade of EMS to version A.04.00.01 or later. This ONLY applies in an IPv6 environment. The workaround is to modify the /etc/inetd.conf file as follows: Locate the following entry in /etc/inetd.conf, registrar stream tcp nowait root /etc/opt/resmon/lbin/registrar /etc/opt/resmon/lbin/registrar And change this to registrar stream tcp6 nowait root /etc/opt/resmon/lbin/registrar /etc/opt/resmon/lbin/registrar and reboot the machine or restart inetd by executing 'inetd -c'. This additional consideration is only required for Item 18 in PHSS_29053. This step is not required for any other fix in this or other patches. These instructions ONLY apply in an IPv6 environment.