Patch Name: PHKL_27851 Patch Description: s700_800 11.00 reboot -h; hang; timeout; ServiceGuard TOC Creation Date: 02/09/09 Post Date: 02/11/05 Hardware Platforms - OS Releases: s700: 11.00 s800: 11.00 Products: N/A Filesets: OS-Core.CORE2-KRN,fr=B.11.00,fa=HP-UX_B.11.00_32,v=HP OS-Core.CORE2-KRN,fr=B.11.00,fa=HP-UX_B.11.00_64,v=HP Automatic Reboot?: Yes Status: General Release Critical: No (superseded patches were critical) PHKL_27314: HANG System hang during reboot PHKL_22785: OTHER HPMC on panic. PHKL_22363: OTHER Rebooting one or more hosts in an FC-AL configuration may cause loss of access to LUNS for other hosts on the loop. PHKL_19752: HANG Category Tags: defect_repair enhancement general_release critical halts_system Path Name: /hp-ux_patches/s700_800/11.X/PHKL_27851 Symptoms: PHKL_27851: ( SR:8606269490 CR:JAGae33725 ) When a node in a ServiceGuard cluster fails, it is difficult to determine why without extensive dump and log analyses. PHKL_27314: ( SR:8606257427 CR:JAGae21730 ) System hang during reboot. PHKL_26306: ( SR:8606232808 CR:JAGae02036 ) When reboot -qh is executed to quickly halt a system, such as when called by envd to halt an overheating system, the system does not always completely halt. It may later restart without any user intervention, even though the 'System has halted, OK to turn off power or reset system' message has been displayed. The problem appears only on ServiceGuard clusters. PHKL_22785: ( SR: 8606161973 DTS: JAGad31289 ) System HPMC's during a panic. PHKL_22363: ( SR: 8606158594 DTS: JAGad27924 ) In a configuration involving multiple hosts attached to a fibre channel private loop, if one or more of the hosts is rebooted, the non-rebooting host(s) experience I/O performance problems. Additionally, in some cases, if an 'ioscan' is executed on a non-rebooting host during the reboot window, a status of "NO_HW" may be generated for some of the devices on the fibre. PHKL_21119: ( SR: 8606126623 DTS: JAGac59464 ) On N and L class machines, the run LED remains on during a reset of the system, in preparation for reboot. PHKL_20979: ( SR: 8606125655 DTS: JAGac42317 ) The chassis code log files show incorrect time stamps. They show either March 06, 2106 or January 01, 1970 (instead of the current date). These incorrect dates show up every four hours, and last for approximately one hour. The logs are accessible through two mechanisms. They are maintained and available when the online diagnostics subsystem is enabled and running (the cclogd daemon is running on the system). The last 200 are also maintained on the Guardian Service Processor (GSP), and accessible from the system consol after typing [ctrl]-b. The date errors are cosmetic only and do not impact the operating system or applications. The date errors are generated on the GSP, which is supported by L-Class and N-Class systems only. PHKL_20176: ( SR: 8606108529 DTS: JAGab16799 ) Wrong Chassis Codes (Cxxx) displayed during shutdown. ( SR: 8606103410 DTS: JAGab70129 ) I20 RAID Disks are not supported. PHKL_19752: ( SR: 1653287672 DTS: JAGab12583 ) When soft-rebooting (reboot/shutdown -r) a node other than a V-class without doing a hardware reset, connections to all fiber channel devices will be lost, causing the whole cluster to be unusable. Defect Description: PHKL_27851: ( SR:8606269490 CR:JAGae33725 ) When a node in a ServiceGuard cluster fails, it is most commonly due to a power failure, critical hardware failure, High Priority Machine Check (HPMC), kernel panic, or ServiceGuard-initiated TOC or Safety Timer timeout. In the case of a Safety Timer timeout, it is usually the case that the timeout value is too small for the node that failed. However, it is difficult to determine if this is the case without extensive dump and logs analyses. Resolution: Whenever a ServiceGuard-initiated TOC (or Safety Timer timeout) occurs, a global variable is set to indicate that the TOC was initiated by ServiceGuard. This variable is checked during the dump and a message is logged indicating that this was a timeout generated failure, allowing faster resolution of root cause. PHKL_27314: ( SR:8606257427 CR:JAGae21730 ) If the reboot process and vxfsd threads are running on different processors in a multi-processor system, there is a chance of a system hang due to a deadlock in the file system. The hang occurs when the reboot process attempts to sync a buffer which the file system has locked. Resolution: The system will now wait for 3 minutes to sync the buffer. If it is unable to sync the buffer after 3 minutes, the reboot process will skip sync'ing the buffer and continue to reboot the system. PHKL_26306: ( SR:8606232808 CR:JAGae02036 ) The reboot system call does not disable the safety timer used by ServiceGuard to TOC the system should cmcld not run. Therefore, once the system has been halted, eventually the safety timer expires and TOCs the system. Resolution: We now disable the ServiceGuard safety timer once the system has been halted. PHKL_22785: ( SR: 8606161973 DTS: JAGad31289 ) Sometimes the OS sleeps inside of panic while it is flushing dirty buffers to disk. Upon waking up, the scheduler sets a flag which (incorrectly) indicates that the OS is executing in a process context. As a result of this mistake, the OS later tries to write to a virtual address from real mode. Resolution: Reset the flag mentioned above to indicate that the OS is not executing in a process context and avoid the writing to a virtual address in real mode (i.e. the code that would create an HPMC). PHKL_22363: ( SR: 8606158594 DTS: JAGad27924 ) When you reboot a machine on a fibre channel loop, its FC card may start sending garbage; this can cause severe performance degradation that may result in the inability to communicate with devices on the fibre. Although we do not expect many customers to observe this defect, we believe it is serious enough to create a patch for those who believe they might encounter the problem. PHKL_21119: ( SR: 8606126623 DTS: JAGac59464 ) Previously, the standard set by FES was that the run LED should remain on during a reboot. FES has changed that standard to state that the run LED should be turned off just before the machine is reset. Resolution: Included code to turn the run LED off before resetting the system. PHKL_20979: ( SR: 8606125655 DTS: JAGac42317 ) At boot time and every four hours thereafter, the operating system sends the GSP the current time to update the GSP's clock. Every hour, the cclogd also sends the GSP the time. The time value sent by the operating system includes a year value of '0' for 2000, instead of 100, or years since 1900. The GSP interprets the year as 1900, which is invalid. It generates an error code -1, which gets interpreted in the logs as the date March 06, 2106. The GSP then resets its clock to January 01, 1970. This happens with every operating system time update to the GSP. The time value sent by cclogd to the GSP is correct. As this is sent every hour, we see the erroneous values on log entries generated starting with the operating system update until the next cclogd update. This defect impacts L-Class and N-Class only, is visable only in the chassis code logs, and has no impact on the operating system or any applications. The log files are viewable through the online diagnostic subsystem. Resolution: Change the current rounding up procedure in the operating system GSP update routine so that the year will be the number of years since 1900, as the GSP expects. PHKL_20176: ( SR: 8606108529 DTS: JAGab16799 ) Chassis codes were changed for N-class and broke legacy systems. (Cxxx) codes showed instead of (Dxxx) codes. Resolution: Replace the classic Dxxx chassis codes and reduce the alert level to Forward Progress for Dxxx codes. ( SR: 8606103410 DTS: JAGab70129 ) I20 RAID Disks are not supported. Resolution: I20 RAID requires some modification to the boot path. PHKL_19752: ( SR: 1653287672 DTS: JAGab12583 ) Fiber channel interfaces did not get reset when soft- rebooting a machine other than a V-class, causing all the connected devices to be disconnected and inaccessible. Resolution: Add PDC_IO to reset fiber channel interfaces when rebooting the kernel. Enhancement: No (superseded patches contained enhancements) PHKL_27851: Enhancements were delivered in a patch this one has superseded. Please review the Defect Description text for more information. SR: 1653287672 8606103410 8606108529 8606125655 8606126623 8606158594 8606161973 8606232808 8606257427 8606269490 Patch Files: OS-Core.CORE2-KRN,fr=B.11.00,fa=HP-UX_B.11.00_32,v=HP: /usr/conf/lib/libhp-ux.a(chassis_log.o) /usr/conf/lib/libhp-ux.a(machdep.o) /usr/conf/lib/libhp-ux.a(safety_time.o) OS-Core.CORE2-KRN,fr=B.11.00,fa=HP-UX_B.11.00_64,v=HP: /usr/conf/lib/libhp-ux.a(chassis_log.o) /usr/conf/lib/libhp-ux.a(machdep.o) /usr/conf/lib/libhp-ux.a(safety_time.o) what(1) Output: OS-Core.CORE2-KRN,fr=B.11.00,fa=HP-UX_B.11.00_32,v=HP: /usr/conf/lib/libhp-ux.a(chassis_log.o): chassis_log.c $Date: 2000/09/08 11:43:00 $Revision: r11ros/5 PATCH_11.00 (PHKL_22363) /usr/conf/lib/libhp-ux.a(machdep.o): machdep.c $Date: 2002/09/06 14:15:25 $Revision: r11r os/19 PATCH_11.00 (PHKL_27851) /usr/conf/lib/libhp-ux.a(safety_time.o): safety_time.c $Date: 2002/09/06 14:15:50 $Revision: r11ros/1 PATCH_11.00 (PHKL_27851) OS-Core.CORE2-KRN,fr=B.11.00,fa=HP-UX_B.11.00_64,v=HP: /usr/conf/lib/libhp-ux.a(chassis_log.o): chassis_log.c $Date: 2000/09/08 11:43:00 $Revision: r11ros/5 PATCH_11.00 (PHKL_22363) /usr/conf/lib/libhp-ux.a(machdep.o): machdep.c $Date: 2002/09/06 14:15:25 $Revision: r11r os/19 PATCH_11.00 (PHKL_27851) /usr/conf/lib/libhp-ux.a(safety_time.o): safety_time.c $Date: 2002/09/06 14:15:50 $Revision: r11ros/1 PATCH_11.00 (PHKL_27851) cksum(1) Output: OS-Core.CORE2-KRN,fr=B.11.00,fa=HP-UX_B.11.00_32,v=HP: 4139263615 7652 /usr/conf/lib/libhp-ux.a(chassis_log.o) 2032925627 29328 /usr/conf/lib/libhp-ux.a(machdep.o) 2681187684 4820 /usr/conf/lib/libhp-ux.a(safety_time.o) OS-Core.CORE2-KRN,fr=B.11.00,fa=HP-UX_B.11.00_64,v=HP: 1532598299 16176 /usr/conf/lib/libhp-ux.a(chassis_log.o) 349637193 76368 /usr/conf/lib/libhp-ux.a(machdep.o) 2671448036 11832 /usr/conf/lib/libhp-ux.a(safety_time.o) Patch Conflicts: None Patch Dependencies: s700: 11.00: PHKL_18543 s800: 11.00: PHKL_18543 Hardware Dependencies: None Other Dependencies: None Supersedes: PHKL_27314 PHKL_26306 PHKL_22785 PHKL_22363 PHKL_21119 PHKL_20979 PHKL_20176 PHKL_19752 Equivalent Patches: None Patch Package Size: 180 KBytes Installation Instructions: Please review all instructions and the Hewlett-Packard SupportLine User Guide or your Hewlett-Packard support terms and conditions for precautions, scope of license, restrictions, and, limitation of liability and warranties, before installing this patch. ------------------------------------------------------------ 1. Back up your system before installing a patch. 2. Login as root. 3. Copy the patch to the /tmp directory. 4. Move to the /tmp directory and unshar the patch: cd /tmp sh PHKL_27851 5. Run swinstall to install the patch: swinstall -x autoreboot=true -x patch_match_target=true \ -s /tmp/PHKL_27851.depot By default swinstall will archive the original software in /var/adm/sw/save/PHKL_27851. If you do not wish to retain a copy of the original software, include the patch_save_files option in the swinstall command above: -x patch_save_files=false WARNING: If patch_save_files is false when a patch is installed, the patch cannot be deinstalled. Please be careful when using this feature. For future reference, the contents of the PHKL_27851.text file is available in the product readme: swlist -l product -a readme -d @ /tmp/PHKL_27851.depot To put this patch on a magnetic tape and install from the tape drive, use the command: dd if=/tmp/PHKL_27851.depot of=/dev/rmt/0m bs=2k Special Installation Instructions: This patch depends on base patch PHKL_18543. For successful installation please insure that PHKL_18543 is already installed, or that PHKL_18543 is included in the same depot with this patch and PHKL_18543 is selected for installation.