Patch Name: PHKL_30033 Patch Description: s700_800 11.11 Core PM, vPar, Psets Cumulative, slpq1; FSS Creation Date: 03/12/09 Post Date: 03/12/18 Repost: 04/03/11 The Other Dependencies section of the patch documentation was modified to clarify that PHKL_30037 must also be installed on systems with the HP-UX Processor Sets software. Hardware Platforms - OS Releases: s700: 11.11 s800: 11.11 Products: N/A Filesets: OS-Core.CORE2-KRN,fr=B.11.11,fa=HP-UX_B.11.11_32,v=HP OS-Core.CORE2-KRN,fr=B.11.11,fa=HP-UX_B.11.11_64,v=HP Automatic Reboot?: Yes Status: General Release Critical: No (superseded patches were critical) PHKL_29706: PANIC PHKL_27091: PANIC HANG PHKL_24257: OTHER Hung, Unkillable Process Category Tags: defect_repair enhancement general_release critical panic halts_system manual_dependencies Path Name: /hp-ux_patches/s700_800/11.X/PHKL_30033 Symptoms: PHKL_30033: ( SR:8606314571 CR:JAGae77335 ) Certain workloads cannot achieve their entitlements with the Fair Share Scheduler (FSS) when capping is enabled. This results in a performance degradation for some workloads. PHKL_29706: ( SR:8606236276 CR:JAGae05337 ) System panics with "data page fault". There are two stack traces that represent the same failure: panic string : Data page fault panic+0x6c report_trap_or_int_and_panic+0x94 trap+0xed4 thandler+0xd20 -------- TRAP ----------- find_thread_other_spu+0x60 idle_nonpset_loop+0x4e4 idle+0x4e0 swidle+0x28 panic: Zombie thread walks ! stack trace for event 0 crash event was a panic panic+0x6c thread_exit+0x200 thread_process_suspend+0x1ec issig+0x2a4 syscall+0x9e4 $syscallrtn+0x0 ( SR:8606236816 CR:JAGae05866 ) The Processor Sets based systems show performance degradation when relatively idle. This problem is observed only when the optional Process Sets (PROCSETS) product is installed. ( SR:8606274083 CR:JAGae38161 ) When running the IO based jobs with the Fair Share Scheduler (FSS) enabled via Process Resource Manager (PRM) or Workload Manager (WLM), the observation of performance degradation is 10-20%. ( SR:8606316028 CR:JAGae78747 ) When Process Resource Manager (PRM) is enabled, certain workloads -- especially memory intensive workloads -- may show significant performance degradation on relatively idle systems. PHKL_27091: ( SR: 8606236276 CR:JAGae05337 ) Panics due to run queue corruption may occur on systems with patches PHKL_24551 or PHKL_25389. The panics occur on systems in which at least one processor is idle, and symptoms may take the form of a data page fault panic in find_thread_other_spu() or gs_rendezvous_thread(), or a spinlock deadlock panic on the 'Per SPU RUNQ Lock'. ( SR:8606249635 CR:JAGae16022 ) Applications may hang with threads in the accept(2) system call. The problem occurs only when multiple threads are issuing accept(2) on the same socket, and when no thread calls accept(2) again after a thread is interrupted by a signal. ( SR:8606259436 CR:JAGae23754 ) System may panic with data page fault in clock interrupt path. The stack trace is as follows: panic+0x14 report_trap_or_int_and_panic+0x84 interrupt+0x1d4 $ihndlr_rtn+0x0 determine_processor_state+0xbc per_spu_hardclock+0xc8 clock_int+0x58 mp_ext_interrupt+0x150 ivti_patch_to_nop3+0x0 idle+0x108 swidle_exit+0x0 ( SR:8606234249 CR:JAGae03469 ) Enhancement: This product update is a member of a set needed to support the kernel sleep/wakeup queuing performance enhancement. The full list of product updates required for this feature are: PHKL_27091, PHKL_27294, PHKL_27093 and PHKL_27094. Performance degradation may be seen on systems in which a large number (500 or more) of TIMESHARE threads call the accept(2) function on a single socket. ( SR:8606245859 CR:JAGae12318 ) Processes which call vfork(2) can sometimes hang and become unkillable. Further, executing a setpriority(2) operation (e.g. via renice(1M)) on such a process may cause a kernel panic due to a Data Page Fault, with the stack trace: is_realtime+0x0 get_pregionnice+0x34 update_preg_nice+0x44 donice+0xc8 setpriority+0x6c syscall+0x750 syscallinit+0x5b0 PHKL_25389: ( SR:8606215976 CR:JAGad85148 ) When thousands of threads are waiting on a select(2) call, application performance slows down considerably. This is an enhancement to sleep queues to boost performance. ( SR:8606226427 CR:JAGad95496 ) Possible races from kernel subsystems that assume the entry to kernel sleep is atomic. This can result in missed wakeup events. PHKL_24551: ( SR:8606200799 CR:JAGad69975 ) This patch is a member of a set of patches needed to enable the HP-UX Processor Sets product (PROCSETS). When PROCSETS product is installed, it will install the full set of required patches for that product, including this patch. If the HP-UX Processor Sets product is not installed, this change will have no Processor Sets impact on your system. ( SR:8606199577 CR:JAGad68764 ) This patch is a member of a set of patches needed to enable the HP-UX Virtual Partitions product Sets. When the HP-UX Virtual Partitions product (VPARSBASE or T1335AA) is installed, it will install the full set of required patches for that product, including this patch. If the HP-UX Virtual Partitions product is not installed, this change will have no Virtual Partitions impact on your system. ( SR:8606194817 CR:JAGad64023 ) Load averages reported by such utilities as top and uptime are overall higher in 11.11 than they were in earlier releases. PHKL_23665: ( SR:8606128017 CR:JAGac78818 ) vhand priority does not match scheduling policy for brief durations. PHKL_24257: ( SR:8606159451 CR:JAGad28779 ) Duplicate ( SR:8606103740 CR:JAGab70789 ) A multi-threaded process being executed over NFS can become hung and unkillable while performing either a fork, core, setrlimit, SIGSTOP, or debugger operations. This can happen with mutiple threads in different processes competing for the same resource when one thread is stopped. Defect Description: PHKL_30033: ( SR:8606314571 CR:JAGae77335 ) Existing algorithms of the fair-share scheduler (FSS) make some decisions which are inappropriate for some workloads when the capping feature of FSS is enabled. This causes processors to remain idle even when some FSS groups have not attained their entitlements. Resolution: The FSS balancer and thread selection algorithms have been modified where capping is enabled so that the processors do not inappropriately idle. This improves the ability of FSS groups to attain their entitlements. The run queue management support has been updated to support the improved FSS capping mechanism. PHKL_29706: ( SR:8606236276 CR:JAGae05337 ) The first symptom is caused by looping forever on a thread's run queue links, which points back the thread itself. The second is caused by dereferencing a thread's null run queue links. Both are separate stages of the same problem. In idle() path, two of the synchronized flags in sequence are reversed in store order by the compiler optimization which causes an inconsistent thread state that leads to run queue corruption. PHKL_24551 and PHKL_25389 were impacted by the flipped store order. However, PHKL_27091 which supercedes the above two patches has the correct order even without the code fix. Resolution: Explicitly set the two flags to be volatile in idle code path to ignore compiler optimization. ( SR:8606236816 CR:JAGae05866 ) The Processor Sets functionality is consuming huge amounts of CPU cycles in the "idle" loop due to one heavy lock contention and cache misses. Resolution: Changes to reduce lock contention in a Processor Sets kernel. ( SR:8606274083 CR:JAGae38161 ) On a large PRM group count systems, HP-UX walks the run queue once per group. Also, per-group tick accounting is not very precise. Resolution: Remember what groups a system has on the first pass through the run queue and only walk it a second time if the system is guaranteed of a success. ( SR:8606316028 CR:JAGae78747 ) When PRM is enabled, an extreme case of cache thrashing is observed due to the unnecessary constant update of a global volatile variable in the idle() path. This scenario causes heavy traffic on the system bus, greatly impacting overall system performance on relatively idle systems. The idle() path is seen in two different places, one is based kernel, and the other one is Processor Sets kernel. These two paths are independent to each other. Resolution: Remove the update of the global volatile variable in the based kernel idle() path when PRM is enabled. PHKL_27091: ( SR: 8606236276 CR:JAGae05337 ) Patches PHKL_24551 and PHKL_25389 introduced a race condition in the interaction between the idle and suspend paths, leading to a thread being in an inconsistent state while either actively running or on the run queue. Resolution: In PHKL_27091 the race condition no longer exists. ( SR:8606249635 CR:JAGae16022 ) A thread receiving an event wakeup and a signal simultaneously will handle the signal. The event will not be handled even though there may be other threads waiting for that event. They will wait forever, unless another duplicate event occurs. Resolution: A signaled thread will now determine if it also received an event wakeup. If so, it will wake up the next waiting thread to handle the event. ( SR:8606259436 CR:JAGae23754 ) A clock interrupt occuring as soon as the idle loop enables interrupts may attempt to dereference a null thread pointer if the cpu state is stale, causing the panic. Resolution: Set the processor state information earlier in the idle loop, before interrupts are enabled. ( SR:8606234249 CR:JAGae03469 ) This product update contains a performance enhancement to the kernel sleep/wakeup queuing mechanism. Resolution: Implement a new sleep/wakeup queuing mechanism that addresses the performance issue. ( SR:8606245859 CR:JAGae12318 ) A race condition in vfork(2) causes a wakeup to be missed. As the parent is left in an incoherent state, a subsequent priority setting operation encounters a stale pointer, causing the Data Page Fault. Resolution: Fixed operation sequence to close the race, so that the wakeup is not missed. PHKL_25389: ( SR:8606215976 CR:JAGad85148 ) This is an enhancement for a performance problem seen while trying to remove a single thread from a long sleep queue. This would be useful to customers who are making numerous system calls that would cause threads to sleep on the same sleep queue, such as select(2). Resolution: The sleep queues were changed from a single-linked list to a double-linked list. ( SR:8606226427 CR:JAGad95496 ) This is an enhancment that allows kernel subsystems to enter kernel sleep with alternative locking rules. By permitting these new locking rules, other subsystems are able to close race windows around entering and leaving kernel sleep. If there is no other patch that requires this change, it will do nothing. Resolution: Permit kernel subsystems entering kernel sleep to hold an additional resource to prevent race conditions . PHKL_24551: ( SR:8606200799 CR:JAGad69975 ) This patch contains minor enhancements required to support the HP-UX Processor Sets product. Resolution: Enhancements added to enable scheduler to recognize and work with processor sets when the Processor Sets product is enabled. ( SR:8606199577 CR:JAGad68764 ) This patch contains minor enhancements required to support the HP-UX Virtual Partitions product. Resolution: Enhancements added to support CPU migration. ( SR:8606194817 CR:JAGad64023 ) System daemon threads are factored into the load average calculations in 11.11 where they were not in earlier releases. This makes the reported load averages higher than they were in earlier releases. Resolution: This patch changes the load average calculations to once more disregard system daemon threads, resulting in load averages much more closely aligned to those in earlier releases. PHKL_23665: ( SR:8606128017 CR:JAGac78818 ) When vhand's priority is increased due to it being preempted, there is a race with other threads which are also raising vhand's priority at the same time. Thus, when vhand switches back to run again, its policy and priority number do not match. The priority and policy will be back in sync once the thread which elevated the priority of vhand is restored to run again. Resolution: Preemption_point no longer elevates the priority of the preempted thread. PHKL_24257: ( SR:8606159451 CR:JAGad28779 ) Duplicate ( SR:8606103740 CR:JAGab70789 ) A thread acquires a lock and then sleeps interruptibly. The interruptible sleep permits the thread to be stopped. Any other thread attempting to acquire this lock will sleep uninterruptibly until the lock is available. This uninterruptible thread is also unkillable. This introduces a deadlock potential in multi-threaded processes: when a thread holding the lock, a thread desiring the lock, and a third thread doing one of fork, setrlimit, core, SIGSTOP, or debugger operations, all occur at the same time in the same process, the deadlock is reached. The only way to resolve the deadlock is to reboot the system. A similar situation can occur when threads in different processes are competing for the same NFS resource and the thread that owns that resource is stopped via a signal, a debugger, or a ctrl-Z. This patch is part of a set of five patches (PHKL_24253, PHKL_24254,PHKL_24255,PHKL_24256,PHKL_24257) that enable P_NOSTOP, a new feature that prevents a process from being unkillable. Each patch is independently installable. Without all five installed, P_NOSTOP will be unavailable. In order to prevent the process executed over NFS from becoming unkillable, NFS must use the P_NOSTOP feature. Usage of this feature was added to PHNE_23502. Resolution: If a thread acquires a lock and then sleeps interruptibly, it is not permitted to be stopped if P_NOSTOP is set. This prevents this thread from becoming unkillable and prevents the deadlock. Enhancement: No (superseded patches contained enhancements) PHKL_29706: Enhancements were delivered in a patch this one has superseded. Please review the Defect Description text for more information. SR: 8606103740 8606128017 8606159451 8606194817 8606199577 8606200799 8606215976 8606226427 8606234249 8606236276 8606236816 8606245859 8606249635 8606259436 8606274083 8606314571 8606316028 Patch Files: OS-Core.CORE2-KRN,fr=B.11.11,fa=HP-UX_B.11.11_32,v=HP: /usr/conf/lib/libpm.a(pm_swtch.o) /usr/conf/lib/libvm.a(vm_stats.o) OS-Core.CORE2-KRN,fr=B.11.11,fa=HP-UX_B.11.11_64,v=HP: /usr/conf/lib/libpm.a(pm_swtch.o) /usr/conf/lib/libvm.a(vm_stats.o) what(1) Output: OS-Core.CORE2-KRN,fr=B.11.11,fa=HP-UX_B.11.11_32,v=HP: /usr/conf/lib/libpm.a(pm_swtch.o): pm_swtch.c $Date: 2003/12/05 14:05:15 $Revision: r11 .11/11 PATCH_11.11 (PHKL_30033) /usr/conf/lib/libvm.a(vm_stats.o): vm_stats.c $Date: 2001/07/17 16:02:01 $Revision: r11 .11/1 PATCH_11.11 (PHKL_24551) OS-Core.CORE2-KRN,fr=B.11.11,fa=HP-UX_B.11.11_64,v=HP: /usr/conf/lib/libpm.a(pm_swtch.o): pm_swtch.c $Date: 2003/12/05 14:05:15 $Revision: r11 .11/11 PATCH_11.11 (PHKL_30033) /usr/conf/lib/libvm.a(vm_stats.o): vm_stats.c $Date: 2001/07/17 16:02:01 $Revision: r11 .11/1 PATCH_11.11 (PHKL_24551) cksum(1) Output: OS-Core.CORE2-KRN,fr=B.11.11,fa=HP-UX_B.11.11_32,v=HP: 2147077744 43316 /usr/conf/lib/libpm.a(pm_swtch.o) 3506511545 10060 /usr/conf/lib/libvm.a(vm_stats.o) OS-Core.CORE2-KRN,fr=B.11.11,fa=HP-UX_B.11.11_64,v=HP: 1955573375 105008 /usr/conf/lib/libpm.a(pm_swtch.o) 2324151306 24640 /usr/conf/lib/libvm.a(vm_stats.o) Patch Conflicts: None Patch Dependencies: s700: 11.11: PHKL_27093 PHKL_27094 PHKL_30032 PHKL_30034 PHKL_30035 PHKL_30036 s800: 11.11: PHKL_27093 PHKL_27094 PHKL_30032 PHKL_30034 PHKL_30035 PHKL_30036 Hardware Dependencies: None Other Dependencies: PHKL_30033: On systems with the HP-UX Processor Sets product (PROCSETS) version A.01.00.00.06 installed, PHKL_30037 must be installed with this patch to avoid a system panic. PHKL_29706: To solve the Processor Sets performance degradation problem JAGae05866 and PRM performance degradation problem JAGae78747, PHKL_29709 must be installed. PHKL_24257: If NFS is installed on the system, all five patches (PHNE_23502, PHKL_24253, PHKL_24254,PHKL_24255, PHKL_24256, PHKL_24257) are required to resolve the process hang/deadlock due to unkillable processes executed over NFS. However, if NFS is not in use, none of these patches are required. Supersedes: PHKL_29706 PHKL_27091 PHKL_25389 PHKL_24551 PHKL_24257 PHKL_23665 Equivalent Patches: None Patch Package Size: 110 KBytes Installation Instructions: Please review all instructions and the Hewlett-Packard SupportLine User Guide or your Hewlett-Packard support terms and conditions for precautions, scope of license, restrictions, and, limitation of liability and warranties, before installing this patch. ------------------------------------------------------------ 1. Back up your system before installing a patch. 2. Login as root. 3. Copy the patch to the /tmp directory. 4. Move to the /tmp directory and unshar the patch: cd /tmp sh PHKL_30033 5. Run swinstall to install the patch: swinstall -x autoreboot=true -x patch_match_target=true \ -s /tmp/PHKL_30033.depot By default swinstall will archive the original software in /var/adm/sw/save/PHKL_30033. If you do not wish to retain a copy of the original software, include the patch_save_files option in the swinstall command above: -x patch_save_files=false WARNING: If patch_save_files is false when a patch is installed, the patch cannot be deinstalled. Please be careful when using this feature. For future reference, the contents of the PHKL_30033.text file is available in the product readme: swlist -l product -a readme -d @ /tmp/PHKL_30033.depot To put this patch on a magnetic tape and install from the tape drive, use the command: dd if=/tmp/PHKL_30033.depot of=/dev/rmt/0m bs=2k Special Installation Instructions: None