Patch Name: PHKL_30972 Patch Description: s700_800 11.00 hang in aio_shutdown_fd_begin();EVP Creation Date: 04/06/07 Post Date: 04/06/14 Hardware Platforms - OS Releases: s700: 11.00 s800: 11.00 Products: N/A Filesets: OS-Core.CORE-KRN,fr=B.11.00,fa=HP-UX_B.11.00_32/64,v=HP ProgSupport.C-INC,fr=B.11.00,fa=HP-UX_B.11.00_32/64,v=HP OS-Core.CORE2-KRN,fr=B.11.00,fa=HP-UX_B.11.00_32,v=HP OS-Core.CORE2-KRN,fr=B.11.00,fa=HP-UX_B.11.00_64,v=HP Automatic Reboot?: Yes Status: General Release Critical: Yes PHKL_30972: PANIC PHKL_30372: HANG PHKL_22264: OTHER This is a performance improvement to support large numbers of outstanding AIO requests (aio_read or aio_write). With this patch we can see a magnitude decrease in system time for AIO. PHKL_20991: PANIC PHKL_29385: CORRUPTION PHKL_22840: PANIC PHKL_22209: PANIC PHKL_22146: PANIC PHKL_21857: PANIC Category Tags: defect_repair enhancement general_release critical panic halts_system corruption Path Name: /hp-ux_patches/s700_800/11.X/PHKL_30972 Symptoms: PHKL_30972: ( SR:8606345444 CR:JAGaf06294 ) Panic occurs during an ioctl(2) or close on an event port device. The stack traces can look like: soo_select2+0x8 unp_poll_handler+0x2c so_poll_switch+0x4c evp_dp_poll+0x1d8 evp_ioctl+0x80 spec_ioctl+0xb0 vno_ioctl+0x8c ioctl+0x138 or so_eventreg+0x78 evp_dereg_objhdr+0x30 evp_close+0x108 call_open_close+0x234 closed+0x98 spec_close+0x48 vn_close+0x40 vno_close+0x20 closef+0x60 PHKL_30372: ( SR:8606244386 CR:JAGae10873 ) A multithreaded process may hang with one thread inside exit and another thread continuously looping inside the AIO code. The stack trace of the thread sleeping in exit code can look like: _sleep+0x214 thread_halt_wait+0x198 for_specific_threads+0xc0 process_wide_halt_wait+0x2c exit+0x3a0 The stack trace of the thread looping inside the AIO code can look like: sleep_spinunlock+0x60 aio_shutdown_fd_begin+0x160 close+0x28 PHKL_22264: ( SR: 8606146004 CR: JAGad15340 ) ( SR: 8606126859 CR: JAGac59700 ) Large numbers of outstanding AIO requests (aio_read or aio_write) show exponential growth in system time and seriously degrade performance. ( SR: 8606105265 CR: JAGab73237 ) ( SR: 8606105049 CR: JAGab72870 ) ( SR: 8606123979 CR: JAGac39339 ) ( SR: 8606131023 CR: JAGad00181 ) When an application using POSIX Async I/O is killed sometimes the system could panic with a data page fault within the routine aio_rw_child_thread(). ( SR: 8606139114 CR: JAGad08401 ) Applications using POSIX Async I/O could panic with a data page fault within the routine aio_fsync() in some memory shortage cases. ( SR: 8606141682 CR: JAGad11042 ) ( SR: 8606141293 CR: JAGad10654 ) ( SR: 8606144570 CR: JAGad13910 ) ( SR: 8606145187 CR: JAGad14525 ) Applications using POSIX Async I/O could panic with a data page fault within the routine aio_rw_child_thread() when one thread tries to create a new aio request for a file while the other thread is closing the same file. panic+0x100 report_trap_or_int_and_panic+0x94 trap+0x1198 nokgdb+0x8 IsAProc+0xc aio_rw_child_thread+0x3c kthread_daemon_startup+0x24 kthread_daemon_startup+0x0 aio_rw_child_thread+0x3c ( SR: 8606142810 CR: JAGad12159 ) ( SR: 8606146666 CR: JAGad16009 ) ( SR: 8606147082 CR: JAGad16425 ) ( SR: 8606147022 CR: JAGad16365 ) ( SR: 8606151982 CR: JAGad21321 ) Applications using POSIX Async I/O could panic with a data page fault within the routine close() when a user coincidentally kills the process while a thread is closing a file. PHKL_20991: ( SR: 8606109916 CR: JAGab82617 ) Applications using POSIX Async I/O could panic with a data page fault within the routine aio_rw_child_thread() in some circumstances. PHKL_29385: ( SR:8606305253 CR:JAGae68301 ) In a multi-threaded process, a race condition may occur between a thread calling dup2(2) and a thread calling open(2), which can result in data loss or file corruption. PHKL_25613: ( SR:8606195573 CR:JAGad64777 ) This patch is a member of a set of patches needed to enable the eventport pseudo driver feature delivered in PHKL_24064. The eventport driver patch specifies the full set of required patches for this new feature. If the eventport pseudo driver patch (or superseding patch) is not installed, this change will have no impact on your system. PHKL_22840: ( SR: 8606165509 DTS: JAGad34802 ) Multithreaded applications may panic the system after doing select, poll or other system calls caused by bad file/socket pointer. The stack of the panic thread might look like: panic+0x14 report_trap_or_int_and_panic+0x4c trap+0xe9c $RDB_trap_patch+0x38 select+0x36c syscall+0x750 $syscallrtn+0x0 panic+0x14 report_trap_or_int_and_panic+0x84 trap+0xd9c nokgdb+0x8 soo_select2+0x14 soo_select+0x14 pollscan+0xa8 poll+0x104 syscall+0x480 $syscallrtn+0x0 PHKL_22209: ( SR: 8606144099 DTS: JAGad13432 ) Multithreaded applications may panic the system after doing a fork(2). The stack of the panic thread might look like this: panic+0x14 report_trap_or_int_and_panic+0x80 trap+0xdb8 nokgdb+0x8 vn_close+0x10 vno_close+0x20 closef+0x68 close+0x48 syscall+0x480 $syscallrtn+0x0 PHKL_22146: ( SR: 8606144971 CR: JAGad14309 ) A multiprocessor system running a multithreaded application panics due to spinlock contention. This happens in an environment where heavy file system processing is done over the net. The crux of the problem is that, the application is attempting to close a file twice. PHKL_21857: ( SR: 8606141690 CR: JAGad11050 ) Currently kernel threads that allocate file descriptors are prevented access to that file descriptor until the open is complete. Certain file types -- such as sockets -- that have a delayed opening mechanism require that for multithreaded applications that the opening thread may have access to the file descriptor during this opening transition state and that all other threads of the process are prevented access, in order for syscalls such as accept(2) to work correctly. PHKL_21355: ( SR: 8606132618 CR: JAGad01767 ) The Praesidium IDS/9000 product requires this patch in order to run. This patch has no impact on systems without the Praesidium IDS/9000 product installed and enabled. Defect Description: PHKL_30972: ( SR:8606345444 CR:JAGaf06294 ) The defect can happen in the following two scenarios: 1) The registration of a file descriptor with an event port was not handled properly in certain cases. 2) A race condition exists between registration and deregistration of a file descriptor with a given event port. Resolution: 1) The implementation was modified to handle all possible registration paths. 2) The race condition was closed. PHKL_30372: ( SR:8606244386 CR:JAGae10873 ) The exiting thread was waiting for the thread looping inside the AIO code to finish. The thread inside the AIO code will call sleep to let the AIO operation finish. As the process is exiting the sleep will immediately be interrupted and the thread loops continuously calling sleep and being interrupted. Resolution: The thread which was looping inside the AIO code sets the EINTR after it is interrupted from its sleep and returns. PHKL_22264: ( SR: 8606146004 CR: JAGad15340 ) ( SR: 8606126859 CR: JAGac59700 ) Kernel profiling shows locating aio requests as a bottle-neck. A little hash table in there is needed for aio performance purposes. Resolution: A hash table has been added for aio requests and corresponding changes have been made in AIO data structure and internals. ( SR: 8606105265 CR: JAGab73237 ) ( SR: 8606105049 CR: JAGab72870 ) ( SR: 8606123979 CR: JAGac39339 ) ( SR: 8606131023 CR: JAGad00181 ) If an AIO process is interrupted (by killing the process) while closing down its file descriptors, there is a possibility that it will free resources associated with that file descriptor before all threads have completed, which will lead to a data page fault. Resolution: The AIO shutdown routine has been modified to recheck the file descriptor state after being interrupted to ensure there are no outstanding references before releasing resources. ( SR: 8606139114 CR: JAGad08401 ) If the creation of aio daemon thread failed (because of memory shortage), before destroying the request we still should clean up its aio sync queue. Otherwise, a data page fault would result. Resolution: The AIO sync routine has been modified. Only when the aio sync queue is null, we cancel the action which move off the incomplete sync request from the aio sync queue and then destroy the request. ( SR: 8606141682 CR: JAGad11042 ) ( SR: 8606141293 CR: JAGad10654 ) ( SR: 8606144570 CR: JAGad13910 ) ( SR: 8606145187 CR: JAGad14525 ) If aio read/write of a thread try to create a new aio request while the other thread is shutting down aio requests by calling close(), there is a race between them which could lead the aio request queue inconsistant. This could cause a data page fault. Resolution: Introduce a file pointer and use spin lock to prevent other thread from creating new aio requests until aio shutting down has been completed. ( SR: 8606142810 CR: JAGad12159 ) ( SR: 8606146666 CR: JAGad16009 ) ( SR: 8606147082 CR: JAGad16425 ) ( SR: 8606147022 CR: JAGad16365 ) ( SR: 8606151982 CR: JAGad21321 ) If exit() and close() both shutdown aio operations (when a user coincidentally kills the process while a thread is closing a file), there could be a race between them for the file's aio shutting down flag. Base on the value of this flag, the cleaning up of aio requests should wait until the completion of all aio requests for this file. Otherwise, it would cause a data page fault. Resolution: The AIO shutdown routine has been modified to let the aio shutting down function to be waiting until the completion of all aio requests for this file, although the file's aio shutting down flag is set. PHKL_20991: ( SR: 8606109916 CR: JAGab82617 ) If an AIO process is interrupted while closing down its file descriptors, there is a possibility that it will free resources associated with that file descriptor before all threads have completed, which will lead to a data page fault. Resolution: The AIO shutdown routine has been modified to recheck the file descriptor state after being interrupted to ensure there are no outstanding references before releasing resources. PHKL_29385: ( SR:8606305253 CR:JAGae68301 ) During the dup2() processing, the destination fd must be closed if already opened, and then reallocated in order to duplicate the source fd. If an open() call from another thread occurs between the close of the fd and the reallocation of the fd, the destination fd may incorrectly be used by both the open() and the dup2() call, subsequently resulting in data loss or file corruption. Resolution: The dup2() routine was changed to free and allocate the file descriptor atomically. PHKL_25613: ( SR:8606195573 CR:JAGad64777 ) This change contains minor enhancements required to support the eventport feature. Resolution: Enhancements added include a file descriptor subsytem interface used by the eventport driver and respective eventport driver callbacks. PHKL_22840: ( SR: 8606165509 DTS: JAGad34802 ) There are two different panics involved. 1. One problem is a race condition between two theads in a process. One thread is allocating a user file descriptor while the other is trying to access this same file descriptor. 2. The other problem is that the getf/putf scheme does not support multiple recursive getf/putf pairs in a single thread. There is a race condition between two threads in a process when one thread is in multiple recursive getf/putf calls for a file descriptor while the other is also trying to get this file descriptor. Resolution: 1. Before accessing a file descriptor with getf(), the file descriptor thread lock should first be obtained to exclude other operations on this file descriptor by other threads. 2. Add a condition statement in putf() to ensure that the protection for the file descriptor is released only when no one is accessing it. PHKL_22209: ( SR: 8606144099 DTS: JAGad13432 ) During a fork from a multithreaded process, some file descriptors may be copied from the parent to the child without having a hold (ie, incrementing the reference count) on the file for the child. These files may become inactive during the child's lifetime, and thereafter referencing the fields of the file by the child results in a Data Page Fault. Resolution: Modified fork code to put a hold (increment the reference count) on all applicable files while setting up the child process. The code which manages the highest file descriptor count (which fork relies on) has been corrected so that it always reflects an accurate value with respect to the parent process. PHKL_22146: ( SR: 8606144971 CR: JAGad14309 ) In a multiprocessor system one processor panics when it can't get a spinlock. This happens when one processor is executing falloc() and the other crfree(). The panic occurs in the crfree() because the application is trying to close the same file twice, in essence it is trying to close a non-existing file or 'fp' the second time. Resolution: Check the file credentials before they are freed by crfree() in the closef() function, that is used to close a file. PHKL_21857: ( SR: 8606141690 CR: JAGad11050 ) Enhancement to the file descriptor handling code that enables multithreaded application appropriate access to files that are in the process of being opened. Resolution: Set the fd_locker_tid field in ufalloc() function so that the thread that did falloc() has access to the file descriptor untill the open is complete and in the interim no other thread is allowed access to the file descriptor. PHKL_21355: ( SR: 8606132618 CR: JAGad01767 ) This patch is one of 16 patches (PHKL_21348-PHKL_21363) required by the Praesidium IDS/9000 product. These patches enable the collection and tracking of information from various system calls. Unless all of the enabling patches (or their successors) and the product are installed, and the product is enabled, this patch has no impact on the system. Resolution: This patch enables the gathering of information from the fcntl(), fstat(), and close() system calls. Enhancement: No (superseded patches contained enhancements) PHKL_30372: Enhancements were delivered in a patch this one has superseded. Please review the Defect Description text for more information. PHKL_29385: Enhancements were delivered in a patch this one has superseded. Please review the Defect Description text for more information. SR: 8606109916 8606132618 8606141690 8606144099 8606144971 8606146004 8606165509 8606195573 8606244386 8606305253 8606345444 Patch Files: OS-Core.CORE-KRN,fr=B.11.00,fa=HP-UX_B.11.00_32/64,v=HP: /usr/conf/h/aio.h ProgSupport.C-INC,fr=B.11.00,fa=HP-UX_B.11.00_32/64,v=HP: /usr/include/sys/aio.h OS-Core.CORE2-KRN,fr=B.11.00,fa=HP-UX_B.11.00_32,v=HP: /usr/conf/lib/libhp-ux.a(aio_subr.o) /usr/conf/lib/libhp-ux.a(aio_syscall.o) /usr/conf/lib/libhp-ux.a(kern_dscrp.o) OS-Core.CORE2-KRN,fr=B.11.00,fa=HP-UX_B.11.00_64,v=HP: /usr/conf/lib/libhp-ux.a(aio_subr.o) /usr/conf/lib/libhp-ux.a(aio_syscall.o) /usr/conf/lib/libhp-ux.a(kern_dscrp.o) what(1) Output: OS-Core.CORE-KRN,fr=B.11.00,fa=HP-UX_B.11.00_32/64,v=HP: /usr/conf/h/aio.h: aio.h $Date: 2000/08/22 08:19:46 $Revision: r11ros/2 PATCH_11.00 (PHKL_22264) ProgSupport.C-INC,fr=B.11.00,fa=HP-UX_B.11.00_32/64,v=HP: /usr/include/sys/aio.h: aio.h $Date: 2000/08/22 08:19:46 $Revision: r11ros/2 PATCH_11.00 (PHKL_22264) OS-Core.CORE2-KRN,fr=B.11.00,fa=HP-UX_B.11.00_32,v=HP: /usr/conf/lib/libhp-ux.a(aio_subr.o): aio_subr.c $Date: 2004/01/27 01:39:00 $Revision: r11 ros/11 PATCH_11.00 (PHKL_30372) /usr/conf/lib/libhp-ux.a(aio_syscall.o): aio_syscall.c $Date: 2000/08/22 08:19:46 $Revision: r11ros/8 PATCH_11.00 (PHKL_22264) /usr/conf/lib/libhp-ux.a(kern_dscrp.o): kern_dscrp.c $Date: 2004/05/25 08:52:57 $Revision: r 11ros/19 PATCH_11.00 (PHKL_30972) OS-Core.CORE2-KRN,fr=B.11.00,fa=HP-UX_B.11.00_64,v=HP: /usr/conf/lib/libhp-ux.a(aio_subr.o): aio_subr.c $Date: 2004/01/27 01:39:00 $Revision: r11 ros/11 PATCH_11.00 (PHKL_30372) /usr/conf/lib/libhp-ux.a(aio_syscall.o): aio_syscall.c $Date: 2000/08/22 08:19:46 $Revision: r11ros/8 PATCH_11.00 (PHKL_22264) /usr/conf/lib/libhp-ux.a(kern_dscrp.o): kern_dscrp.c $Date: 2004/05/25 08:52:57 $Revision: r 11ros/19 PATCH_11.00 (PHKL_30972) cksum(1) Output: OS-Core.CORE-KRN,fr=B.11.00,fa=HP-UX_B.11.00_32/64,v=HP: 3378983998 13666 /usr/conf/h/aio.h ProgSupport.C-INC,fr=B.11.00,fa=HP-UX_B.11.00_32/64,v=HP: 3378983998 13666 /usr/include/sys/aio.h OS-Core.CORE2-KRN,fr=B.11.00,fa=HP-UX_B.11.00_32,v=HP: 3048312636 17792 /usr/conf/lib/libhp-ux.a(aio_subr.o) 3678888404 13124 /usr/conf/lib/libhp-ux.a(aio_syscall.o) 2703466521 19528 /usr/conf/lib/libhp-ux.a(kern_dscrp.o) OS-Core.CORE2-KRN,fr=B.11.00,fa=HP-UX_B.11.00_64,v=HP: 3931524776 39856 /usr/conf/lib/libhp-ux.a(aio_subr.o) 3699229736 28368 /usr/conf/lib/libhp-ux.a(aio_syscall.o) 137233734 44264 /usr/conf/lib/libhp-ux.a(kern_dscrp.o) Patch Conflicts: None Patch Dependencies: s700: 11.00: PHKL_18543 s800: 11.00: PHKL_18543 Hardware Dependencies: None Other Dependencies: PHKL_30973 along with this patch provides the fix for the defects in event port driver reported in JAGaf06294. Supersedes: PHKL_29385 PHKL_25613 PHKL_22840 PHKL_22209 PHKL_22146 PHKL_21857 PHKL_21355 PHKL_20991 PHKL_30372 PHKL_22264 Equivalent Patches: PHKL_30541: s700: 11.11 s800: 11.11 Patch Package Size: 130 KBytes Installation Instructions: Please review all instructions and the Hewlett-Packard SupportLine User Guide or your Hewlett-Packard support terms and conditions for precautions, scope of license, restrictions, and, limitation of liability and warranties, before installing this patch. ------------------------------------------------------------ 1. Back up your system before installing a patch. 2. Login as root. 3. Copy the patch to the /tmp directory. 4. Move to the /tmp directory and unshar the patch: cd /tmp sh PHKL_30972 5. Run swinstall to install the patch: swinstall -x autoreboot=true -x patch_match_target=true \ -s /tmp/PHKL_30972.depot By default swinstall will archive the original software in /var/adm/sw/save/PHKL_30972. If you do not wish to retain a copy of the original software, include the patch_save_files option in the swinstall command above: -x patch_save_files=false WARNING: If patch_save_files is false when a patch is installed, the patch cannot be deinstalled. Please be careful when using this feature. For future reference, the contents of the PHKL_30972.text file is available in the product readme: swlist -l product -a readme -d @ /tmp/PHKL_30972.depot To put this patch on a magnetic tape and install from the tape drive, use the command: dd if=/tmp/PHKL_30972.depot of=/dev/rmt/0m bs=2k Special Installation Instructions: None