Patch Name: PHKL_2874 Patch Description: s700 8.07 Patch for NFS corruption/EACCESS/automount This patch fixes data corruption and/or premature end-of-file conditions across NFS. The problem has been experienced by two customers in two different ways: Case1: The customer was using the -async option on the NFS server. The client was doing sequential writes using the O_APPEND flag. The client was also running multiple biods. The O_APPEND forces the client to lseek to the end-of-file before processing the write. Therefore, if the end-of-file is incorrect, the client will overwrite sections of the file and the file will end up shorter than expected. The root cause for this case was simply that the client was processing NFS write replies out-of-order. This was caused by the fact that the server was sending the replies too close to each other due to the use of -async. A good workaround to this problem is to run with 0 biods or to run with 1 biod. Not so good workarounds include using O_SYNC along with O_APPEND or not using the -async option on the server. Case2: The customer was using a Sun server running NFS version 4.2. The client was doing sequential writes with a random read executed once-in-a-while to verify the last 512 bytes written. The read was sometimes failing with 0 bytes returned, indicating end-of-file. (That is called a premature end-of-file.) The root cause for this case was in the behavior of the Sun NFS 4.2 server. The Sun 4.2 server code returns stale file attributes whenever a duplicate write request is made. The HP client was receiving incorrect information from the server, specifically, the location of the end-of-file (i.e. the file size was too small). Note that this would happen only on the reply to a duplicate write request. Since the read was happening on the last 512 bytes, the HP client would believe the end-of-file coming back from the Sun server (as a result of a duplicate write) and return 0 bytes to the user program. The fix for these two problems was to force the HP client code to double check with the server BEFORE truncating the file size. Essentially, if a write reply comes back with a file size that is smaller than expected, the client will go across-the-wire with a getattr call to double check. If the getattr returns the smaller file size again, then the HP client assumes the file has honestly been truncated. ================================================================== PHKL_2475: This patch fixes a bug that occurs in an NFS server when it receives a duplicate CREATE request. The bug is that this duplicate request may fail with EACCES even though the original request succeeded. This patch allows the NFS server to properly process the duplicate CREATE request. The corresponding 9.01/S700 patch is PHKL_2476. A typical scenario where this problem may occur is a bunch of NFS clients working very heavily (> 100 NFS requests per second; see nfsstat(1M) on a single server. One customer experienced the problem when running lots of rm/cp/mv commands; once in a while a cp or mv command would fail with "permission denied" which translates into an EACCES returned from a creat system call. The problem is caused by the following sequence of events: ========================================================= o The client generates a CREATE request with 0000 permissions. o The server performs the CREATE operation and sends a reply. o The reply is dropped and never received by the client. o The client generates a second/duplicate CREATE request. o The server attempts to do a second CREATE which sets the u.u_error flag to EACCES. o The sever identifes the second CREATE as a duplicate request. o The sever fails to reset the u.u_error flag to 0; this is key! o The server performs a LOOKUP operation to return the file id. o The entry is not in the directory name lookup cache (dnlc) which causes the server to invoke native file system code. o The native file system code sees the EACCES in u.u_error and returns immediately. The fix was simply to clear the u.u_error flag after detecting a duplicate request and before performing the LOOKUP operation. PHKL_1785: Fixes NFS data loss. This patch fixes a problem where some writes to an NFS file can be lost if another process is stat(2)'ing the file. The process writing the file has to be also doing a lseek(2) to the end of the file between writes. This combination is likely to cause the problem. It is also possible that multiple processes stat()'ing the file while one is writing it could cause the problem. PHKL_1602: All NFS kernel patches. This patch contains all the kernel patches for NFS. It is designed to make it easier for customers get all the patches at once. There are seven problems fixed with this patch that were not fixed with PHKL_1102, the previous NFS super KERNEL patch. They are the first seven problems described below. This patch was originally released with a bug in nfs_server.o that could cause spurios error messages of "rfs_readdir: bad directory: mangled entry." This would rarely occur but was not a real error. This patch has been updated to fix this bug and you should make sure that the what strings in your kernel match the one given below for nfs_server.o. This patch fixes a problem where NFS file systems may be incorrectly marked as "ignore" in /etc/mnttab on diskless systems. This would cause commands like bdf to not report disk space for the file systems. The "ignore" designation is only used by the automounter but sometimes a diskless client would mark a file system as "ignore" in /etc/mnttab. The problem would occur whether or not the automounter was running on the system. Also if the automounter was running on the system, file systems may not have been marked "ignore" when they should have been. This patch fixes both symptoms of the problem. This patch allows the automounter to work correctly on diskless clusters. Without this patch the clients can not access the automounted file systems which are configured as direct-mapped mounts. You should also install patch PHCO_1527 to fix a booting problem when the automounter is used in a diskless cluster. The patch lowers some timeout values that are used to control how often the lock manager retries remote locks. Without this patch, retries will take 15 seconds which can be very painful. The patch also fixes a memory leak that can occur using NFS. The amount of memory available to the system can be bled off until there is not enough left to execute processes. This only occurs if a non-HP machine is an NFS client. The patch fixes an infinite loop that occurs if NFS trys to read a mangled directory entry. In the case of a readdir(2) system call, if the directory length is 0, the kernel can infinite loop making the processor unavailable to any process. The processor will still respond to interuppts so it the machine will respond to ping(1m), but it is essentially locked up. Note that readdir() is used by the ls(1) command, so just using ls over NFS can cause the hang. This patch also fixes a problem where a process that opens a NFS file and then reads and writes to it in a loop runs slowly. The problem was that every read after a write on the same file was causing a flush of the to the server. Without this patch, automount will not clean up direct-mapped mount points correctly when it is killed. This can cause processes to hang if they access these mount points. This was originally in PHKL_0836. This fix is in HP-UX 9.0. The fix in patch PHKL_1374 is included in this patch since it was mistakenly made so that applying PHKL_1374 would undo part of PHKL_1102 or vice-versa. It fixes a rare problem where a full or nearly full disk that has a fragment size greater than 1k can a panic with the message "freeing free frag". Only SDS disk arrays have larger blocksizes by default, but other filesystems can have large blocksizes if created using options to makefs or newfs. This fix is in HP-UX 9.0. This also patch fixes a problem where using the automounter can cause processes to hang if too many processes try to talk to the automounter daemon at the same time. The processes would use all the CLIENT structures, but the automounter needed one to service the process's request so it would sleep waiting for one to be freed, while the processes would hold the CLIENT structures until the automounter responded. Classic deadlock. This patch fixes the problem by increasing the number of CLIENT structures from 6 to 100. This was originally in PHKL_0876. This fix is in HP-UX 9.0. A problem is fixed where making a directory, then mounting a nfs file system on that directory with invalid options acts like it succeeds but really doesn't. An example of an invalid option is rsize=-1. This fix is in HP-UX 9.0. Finally, this patch also fixes a problem where clients may keep stale data indefinitely. If a file on the NFS server is updated within a second of the last access by an NFS client, the client may keep the stale data indefinitely. This fix is in HP-UX 9.0. The NFS Problem on HP-UX: ======================== After the first access, the NFS client kernel has the file attributes and file data in its cache. In a subsequent access, the kernel client code will: - go across the wire to refresh the file attributes - compare the new modification time (coming over the wire), using the seconds only, to the previous modification time (stored in the client's rnode). o if the seconds are the same (namely, the server managed to modify the file within a second of the client's last access), the client kernel code will NOT invalidate the file data in its buffer cache and will therefore keep the stale file data in its buffer cache indefinitely. How come Sun's implementation works? =================================== The Sun code compares seconds and microseconds; it's very unlikely that set server could modify the file within a microsecond of the client's last access. Why is HP-UX different? ====================== The system call utime(2) allows user level programs to change the modification time for any particular file. Unfortunately, this system call takes time_t as an argument (HP-UX/POSIX/Bell), not a struct timeval (see ). The time_t limitation limits the granularity of the time stamp to seconds; consequently, utime(2) resets the microseconds field to 0. (NOTE: Sun's implementation of utime(2) is BSD-based and does take a struct timeval as an argument.) The HP-UX NFS client code change has a comment stating that "We've had instances of things like fbackup(1), ftio(1), file(1) etc. zero out the microseconds field and cause an unexpected "killed on text modification error" even though the contents of the file had not really changed." In other words, with the original Sun code, if a backup utility on the server uses utime(2) on an executable that is running on a client, the client kernel code will kill the process because it thinks that the executable has been modified on the server. The change for HP-UX was made to eliminate this problem for executables. In essence, the HP-UX client code ignores differences in the microseconds fields, which unfortunately, creates the problem we are now patching. PHKL_0736: system panics with bus error in nfs kernel code. The system panics with a bus error in the kernel routine do_bio(). The file fixed was nfs/nfs_vnops.c PHKL_0836: automount fails to clean-up properly when killed. Without this patch, automount will not clean up direct-mapped mount points correctly when it is killed. This can cause processes to hang if they access these mount points. The file fixed was nfs/nfs_vnops.c PHKL_0876: Processes hang when using NFS automounter This patch fixes a problem where using the automounter can cause processes to hang if too many processes try to talk to the automounter daemon at the same time. The processes would use all the CLIENT structures, but the automounter needed one to service the process's request so it would sleep waiting for one to be freed, while the processes would hold the CLIENT structures until the automounter responded. Classic deadlock. This patch fixes the problem by increasing the number of CLIENT structures from 6 to 100. The file fixed was nfs/nfs_subr.o PHKL_0942: automounter patches. "Data segmentation fault" in do_bio. This patch is a compilation of all previous patches that affect the automounter. It is designed to make it easier for customers. PHKL_1102: automounter patches. NFS patches. This patch is a compilation of all previous patches that affect the automounter and NFS. It is designed to make it easier for customers get all the patches at once. PHKL_1374: Fixes freeing free frag for systems with > 1k frag sizes Path Name: /hp-ux_patches/s700/8.X/PHKL_2874 Effective Date: 930825 Patch Files: /system/PHKL_2874/new/dux_lookup.o /system/PHKL_2874/new/dux_mount.o /system/PHKL_2874/new/klm_lckmgr.o /system/PHKL_2874/new/nfs_server.o /system/PHKL_2874/new/nfs_subr.o /system/PHKL_2874/new/nfs_vnops.o /system/PHKL_2874/new/ufs_alloc.o SR/APR#: 5003094326 "what" string/timestamp: dux_lookup.o: PATCH_8.07: dux_lookup.o 1.7.61.4 92/10/21 PHKL_1602 dux_mount.o: PATCH_8.07: dux_mount.o 1.8.61.5 92/10/21 PHKL_1602 klm_lckmgr.o: PATCH_8.07: klm_lckmgr.o 1.4.61.3 92/10/21 PHKL_1602 nfs_server.o: nfs_server.c $Date: 93/04/28 16:22:16 $ $Revision: 1.17.77.4 $ PATCH_8.07 (PHKL_2475) nfs_subr.o: PATCH_8.07: nfs_subr.o 1.12.61.2 92/03/02 PHKL_0942 PHKL_1102 PHKL_1602 nfs_vnops.o: PATCH_8.07 nfs_vnops.c Revision: 1.17.77.11 Date: 93/08/25 18:37:32 PHKL_2874 ufs_alloc.o: PATCH_8.07: ufs_alloc.o 1.29.61.2 92/11/16 PHKL_0942 PHKL_1102 PHKL_1374 PHKL_1602 "sum" output: 8835 25 dux_lookup.o 50705 34 dux_mount.o 35170 11 klm_lckmgr.o 50502 46 nfs_server.o 50480 28 nfs_subr.o 9146 49 nfs_vnops.o 43189 34 ufs_alloc.o Dependencies: None. Supersedes: PHKL_0736 PHKL_0836 PHKL_0876 PHKL_0942 PHKL_1102 PHKL_1374 PHKL_1602 PHKL_1785 PHKL_2475 Patch Package Size: 185 Kbytes Installation Instructions: Please review all instructions and the Hewlett-Packard SupportLine User Guide or your Hewlett-Packard support terms and conditions for precautions, scope of license, restrictions, and, limitation of liability and warranties, before installing this patch. Note: Please back up your system before you patch. --------------------------------------------------------------------------- After getting the patch onto your machine, unshar the patch (sh PHKL_2874). To install this patch do the following: 1) Run /etc/update (Note: you must be logged in as root to update a system). 2) Once in the update "Main Menu" move the highlighted line to "Change Source or Destination ->" and press "Return" or "Select Item". 3) Make sure the highlighted item in the "Change Source or Destination" window is "From Tape Device to Local System ...", then press "Return" or "Select Item". 4) You should now be in the "From Tape Device to Local System" window. Change the "Source: /dev/rmt/0m" to "Source: /tmp/PHKL_2874.updt" (this assumes that you are in the /tmp directory where PHKL_2874.updt has been placed). Note: You must enter the complete path name. 5) Press "Done". 6) From here on follow the standard directions for update. The customized script that update runs will move the original software to /system/PHKL_2874/orig. HP recommends keeping this software there in order to recover from any potential problems. It is also recommended that you move the PHKL_2874.text file to /system/PHKL_2874 to be retained for future reference. If you wish to put this patch on a magnetic tape and update from the tape drive, dd a copy of the patch to the tape drive. As an example the following will create a copy of the patch that update can read: dd if=PHKL_2874.updt of=/dev/rmt/0m bs=2048