Patch Name: PHCO_23651 Patch Description: s700_800 11.00 fsck_vxfs(1M) cumulative patch Creation Date: 01/03/21 Post Date: 01/03/23 Hardware Platforms - OS Releases: s700: 11.00 s800: 11.00 Products: N/A Filesets: JournalFS.VXFS-BASE-RUN,fr=B.11.00,fa=HP-UX_B.11.00_32/64,v=HP Automatic Reboot?: No Status: General Release Critical: Yes PHCO_23651: CORRUPTION A significantly fragmented VxFS filesystem may be corrupted by fsck_vxfs(1M). Two consecutive invocations of fsck_vxfs(1M) can result in a directory corruption. PHCO_22453: CORRUPTION OTHER fsck may incorrectly detect a corruption on an LCT entry that is correct. Attempting to fix it will cause that entry to become corrupt. fsck may fail to recover a filesystem after a system hang or panic and the filesystem will fail to mount. PHCO_20882: CORRUPTION When invoked by extendfs(1M) during the process of extending a VxFS file system, fsck(1M) may corrupt the file system. Subsequent invocations of fsck(1M) will fail to repair the file system and the file system will fail to mount. PHCO_13411: OTHER It is a backward compatibility issue. If you are using OmniStore (DMAPI app) you will experience data corruption. PHCO_13377: OTHER The vxfs fsck command will sometimes fail to recover the file system with the message "no valid ILISTS for fset 999". After this, one is unable to mount the file system. Category Tags: defect_repair general_release critical corruption Path Name: /hp-ux_patches/s700_800/11.X/PHCO_23651 Symptoms: PHCO_23651: 1. A data corruption can occur when fsck_vxfs(1M) is run on a significantly fragmented filesystem. During the fsck run an message similar to the following may be displayed: "fileset 999 iau 5 summary incorrect - fix? (ynq) y" However, it is possible that instead of fixing the iau summary an arbitrary block would be overwritten. Fsck then reports that all is fixed, and allows to mount a filesystem. An unpredictable behavior may result from this error, depending on the data contained in the corrupted block. 2. If fsck_vxfs(1M) is run twice in a row in a log reply mode it may corrupt the directory entries. A full fsck will be required, and the lost data should be manually restored from lost+found to recover. PHCO_22453: 1.fsck deletes user data stored in extended attribute inodes. For each file that has such attributes a message displayed: fileset 999 primary inode ???? has invalid attributes clear? (ynq) By default these are cleared. 2.fsck incorrectly detects LCT corruption: pass0 - checking structural files pass1 - checking inode sanity and blocks pass2 - checking directory linkage pass3 - checking reference counts Fileset 999 LCT entries are incorrect, fix? (ynq)n pass4 - checking resource maps OK to clear log? (ynq)n 3.After a system hang or panic fsck may fail with the following error: log replay in progress vxfs fsck: file system does not contain a valid log vxfs fsck: cannot perform log replay pass0 - checking structural files pass1 - checking inode sanity and blocks vxfs fsck: fileset 999 primary inode xxxx reorg extent list overflow file system check failure, aborting ... PHCO_20882: fsck fails to repair a filesystem during an extendfs operation. This problem was introduced in PHCO_15037. fsck goes into infinite loop as follows: # fsck -F vxfs -o /dev/vg01/rlvol2 replay in progress file system is not clean, full fsck required pass0 - checking structural files vxfs fsck: structural inode 97 (Primary Ilist 1) failed validationclear? (ynq)y pass1 - checking inode sanity and blocks pass2 - checking directory linkage pass3 - checking reference countsrebuild structural files? (ynq)y pass0 - checking structural files vxfs fsck: structural inode 97 (Primary Ilist 1) failed validation clear? (ynq)y Pass2 ... fsck -m fails to check the sanity of insane filesystem when ran through the fs_wrapper. fsck fails to repair the filesystem after conversion to vxfs by vxconvert PHCO_17556: fsck fails with the following message: fileset 1 primary inode 65 has invalid size (2190737408) fileset 1 primary inode 97 has invalid size (2190737408) no valid ILISTs for fileset 999 file system check failure, aborting ... PHCO_17009: fsck fails with the following message: log replay in progress pass0 - checking structural file pass1 - checking inode sanity and blocks pass2 - checking directory linkage pass3 - checking reference counts vxfs fsck: invalid LCT extent PHCO_15037: Upon a system crash a full fsck occurred on filesystems even though fsck reported that a log replay was not required and the filesystem is clean. Fsck incorrectly marks IFQUO,IFILIST,IFIAU,IFEMR inodes bad if they are sparse (have less blocks allocated to them as compared to their sizes in blocks). The reorg pointer (rlp) is not properly incremented as it transverses the reorglist. If a resize operation is in progress when a system failure occurs then fsck cannot clean the filesystem. Add assertions to the HOLD_BP() and RELE_BP() macros so that the hold count on the buffer is sane. Fsck was not correctly validating the inodes against the CUT value. An additional change for the modifications made in PHCO_13377 concerning failure of fsck to recover a file system resulting in the message "no valid ILISTS for fset 999". PHCO_13411: OmniStore makes extensive use of extended attributes via the DMAPI interface. Its use of the vx_attr_direct structure requires that member ad_len be of type 32bit unsigned. Its current type of 64bit has broken compatibility with the 10.20/10.10/10.01 releases. Currently, only OmniStore uses the vx_attr_direct structure. This patch defines a new structure vx_attr_direct2 instead of modifying vx_inode.h. This patch requires the installation of PHKL_13387. PHCO_13377: The vxfs fsck command may complain about invalid LCT entries. The vxfs fsck command fails to recover a file system resulting in the message "no valid ILISTS for fset 999". Defect Description: PHCO_23651: 1. fsck_vxfs(1M) does not handle odd-sized extents correctly. On a significantly fragmented filesystem extent sizes can become odd, so that IAU summary block would not follow IAU header. However fsck assumes that those two are consecutive blocks and overwrites the block immediately following the IAU header. 2. In reply mode fsck_vxfs attempts to optimize the directory structure. It moves the entries towards the beginning of a list, if there is room. If interrupted and restarted again in a log reply mode it will not be aware of the moved entries, and will try to write to the old locations of the moved directories, as per the intent log. Resolution: 1. The code in fsck_vxfs(1M) was modified to handle the extents with IAU summary not immediately following the IAU header correctly 2. Directory optimization during the log reply removed from fsck_vxfs(1M) PHCO_22453: 1.fsck deletes extended attributes with user data (created by a third party app). The problem is in passing a 32 bit value to a 64 bit function without a proper cast. 2.lct_check() calls iget() for each inode corresponding to an entry in the LCT buffer without having a hold on the LCT buffer. Sometimes the LCT buffer is reused for inode data, after which we will very quickly discover "incorrect entries". If fsck is allowed to "fix" these entries, it will actually corrupt the filesystem. 3.The fsck failure, "fileset 999 primary inode xxx reorg extent list overflow", was due to fsck having a limitation of 1000 extents in a reorg inode and an incorrect algorithm to detect inconsistencies between the reorg and original inode. This fsck failure may occur after a system crash or hang. Two such documented systems failures were fixed in PHKL_21941 and PHKL_22393. Resolution: 1.Casted variable to 64 bit. 2.Put a hold on the LCT buffer while we're using it. 3.Implemented new algorithm for reorg inode checking. PHCO_20882: fsck fails to repair a filesystem during operation of extendfs. The bug is in using an incorrect variable name when switching the org type to TYPED. This problem was introduced in PHCO_15037 fsck fails to validate structural inodes and loops. Will send fsck into an infinite loop. The system stays unmountable. fsck -m fails to detect an insane file system when it is ran through the fs_wrapper, which is the normal way of running fsck. The problem occured because of an uninitialized data structure. After upgrading from hfs to vxfs on 128G+ system fsck fails to fix the filesystem, resulting in an unmountable filesystem. The problem is in the incorrect buffer offest calculation. Resolution: Corrected the variable name to fix failing of fsck during extendfs operation The code was added to validate structural inodes and to fix the corrupted ones. The code was added to initialise a data structure which was originally left uninitialised. A one line change implemented to correct the calculation of the offset into the buffer. PHCO_17556: fsck will fail on a filesystem with greater than 8 million inodes and the largefiles option not set. For a filesystem to accomodate more than 8 million inodes the structural ILIST file must be greater that 2GB. The fsck failure occurs during validation of the inode referencing the structural ILIST file. The inode's size field is greater than 2GB and because the largefile option is not set an error flag is set. Resolution: This fix requires both a fsck command and a kernel patch. An additional test has been added to the fsck inode validation routine to check if the inode is referencing a structural file prior to setting an error condition if the size field is greater than 2GB and the largefile option is not set. This patch requires the installation of kernel patches of PHKL_17869 and PHKL_14764. PHCO_17009: fsck was incorrectly handling extents less than 8k. For Version 3 filesystems variable-sized indirect extents are allowed. An extent of less than 8k can occur because of filesystem fragmentation. PHCO_15037: Upon a system crash a full fsck occurred on filesystems even though fsck reported that a log replay was not required and the filesystem is clean. This occurred on filesystems tagged as "dusty". The filesystems were correctly tagged however fsck incorrectly marked the filesystem for a full fsck. Fsck incorrectly marks IFQUO,IFILIST,IFIAU,IFEMR inodes bad if they are sparse (have less blocks allocated to them as compared to their sizes in blocks). These inodes types along with regular inodes and inodes with IMMED organization are allowed to be sparse. The reorg pointer (rlp) was not properly incremented as it transverses the reorglist. If a resize operation is in progress when a system failure occurs then fsck cannot clean the filesystem. Fsck will now complete (or undo for pre-v3 filesystems) a pending resize operation when run with the -y command line option. Add assertions to the HOLD_BP() and RELE_BP() macros so that the hold count on the buffer is sane. Fsck was not correctly validating the inodes against the CUT value. An additional change for the modifications made in PHCO_13377 concerning failure of fsck to recover a file system resulting in the message "no valid ILISTS for fset 999". PHCO_13411: The vx_attr_direct structure is defined in the vx_inode.h file. This structure is used by OmniStore. During one of numerous integration cycles with DFS we redefined the ad_len member of the vx_attr_ direct structure to a DFS-compatible data type vxhyper_t (64bit value). Unfortunately, the disk layout impact was overlooked. OmniStore makes extensive use of extended attributes via DMAPI and most of those attributes are direct. The vx_attr_direct structure should keep ad_len as a 32bit unsigned type in order to keep the backward compatibility with the 10.20/10.10/10.01 releases. PHCO_13377: The vxfs fsck command may complain about invalid LCT entries. It complained because the disk had LCT counts of 0 with the lcd_free bit set and the incore copy of the table also had a count of 0, but the free bit wasn't set. Since fsck will never set the free bit incore, if this is the only difference then the disk is correct. For every fileset, fsck calls lct_check which checks the LCT for all filesets. lct_check should either process only one fileset or it should call it only once. The vxfs fsck command fails to recover a file system resulting in the message "no valid ILISTS for fset 999". The ilist for fset 999 has gone double indirect. While adding the fileset 999, the primary and replica inodes for the fileset ilist are read in buffers in the buffer cache and then compared. The extent maps of both inodes are compared using the inodes in the buffer cache and calling a function due to double indirection. One of the calls to bread() within that function doesn't pass a buffer to read in data, consequently the bread function reads in data in random memory causing corruption. The fix is to pass the missing buffer argument to the bread function. SR: 8606147578 5003459271 5003460253 8606125754 1653291369 1653276618 4701376574 Patch Files: JournalFS.VXFS-BASE-RUN,fr=B.11.00,fa=HP-UX_B.11.00_32/64, v=HP: /sbin/fs/vxfs/fsck what(1) Output: JournalFS.VXFS-BASE-RUN,fr=B.11.00,fa=HP-UX_B.11.00_32/64, v=HP: /sbin/fs/vxfs/fsck: $ PATCH/11.00:PHCO_19491 Aug 9 1999 09:49:32 $ PATCH_11_00: aggr.o attr.o bmap.o dir.o extent.o ext ern.o fset.o inode.o links.o lwrite.o machde p.o main.o olt.o readi.o replay.o subr.o sub replay.o super.o 01/03/21 cksum(1) Output: JournalFS.VXFS-BASE-RUN,fr=B.11.00,fa=HP-UX_B.11.00_32/64, v=HP: 4242980044 491520 /sbin/fs/vxfs/fsck Patch Conflicts: None Patch Dependencies: s700: 11.00: PHKL_17869 s800: 11.00: PHKL_17869 Hardware Dependencies: None Other Dependencies: None Supersedes: PHCO_13377 PHCO_13411 PHCO_15037 PHCO_17009 PHCO_17556 PHCO_20882 PHCO_22453 Equivalent Patches: None Patch Package Size: 510 KBytes Installation Instructions: Please review all instructions and the Hewlett-Packard SupportLine User Guide or your Hewlett-Packard support terms and conditions for precautions, scope of license, restrictions, and, limitation of liability and warranties, before installing this patch. ------------------------------------------------------------ 1. Back up your system before installing a patch. 2. Login as root. 3. Copy the patch to the /tmp directory. 4. Move to the /tmp directory and unshar the patch: cd /tmp sh PHCO_23651 5. Run swinstall to install the patch: swinstall -x autoreboot=true -x patch_match_target=true \ -s /tmp/PHCO_23651.depot By default swinstall will archive the original software in /var/adm/sw/save/PHCO_23651. If you do not wish to retain a copy of the original software, use the patch_save_files option: swinstall -x autoreboot=true -x patch_match_target=true \ -x patch_save_files=false -s /tmp/PHCO_23651.depot WARNING: If patch_save_files is false when a patch is installed, the patch cannot be deinstalled. Please be careful when using this feature. For future reference, the contents of the PHCO_23651.text file is available in the product readme: swlist -l product -a readme -d @ /tmp/PHCO_23651.depot To put this patch on a magnetic tape and install from the tape drive, use the command: dd if=/tmp/PHCO_23651.depot of=/dev/rmt/0m bs=2k Special Installation Instructions: None