Patch Name: PHSS_28322 Patch Description: s700_800 11.X MetroCluster CA A.04.10 patch Creation Date: 02/11/27 Post Date: 03/01/17 Repost: 04/07/20 The patch documentation was modified to clarify information regarding issue #2 documented in the Symptoms and Defect Description fields. Hardware Platforms - OS Releases: s700: 11.00 11.11 s800: 11.00 11.11 Products: B8109BA MetroCluster CA A.04.10 B7659BA Continental Cluster A.03.03 Filesets: SG-CA-Tool.CM-SGCA,fr=A.04.10,fa=HP-UX_B.11.00_32/64,v=HP Automatic Reboot?: No Status: General Release Critical: Yes PHSS_28322: CORRUPTION Possible silent data loss during failover. Category Tags: defect_repair general_release critical corruption Path Name: /hp-ux_patches/s700_800/11.X/PHSS_28322 Symptoms: PHSS_28322: 1. During a normal failover where the XP/CA pair device is expected to swap personalities, the MetroCluster CA script can allow packages to startup even though the personality swap for the disk arrays fails. Unless the user checks the device state, they may assume that the data is remotely protected. But under this case, the data is not remotely protected. 2. While an application is running on the PVOL side, the CA links fail in the direction of the PVOL to the SVOL but the CA links in the direction of the SVOL to the PVOL are still intact. Note under this state if the application continues to write data to the PVOL, the SVOL side will become non-current. If the application fails over to the remote data center and when MetroCluster/CA cannot get status of PVOL, MetroCluster/CA will swap the pair device using the old data on the SVOL and thus the PVOL data will be overwritten without the consent to the user. Please see the Special Installation Instructions for possible changes required to the MetroCluster/CA environment file after the installation of this patch. Defect Description: PHSS_28322: 1. When performing a SWAP takeover, the MetroCluster/CA script accepts a return code for SVOL takeover. The MetroCluster/CA script should only accept a successful return code for SWAP takeover.. Any other return code from SWAP takeover indicates some kind of problem with XP disk array. 2. In the MetroCluster/CA script, the script will perform a "pairresync -swaps". This command doesn't check to see that the data on the PVOL is flushed over to the SVOL prior to swapping the personalities. Therefore under circumstances where only the CA links from the PVOL to SVOL direction fail and then a failover occurs after some data has been written to the PVOL, the MetroCluster/CA script will execute "pairresync -swaps" and swap the personalities. This swap will cause the PVOL data to be overwritten with the SVOL data which is non-current. Please see the Special Installation Instructions for possible changes required to the MetroCluster/CA environment file after the installation of this patch. Enhancement: No SR: 8606266733 8606282619 8606368589 Patch Files: SG-CA-Tool.CM-SGCA,fr=A.04.10,fa=HP-UX_B.11.00_32/64,v=HP: /usr/sbin/DRCheckXPCADevGrp.ed what(1) Output: SG-CA-Tool.CM-SGCA,fr=A.04.10,fa=HP-UX_B.11.00_32/64,v=HP: /usr/sbin/DRCheckXPCADevGrp.ed: A.04.10 MCCA - PHSS_28322 - Last modified 12/02/02 cksum(1) Output: SG-CA-Tool.CM-SGCA,fr=A.04.10,fa=HP-UX_B.11.00_32/64,v=HP: 3282569063 3227 /usr/sbin/DRCheckXPCADevGrp.ed Patch Conflicts: None Patch Dependencies: None Hardware Dependencies: None Other Dependencies: None Supersedes: None Equivalent Patches: None Patch Package Size: 30 KBytes Installation Instructions: Please review all instructions and the Hewlett-Packard SupportLine User Guide or your Hewlett-Packard support terms and conditions for precautions, scope of license, restrictions, and, limitation of liability and warranties, before installing this patch. ------------------------------------------------------------ 1. Back up your system before installing a patch. 2. Login as root. 3. Copy the patch to the /tmp directory. 4. Move to the /tmp directory and unshar the patch: cd /tmp sh PHSS_28322 5. Run swinstall to install the patch: swinstall -x autoreboot=true -x patch_match_target=true \ -s /tmp/PHSS_28322.depot By default swinstall will archive the original software in /var/adm/sw/save/PHSS_28322. If you do not wish to retain a copy of the original software, include the patch_save_files option in the swinstall command above: -x patch_save_files=false WARNING: If patch_save_files is false when a patch is installed, the patch cannot be deinstalled. Please be careful when using this feature. For future reference, the contents of the PHSS_28322.text file is available in the product readme: swlist -l product -a readme -d @ /tmp/PHSS_28322.depot To put this patch on a magnetic tape and install from the tape drive, use the command: dd if=/tmp/PHSS_28322.depot of=/dev/rmt/0m bs=2k Special Installation Instructions: PHSS_28322: The fix implemented in this patch for Symptom item 2 causes changes in behavior of the MetroCluster/CA script, which may require changing the currently configured value of the AUTO_NONCURDATA variable for any MetroCluster/CA packages. So before or immediately after the installation of this patch, it is recommended to determine the AUTO_NONCURDATA (and in some cases the AUTO_FENCEDATA_SPLIT) variables in the MetroCluster/CA package environment files for all packages using MetroCluster/CA. Use the Background information and example scenarios below to help determine the correct settings for your MetroCluster/CA packages. Background information: In MetroCluster/CA A.04.10 prior to the installation of this patch, when a device group state is SVOL_PAIR on local site and EX_ENORMT or EX_CMDIOE on remote site, MetroCluster/CA will try to start up the package using the "pairresync -swaps" command. If the CA link is in a good state (meaning that the SVOL data is still in sync with the PVOL), this command will resynchronize the data from SVOL to PVOL, and then swaps the rolls of the PVOL and SVOL device groups. Then Metrocluster/CA allows the package to startup. The above assumptions and the actions being taken were wrong and might allow automatic package start up on non-current data, and possible overwriting of current data on the previous PVOL site (refer to the "Defect Description" section). This patch was created to fix this defect. The new logic introduced in this patch changes the behavior so when a device group state is SVOL_PAIR on the local site and EX_ENORMT (Raid Manager or node failure) or EX_CMDIOE (disk I/O failure) on the remote site (meaning that it is impossible for MetroCluster/CA to determine if the data on the SVOL site is current), MetroCluster/CA conservatively assumes that the data on the SVOL site may not be current and uses the value of AUTO_NONCURDATA to determine whether the package is allowed to automatically start up. If the value is 1, MetroCluster/CA allows the package to startup; otherwise, the package will not be started. With MetroCluster/CA A.04.10 prior to this patch, when you set AUTO_NONCURDATA=3D0, the device group state is as above, and the CA link is in a good state, the package will allowed to automatically start up. However, with this patch, the package will not be allowed to start up. As a result of this change in behavior, you may need to consider changing the value of AUTO_NONCURDATA to 1, based on your device group fence level and package failover policy. Definition of AUTO_NONCURDATA after this patch has been installed: AUTO_NONCURDATA is set to 0 means the user does not want the application to start up on data that may not be current. If MetroCluster/CA cannot determine the data is current, it will not allow the package to start up. (To ensure the data is current, both PVOL and SVOL have to be in PAIR state) AUTO_NONCURDATA is set to 1 means the user wants the application to start up even when the data may not be current. The data may or may not be current when the device group pair state is not PVOL_PAIR/SVOL_PAIR After applying this patch, use the 2 scenarios below to help you determine the correct environment settings for AUTO_NONCURDATA and AUTO_FENCEDATA_SPLIT for your MetroCluster/CA packages. Scenario 1: With the package device group fence level DATA, if you set AUTO_FENCEDATA_SPLIT=3D0, you are guaranteed that the remote data site will never contain non-current data (this assumes that the FORCEFLAG has not been used to allow the package to start up if the CA links or SVOL site are down). For this environment, you can set AUTO_NONCURDATA=3D1 to make the package automatically startup on the SVOL site when the PVOL site fails, and you are guaranteed that the package data is current. (If you set AUTO_NONCURDATA=3D0, the package will not be automatically started up on SVOL site.) Scenario 2: With the package device group fence level NEVER or ASYNC, you are not guaranteed that the remote (SVOL) data site still contains current data (The application can continue to write data to the device group on the PVOL site if the CA links or SVOL site are down, and it is impossible for MetroCluster/CA to determine whether the data on the SVOL site is current after that). For this environment, you must set AUTO_NONCURDATA=3D0 if you want to ensure your package application is running on current data. (If you set AUTO_NONCURDATA=3D1, the package will be started up on SVOL site whether the data is current or not.)