Patch Name: PHSS_30417 Patch Description: s700_800 11.11 ServiceGuard Extension for RAC A.11.15.00 Creation Date: 04/07/05 Post Date: 04/07/16 Hardware Platforms - OS Releases: s700: 11.11 s800: 11.11 Products: ServiceGuard Extension for RAC A.11.15.00 Filesets: SG-NMAPI.CM-NMAPI,fr=A.11.15.00,fa=HP-UX_B.11.11_32/64,v=HP Automatic Reboot?: No Status: General Release Critical: Yes PHSS_30417: OTHER Oracle OPS8i instance may not shutdown gracefully, shutdown abort is to be issued to shutdown the database. Oracle RAC 8i trace file might show following errors: *** SESSION ID:(32.34) 2004-05-01 19:19:19.926 *** 2004-05-01 19:19:19.926 ksedmp: internal or fatal error ORA-00600:internal error code, arguments:[ksimgchg:1], [4294967295],[],[],[],[],[],[] RAC 9i and 10g will shutdown with out any problem, however there will be error messages in the cmgmsd log file, if logging is enabled with gmsetlog, and in NMAPI2 log files as below: NMAPI log files might show following errors: gm_slave_detach: cl_msg_tcp_recv failed (231, Software caused connection abort) May 17 21:44:45 [11159] Closing fd 18 In cmgmsd log file the following error messages are seen: May 17 21:44:45 [4890] slave_deregister: exiting May 17 21:44:45 [4890] WARNING: Failed to send reply message to client (235,Socket is not connected) Yes PHSS_29096: OTHER RAC instance cannot open the database as it fails to register with cmgmsd. Registration fails with EOVERFLOW error. The following error message is seen in the Oracle trace file: ORA-00600: internal error code, arguments: [ksimgchg:1], [4294967295], [], [], [ ], [], [], [] The error is permanent currently Other errors observed: Sun Dec 21 21:01:05 2003 ORA-00304: requested INSTANCE_NUMBER is busy ORA-27300: OS system dependent operation:skgxnreg: gm_primary_attach failed In the gms log file, the following error message is found: Dec 23 16:45:37 [26520] ERROR: Failed to get process name for pid: 13868 (-1,72) Because of these errors RAC instances cannot open the database Category Tags: defect_repair general_release critical Path Name: /hp-ux_patches/s700_800/11.X/PHSS_30417 Symptoms: PHSS_30417: 1. Oracle OPS 8i database may not shutdown gracefully, shutdown abort is to be used to shutdown the database. Oracle RAC 8i trace file might show following errors: *** SESSION ID:(32.34) 2004-05-01 19:19:19.926 *** 2004-05-01 19:19:19.926 ksedmp: internal or fatal error ORA-00600: internal error code, arguments:[ksimgchg:1], [4294967295],[],[],[],[],[],[] In the case of RAC 9i and 10g, the database will shutdown but we see error messages in the cmgmsd log file, if logging is enabled with gmsetlog, as follows: Jun 11 10:44:06 [2391] WARNING: Failed to send reply message to client (235,Socket is not connected). PHSS_29096: 1. RAC instance may not be able register with cmgmsd because the system may not have sufficient network port numbers in the machine. The errno encountered is EADDRESSINUSE. 2. A RAC instance may fail to startup on a 64-bit system. RAC instance cannot open the DB as it fails to register with cmgmsd. Registration fails with EOVERFLOW error. The following error messages in the Oracle trace file may be an indication of this problem: ORA-00600: internal error code, arguments: [ksimgchg:1], [4294967295], [], [], [ ], [], [], [] Sun Dec 21 21:01:05 2003 ORA-00304: requested INSTANCE_NUMBER is busy ORA-27300: OS system dependent operation:skgxnreg: gm_primary_attach failed In the gms log file, the following error message is found: Dec 23 16:45:37 [26520] ERROR: Failed to get process name for pid: 13868 (-1,72) Number 72 in the previous line indicates errno for EOVERFLOW. Because of these errors, RAC instance is not able to open the database. Defect Description: PHSS_30417: 1. cmgmsd keeps the socket connection alive with the RAC's primary and slave processes. During shutdown of the database, the clients expects reply from cmgmsd. In this case cmgmsd is closing the socket before replying on that socket. Resolution: Close the socket after replying to the client. PHSS_29096: 1. SGeRAC disconnects the socket connection between client and cmgmsd, thus needing new network ports when client makes another connection with cmgmsd. If the system does not have sufficient network ports available, it may not be possible to establish new network connections. Resolution: We reuse the socket connections made between client process and cmgmsd. 2. When a 32-bit process is run on 64-bit OS, pstat_getproc () may return EOVERFLOW error. This is because when an executable is built in a 32-bit environment it uses data types according to 32-bit architecture. When the executable is run a 64-bit OS the data types used to fill the members would be 64-bit data types. This may cause OVERFLOW error depending on the data. We have noticed that when pst_addr, address of process in memory), exceeds 2 gig in size we ran into EOVERFLOW error. The code does not currently handle the EOVERFLOW return and exits, thus the RAC instance fails to start. The particular fields that we are interested in from the pstat_getproc() never overflow, so we can safely ignore the EOVERFLOW error return. Resolution: Ignore EOVERFLOW error for pstat_getproc and if process name is returned correctly continue with registration. Enhancement: No SR: 8606301658 8606343589 8606366642 Patch Files: SG-NMAPI.CM-NMAPI,fr=A.11.15.00,fa=HP-UX_B.11.11_32/64,v=HP: /opt/nmapi/nmapi2/lib/libnmapi2.1 /opt/nmapi/nmapi2/lib/pa20_64/libnmapi2.1 /usr/lbin/cmgmsd what(1) Output: SG-NMAPI.CM-NMAPI,fr=A.11.15.00,fa=HP-UX_B.11.11_32/64,v=HP: /opt/nmapi/nmapi2/lib/libnmapi2.1: A.11.15.00 Date: 06/20/04 Patch: PHSS_30417 Build date: Sun Jun 20 20:28:59 PDT 2004 Build id: ibld_sgerac_a1115patch_1111_product Build platform: hpux NMAPI2 /opt/nmapi/nmapi2/lib/pa20_64/libnmapi2.1: Build date: Sun Jun 20 20:21:45 PDT 2004 Build id: ibld_sgerac_a1115patch_1111_product Build platform: hpux - 64 bit NMAPI2 A.11.15.00 Date: 06/20/04 Patch: PHSS_30417 /usr/lbin/cmgmsd: HP92453-02A.11.00 HP-UX SYMBOLIC DEBUGGER (END.O ILP 32) $Revision: 75.02 $ Build date: Sun Jun 20 20:19:30 PDT 2004 Build id: ibld_sgerac_a1115patch_1111_product Build platform: hpux A.11.15.00 Date: 06/20/04 Patch: PHSS_30417 CMGMSD CMGMSD cksum(1) Output: SG-NMAPI.CM-NMAPI,fr=A.11.15.00,fa=HP-UX_B.11.11_32/64,v=HP: 3000099526 303104 /opt/nmapi/nmapi2/lib/libnmapi2.1 503945498 188616 /opt/nmapi/nmapi2/lib/pa20_64/libnmapi2.1 4174369725 354840 /usr/lbin/cmgmsd Patch Conflicts: None Patch Dependencies: None Hardware Dependencies: None Other Dependencies: None Supersedes: PHSS_29096 Equivalent Patches: PHSS_30418: s700: 11.23 s800: 11.23 Patch Package Size: 270 KBytes Installation Instructions: Please review all instructions and the Hewlett-Packard SupportLine User Guide or your Hewlett-Packard support terms and conditions for precautions, scope of license, restrictions, and, limitation of liability and warranties, before installing this patch. ------------------------------------------------------------ 1. Back up your system before installing a patch. 2. Login as root. 3. Copy the patch to the /tmp directory. 4. Move to the /tmp directory and unshar the patch: cd /tmp sh PHSS_30417 5. Run swinstall to install the patch: swinstall -x autoreboot=true -x patch_match_target=true \ -s /tmp/PHSS_30417.depot By default swinstall will archive the original software in /var/adm/sw/save/PHSS_30417. If you do not wish to retain a copy of the original software, include the patch_save_files option in the swinstall command above: -x patch_save_files=false WARNING: If patch_save_files is false when a patch is installed, the patch cannot be deinstalled. Please be careful when using this feature. For future reference, the contents of the PHSS_30417.text file is available in the product readme: swlist -l product -a readme -d @ /tmp/PHSS_30417.depot To put this patch on a magnetic tape and install from the tape drive, use the command: dd if=/tmp/PHSS_30417.depot of=/dev/rmt/0m bs=2k Special Installation Instructions: 1) Shutdown all RAC database instances on the node where you will be installing the patch 2) Halt ServiceGuard on that node. 3) Install this patch on that node. 4) Restart ServiceGuard on that node. 5) Restart all RAC instances 6) Repeat the steps above to install this patch on all nodes in the cluster.