Patch Name: PHKL_25053 Patch Description: s700_800 11.04 (VVOS) FCMS Driver Patch Creation Date: 01/09/28 Post Date: 01/10/22 Hardware Platforms - OS Releases: s700: 11.04 s800: 11.04 Products: N/A Filesets: OS-Core.CORE2-KRN,fr=B.11.04,fa=HP-UX_B.11.04_32,v=HP OS-Core.CORE2-KRN,fr=B.11.04,fa=HP-UX_B.11.04_64,v=HP FCMassStorage.FCMS-ENG-A-MAN,fr=B.11.04,fa=HP-UX_B.11.04_32/64,v=HP FCMassStorage.FCMS-KRN,fr=B.11.04,fa=HP-UX_B.11.04_32,v=HP FCMassStorage.FCMS-KRN,fr=B.11.04,fa=HP-UX_B.11.04_64,v=HP FCMassStorage.FCMS-RUN,fr=B.11.04,fa=HP-UX_B.11.04_32/64,v=HP VirtualVaultOS.VVOS-AUX-IA,fr=B.11.04,fa=HP-UX_B.11.04_32/64,v=HP Automatic Reboot?: Yes Status: General Release Critical: Yes PHKL_25053: PANIC CORRUPTION HANG Based on HP-UX Patch PHKL_23939: PANIC CORRUPTION PHKL_23045: PANIC CORRUPTION HANG Based on HP-UX Patch PHKL_21834: PANIC CORRUPTION Based on HP-UX Patch PHKL_21000: PANIC Based on HP-UX Patch PHKL_20207: PANIC Category Tags: defect_repair hardware_enablement enhancement general_release critical panic halts_system corruption Path Name: /hp-ux_patches/s700_800/11.X/PHKL_25053 Symptoms: PHKL_25053: Ported HP-UX patch PHKL_23939 to VVOS Based on HP-UX patch PHKL_23939: (1) JAGab77626/8606107413 When a disaster test hazard (c16) is run on Array device (Hitachi disks), a number of DIOC_RSTCLR failures with EIO are reported. (2) JAGad08849/8606139546 Data page fault observed in FC driver when timer left after login was removed. No stack trace is available. The problem was seen with a suspected faulty card. (3) JAGac29405/8606114642 The system panics on an assertion failure when the driver attempts to pop from an empty call back function stack. panic assertion failed(CBFN stack underflow) stack trace for event 0 crash event was a panic panic+0x14 assfail+0x3c _assfail+0x30 POP_CBFN+0x68 fcpbh_fcp_cbfn+0x1e8 fcpbh_xmit_completer+0x344 fcT1_isr+0x864 epic_isr+0xa0 mp_ext_interrupt+0x270 ivti_patch_to_nop3+0x0 fcT1_watchdog_timer+0x98 invoke_callouts_for_self+0x1d0 sw_service+0xa4 mp_ext_interrupt+0x28c ivti_patch_to_nop3+0x0 splx+0xa4 resume_cleanup+0x4a8 no_fcoproc_restore+0xc swtch+0x1b0 trap+0x2928 nokgdb+0x8 stack trace for event 0 crash event was a panic r0 /r1 /r2 0'00000000 0'00000001 0'0005253c r3 /r4 /r5 0'41758000 0'5bf4e500 0'4e431268 r6 /r7 /r8 0'4e431200 0'00835988 0'416d0070 r9 /r10/r11 0'416d0060 0'416d0068 0'00000800 r12/r13/r14 0'416d1bc4 0'416d1c44 0'416d1bfc r15/r16/r17 0'416c76e0 0'00000000 0'00000000 r18/r19/r20 0'00000008 0'00000000 0'00000053 r21/r22/r23 0'00000003 0'0000000c 0'0800001f r24/r25/r26 0'00000000 0'004257e4 0'08115580 r27/r28/r29 0'08278e40 0'00000058 0'08115750 r30/r31/r32 0'08115780 0xffffffff'ffffffff sr0 /sr1 /sr2 0'079bb800 0'00000000 0'00000000 sr3 /sr4 /sr5 0'00000000 0'00000000 0'0b12a400 sr6 /sr7 /sr8 0'05728c00 0'00000000 Reference64 failed at 0.0x81b248 LEVEL FUNC lev 0) panic+0x14 ARG0 0x08115580 ARG1-ARG7 n/a lev 1) assfail+0x3c ARG0-ARG7 n/a lev 2) _assfail+0x30 ARG0 0'008359c8 ARG1 0'008359a0 ARG2 0'000002a0 ARG3 0'00000000 ARG4-ARG7 n/a lev 3) POP_CBFN+0x68 ARG0 0'4e4312d8 ARG1-ARG7 n/a lev 4) fcpbh_fcp_cbfn+0x1e8 ARG0 0'41758000 ARG1 0'00000003 ARG2 0'5bf4e500 ARG3-ARG7 n/a lev 5) fcpbh_xmit_completer+0x344 ARG0 n/a ARG1 0'00000000 ARG2 0'416c76e0 ARG3 0'00000000 ARG4-ARG7 n/a lev 6) fcT1_isr+0x864 ARG0 0'416d0000 ARG1 0'00000000 ARG2 0'00000000 ARG3 0'00000000 ARG4-ARG7 n/a lev 7) epic_isr+0xa0 ARG0-ARG7 n/a lev 8) mp_ext_interrupt+0x270 ARG0 0'08114980 ARG1-ARG7 n/a lev 9) ivti_patch_to_nop3+0x0 ARG0-ARG7 n/a lev 10)fcT1_watchdog_timer+0x98 ARG0 0'410c0000 ARG1 0'00000000 ARG3-ARG7 n/a lev 11)invoke_callouts_for_self+0x1d0 ARG0 0'00000000 ARG1-ARG6 n/A ARG7 0'007c07a0 lev 12) sw_service+0xa4 ARG0 n/a ARG1 0'08114050 ARG2-ARG7 n/a lev 13) mp_ext_interrupt+0x28c ARG0 0'08114050 ARG1-ARG7 n/a lev 14) ivti_patch_to_nop3+0x0 ARGO-ARG7 n/a lev 15) splx+0xa4 ARG0 0x1004'409ced68 ARG1-ARG7 n/a lev 16) resume_cleanup+0x4a8 ARG0 0xfffffff0'ffffffff ARG1-ARG5 n/a ARG6 0x400003ff'ffff0000 ARG7 n/a lev 17) no_fcoproc_restore+0xc ARG0 0x400003ff'ffff0000 ARG1-ARG7 n/a lev 18) swtch+0x1b0 ARG0-ARG7 n/a lev 19) trap+0x2928 ARG0 n/a ARG1 0x400003ff'ffff07a8 ARG2-ARG7 n/a lev 20) nokgdb+0x8 ARG0-ARG7 n/a (4) JAGad32471/8606163155 System hangs as hundreds of I/Os are hanging in the Tachyon fc driver. (5) JAGac59739/8606126898 System panics. The following is the stack trace panic+0x14 wait_for_lock+0x350 sl_retry+0x1c fcT1_awaiting_acc_to_els_timer+0x2d8 fcT1_els_xmit_completer+0x25b0 fcT1_nonperf_isr+0x718 fcT1_isr+0x38c epic_isr+0x50 mp_ext_interrupt+0x260 ivti_patch_to_nop3+0x0 idle+0xb04 swidle_exit+0x0 (6) JAGad42201/8606172941 The problem seen will be a system panic as a result of a data page fault. The stack trace 1: LEVEL FUNC lev 0) panic+0x6c ARG0-ARG7 n/a lev 1) report_trap_or_int_and_panic+0x94 ARG0 0'00000002 ARG1 0'0000000f ARG2 0'01138bb0 ARG3 0'00746d68 ARG4-ARG7 n/a lev 2) interrupt+0x208 ARG0 n/a ARG1 0'01138bb0 ARG2-ARG7 n/a lev 3) $ihndlr_rtn+0x0 ARG0-ARG7 n/a lev 4) fcpbh_act_dequeue+0x10 ARG0 0'40f62000 ARG1 0'48321300 ARG2-ARG7 n/a lev 5) fcpbh_notify+0x1888 ARG0 n/a ARG1 0'0000000c ARG2 0'000000e1 ARG3-ARG7 n/a lev 6) fcT1_notifyfc4_logout_sync+0x7c ARG0 0'4df28940 ARG1 0'000000e1 ARG2-ARG7 n/a lev 7) fcT1_ctrl+0x1598 ARG0 0'40e80000 ARG1 0'00000003 ARG2 0'480ebc00 ARG3 0'00000000 ARG4-ARG7 n/a lev 8) fcpbh_send_logout_fc4err+0x250 ARG0 0'40f62000 ARG1 0'480ebc00 ARG2 0'00000001 ARG3-ARG7 n/a lev 9) fcpbh_prlo_handler+0x190 ARG0 0'40f62000 ARG1 0'40ee7620 ARG2 0'43acc300 ARG3-ARG7 n/a lev 10) fcpbh_rcv_completer+0x1210 ARG0-ARG7 n/a lev 11) fcT1_process_els_frame+0x368 ARG0 0'40e80000 ARG1 0'43acc300 ARG2 0'40ee7620 ARG3 0'00000004 ARG4-ARG7 n/a lev 12) fcT1_process_read_pkt+0x12d8 ARG0 0'40e80000 ARG1 0'00000004 ARG2 0'43acc300 ARG3 0'43acc300 ARG4 0'40ee7620 ARG5-ARG7 n/a lev 13) fcT1_isr+0x60c ARG0-ARG7 n/a lev 14) mp_ext_interrupt+0x2ec ARG0 0'01137000 ARG1-ARG7 n/a lev 15) ivti_patch_to_nop3+0x0 ARG0-ARG7 n/a ****************************************************** The stack trace 2: LEVEL FUNC lev 0) panic+0x6c ARG0-ARG7 n/a lev 1) report_trap_or_int_and_panic+0x94 ARG0 0'00000002 ARG1 0'0000000f ARG2 0'0113c470 ARG3 0'00746d68 ARG4-ARG7 n/a lev 2) interrupt+0x208 ARG0 n/a ARG1 0'0113c470 ARG2-ARG7 n/a lev 3) $ihndlr_rtn+0x0 ARG0-ARG7 n/a lev 4) scb_dequeue+0x40 ARG0 0'492ca818 ARG1 0'00000001 ARG2-ARG7 n/a lev 5) fcpbh_kick_start+0x48 ARG0-ARG7 n/a lev 6) fcpbh_cont_login_after_adisc+0x5b0 ARG0 0'41b7c000 ARG1 n/a ARG2 0'4025e200 ARG3-ARG7 n/a lev 7) fcpbh_adisc_cbfn+0x770 ARG0-ARG7 n/a lev 8) fcpbh_rcv_completer+0xe98 ARG0-ARG7 n/a lev 9) fcT1_process_els_frame+0x208 ARG0 0'40fa0000 ARG1 0'48fb7d40 ARG2 0'40f9d360 ARG3 0'00000004 ARG4-ARG7 n/a lev 10) fcT1_process_read_pkt+0x12d8 ARG0 0'40fa0000 ARG1 0'00000004 ARG2 0'48fb7d40 ARG3 0'48fb7d40 ARG4 0'40f9d360 ARG5-ARG7 n/a lev 11) fcT1_isr+0x60c ARG0-ARG7 n/a lev 12) mp_ext_interrupt+0x2ec ARG0 0'0113b000 ARG1-ARG7 n/a lev 13) ivti_patch_to_nop3+0x0 ARG0-ARG7 n/a (7) JAGad56591/8606187384 "fcmsutil" man page doesn't include the Tachyon TL cards A6684A, A6685A even though the "fcmsutil" command can be used for these cards. (8) JAGad50173/8606180952 Tape access failure (I/Os in progress failed) when an "ioscan" is run during which a tape backup in progress. PHKL_23045: Ported HP-UX patch PHKL_21834 to VVOS Based on HP-UX patch PHKL_21834: (1) JAGaa92689/8606160434 Corruption was detected on a transformer device configured with 60 disks while running an HP internal test program in a multi-initiator enviroment. (2) JAGab69015/8606102940 System panic caused by data page fault. Stack trace: panic+0x14 report_trap_or_int_and_panic+0x80 interrupt+0x1d4 $ihndlr_rtn+0x0 fcpbh_act_dequeue+0x14 fcpbh_scsi_comp+0x120 fcpbh_fcp_cbfn+0x14c fcpbh_rcv_completer+0x108 fcT1_isr+0x84c mp_ext_interrupt+0x34c ivti_patch_to_nop3+0x0 idle+0x54c swidle_exit+0x0 (3) JAGac40831/8606125441 System panic caused by assertion failure. Stack trace: panic+0x14 assfail+0x30 _assfail+0x2c fcpbh_act_dequeue+0xb8 fcpbh_scsi_comp+0x10c fcpbh_fcp_cbfn+0x294 fcpbh_rcv_completer+0xa10 fcT1_isr+0x5b8 mp_ext_interrupt+0x358 ivti_patch_to_nop3+0x0 spinunlock+0x44 b_vsema+0xf4 vhand_vfdcheck_4k+0x194 vhand_vfdcheck+0xc8 for_val3+0x78 for_val2+0x168 foreach_valid+0xc0 agepages+0x1d0 vhand_core+0x56c vhand_global_pager+0x144 vhand+0x1b0 im_vhand+0xd8 DoCalllist+0x3c main+0x24 $vstart+0x34 $locore+0x90 (4) JAGad00807/8606131657 Possible system panic due to unmapping of memory which has not been mapped. (5) JAGad01417/8606132268 Possible system panic or corruption due to mapping failure. No stack trace available. (6) JAGad01418/8606132269 The fiber channel driver internal trace sometimes generates incorrect entries. (7) JAGad02280/8606133133 System panic caused by assertion failure. Stack trace is as follows: panic+0x10 assfail+0x30 _assfail+0x2c fcpbh_map_data+0xa38 fcpbh_res_acquire+0x380 fcpbh_res_queue+0x204 fcpbh_scsi_start+0x1a4 fcparray_start+0x268 scsi_start_bus_locked+0x750 scsi_start+0xc0 scsi_strategy_real+0x4d0 ioforw_sched+0x614 scsi_strategy+0x104 vx_dev_strategy+0x318 vx_flush_chain+0x24c vx_vnode_flush+0x228 vx_do_putpage+0x234 vx_write_flush+0x70 vx_write_default+0x2bc vx_write1+0xcdc vx_rdwr+0x1c0 vno_rw+0xbc 4_2dfb_cl_rwuio+0x230 write+0x84 syscall+0x56c $syscallrtn+0x0 (8) JAGad11001/8606141638 System panic caused by assertion failure. The stack trace is as follows: panic+0x14 assfail+0x3c _assfail+0x2c fcpbh_fcp_cbfn+0xa0 fcpbh_rcv_completer+0x17fc fcT1_process_read_pkt+0x1310 fcT1_nonperf_isr+0x174 fcT1_isr+0x798 mp_ext_interrupt+0x378 ivti_patch_to_nop3+0x0 spinunlock+0x48 getblk1+0x224 bread1+0xa4 bread+0x14 blkatoff+0x140 dirlook_loop+0x140 dirlook+0xe8 sdo_lookup+0xcc ufs_lookup+0x28 lookuppn+0x538 vn_create+0xb8 mkdir+0x80 syscall+0x62c $syscallrtn+0x0 (9) JAGad11250/8606141896 Some i/o requests to the fiber channel driver may hang when the interface cable is disconnected from the fiber channel host bus adapter. (10) JAGad33045/8606163741 Possible system panic while freeing a free mbuf. Stack trace: panic+0x14 m_free+0x3c0 m_freem+0x14 fcT1_reset_clean_and_reprogram+0xe34 invoke_callouts_for_self+0xc0 sw_service+0xb0 mp_ext_interrupt+0x144 ivti_patch_to_nop3+0x0 (11) JAGad34888/8606165597 Possible channel errors while processing many active I/Os by the driver. (12) JAGad34891/8606165600 Possible system panic due to channel error. Stack trace: panic+0x14 fcT1_isr+0xc8 epic_isr+0x58 mp_ext_interrupt+0x34c ivti_patch_to_nop3+0x0 scsi_strategy_real+0x5f8 ioforw_int+0xd8 mp_ext_interrupt+0x144 ivti_patch_to_nop3+0x0 idle+0x4f8 swidle_exit+0x0 Based on HP-UX patch PHKL_21381: (1) JAGad03305/8606134165 Enhancement request to enable Fabric support for Tachyon TL A5158A card. (2) JAGad02946/8606133802 Enhancement request to add Fabric related command options to fcmsutil. Based on HP-UX patch PHKL_21000: (1) JAGab75432/8606106386 System panic. (2) JAGab79008/8606108561 System panic with data page fault in fcpbh_xmit_completer(). (3) JAGac39376/8606124016 An HP Hazard c16 test program reported DIOC_RSTCLR messages. BDR's issued to the fibre channel devices were not completing successfully. (4) JAGac40135/8606124743 Fcmsutil does not display 'Elastic Store Errors'. (5) JAGac56862/8606126297 The link was constantly activated and de-activated on a V2200 system. Thousands of elastic store errors were recorded. The system finally paniced with data page fault in scb_dequeue. The stack trace showed that the panic occured in scb_dequeue() invoked from fcpbh_act_dequeue(). Based on HP-UX patch PHKL_20207: (1) JAGab82322/8606109622 Enhancement Request to add support for TACHYON TL A5158A card. (2) JAGab82817/8606110114 System panic while running HP System Reliability Test Suites. Stack trace includes fiber channel module fcT1_reset_clean_and_reprogram. (3) JAGab84453/8606112165 When using a Tachlite card a control node is created and left unclaimed. Based on HP-UX patch PHKL_19416: Reduces the number of open failures. PHKL_19143: Ported HP-UX patch PHKL_19124 to VVOS Based on HP-UX patch PHKL_19124: System hang. Based on HP-UX patch PHSS_18652: Disks fail to show up in ioscan, results in LVM activation errors. T600 Machines experience process timeouts. Based on HP-UX patch PHKL_18232: Unpredictable behavior on a host system may result due to a insufficient time interval between consecutive PIO write operations. If the interval is too short, the PCI host bus adapter may be improperly reset, and behave erratically. Based on HP-UX patch PHSS_18136: LVM activation errors have been seen on Model 12H arrays connected via a SCSI Mux and EMC arrays connected via Fibre Channel. A code change has been made in the FCMS driver to increase the robustness of the driver during boot-up. In disaster recovery configurations, problems have been seen when failing over from one system to another. The disk I/O timeout has been limited to 10 seconds for LVM I/Os and I/Os meant for block-special devices. The I/O error recovery time has been cut down by potentially more than a second. I/O hangs can occur on a loop where multiple back to back LIPs are seen. Based on HP-UX patch PHSS_17199: Channel error in FCMS driver. Based on HP-UX patch PHSS_17108: Short-term resolution for interface chip parity errors. Based on HP-UX patch PHSS_16824: The function 'add_to_sys' is missing from the postinstall script. The fcms drivers do not get installed on the system. Based on HP-UX patch PHSS_16128: Host system hangs while running Logical Volume Manager testing. The LVM layer is not informed of the switchover to the alternate link and the system hangs. Based on HP-UX patch PHSS_16001: Host system panics due to the topology toggling. The FCMS Driver is trying to come up in a Loop topology and the Host system is trying to come up in a point-to-point topology. Based on HP-UX patch PHSS_15946: FW update utility fails for the FCMS Host Bus Adapter card A3404A. Based on HP-UX patch PHSS_15381: Ioscan fails to find the fibre channel disk array (A3661A)in a FCMS configuration. Based on HP-UX patch PHSS_14652: Files fcp_cdio.h,fcp_ioctls.h and fcp_ctrl.c are necessary for Diagnostic IOCTL functionality. Based on HP-UX patch PHSS_14241: The file fcms.o needed for debugging FCMS problems needs to be integrated into the libhp-ux.a file. Based on HP-UX patch PHSS_13495: Re-enablement of pre-fetch after PCI firmware defect workaround identified and implemented on V-class. Defect Description: PHKL_25053: Ported HP-UX patch PHKL_23939 to VVOS Based on HP-UX patch PHKL_23939: (1) JAGab77626/8606107413 The bus driver called "fcparray" does not implement a policy of retry for the failed diskbdr's issued by the Hazard c16 test. "Target Reset command" issued by the FC interface driver can time out due to a frame being affected by a LIP or a frame being dropped due to inbound buffers of the chip being full. So, if there are no retries, the DIOC_RSTCLR error will be reported by the scsi services when Hazard c16 is run. Resolution: A code change has been made to accomodate the retries in the fcparray which will help resolve the DIOC_RSTCLR errors. (2) JAGad08849/8606139546 The problem was seen as a panic on a bad fc card and was notreproducible after replacement with a new FC card. There are concurrency issues if the interrupt running function fcT1_awaiting_acc_to_els_timer() and function fcT1_clean_everything() are executing at the same time. The concurrency issues must be resolved with the internal locking mechanism. Resolution In function fcT1_clean_everything(), fcT1_free_outb_esb_timer() and fcT1_awaiting_acc_to_els_timer()the concurrency issues are resolved by using appropriate locks. (3) JAGac29405/8606114642 The assertion failure is due to the call to POP_CBFN in fcpbh_fcp_cbfn routine. The code executes the POP_CBFN path when there are out of order completion messages. The callback function stack will be empty at the time driver attempts to POP an element from the above mentioned stack. This results in a stack under flow and hence the assertion failure. Resolution The macro in fcpbh_fcp_cbfn() that POPs an element from the call back function stack is removed. (4) JAGad32471/8606163155 In the present Tachyon driver the BDR_IN_PROGRESS flag is set in fcpbh_target_reset before acquiring resources for bdr scb. In a corner case scenario the flag setting causes I/Os to hang indefinitely in the driver when the bdr scb is stuck on a resource queue. Resolution Code has been changed to set the BDR_IN_PROGRESS flag in fcpbh_send_odb after acquiring all the resources for bdr scb,to prevent hung I/O situation in the fc Tachyon driver. (5) JAGac59739/8606126898 The spinlock being held too long causes the system to panic. Resolution The spinlock problem was seen along the reset path and clean_everything.So with the fixes for JAGad08849 going in additional spinlock code was added in the timers path. So code fixes have gone into fcT1_awaiting_acc_to_els_timer(), making sure when accessing the login data structures the resource lock is in place, because there could a reset path in another processor which could be using the fcT1_clean_everything() path. (6) JAGad42201/8606172941 The problem occurs due to queue corruption of a adisc scb. As a result of the queue corruption, referencing an already dequeued adisc scb (null scb) causes the data page fault and eventually panic's the system. Resolution The fix will be to make checks on OCQ_Q, NPORT_WAIT_Q, and TEMP_Q and eventually handle the adisc scb (i.e. dequeue it properly) before its enqueued again into HP_OCQ_Q/OCQ_Q. (7) JAGad56591/8606187384 fcmsutil is not updated with the new cards. Resolution man page is updated with the new card names (8) JAGad50173/8606180952 The problem occurs due to an inquiry command being send to a tape (untagged device which can handle only one I/O at a time) while a backup operation was in progress. Inquiry command was sent since the user initiated a ioscan. Resolution The problem can be resolved by not sending the inquiry, if the device is currently used by more than one application. This is achieved by getting the required inquiry data from the previous ioscan. PHKL_23045: Ported HP-UX patch PHKL_21834 to VVOS Based on HP-UX patch PHKL_21834: (1) JAGaa92689/8606160434 If the i/o timer expires, the fiber channel driver starts logging out and releasing i/o resources without synchronizing with the xmit_completer(). During cleanup and while an i/o is still on the ocq, the driver fetches a reused iova and causes corruption to occur. Resolution: The timer is iqnored if any I/O is on the ocq_q. (2) JAGab69015/8606102940 The err_delay_q and the active_q became entangled. The scb that caused the panic was in the err_delay_q instead of the active_q. Its forward pointer was pointing to an scb in the err_delay_q and the backward pointer to an scb in the active_q. When the scb was queued on the err_delay_q it was not removed from the active_q which led to the queue entanglement. Resolution: Before calling fcpbh_fcp_comp(), the queue field in the scb is checked to make sure that it has been dequeued from the active_q. (3) JAGac40831/8606125441 The ocq_q in the fiber channel driver bacame corrupted after a link down and link up. At the head of the ocq_q was an scb with both the forward and the backward pointer set to null. An attempt to enque a new scb on this queue resulted in system panic. The linked list corruption is a consequence of a device violating the protocol. Resolution: The queue coruption is prevented by making a check weather the SCB is on the active_q or not, before dequeueing it from active_q. (4) JAGad00807/8606131657 This occurs when FC_MAP fails (for payload_iov), which almost never occurs on high end systems. The code which unmaps the unmapped area is located in fcT1_xmit.c in the routine fcT1_bld_oib_od(...). The edb_iov is unmapped if payload_iov map is failed. Previously only the FC_MALLOC is done for edb and no mapping is done. So unmapping here is a sign of inconsistency. Resolution: Removed the FC_UNMAP()for edb_iov and its respective trace FCTRACE_UNMAP. (5) JAGad01417/8606132268 If the mapping fails, then the sfsbq->m is not released. This could reflect wrong values in q4 analysis. The value of mbuf is not cleared in fcT1_replenish_sfsbq in case of a mapping failure. The mbuf is being freed but the value is not cleared from the fcp->sfsbq[].m. The fcp->sfsbq[rpi].m gets populated prior to FC_MAP for l_io_vec, hence the issue. Resolution: Moved the population for fcp->sfsbq[rpi].m and fcp->sfsbq[rpi].virtual_addr after the FC_MAP for l_io_vec. Hence, no need to clear in case the FC_MAP fails. A similar fix is done in the fcT1_replenish_mfsbq() routine as well. (6) JAGad01418/8606132269 The trace statement for tracing sfsbq rci, rpi and mfsbq indices in the routine fcT1_reset_clean_and_reprogram is doing a logical OR on the three values instead of a binary OR. Resolution: Modified the trace in fcT1_reset_clean_and_reprogram routine. (7) JAGad02280/8606133133 An assertion failure in Tachyon after receiving more than 8 AL-pairs. The limitation is set by a constant set to 64. Resolution: The constant 64 was changed to FCP_SDB_SIZE which has the SDB length of 128. This will allow processing of 512 kbytes of data and eliminate the 8 AL limitation. (8) JAGad11001/8606141638 An INBOUND_MFS_COMPLETION message triggered the assert. The fcpbh_fcp_cbfn callback function takes care of this message by pushing itself on the stack. The assert is not required. Resolution: The assert has been removed. (9) JAGad11250/8606141896 The fiber channel driver treats the cable disconnect as a link down and link failure event and initiates clean up of the I/O queues. The ocq_q is cleaned up before the the sest_inv_q. The I/Os on which an abort has been initiated just after the link failure and before the start of clean up are moved to the ocq_q via sest_inv_q when active_q is cleaned up. These I/Os remain on ocq_q because it does not get cleaned up again. Resolution: At the end of cleanup the ocq_q is cleaned up once more to eliminate any I/Os that were moved from the sest_inv_q. (10) JAGad33045/8606163741 The messages posted to IMQ by Tachyon are processed in fcT1_isr as well as well as fcT1_reset_clean_and_reprogram routine. These two routines can be active simultaneously and a message in IMQ can be processed by both the routines. Simultaneous scheduling of fcT1_reset_clean_and_reprogram and fcT1_isr must be eliminated to prevent panics due to data page fault, freeing a buffer twice etc. The fix for JAGab82817 still had a window during which both routines could be running. Resolution: Simultaneous execution of fcT1_isr and fcT1_reset_clean_and_reprogram has been eliminated in the driver by checking and setting a common flag. If fcT1_isr detects the flag as set then it returns from the isr without processing the IMQ. If fcT1_reset_clean_and_reprogram detects the flag as set then it reschedules itself and returns without processing the IMQ. (11) JAGad34888/8606165597 The driver's link down processing code checks BOOTUP_LOOPBACK flag before scheduling reset_clean_and_reprogram routine. When this flag is set the driver executes reset_clean_and_reprogram routine after a delay of 0.5secs. This delay is sufficient if the card is reset during bootup time, but 0.5 sec delay is insufficient for a card reset when the system is up and running with lot of active I/Os on the card. The BOOTUP_LOOPBACK flag is set only once and the very first execution of reset_clean_and_reprogram routine resets this flag. So, with the removal of card reset code during bootup time the driver executes reset_clean_repro... routine without sufficient delay on a first link down. Resolution: Removed the setting of BOOTUP_LOOPBACK flag in the claim routine. (12) JAGad34891/8606165600 Sequential driver writes to the Tachyon's OCQ producer index register are out of order. This creates a false OCQ full condition. When Tachyon detects the OCQ as full, it starts working on the ODBs that are in the OCQ. Since it is a false OCQ full condition, the ODBs picked up by Tachyon are invalid. When Tachyon acts on an invalid ODB, it results in a read channel context error. Resolution: Code has been added to read the Tachyon's OCQ producer index register after every write to this register, and before releasing the spinlock. This will ensure in-order completion of OCQ producer index updates by multiple driver threads. Memory for host indices is now allocated in the EPIC shared memory instead of host memory. Now Tachyon will update the OCQ consumer index in EPIC shared memory. Based on HP-UX patch PHKL_21381: (1) JAGad03305/8606134165 To enable fabric support for Tachyon TL A5158A card the FCP CDIO module has to support Fabric Scan. Resolution: The FCP CDIO module has been enhanced to do fabric scan. (2) JAGad02946/8606133802 For Fabric support with the Tachlite driver the fcmsutil(1M) needs fabric related command options added. The man pages for fcmsutil(1M) need to be modified to reflect the changes. Resolution: The fcmsutil has been changed and the man pages have been updated. Based on HP-UX patch PHKL_21000: (1) JAGab75432/8606106386 An incorrect offset was passed to the fcmsutil program. The program did not do any boundary checking on the arguments passed as an offset. As a result an unknown location was being acessed by the driver. Resolution: A change is made in the fcmsutil code to check for valid offsets. If a bad offset is received the correct address range is displayed. (2) JAGab79008/8606108561 Because of the tachyon card hardware problem, the contents of Outbound Descriptor Block is shifted by one word. The transaction id gets corrupted and this finally causes the system panic. Resolution: The driver code has been changed to detect corrupted transaction ID and reset the card to abort all the IO's. (3) JAGac39376/8606124016 The bus driver 'fcpdev.c' doesn't retry the BDR failures because of wrong retry time period used by it. The fcpdev does not convert FCPDEV_RETRY_PERIOD from 'Seconds' to 'Tick', which ultimately results in a too small retry period. Resolution: A fix has been made to convert seconds into ticks in the fcpdev driver. This results in bdr retries in case of failures. (4) JAGac40135/8606124743 The Elastic Store Error count is not made available to the application. Resolution: A code change has been made in the driver to pass the elastic store error count to the application. (5) JAGac56862/8606126297 This problem happens due to link instability. In this particular scenario, the driver sends adisc (address discovery) to the device but before the device could send any response the link goes down and comes up again. The device misbehaves and sends completion of the 'adisc' and thus violates the protocol. This causes the 'scb' queues mix-up and finally leads to system panic when the driver calls scb_dequeue to dequeue the scb. Resolution: The driver code has been changed to check the hpcq_q ( High priority commond queue) for each completion message. If the scb (scsi control block) for the current transaction is found on the 'hpcq_q', it is removed from hpcq_q and then it is enqueued on the active queue (act_q). This puts the 'scb' on the correct queue and eliminates the system panic. Based on HP-UX patch PHKL_20207: (1) JAGab82322/8606109622 The fcmsutil utility needs be enhanced to execute the tdutil (the diagnostic utility for TACHLITE) when the device file corresponding to a TACHYON TL A5158A card is specified. Resolution The fcmsutil program has been changed to execute (exec) the tdutil. (2) JAGab82817/8606110114 The problem occurs when the isr for the card (fcT1_isr) and fcT1_reset_clean_and_reprogram occurr at the same time and both of them start running on two different processors simultaneously. Since neither of them is holding any locks, both start executing. In the execution path after reading the imq_ci both routines try to process the same imq enteries and hence the same mbufs. As a result the smae mbuf is freed up twice. Resolution This problem is resolved by checking if the state (CI_CHANGING) is set in the fcT1_reset_clean_and_reprogram routine. If set then an isr is being processed and the fcT1_reset_clean_and_reprogram processing is postponed. This prevents both routines from being scheduled for processing at the same time. (3) JAGab84453/8606112165 The FCMS driver for tachyon card creates the FCP(0.8) and control (0.5) nodes for all FC adapters. For Tachlite cards there is no control driver and hence the control node is not needed and will be left unclaimed. Resolution A change has been made in the FCMS driver's scan routine to check for the presence of Tachlite card. If this card is installed then the control node will not be created by the FCMS driver. Based on HP-UX patch PHKL_19416: While running io with aborts, a large (100+ per hour) number of open device failures were occuring. Resolution: Bounds checking for a non-assisted OXID was introduced in 990P to the driver. In one abort case, the checking the lower bound was not valid. The lower bound check was removed. PHKL_19143: Ported HP-UX patch PHKL_19124 to VVOS Based on HP-UX patch PHKL_19124: V2500 hang. Several processors have hung processes waiting for sched_lock which is not owned. FC30 (A3661A) "loses credit" problems. Resolution: After detecting several interupts with no i/o progress a Tachyon reset is issued. Based on HP-UX patch PHSS_18652: A cache was not being properly flushed for the T-Class. Resolution: Flush the cache. Based on HP-UX patch PHKL_18232: Consecutive PIO Write operations are being issued to the PCI adapter without a sufficient time interval. If the interval is too short, the host bus adapter may be improperly reset and behave erratically. Resolution: Force the proper time interval between PIO writes during reset; a PIO read which guarrantees completion of prior PIO writes. Based on HP-UX patch PHSS_18136: Increased the robustness of the FCMS driver to safeguard against random link errors during boot-up. Resolution: Driver change to make the discovery process more robust, which essentially involves doing two retries for the first open attempt as part of the boot-up ioscan. Based on HP-UX patch PHSS_17199: Channel error in FCMS Driver. Based on HP-UX patch PHSS_17108: Short-term resolution for interface chip parity errors. Based on HP-UX patch PHSS_16824: The function 'add_to_sys' is missing from the postinstall script. The fcms drivers do not get installed on the system. Based on HP-UX patch PHSS_16128: The host system K570 hangs while doing failover testing with LVM. The LVM layer is not informed of the switch to the alternate path. Based on HP-UX patch PHSS_16001: The FCMS driver was doing a automatic topology discovery. However, due to unhealthy link condition, after a link down the link then comes up in point 2 point, which causes all the associated previous loop related resources to be released. The link doesn't stay in Pt2Pt, it bounces back to loop. Then the system panics while trying to access some strucutres that have been released. Based on HP-UX patch PHSS_15946: STM (the FW update utility) fails for the FCMS Host Bus Adapter card A3404A. Based on HP-UX patch PHSS_15381: Ioscan fails to discover FCMS devices in a FCMS configuration. Based on HP-UX patch PHSS_14652: To enable entry points for the FCMS driver, the files fcp_cdio.h,fcp_ioctls.h and fcp_ctrl.c are needed. Based on HP-UX patch PHSS_14241: For debugging problems in the field, the file fcms.o needs to be included in libhp-ux.a for the FCMS Driver. A previous patch did not include this file thus causing the version of fcms.o delivered with 11.LR and the version delivered with patch PHSS_13495 to be out of sync. Based on HP-UX patch PHSS_13495: FCMS Driver workaround for V-class PCI firmware defect. SR: 8606107413 8606139546 8606114642 8606163155 8606126898 8606172941 8606187384 8606180952 8606165600 8606165597 8606163741 8606160434 8606141896 8606141638 8606134165 8606133802 8606133133 8606132269 8606132268 8606131657 8606126297 8606125441 8606124743 8606124016 8606112165 8606110114 8606109622 8606108561 8606106386 8606102940 4701423889 1653281956 Patch Files: OS-Core.CORE2-KRN,fr=B.11.04,fa=HP-UX_B.11.04_32,v=HP: /usr/conf/lib/libhp-ux.a(fcms.o) OS-Core.CORE2-KRN,fr=B.11.04,fa=HP-UX_B.11.04_64,v=HP: /usr/conf/lib/libhp-ux.a(fcms.o) FCMassStorage.FCMS-ENG-A-MAN,fr=B.11.04, fa=HP-UX_B.11.04_32/64,v=HP: /usr/share/man/man1m.Z/fcmsutil.1m FCMassStorage.FCMS-KRN,fr=B.11.04,fa=HP-UX_B.11.04_32,v=HP: /usr/conf/master.d/fcms /usr/conf/space.h.d/fcms.h /usr/conf/lib/libfcms.a FCMassStorage.FCMS-KRN,fr=B.11.04,fa=HP-UX_B.11.04_64,v=HP: /usr/conf/master.d/fcms /usr/conf/space.h.d/fcms.h /usr/conf/lib/libfcms.a FCMassStorage.FCMS-RUN,fr=B.11.04,fa=HP-UX_B.11.04_32/64, v=HP: /opt/fcms/bin/fcmsutil VirtualVaultOS.VVOS-AUX-IA,fr=B.11.04, fa=HP-UX_B.11.04_32/64,v=HP: /etc/auth/system/files.fcdb/05.patches/19143_PHKL.fcdb what(1) Output: OS-Core.CORE2-KRN,fr=B.11.04,fa=HP-UX_B.11.04_32,v=HP: /usr/conf/lib/libhp-ux.a(fcms.o): None OS-Core.CORE2-KRN,fr=B.11.04,fa=HP-UX_B.11.04_64,v=HP: /usr/conf/lib/libhp-ux.a(fcms.o): None FCMassStorage.FCMS-ENG-A-MAN,fr=B.11.04, fa=HP-UX_B.11.04_32/64,v=HP: /usr/share/man/man1m.Z/fcmsutil.1m: None FCMassStorage.FCMS-KRN,fr=B.11.04,fa=HP-UX_B.11.04_32,v=HP: /usr/conf/master.d/fcms: None FCMassStorage.FCMS-KRN,fr=B.11.04,fa=HP-UX_B.11.04_32,v=HP: /usr/conf/space.h.d/fcms.h: None FCMassStorage.FCMS-KRN,fr=B.11.04,fa=HP-UX_B.11.04_32,v=HP: /usr/conf/lib/libfcms.a: $Source: kern/wsio/fcT1_ctrl.c, hpuxsysio, vvos_rose , rose0198 $ $Date: 01/01/15 01:42:36 $ $Rev ision: 1.7 PATCH_11.04 (PHKL_23045) $ libfcms.a $Date: 2000/10/18 17:32:48 $Revision: PATC H_11.00 (PHKL_21834) FCMassStorage.FCMS-KRN,fr=B.11.04,fa=HP-UX_B.11.04_64,v=HP: /usr/conf/master.d/fcms: None FCMassStorage.FCMS-KRN,fr=B.11.04,fa=HP-UX_B.11.04_64,v=HP: /usr/conf/space.h.d/fcms.h: None FCMassStorage.FCMS-KRN,fr=B.11.04,fa=HP-UX_B.11.04_64,v=HP: /usr/conf/lib/libfcms.a: $Source: kern/wsio/fcT1_ctrl.c, hpuxsysio, vvos_rose , rose0198 $ $Date: 01/01/15 01:42:36 $ $Rev ision: 1.7 PATCH_11.04 (PHKL_23045) $ libfcms.a $Date: 2000/10/18 17:32:48 $Revision: PATC H_11.00 (PHKL_21834) FCMassStorage.FCMS-RUN,fr=B.11.04,fa=HP-UX_B.11.04_32/64, v=HP: /opt/fcms/bin/fcmsutil: $Source: net/lanlink/LAN/fcmsutil/fcmsutil.c, hpuxcm dnet, vvos_rose, rose0198 $ $Date: 01/01/15 02:08:08 $ $Revision: 1.5 PATCH_11.04 (PHKL_ 23045) $ fcmsutil : Version: B.11.00.13 $Revision: Hewlett-Packard ISSL Level vvos_rose42 $ $Header: Hewlett-Packard ISSL Release vvos_r ose $ $Date: Tue Oct 9 09:50:17 EDT 2001 $ VirtualVaultOS.VVOS-AUX-IA,fr=B.11.04, fa=HP-UX_B.11.04_32/64,v=HP: /etc/auth/system/files.fcdb/05.patches/19143_PHKL.fcdb: $Revision: Hewlett-Packard ISSL 1.2 etc/auth/system/ files.fcdb/05.patches/19143_PHKL.fcdb, hpuxp atch, vvos_rose $ $Date: 99/08/27 18:34:46 $ $Source: etc/auth/system/files.fcdb/05.patches/19143 _PHKL.fcdb, hpuxpatch, vvos_rose $ $Date: 99 /08/27 18:34:46 $ $Revision: 1.2 PATCH_11.04 (PHKL_19143) $ cksum(1) Output: OS-Core.CORE2-KRN,fr=B.11.04,fa=HP-UX_B.11.04_32,v=HP: 267923401 249856 /usr/conf/lib/libhp-ux.a(fcms.o) OS-Core.CORE2-KRN,fr=B.11.04,fa=HP-UX_B.11.04_64,v=HP: 2756968825 309784 /usr/conf/lib/libhp-ux.a(fcms.o) FCMassStorage.FCMS-ENG-A-MAN,fr=B.11.04, fa=HP-UX_B.11.04_32/64,v=HP: 1008881678 7584 /usr/share/man/man1m.Z/fcmsutil.1m FCMassStorage.FCMS-KRN,fr=B.11.04,fa=HP-UX_B.11.04_32,v=HP: 1691551532 5828 /usr/conf/master.d/fcms FCMassStorage.FCMS-KRN,fr=B.11.04,fa=HP-UX_B.11.04_32,v=HP: 1085605995 364 /usr/conf/space.h.d/fcms.h FCMassStorage.FCMS-KRN,fr=B.11.04,fa=HP-UX_B.11.04_32,v=HP: 1725801327 594058 /usr/conf/lib/libfcms.a FCMassStorage.FCMS-KRN,fr=B.11.04,fa=HP-UX_B.11.04_64,v=HP: 1691551532 5828 /usr/conf/master.d/fcms FCMassStorage.FCMS-KRN,fr=B.11.04,fa=HP-UX_B.11.04_64,v=HP: 1085605995 364 /usr/conf/space.h.d/fcms.h FCMassStorage.FCMS-KRN,fr=B.11.04,fa=HP-UX_B.11.04_64,v=HP: 2235783167 1077888 /usr/conf/lib/libfcms.a FCMassStorage.FCMS-RUN,fr=B.11.04,fa=HP-UX_B.11.04_32/64, v=HP: 3056717395 69632 /opt/fcms/bin/fcmsutil VirtualVaultOS.VVOS-AUX-IA,fr=B.11.04, fa=HP-UX_B.11.04_32/64,v=HP: 1415393186 608 /etc/auth/system/files.fcdb/05.patches/ 19143_PHKL.fcdb Patch Conflicts: None Patch Dependencies: s700: 11.04: PHKL_19142 s800: 11.04: PHKL_19142 Hardware Dependencies: None Other Dependencies: None Supersedes: PHKL_19143 PHKL_23045 Equivalent Patches: PHKL_23939: s700: 11.00 s800: 11.00 Patch Package Size: 2360 KBytes Installation Instructions: Please review all instructions and the Hewlett-Packard SupportLine User Guide or your Hewlett-Packard support terms and conditions for precautions, scope of license, restrictions, and, limitation of liability and warranties, before installing this patch. ------------------------------------------------------------ 1. Back up your system before installing a patch. 2. Login as root. 3. Copy the patch to the /tmp directory. 4. Move to the /tmp directory and unshar the patch: cd /tmp sh PHKL_25053 5. Run swinstall to install the patch: swinstall -x autoreboot=true -x patch_match_target=true \ -s /tmp/PHKL_25053.depot By default swinstall will archive the original software in /var/adm/sw/save/PHKL_25053. If you do not wish to retain a copy of the original software, use the patch_save_files option: swinstall -x autoreboot=true -x patch_match_target=true \ -x patch_save_files=false -s /tmp/PHKL_25053.depot WARNING: If patch_save_files is false when a patch is installed, the patch cannot be deinstalled. Please be careful when using this feature. For future reference, the contents of the PHKL_25053.text file is available in the product readme: swlist -l product -a readme -d @ /tmp/PHKL_25053.depot To put this patch on a magnetic tape and install from the tape drive, use the command: dd if=/tmp/PHKL_25053.depot of=/dev/rmt/0m bs=2k Special Installation Instructions: ALERT messagess One of the major changes made to the Fibre Channel driver in PHKL_25053 and carried forward in PHKL_25053 is that the HBA is reset when the system is booted. This reset causes link errors which result in the ALERT messages seen at boot time. Therefore ALERT messages at boot time should not be considered a problem. ALERT messages seen after the boot process should be considered a possible problem and need to be investigated.