Patch Name: PHKL_30508 Patch Description: s700_800 11.00 SCSI IO Subsystem Cumulative Patch Creation Date: 04/05/18 Post Date: 04/06/07 Hardware Platforms - OS Releases: s700: 11.00 s800: 11.00 Products: N/A Filesets: OS-Core.ADMN-ENG-A-MAN,fr=B.11.00,fa=HP-UX_B.11.00_32/64,v=HP OS-Core.CORE-KRN,fr=B.11.00,fa=HP-UX_B.11.00_32/64,v=HP ProgSupport.C-INC,fr=B.11.00,fa=HP-UX_B.11.00_32/64,v=HP OS-Core.CORE2-KRN,fr=B.11.00,fa=HP-UX_B.11.00_32,v=HP OS-Core.KERN2-RUN,fr=B.11.00,fa=HP-UX_B.11.00_32,v=HP SCSI-Passthru.SPT2-DVR,fr=B.11.00,fa=HP-UX_B.11.00_32,v=HP OS-Core.CORE2-KRN,fr=B.11.00,fa=HP-UX_B.11.00_64,v=HP OS-Core.KERN2-RUN,fr=B.11.00,fa=HP-UX_B.11.00_64,v=HP SCSI-Passthru.SPT2-DVR,fr=B.11.00,fa=HP-UX_B.11.00_64,v=HP Automatic Reboot?: Yes Status: General Release Critical: No (superseded patches were critical) PHKL_29834: PANIC HANG PHKL_29049: PANIC PHKL_29041: HANG PHKL_28496: PANIC CORRUPTION PHKL_28131: PANIC MEMORY_LEAK HANG PHKL_27003: OTHER HANG Please see the defect description for details PHKL_26452: PANIC HANG PHKL_25675: PANIC HANG PHKL_24004: PANIC HANG OTHER Update 11.00 to 11.11 fails. PHKL_23281: PANIC HANG PHKL_22941: HANG OTHER See description for details on enhancements PHKL_22759: OTHER This patch replaces PHKL_22460 which addressed PANIC HANG and MEMORY_LEAK symptoms. PHKL_22460: PANIC HANG MEMORY_LEAK PHKL_21607: PANIC HANG PHKL_21504: PANIC PHKL_20688: HANG PHKL_20629: PANIC HANG PHKL_20452: PANIC HANG ABORT CORRUPTION MEMORY_LEAK OTHER See list of Defect Symptoms for details. PHKL_20208: PANIC PHKL_19245: PANIC PHKL_17333: HANG PHKL_13371: PANIC PHKL_14688: PANIC Category Tags: defect_repair hardware_enablement enhancement general_release critical panic halts_system corruption memory_leak Path Name: /hp-ux_patches/s700_800/11.X/PHKL_30508 Symptoms: PHKL_30508: ( SR:8606349130 CR:JAGaf09949 ) If a system with mirrored VxVM root disk configuration is powered off and on, on booting up from the primary disk, the secondary disk is shown in the "failed" state. # vxdisk -g rootdg list DEVICE TYPE DISK GROUP STATUS c0t6d0 simple rootdisk01 rootdg online - - rootmirror rootdg failed ( SR:8606344298 CR:JAGaf05149 ) open(2) command sequence takes a long time for SPC-2 compliant devices. ( SR:8606351535 CR:JAGaf12340 ) open(2) on the device node corresponding to the SCSI initiator ID of the card fails. PHKL_29834: ( SR:8606322906 CR:JAGae85372 ) When an external single-ended wide SCSI disk is connected to the narrow 50 pin SCSI connector on a C3750 model workstation, the following unexpected behavior may be observed : - Hang during I/O request to disk - Disk not accessible - diskinfo(1M) showing incorrect output - ioscan(1M) showing invalid description strings ( SR:8606295123 CR:JAGae58817 ) System panics with the following stack trace when an asynchronous write request fails. panic+0x14 assfail+0x3c scsi_iodone_error+0x1f0 scsi_iodone+0x4ec scsi_cbfn+0x750 scsi_fast_cbfn+0x26c c720_call_cbfns+0x9c c720_invalid_req_done+0x1a4 invoke_callouts_for_self+0x19c sw_service+0xb4 mp_ext_interrupt+0x330 ivti_patch_to_nop3+0x0 sul_pcxu_stop_here+0x0 spinunlock+0x44 idle+0x9f0 swidle_exit+0x0 ( SR:8606335728 CR:JAGae96782 ) System performance monitoring tools such as glance(1)/gpm(1) show invalid values for byte count disk metrics. PHKL_29364: ( SR:8606299275 CR:JAGae62769 ) The logging subsystem does not display the recovered error events from disks. PHKL_29049: ( SR:8606304724 CR:JAGae68058 ) I/O requests to disk device is slow and sar(1M) shows large avwait and avque values. ( SR:8606266268 CR:JAGae30517 ) The system panics with the following stack trace, ... ... scsi_frequency+0x1ac scsi_ioctl+0x1344 sdisk_ioctl+0x1c spec_ioctl+0x168 vno_ioctl+0x88 ioctl+0x108 syscall+0x1bc $syscallrtn+0x0 ... ... ( SR:8606304019 CR:JAGae67368 ) SIOC get/set ioctls will not function properly with the SCSI interface card supporting Ultra320 speed. PHKL_29041: ( SR:8606286789 CR:JAGae50728 ) Wrong block device activity data is reported by sar(1M) when the disks have failed operations. This resulted in sar -d displaying wrong values of avque. ( SR:8606298657 CR:JAGae62156 ) Application hang due to error returned by the SCSI driver. PHKL_28496: ( SR:8606230478 CR:JAGad99528 ) The system may panic on select timeout with data page fault. ( SR:8606226043 CR:JAGad95114 ) Data integrity issues or HPMC with Channel B of A5159A and Core I/O FWD SCSI HBA on the following systems: rp24xx (A-class), rp54xx (L-class), rp7400 (N-class). Description field in ioscan output for affected Core I/O FWD SCSI cards will contain string 'C875'. ( SR:8606286272 CR:JAGae50215 ) SCSI controllers with 896 chip(revision 4) under certain circumstances may send wrong data on the SCSI bus after a bus reset. ( SR:8606135832 CR:JAGad04964 ) Corruption of 16 byte CDB command. ( SR:8606289589 CR:JAGae53519 ) Panic in SCSI stack with the following trace: crash event was a panic panic+0x14 wait_for_lock+0x2cc call_wait_for_lock+0x20 scsi_start+0x50 scsi_free_scb+0xac scsi_strategy_real+0xcd4 ioforw_sched+0xa4 scsi_cmd+0x3a4 scsi_probe+0x444 parallel_scsi_probe+0x1b4 wsio_probe+0xe0 wsio_find_it+0x34 wsio_scan+0x70 gio_scan_subtree+0x188 gio_scan_subtree+0x1c4 gio_scan_subtree+0x1c4 gio_scan_subtree+0x1c4 gio_scan_subtree+0x1c4 gio_scan_subtree+0x1c4 io_scan+0x9c do_io_scan+0x48 dev_config_ioctl+0xd8 spubind_cdev_ioctl+0x94 spec_ioctl+0xac vno_ioctl+0x90 ioctl+0x1f4 syscall+0x28c $syscallrtn+0x0 ( SR:8606293572 CR:JAGae57320 ) The open count of the non boot device always remains greater than one. This may result in failure of the operations which expect to do a first open on the device. PHKL_28131: ( SR:8606225743 CR:JAGad94816 ) The system panics with the following stack trace when an odd-byte-aligned buffer and a read/write request of size greater than 64K bytes, is issued to a raw disk device connected to an HP Precision Bus (HP-PB) fast-wide SCSI interface. panic+0x14 invade_other_pools+0xb0 alloc_from_pool+0x80 io_send+0x58 d30_power_on_reply+0x8c disc30+0x4ec io_send+0x2a8 s3_send_power_on_req+0x3c s3_pass_power_on+0x240 scsi3+0x364 io_send+0x124 d30_send_scsi_io_req+0x2dc d30_rdwr+0x1cc disc30+0x180 io_send+0x124 d3startio+0x6ac d3startreq+0x90 d3pwrfail+0xc8 d3llioerr+0x3f4 disc3+0x194 io_send+0x2a8 disc3_strategy_real+0x22c ioforw_int+0xd8 mp_ext_interrupt+0x6c ivti_patch_to_nop3+0x0. d3pwrfail+0xc8 d3llioerr+0x3f4 disc3+0x194 io_send+0x2a8 disc3_strategy_real+0x22c ioforw_int+0xd8 mp_ext_interrupt+0x6c ivti_patch_to_nop3+0x0 ( SR:8606165403 CR:JAGad34697 ) During open, the LUN ownership is getting changed thereby causing the LUN to be bound to the current controller. ( SR:8606242143 CR:JAGae09397 ) The system may experience intermittent bus hangs followed by resets on the ports of the A5159A card and Core I/O FWD SCSI HBA on the following systems: rp24xx (A-class), rp54xx (L-class), rp7400 (N-class), when connected to a disk enclosure. ( SR:8606265990 CR:JAGae30243 ) I/O hang due to a pending I/O request in the lun disk queue. ( SR:8606165305 CR:JAGad34599 ) On systems with HP-PB backplanes, the SIOC_IO ioctl returns incorrect data when the buffer passed is a kernel buffer. ( SR:8606241873 CR:JAGae09130 ) The ioscan may hang and on the following reboot the system panics with a stack trace that is not consistent. ( SR:8606177456 CR:JAGad46688 ) The driver issued the Synchronize cache command during close for FC60 devices. This caused the FC60 device to trigger Auto LUN Transfer (ALT) and automatically change LUN-controller ownership to the path that is currently being used. ( SR:8606282310 CR:JAGae46262 ) An incorrect debug assert statement in disk driver causes a panic when a Check Condition with Medium Change sense occurs. ( SR:8606264850 CR:JAGae29181 ) An open() with the O_NDELAY flag takes too long when no CD is present in the DVD/CD-ROM drive. ( SR:8606238711 CR:JAGae07734 ) LVM is not switching to an available alternate path inspite of the SCSI driver returning an error. ( SR:8606226361 CR:JAGad95431 ) Applications may hang due to incorrect SCSI error handling. ( SR:8606173887 CR:JAGad43140 ) Improper handling for error conditions by the SCSI disk driver in 11.00. There are various symptoms as described under the following CRs. ( SR:8606178041 CR:JAGad47268 ) "vgchange -a n " command hangs when the cable is disconnected on the alternate link, if immediate reporting (IR) is true. ( SR:8606167814 CR:JAGad37097 ) ioscan -fn command hangs when there is a bad disk present. ( SR:8606139670 CR:JAGad08981 ) The system panics when a certain type of SCSI error occurs while doing writes on hfs filesystem. The /var/adm/syslog/syslog.log reports Check Condition status with sense key: (03) Medium Error. PHKL_27003: ( SR:8606230706 CR:JAGad99756 ) When PHKL_22941 is installed and LVM is trying to switch from the primary path to an alternate path, the SCSI subsystem may report false read errors to LVM. ( SR:8606232873 CR:JAGae02101 ) Customer gets an I/O error on attempting to do more than one backup on tape after installing PHKL_22460. ( SR:8606244278 CR:JAGae10766 ) The LVM I/Os hang due to the disk-driver retrying the request forever, on getting a 'busy' status from disk. Due to this, it was not possible to login or do any work on the system. ( SR:8606249862 CR:JAGae16248 ) Some of the I/O requests may fail due to SCSI driver not performing the retry operation. PHKL_26452: ( SR:8606185203 CR:JAGad54405 ) System panics with a Data Page Fault when a read command is issued on a SCSI pass through driver and the read failed with a check condition on a deferred error : panic+0x14 report_trap_or_int_and_panic+0x80 interrupt+0x1d4 $ihndlr_rtn+0x0 b_pcxu_loop+0x58 privlbcopy+0x1c scsi_fix_alignment_done+0x44 scsi_iodone+0xd4 scsi_cbfn+0x4b0 fcpdev_scsi_comp+0x94 fcpbh_scsi_comp+0x2cc fcpbh_fcp_cbfn+0x14c fcpbh_rcv_completer+0x108 fcT1_isr+0x900 sapic_interrupt+0x2c mp_ext_interrupt+0x34c ivti_patch_to_nop3+0x0 ( SR:8606147432 CR:JAGad16775 ) On a K-Class system with no devices connected, if an inquiry request is issued using the SIOC_IO command on a SCSI pass through device (spt0) and an ioscan is also issued, the system panics with the following stack trace : panic+0x14 report_trap_or_int_and_panic+0x4c interrupt+0x1e8 $ihndlr_rtn+0x0 s3_chain_ios+0x864 s3_check_ioq+0x5c s3_io_request+0xac s3_probe_request+0x1ac s3_send_reply+0x234 s3_int_direct+0xe0 scsi3+0xf4 io_send+0x130 int_direct+0x74 mp_ext_interrupt+0x300 ivti_patch_to_nop3+0x0 idle+0x3bc swidle_exit+0x0 ( SR:8606225743 CR:JAGad94816 ) System panics with the following stack trace when an odd byte aligned buffer, or greater than 64K-1 byte, read/write request is issued to a raw disk device connected to an HP Precision Bus fast-wide SCSI interface. panic+0x14 invade_other_pools+0xb0 alloc_from_pool+0x80 io_send+0x58 d30_power_on_reply+0x8c disc30+0x4ec io_send+0x2a8 s3_send_power_on_req+0x3c s3_pass_power_on+0x240 scsi3+0x364 io_send+0x124 d30_send_scsi_io_req+0x2dc d30_rdwr+0x1cc disc30+0x180 io_send+0x124 d3startio+0x6ac d3startreq+0x90 d3pwrfail+0xc8 d3llioerr+0x3f4 disc3+0x194 io_send+0x2a8 disc3_strategy_real+0x22c ioforw_int+0xd8 mp_ext_interrupt+0x6c ivti_patch_to_nop3+0x0 ( SR:8606216118 CR:JAGad85288 ) When the scsi bus is being opened and if an interrupt gets serviced at the same time, the system panics with the following stack trace : panic+0x14 report_trap_or_int_and_panic+0x84 interrupt+0x1d4 $ihndlr_rtn+0x0 c720_isr+0x890 sapic_interrupt+0x2c mp_ext_interrupt+0x318 ivti_patch_to_nop3+0x0 bz_pre_sl_loop+0x4 c720_if_bus_open+0x318 scsi_lun_open+0x12d4 sctl_open+0x24 scsi_probe+0x370 parallel_scsi_probe+0x1a8 wsio_probe+0xe0 wsio_find_it+0x34 wsio_scan+0x70 gio_scan_subtree+0x188 gio_scan_subtree+0x1c4 gio_scan_subtree+0x1c4 io_scan+0x9c do_io_scan+0x48 dev_config_ioctl+0xd8 spubind_cdev_ioctl+0x94 spec_ioctl+0xac vno_ioctl+0x90 ioctl+0x1f4 syscall+0x480 $syscallrtn+0x0 ( SR:8606223745 CR:JAGad92841 ) On workstation model C3700, the external narrow SCSI bus is set up incorrectly. The 'diskinfo' command returns invalid information and I/O's on this bus hang. PHKL_25938: ( SR:8606186960 CR:JAGad56170 ) Bogus error messages "SCSI: asense data-done lbolt:..." are displayed even if a device correctly returns sense data. This is seen with Plasmon optical drives. ( SR:8606204859 CR:JAGad74037 ) The SCSI driver can not communicate with the target (nCipher encryption device) that initiates speed and width negotiation. This results in parity errors on the SCSI bus and as a result SCSI bus resets. ( SR:8606193416 CR:JAGad62628 ) With PHKL_21607 or subsequent SCSI patch installed, if the SCSI driver detects an error, the line below is displayed: scb->cdb: 12 00 00 00 80 00 without the associated information. PHKL_25675: ( SR:8606137271 CR:JAGad06389 ) Some processes might become unkillable if many processes access the same bus. This error condition has been experienced only on systems with a hundred or more luns on the same bus. ( SR:8606207857 CR:JAGad77034 ) The ioctl system call returns invalid values if called with SIOC_GET_TGT_LIMITS or SIOC_GET_TGT_PARMS parameters for a SCSI device controlled by the c8xx driver. ( SR:8606168360 CR:JAGad37642 ) A Data Page Fault panic occurs when an application uses the sctl/ioctl passthrough interface with the read/write data mismatching the buffer size. The stack trace would look similar to the following: panic+0x14 report_trap_or_int_and_panic+0x4c interrupt+0x1e8 $ihndlr_rtn+0x0 lbcopy_pcxu_method+0xc privlbcopy+0x1c PHKL_24004: ( SR: 8606179935 CR: JAGad49157 ) If an error occurs that causes LVM to switch to an alternate link (if configured) to access the physical volume, a subsequent attempt to deactivate the volume group with the command: vgchange -a n [vg_name] hangs. ( SR: 8606158737 CR: JAGad28067 ) The following informative message on the console and in /var/adm/syslog/syslog.log unnecessary alarmed customers: SCSI: Attempt to access partially open device -- dev: %x ( SR:8606189487 CR: JAGad58701 ) An operating system update from 11.00 to 11.11 fails during the kernel rebuild. The compilation of conf.c fails with the the following messages: WARNING: Duplicate tunable scsi_max_qdepth found in /usr/conf/master.d/sctl. Ignoring the following entry from /usr/conf/master.d/sctl. scsi_max_qdepth SCSI_MAX_QDEPTH 8 Compiling /stand/build/conf.c... (Bundled) cc: "/usr/conf/space.h.d/scsi_ctl_space.h", line 54: error 1588: "SCSI_MAX_QDEPTH" undefined. (Bundled) cc: "/usr/conf/space.h.d/scsi_ctl_space.h", line 54: error 1521: Incorrect initialization. (Bundled) cc: "/usr/conf/space.h.d/scsi_ctl_space.h", line 54: error 1521: Incorrect initialization. (Bundled) cc: "/usr/conf/space.h.d/scsi_ctl_space.h", line 72: error 1584: Inconsistent type declaration: "scsi_max_qdepth". (Bundled) cc: "/usr/conf/space.h.d/scsi_ctl_space.h", line 72: error 1521: Incorrect initialization. *** Error exit code 1 ( SR: 8606199984 CR: JAGad69170 ) With heavy stress on Fibre Channel (FC) devices, the system panics with the following stack trace: panic+0x14 report_trap_or_int_and_panic+0x4c interrupt+0x1e8 $ihndlr_rtn+0x0 scsi_is_synchronous_err+0x6c scsi_action+0xb0 sd_retry+0x5c scsi_cbfn+0x294 fcpdev_scsi_comp+0x20c fcpbh_scsi_comp+0x5ec fcpbh_fcp_cbfn+0x284 fcpbh_rcv_completer+0x450 fcT1_isr+0x77c PHKL_23281: ( SR: 8606173791 CR: JAGad43048 ) A system panic occurs if a specific I/O logging level is set while the system is experiencing I/O errors using the passthrough driver. This panic may not occur if logging is not enabled for investigation purposes. The panic causes the following stack trace: panic+0x14 report_trap_or_int_and_panic+0x80 interrupt+0x1d4 $ihndlr_rtn+0x0 scsi_dmesg_log_io+0xf8 scsi_action+0x1b8 scsi_status_action+0x6c scsi_cbfn+0x41c scsi_fast_cbfn+0x1b0 c720_call_cbfns+0x60 c720_isr+0x5bc epic_isr+0x58 mp_ext_interrupt+0x34c ivti_patch_to_nop3+0x0 idle+0x164 swidle_exit+0x0 ( SR: 8606161696 CR: JAGad31012 ) A defective SCSI bus controler generates many SCSI bus resets and causes the system to panic. The panic results in the following stack trace: panic+0x14 settimeout_for_cpu+0x174 Ktimeout+0x3c c720_reset_chip+0x129c c720_isrRST+0x94 c720_isr+0x15cc sapic_interrupt+0x2c ( SR: 8606176639 CR: JAGad45877 ) LVM requests to a Volume Group may hang instead of switching to alternate link under certain disk failure conditions. After TOC'ing the system, the resulting dump showed the following stack trace of the lvmkd process: _swtch+0x138 real_sleep+0x234 _sleep+0x14 scsi_sleep+0x34 scsi_iowait+0x54 scsi_cmdx+0x20c scsi_cmd+0x3c scsi_init_inquiry_data+0xe4 scsi_ioctl+0x1024 sdisk_ioctl+0x28 lv_check_dev_accessability+0x134 lv_bufio+0x23c lv_test_a_link+0x8c lv_check_pf_pvs+0x3a0 lvmkd_daemon+0xd4 lvmkd_fork+0xa0 lvmkd_init+0x1c main+0x870 $vstart+0x34 $locore+0x74 PHKL_22941: ( SR: 8606112261 CR: JAGab84575 ) The same scsi queue depth can be set for all tagged devices but not on a per device basis. ( SR: 8606135046 CR: JAGad04180 ) Frequent resets in systems with Fibre Channel devices (possibly due to addition/removal of devices in the loop) can cause excessive logging resulting in diag2 overrun or /var filesystem free space to be exhausted. ( SR: 8606158437 CR: JAGad27767 ) An XP256 array connected to a Fibre Channel adapter can have placeholder LUNs with capacity zero. Using scsictl command on those zero capacity LUNs causes an unrecoverable process hang. ( SR: 8606166729 CR: JAGad36016 ) When a bus is shared between two systems, if one of the systems continuously sends out bus resets, the I/Os from the other system on this bus hang, consequently the PV-Link switch would not occur. ( SR: 8606167125 CR: JAGad36411 ) Disk I/O hangs even when LVM PV-Link is configured. The system could report a "DIAGNOSTIC SYSTEM WARNING". The on-line diagnostic log would show an I/O Error. ( SR: 8606169435 CR: JAGad38710 ) High Availability systems hang when under heavy load and many I/O errors are being returned by the scsi driver (possibly due to a hardware problem). PHKL_22759: ( SR: 8606169631 CR: JAGad38905 ) This patch replaces PHKL_22460. The recent changes to SCSI services introduced within PHKL_22460 to address Change Request JAGad04900 broke the SIOC_IO ioctl interface. Any SIOC_IO ioctl sent down with a null data buf (ie, test ready, rewind, etc) fails. This will break backup applications and other Unix commands like mc that send SIOC_IO ioctls with null data buffers. PHKL_22460: ( SR: 8606158623 CR: JAGad27953 ) System can panic with a Data Page Fault panic in scsi_start_bus_locked(). This defect has been found on a V-class running ioscan after starting then halting ServiceGuard in a single node configuration. It can potentialy be found with other systems. The stack trace for this is: panic+0x14 report_trap_or_int_and_panic+0x80 trap+0xa8c nokgdb+0x8 scsi_start_bus_locked+0x5a4 scsi_start+0xb0 scsi_strategy_real+0x1a4 pa_ioforw_sched+0x360 scsi_probe+0x640 parallel_scsi_probe+0x100 wsio_probe+0xe0 wsio_find_it+0x34 wsio_scan+0x6c gio_scan_subtree+0x188 gio_scan_subtree+0x1c4 gio_scan_subtree+0x1c4 gio_scan_subtree+0x1c4 gio_scan_subtree+0x1c4 io_scan+0xbc do_io_scan+0x48 dev_config_ioctl+0xe8 spubind_cdev_ioctl+0x94 spec_ioctl+0xac vno_ioctl+0x90 ioctl+0x168 syscall+0x200 $syscallrtn+0x0 ( SR: 8606157951 CR: JAGad27281 ) On a K-class or T-class machine with the HP-PB boards using the scsi3 driver, this defect may cause some processes to hang and SCSI abort messages will be found in the syslog file. However, this defect has not been encountered by any customers and the chances of experiencing it are extremely low. ( SR: 8606155189 CR: JAGad24506 ) This problem would most likely show as an instruction fault panic trying to execute address 0x.0. Careful debugging may be able to produce a stack trace with: c720_if_tgt_open+0xa4 scsi_tgt_open+0xf04 scsi_lun_open+0xf10 sdisk_open+0x1c call_open_close+0xb5c opend+0x2f4 spec_open+0xe8 vns_copen+0x4c vn_open+0xdc copen+0x128 open+0x44 syscall+0x5f4 $syscallrtn+0x0 ( SR: 8606155173 CR: JAGad24490 ) This problem can occur on any system with heavy SCSI I/O. Abort messages are likely to be seen in the syslog file. The stack trace ended with: c720_start+0xadc c720_isrDeactivate+0x280 c720_cleanup+0x78 c720_done+0x6c c720_isrCmdComp+0x164 c720_isrGuts+0xbcc c720_isr+0x218 epic_isr+0xa4 mp_ext_interrupt+0x264 ivti_patch_to_nop3+0x0 spinunlock+0x48 lookuppn+0x1a4 vn_create+0xc0 mkdir+0x80 ( SR: 8606155151 CR: JAGad24468 ) This problem could show up a process hang waiting for an I/O to complete, or a memory leak of a 512 byte bucket on 32bit kernels and 1024 byte bucket on 64bit kernels. It can occur on any kind of machine during regular use of the system. ( SR: 8606133057 CR: JAGad02204 ) The sctl device driver was not designed to run on multiple processors at the same time (i.e. MP safe). As a consequence, all processes using the sctl device driver will all be run on the same processor in a multi-processor system. CPU load distribution will be uneven and may lead, in the worst cases, to severe degradation of performance. ( SR: 8606135767 CR: JAGad04900 ) During regular use of the sctl driver requesting information from a device, a data page fault panic occurs when an unwritable buffer is given to the sctl driver. The stack will show the following lines: panic+0x14 report_trap_or_int_and_panic+0x4c interrupt+0x1e8 $ihndlr_rtn+0x0 lbcopy_pcxu_method+0xc privlbcopy+0x1c ( SR: 8606155155 CR: JAGad24472 ) This problem can be found on any system experiencing memory pressure. A data page fault panic from c720_timer occured when the system was running out of memory. SCSI aborts are likely to be found in the syslog file. The stack trace should show: c720_timer+0x6e4 invoke_callouts_for_self+0x238 sw_service+0x108 mp_ext_interrupt+0x394 ivti_patch_to_nop3+0x0 idle+0x3c4 swidle_exit+0x0 ( SR: 8606155022 CR: JAGad24339 ) The system will suffer performance degragation once a device queue has been filled. The syslog file will include a message reporting a QUEUE DEPTH message for the corresponding device, and the depth gets set to 1. This is never get reset. This problem was partially handled by a previous patch (PHKL_21607), but needs to be completed. This fix is not necessary to have a machine work properly but it is needed for a proper and complete solution on handling correctly the QUEUE FULL condition. ( SR: 8606125977 CR: JAGac46733 ) This problem would be seen as a data page fault panic in LspToScratch. This panic can occur during regular use of the c720 driver. The stack trace should end with: LspToScratch+0x8 c720_isrSelect+0x38 c720_isrGuts+0x84c c720_timer+0x668 invoke_callouts_for_self+0xc0 sw_service+0xb0 mp_ext_interrupt+0x144 ( SR: 8606138825 CR: JAGad08088 ) A data page fault can occur during regular use of the c720 driver. Although no specific stack trace can be expected, if ONE out of the five following functions appears near the top of the stack trace, it is likely that this defect has occurred: c720_isrSelect() c720_isrDataDone() c720_isrExtMsgLenIn() c720_isrWdtrRespRcvd() c720_isrSdtrRespRcvd() ( SR: 8606155947 CR: JAGad25258 ) This problem is likely to show up as a hung process. Debugging of the problem would provide the following stack for this process: _sleep+0x7d4 scsi_sleep+0x3c scsi_lun_close+0x758 sdisk_close+0x10 call_open_close+0x504 closed+0xb0 spec_close+0x54 vn_close+0x48 vno_close+0x20 closef+0x68 close+0x48 syscall+0x480 $syscallrtn+0x0 ( SR: 8606160406 CR: JAGad29728 ) SCSI bus throughput is not as expected on LVD SCSI boards. These boards are 895 & 896 chip based boards. The system will show slow performance from the LVD SCSI boards. ( SR: 8606105472 CR: JAGab73559 ) This is an enhancment that creates a persistent tunable to manage the queue depth for all the SCSI tagged devices on a system. The queue depth gives the number of maximum concurrent I/O to the same device. ( SR: 8606160479 CR: JAGad29800 ) A data page fault panic from scsi_start occurs. The problem can occur if the system is accessing multiple tape devices on a single SCSI bus. The panic and stack trace will be: panic: (display==0xb800, flags==0x0) Data page fault The stack trace was: scsi_start+0x18 scsi_retry+0xd8 invoke_callouts+0x160 softclock+0x38 sw_service+0x154 mp_ext_interrupt+0x2a0 $RDB_int_patch+0x58 mpn_splx_free_lock_ul4_brn_target+0x4 net_callout+0x90 netisr_netisr+0x1bc netisr_daemon+0x68 PHKL_21989: ( SR: 8606142756 CR: JAGad12108 ) Any wide SCSI devices attached to the built-in narrow single-ended SCSI bus using a 50 pin to 68 pin cable will not function properly. The description shown by ioscan for the built-in narrow single-ended SCSI bus will incorrectly show the bus as "Wide". PHKL_21607: ( SR: 8606132292 CR: JAGad01441 ) The QUEUE FULL handling has caused performance problems at customer sites. ( SR: 8606130227 CR: JAGac95098 ) The "incomplete" field of the scsi_lun structure is keeping track of the number of i/o requests pending for a specific lun. This field is increasing to large numbers, which is giving misleading information. ( SR: 8606132288 CR: JAGad01437 ) The stated limitations on lun numbering are that they must start at zero and be in sequential order. If there are any "gaps" in lun numbering, ioscan will not recognize the high order luns above the gap. The request is for ioscan to recognize all luns, regardless of lun numbering and order. ( SR: 8606132426 CR: JAGad01575 ) A typical SCSI message in the syslog contains a dev (for device). It would be useful to have the hardware path to make the log message more clear. ( SR: 8606106155 CR: JAGab75050 ) The system panics with Spinlock held too long. The panic is most likely occuring on a debug kernel. The end of the stack should show the following : panic+0x14 check_held_time+0x42c spinlock_delete+0x50 su_pre_check+0x144 c720_unlock+0x9c c720_isr+0x254 ( SR: 8606133146 CR: JAGad02293 ) A panic occured when assertion failed: panic: assertion failed ((ReadLong(isc,(ubit32 *)&pScript [Ent_PtCmd+4], &j), (j == PTR_TO_CHIPWORD( ((struct c720_OutBuf *)lbp->uPhysOutBuf ( SR: 8606133067 CR: JAGad02214 ) LVM hangs due to I/O requests never being returned by the IO subsystem. The message "Device violation of Contingent Allegiance" is issued to syslog. ( SR: 8606133280 CR: JAGad02425 ) A SCSI spinlocok panic occured. The panic string was: panic: assertion failed ((lisc)->cbfns == NULL) ( SR: 8606125811 CR: JAGac42754 ) The robotics LUN on a LVD/SE DDS4 Autoloader is not discovered by ioscan when the autoloader is attached to a HSC SE bus. The drive LUN is discovered and is attached to stape correctly, but the robotics controller LUN doesn't even come up as unclaimed. PHKL_21504: ( SR: 8606125610 CR: JAGac41000 ) With vmtrace configured, the system panics with a data memory protection fault and the following stack trace : panic+0x14 report_trap_or_int_and_panic+0x80 interrupt+0x1d4 $ihndlr_rtn+0x0 spinlock+0x14 scsi_lun_lock+0x14 sd_strategy_error+0x170 sd_strategy+0x12c scsi_strategy_real+0xd78 ioforw_int+0xcc mp_ext_interrupt+0x144 ivti_patch_to_nop3+0x0 idle+0x4dc swidle_exit+0x0 This problem is reproducible on systems with SCSI LUNs configured with no storage (i.e. zero storage size) and occurs when an I/O request which attempts to access the no-storage LUN coincides with an I/O request to close the same LUN. ( SR: 8606130829 CR: JAGac97596 ) If SCSI SCRIPTS RAM is modified, pNext in c720_bus_open_real() will not be pointing to the end of SCRIPTS RAM. PHKL_20688: ( SR: 8606127757 CR: JAGac78558 ) SCSI hardware failure causes system hang with multiple processes waiting for I/O to return. Multiple console messages are generated which read: SCSI: Third party detected bus hang -- lbolt: xxxxxxxx, bus: x PHKL_20629: ( SR: 8606112882 CR: JAGab93301 ) Panic using spt0 SCSI pass-thru or Omniback with patch PHKL_20452 installed. Panic stack trace: proc[82] at 0x0326b780 ("/opt/omni/lbin/bma -load 1.000000 -name hpcc557_DLTL-1.2 -po"): stack trace for event 0 crash event was a panic panic+0x14 report_trap_or_int_and_panic+0x4c trap+0xea8 $RDB_trap_patch+0x38 spt_getbuf+0x18 spt_sioc_io+0x110 spt_ioctl+0x1c8 spubind_cdev_ioctl+0x88 spec_ioctl+0xb0 vno_ioctl+0x8c ioctl+0x138 syscall+0x1c8 $syscallrtn+0x0 ( SR: 8606110931 CR: JAGab83681 ) Panic in scsi_dmesg_log_io function of scsi driver: "Data Page Fault at line# 1361 in wsio/scsi_ctl.c" Panic Stack Trace: panic+0x14 report_trap_or_int_and_panic+0x80 trap+0xdb8 nokgdb+0x8 lbcopy_pcxu_method+0xc privlbcopy+0x1c scsi_dmesg_log_io+0x4e4 PHKL_20452: ( SR: 5003432120 CR: JAGaa22888 ) SCSI IO subsystem retries non-responsive SCSI devices "forever". Appears to user as IO hang to a device or Logical Volume. ( SR: 8606103129 CR: JAGaa44450 ) After opening a device defined as a scsi_fast_read or write device, SCSI pass-through command mode stops working. ( SR: 8606103814 CR: JAGab19070 ) SCSI Data Page Fault panic: c720_isrAbort/$ihndlr_rtn. crash event was a panic panic+0x14 report_trap_or_int_and_panic+0x80 interrupt+0x1d4 $ihndlr_rtn+0x0 c720_isrAbort+0x1c c720_isr+0xc94 sapic_interrupt+0x2c mp_ext_interrupt+0x33c ivti_patch_to_nop3+0x0 idle+0x508 swidle_exit+0x0 ( SR: 8606103810 CR: JAGab19072 ) System hung running SCSI disk IO, Filesystem, and LVM stress test, due to SCSI controller looping forever to process interrupts. ( SR: 1653307298 CR: JAGab20815 ) File system hang on two-way mirrored LVM configuration when a disk drive in the mirror fails, due to infinite retries on SCSI Parity Errors. ( SR: 8606100396 CR: JAGab31749 ) Machine can have HPMC while doing register dump of the SCSI controller, if the SCSI IO Processor (SIOP) has not first been stopped. ( SR: 8606103820 CR: JAGab39677 ) Data corruption can be experienced on early revision of SCSI 896 controller chip, when Parity Errors occur on the SCSI bus. ( SR: 8606103148 CR: JAGab69517 ) Data Page Fault panic in scsi_start_bus_locked from parallel_scsi_probe. crash event was a panic panic+0x14 report_trap_or_int_and_panic+0x80 trap+0xa8c nokgdb+0x8 scsi_start_bus_locked+0x5a4 scsi_start+0xb0 scsi_strategy_real+0x1a4 pa_ioforw_sched+0x360 scsi_probe+0x640 parallel_scsi_probe+0x100 wsio_probe+0xe0 wsio_find_it+0x34 wsio_scan+0x6c gio_scan_subtree+0x188 gio_scan_subtree+0x1c4 io_scan+0xbc do_io_scan+0x48 dev_config_ioctl+0xe8 spubind_cdev_ioctl+0x94 spec_ioctl+0xac vno_ioctl+0x90 ioctl+0x168 syscall+0x200 $syscallrtn+0x0 ( SR: 8606103192 CR: JAGab69594 ) Infinite SCSI IO retries; appears to user as IO hang to a device or Logical Volume. ( SR: 8606105969 CR: JAGab74731 ) SCSI Bus Reset occurs during an xstm firmware download to disks on A,K,V class machines. ( SR: 8606106038 CR: JAGab74836 ) Ultra-II speeds not set correctly in scsi c720 driver, so Ultra-II devices go slower than expected. Also, SCSI speed and bus width set incorrectly on PCI-attached SCSI controllers, as shown in ioscan(1M) output. ( SR: 8606113541 CR: JAGab76136 ) SCSI Unexpected Disconnect on DLT7000 tape drive. ( SR: 8606108198 CR: JAGab78589 ) Long system hangs when using the NIO pass-through driver. ( SR: 8606110476 CR: JAGab83179 ) Extended Interrupt Vectors would not work on 64-bit PA RISC Processors. ( SR: 8606110477 CR: JAGab83180 ) Display values for SCSI Width and Mode are set incorrectly, as shown by ioscan on some machines: J7000, J5000, C3000, and B1000. ( SR: 8606110479 CR: JAGab83182 ) Data corruption with early 896 SCSI chip in 32-bit mode, as observed on the SCSI bus. ( SR: 8606110481 CR: JAGab83184 ) Panic on dereference of NULL pointer "lsp" in the msg_printf at the end of c720_isrGuts_LBP_STALL() routine. ( SR: 8606110616 CR: JAGab83364 ) Online Deletion causes memory leak. ( SR: 8606110653 CR: JAGab83401 ) SCSI Abort Message not working properly with PCI bus. ( SR: 8606110782 CR: JAGab83531 ) SIOP is started with an empty IO request. PHKL_20208: ( SR: 1653281824 DTS: JAGaa42584 ) If immediate reporting is enabled and a deferred error occurs, the system will panic with "scsi unrecovered deferred error". PHKL_19776: ( SR: 8606103698 CR: JAGab70738 ) ( SR: 8606113358 CR: JAGab70313 ) LVM VG failover to alternate HW path takes up to 8 minutes. ( SR: 1653310672 CR: JAGab31999 ) A bad disk shows as good, based on cached INQUIRY data which can be shown with "diskinfo -v " on bad disk. Note: the cause of this also affected above two CRs, so this SCSI INQUIRY fix was required for this patch. PHKL_19245: ( SR: 8606103582 DTS: JAGaa09970 ) Enhancement to add new IOCTL for issuing SCSI commands to SCSI disks. PHKL_19561: ( SR: 4701424978 DTS: JAGab13476 ) I/O error on reading odd-length records from tape device. PHKL_17333: ( SR: 1653284257 DTS: JAGaa44107 ) NIO disks may become unresponsive, causing processes which access them to hang and become unkillable. PHKL_14807: The previous patch has been recut to include compile-based performance tuning. There is no functional change in this patch. PHKL_13371: ( SR: 4701376111 DTS: JAGaa09879) The following has been the configuration that produced the problem in practice, though this problem could potentially occur on other configurations. When a CASCADE (C2430D) device is connected to a T520 or T600 and a probe goes down to the device possibly from an ioscan running, it may get a check condition. This leads to a data page fault PANIC. PHKL_19287: ( SR: 8606101377 DTS: JAGab17408 ) LVM failover to alternate path fails on logical volumes configured on SCSI3 devices PHKL_20157: ( SR: 8606107849 DTS: JAGab78147 ) c720 driver does not support SYM 53C895A SCSI chip ( SR: 8606107164 DTS: JAGab76873 ) Internal SCSI on N-class is identified as SE instead of LVD ( SR: 8606113567 DTS: JAGab76903 ) Ultra-2 speeds not set correctly in scsi c720 driver ( SR: 8606105969 DTS: JAGab74731 ) Reset during Firmware Download ( SR: 8606103151 DTS: JAGab69533 ) N/L sys. could NOT see disk when boots up with install kernel PHKL_17368: Hot Spares configured on a Disk Array will not be visible via SAM. SAM issues the errors: "Failed to open newly created LUN device file" and "SAM is unable to communicate with the device controller at hardware path, x.xx. As a result, SAM cannot retrieve information about the state of any LUNS owned by this controller. SAM will display the LUNS owned by this controller but use extreme caution when using LUNS. If this controller is part of a dual controller disk array, the LUNs may be listed twice instead of correctly being listed only once. Again, use extreme caution when configuring LUNS." SAM works fine when the HOT Spare(s) is removed. PHKL_14688: Data Page Fault panic when commands are repeatedly sent to a device controlled by the scsi_pt driver, with I/O activily on other devices on the same bus. Defect Description: PHKL_30508: ( SR:8606349130 CR:JAGaf09949 ) When VxVM does an open(2) on the secondary disk with O_NDELAY flag set, the SCSI disk driver does not retry the SCSI Start Unit command even when a retryable error is seen. Resolution: The SCSI disk driver code has been modified to retry SCSI Start Unit command when a retryable error is seen, even with O_NDELAY flag set. ( SR:8606344298 CR:JAGaf05149 ) A SCSI sense key of Illegal Request returned by some SPC-2 compliant devices was being treated incorrectly as a transient error thus resulting in retries. Resolution: Code has been modified to handle SPC-2 compliant devices correctly. ( SR:8606351535 CR:JAGaf12340 ) The management processor on MSA30 disk enclosure does not have a dedicated SCSI target ID. It responds to the SCSI ID of the initiator. In order to communicate with the management processor, open(2) on the device node corresponding to the SCSI initiator ID should be allowed. Resolution: Code has been modified to allow open(2) on the device node corresponding to the SCSI initiator ID. PHKL_29834: ( SR:8606322906 CR:JAGae85372 ) SCSI driver incorrectly sets up the external narrow SCSI bus to wide mode. Resolution: The SCSI c720 driver code has been modified to setup SCSI bus correctly for narrow or wide. ( SR:8606295123 CR:JAGae58817 ) When an asynchronous write request fails, the SCSI subsystem did not log the error. The system panicked due to a related assertion check in the code. Resolution: The SCSI subsystem has been modified to log the error appropriately. ( SR:8606335728 CR:JAGae96782 ) SCSI subsystem does not pass the updated values for residual byte count to kmetric subsystem. Resolution: SCSI subsystem has been modified to pass updated values for the residual byte count. PHKL_29364: ( SR:8606299275 CR:JAGae62769 ) SCSI Driver does not log the recovered error events reported by disks to the logging subsystem. Resolution: SCSI Driver has been modified to log the recovered error events. To enable logging, set the 0x40 flag in scsi_log_mask. This may be done as below. adb -w /stand/vmunix /dev/kmem scsi_log_mask/X <===== Get the value of current log mask scsi_log_mask: scsi_log_mask: 1F238B10 Add 0x40 to current mask scsi_log_mask/W 0x1F238B50 scsi_log_mask: 1F238B10 = 1F238B50 scsi_log_mask?W 0x1F238B50 scsi_log_mask: 1F238B10 = 1F238B50 to exit adb PHKL_29049: ( SR:8606304724 CR:JAGae68058 ) When large number of I/O requests are sent to the device and if this number exceeds the queue length limit of the device, the device returns QUEUE FULL status. To avoid frequent QUEUE FULL status messages from the device, the driver will lower its own queue length so that the number of I/O requests sent to the device is reduced. The driver will increase the queue length when a certain number of I/O's have successfully completed at the current queue length. If the driver gets QUEUE FULL status when operating in queue length of 1 it will switch to untagged mode (no queuing). The driver will not return to tagged mode once this happens (even when the I/O load is reduced later). This results in slow I/O to the device and large avwait and avque values reported by sar(1M). Resolution: The code is modified to return to tagged queuing when a certain number of I/O requests complete successfully in the untagged mode. ( SR:8606266268 CR:JAGae30517 ) The system panicked because of divide by zero operation in one of the SCSI routines. Resolution: The code is changed to handle divide by zero operation. ( SR:8606304019 CR:JAGae67368 ) The present code needs to be modified to handle Ultra320 rate in the SIOC get/set ioctls. Resolution: The code is modified to handle Ultra320 rate in the SIOC get/set ioctls. PHKL_29041: ( SR:8606286789 CR:JAGae50728 ) The disk failure paths do not ensure that kmetric completion routines are called to update the block device activity data. This results in wrong data being reported for block device activity. Resolution: The kmetric completion routines are called in the disk driver error paths to ensure that the correct values are reported for block device activity. ( SR:8606298657 CR:JAGae62156 ) When the non-LVM I/O request fails with a sense key of "Illegal Request", the disk driver retries the I/O forever instead of returning failure. This caused the application to hang indefinitely. Resolution: The disk driver is modified to return an error EINVAL for a non-LVM I/O request when we have a sense key of "Illegal Request". PHKL_28496: ( SR:8606230478 CR:JAGad99528 ) On select timeout, c720 driver may dereference a null pointer resulting in a panic with data page fault. Resolution: On entering select timeout interrupt service routine, check if the bus pointer passed is NULL. ( SR:8606226043 CR:JAGad95114 ) In extremely rare conditions, single byte writes to onboard memory (SCRIPT RAM) may not complete on Channel B of A5159A and Core I/O FWD SCSI HBA on rp24xx, rp54xx and rp7400 systems. This may result in following problems: a. Data integrity issues b. System crash due to HPMC Resolution: Driver is changed to perform word writes instead of byte writes. ( SR:8606286272 CR:JAGae50215 ) To avoid data corruption Disable Pipe Request(DPR) bit is to be set during SCSI operations. In the present code it is being done only once, in chip initialization routine, and it gets reset after a successful chip reset operation. Resolution: Set the DPR bit in the chip reset routine instead of chip initialization routine. This will make sure that DPR bit is set on chip reset. ( SR:8606135832 CR:JAGad04964 ) 16 Byte CDB corruption is seen when a 16 byte CDB command is sent through the WSIO pass-thru driver. The device returns a check condition with invalid CDB. Resolution: Enable the driver to handle 16 byte CDB in the same way it handles fewer byte (6 or 10) CDBs. ( SR:8606289589 CR:JAGae53519 ) The SCSI LUN pointer is invalid for the bus scsi control block (SCB) and therefore can cause a recursive bus lock held panic in Multi-LUN configuration. Resolution: The LUN pointer is reset to zero for the bus pool SCB before it is freed. Hence, recursive holding of the bus lock is avoided. ( SR:8606293572 CR:JAGae57320 ) The occurrence of this state prevents doing a last close on the device. Resolution: The problem occurred because two flags are assigned the same value. The problem is resolved by changing the value of one of the flags. PHKL_28131: ( SR:8606225743 CR:JAGad94816 ) Instead of returning an error up to the user level, the disc3 driver retries the I/O over and over again causing all the I/O message frames to be used up. This eventually causes the system to panic. Resolution: The code resolution can be split into two parts: 1. Read/write on raw disk devices is handled by a new function that splits buffers at 64K, when the buffer is odd-byte-aligned and request size is greater than or equal to 64KB. 2. Read/write on LVM devices is handled by allocating a temporary buffer and using this buffer to do the actual operation. ( SR:8606165403 CR:JAGad34697 ) FC60 devices get bound to the current controller because the Mode select command was getting issued to these devices during open. Resolution: The driver code has been changed to not issue a Mode select command for FC60 device during device open. ( SR:8606242143 CR:JAGae09397 ) The Disable Overlapped Arbitration bit in the Control register Zero is used for gaining access to the PCI bus while another function is executing a PCI cycle.The register bit was not set and hence caused intermittent bus hangs and bus resets on the cards containing the 53C876 chip. Resolution: The Disable Overlapped Arbitration bit is now set on the cards containing the 53C876 chip whenever the chip is reset. This avoids the hang and subsequent resets. ( SR:8606265990 CR:JAGae30243 ) The I/O subsystem hang occurred because an I/O request remained in the LUN disk queue. The I/O request remained in the queue because of a failure in allocating the resource. Resolution: The code has been modified to take care that the I/O subsystem hang does not happen when allocation of the resource fails. ( SR:8606165305 CR:JAGad34599 ) The driver code was not able to determine if the data buffer passed in the ioctl was a user space or kernel buffer. This caused the SIOC_IO ioctl to return incorrect data. Resolution: The driver code has been modified to handle kernel buffers passed with SIOC_IO ioctl system call. ( SR:8606241873 CR:JAGae09130 ) The size of scsi_isc array is 255. The interface drivers currently do not verify the array bounds before populating the array with the bus instance number. If the bus instance number greater than 255 is assigned, then it would overflow the array and cause memory corruption thereby resulting in a system panic. Resolution: The driver init routine was changed to check if the bus instance numbers were greater than 255 and if so return an error. This avoids a system panic. In this case, even if the node is claimed with the instance number greater than 255, the devices under the node are neither visible on ioscan nor accessible. ( SR:8606177456 CR:JAGad46688 ) The Auto LUN Transfer feature on FC60 will cause the LUN-controller ownership to change automatically to the path that is currently being used. When a Synchronize Cache is issued from the disk driver close routine, the alternate path of a dual-path configuration is open and closed, and ALT will cause that path to become the primary path even if no other I/O is being done on this path. Resolution: The driver code has been modified to not issue a Synchronize cache command during close of FC60 disk array. ( SR:8606282310 CR:JAGae46262 ) The ATAPI interface driver (side) returns without an interrupt context, when a Check Condition with Medium change sense occurs. This causes a panic in the scsi disk driver due to an incorrect debug assert statement. Resolution: The corresponding debug assert statement in the disk driver has been removed. ( SR:8606264850 CR:JAGae29181 ) The open() on a CDROM drive without a CD in it takes a considerable amount of time compared to having a CD in the drive even with the O_NDELAY flag set. The disk driver code does not handle the O_NDELAY flag correctly. Resolution: The driver code has been modified to handle the O_NDELAY flag correctly. ( SR:8606238711 CR:JAGae07734 ) The disk driver returns EINVAL for I/O request to LVM due to some hardware condition. LVM was not retrying the I/O requests even when an alternate path to the LUN existed. This resulted in some filesystem and system hang condition. Resolution: The disk driver is modified to return an error of EPOWERF when an EINVAL condition is reported by the device for an LVM I/O except for ASC=0x0C, ASCQ=0xA0 (Oracle Hard Integrity error). LVM will retry the I/O on an alternate path due to EPOWERF returned by the disk driver. ( SR:8606226361 CR:JAGad95431 ) When I/O requests from LVM fail or time-out due to bad disks, the SCSI disk driver returns an incorrect error code to LVM causing the LVM to retry the I/O request forever instead of returning failure. This causes the application which has issued the I/O request to hang indefinitely. Resolution: Ensure that the I/O request failed due to MEDIUM ERROR is reported back to LVM with EMEDIA error. ( SR:8606173887 CR:JAGad43140 ) ( SR:8606178041 CR:JAGad47268 ) ( SR:8606167814 CR:JAGad37097 ) ( SR:8606139670 CR:JAGad08981 ) Few error conditions were retried indefinitely causing process hang or PVLink switch not to occur. While in the case of error-intolerant upper layers (like the hfs filesystem) the error returns caused file system panics. These problems were fixed in 11.11 as part of Error Cleanup but still had to be fixed in 11.00. Resolution: Porting the error cleanup code from 11.11 to 11.00. Depending on where the I/O is issued from: 1. Device open/ioctl, 2. I/Os from an error-intolerant upper layer or 3. I/Os from LVM-like upper layers, various error conditions are now handled appropriately. PHKL_27003: ( SR:8606230706 CR:JAGad99756 ) After an LVM I/O times out, the flag L_FAIL_QUEUE_IO can remain set and prevent LVM probes from being sent to the device to see if it has returned on-line. Also, many SCSI read error messages will be seen in syslog. Resolution: Only set the flag (L_FAIL_QUEUE_IO) if there are I/O requests queued to be sent to the device. ( SR:8606232873 CR:JAGae02101 ) PHKL_22460 caused the c720 driver to only map the sense buffer during bus open and re-used the physical address for each I/O until the bus is closed. However, while re-using the request sense buffers between I/O requests, the driver was not invalidating the buffer. This resulted in I/O error due to stale data access from the cache during multiple backups to tape. Resolution: Modify the request sense buffer handling code in c720 driver as below 1. Allocate and map one request sense buffer in initialization function and re-use it during the life of the card. 2. Invalidate the buffer after every completion status receipt from device. ( SR:8606244278 CR:JAGae10766 ) The driver-retry logic causes the disk driver to retry I/O requests forever on getting a 'busy' status from disk. Even the LVM I/Os are retried forever, thereby giving an impression that the process/system has hung. Resolution: Correct the retry logic of the disk driver so that the LVM I/Os are retried only for the duration of the timeout set. ( SR:8606249862 CR:JAGae16248 ) Upon detecting a timed out I/O request, the driver sets a flag in the LUN data structure indicating, "do not retry any requests for this LUN". After a successful completion of a subsequent I/O request, this flag should be cleared. However, when the subsequent I/O request completes successfully, the driver's normal completion path (in which this flag is cleared) does not get executed and hence the flag remains set. So, if any subsequent I/O requests do not complete sucessfully, they are failed immediately without performing the retry. Resolution: The fix is to make sure that the driver follows the normal completion path for the first successful completion of an I/O request following a failed I/O request. PHKL_26452: ( SR:8606185203 CR:JAGad54405 ) When a SCSI pass-through read fails with a check condition on a deferred error, the scsi function used to process the completion of the I/O is incorrectly called twice. Since the scsi function is called twice, the number of bytes to be copied from the kernel space buffer to the user space buffer in the kernel is incorrectly being incremented. This resulted in writing past the end of the user space buffer causing the system to panic with a Data Page Fault. Resolution: The fix is not to call the scsi function twice when there is a deferred error. ( SR:8606147432 CR:JAGad16775 ) A kernel internal data structure for the scsi device was freed when there were outstanding I/O requests. Resolution: The fix is to check if there are any outstanding I/O requests, and only when there are none remaining, to deallocate the data structure. ( SR:8606225743 CR:JAGad94816 ) Instead of returning an error up to the user level, the disc3 driver retries the I/O over and over again causing all the I/O message frames to be used up. This eventually causes the system to panic. Resolution: The fix is to return EINVAL to the user level, if the I/O request has an odd aligned buffer, or exceeds 64K-1 bytes. ( SR:8606216118 CR:JAGad85288 ) The system panics because of a race condition between the scsi bus open and the interrupt being serviced. The interrupt was getting serviced even before the internal data structures in the bus open routine were completely initialized. Resolution: The fix is to set a flag after initializing the data structures in the scsi bus open routine. In the ISR routine, a check is made to verify if this flag is set. The interrupt is serviced only if this flag is set. The flag is unset in the scsi bus close routine. ( SR:8606223745 CR:JAGad92841 ) The SCSI bus on the C7300 was not being correctly identified as narrow. The bus was being set up incorrectly as wide. Resolution: The SCSI bus identification routine was updated to correctly identify the SCSI bus used on the C3700 as narrow. PHKL_25938: ( SR:8606186960 CR:JAGad56170 ) The SCSI driver incorrectly displays this message when the device returns exactly the amount of sense data asked for. Resolution: This incorrect message log is removed in the SCSI driver. ( SR:8606204859 CR:JAGad74037 ) The SCSI driver does not distinguish between speed/width negotiations initiated by the target or the driver. The mismatch in the speed setting on the host and the target results in Parity Error on the bus. Resolution: The SCSI driver now tracks whether the response from the target is a response to host initiated negotiation or an unsolicitated request from the target. ( SR:8606193416 CR:JAGad62628 ) The message scb->cdb: 12 00 00 00 80 00 is always logged. The additional information to log with it is issued at specific levels of SCSI interface driver logging. Resolution: The scb->cdb: 12 00 00 00 80 00 message is now logged at the same level of logging as the additional information. PHKL_25675: ( SR:8606137271 CR:JAGad06389 ) Under heavy I/O load on the same bus, when some per bus resource (tag, nexus) becomes unavailable, I/Os are stored in specific queues, waiting for the resource to become available. Under some conditions, the queues are not checked once the resource is once again available, leaving the I/O requests unserviced. The corresponding processes remain in an unkillable state, waiting for I/O completion or failure that never occurs. Resolution: Additional tests were added to check if I/Os are pending in the queues, and to process them if the resources are now available. ( SR:8606207857 CR:JAGad77034 ) The SCSI services did not support the SIOC_GET_TGT_LIMITS and SIOC_GET_TGT_PARMS ioctl for the c8xx driver. Resolution: SCSI services is enhanced to support these ioctls for the c8xx driver. ( SR:8606168360 CR:JAGad37642 ) If a SCSI I/O is initiated using the sctl/ioctl passthrough function and the transfer size is greater than the size of the malloc'd buffer for this transfer, the system panics. Resolution: Check the access permissions of the buffer supplied by the user before using it. This ensures the system won't panic if the size of the I/O is greater than the size of the buffer. PHKL_24004: ( SR: 8606179935 Chart: JAGad49157 ) The command issued to close the LUN sleeps forever because a counter for pending I/Os was not decremented when returning the I/Os issued to a timed-out LUN. Resolution: The counter for tracking the pending I/Os is now decremented when returning errors to I/Os issued to a timed-out LUN. ( SR: 8606158737 Chart: JAGad28067 ) This informative message was always logged while trying to access partially opened device (This is a device with zero capacity). Resolution: This message is not logged by default for partially opened devices. This message log can be enabled for debuggging purposes by setting appropriate value in scsi_log_mask. ( SR:8606189487 Chart: JAGad58701 ) New files introduced to 11.00 by a patch interfere with an update to 11.11. The defect is seen on 11.00 systems with PHKL_22460 or superseding patches. These patches introduced two new files: /usr/conf/master.d/scsi-disk and /usr/conf/space.h.d/scsi-disk.h, needed to define a new tunable (scsi_max_qdepth). 11.11 includes this tunable too, but in different files. When recompiling the kernel, the compiler sees the scsi_max_qdepth symbol defined in two files. It uses the definition found in the first file. The compilation fails because the tunable definition is different (default value does not correspond to the same constant). Resolution: A new script, scsi.clean, is created by this patch. The file is installed in the /usr/lbin/sw/pre_kernel directory. On updating to 11.11, this script is run and removes scsi-disk and scsi-disk.h. The kernel build will then be successful and the update process will complete. Even though files are removed, the tunable settings are kept across the update. ( SR: 8606199984 Chart: JAGad69170 ) A data page fault panic can occur if the scsi driver tries to access stale sense data unconditionally, for a SCSI error that had no associated sense data. Resolution: The scsi driver will now access sense data only when it receives CHECK CONDITION error. Sense data will always be valid during this error condition. PHKL_23281: ( SR: 8606173791 CR: JAGad43048 ) Because the logging function was called by the passthrough driver a pointer was not set. The logging function uses this pointer to reference some elements causing the system to panic. Resolution: The function checks if the calling driver is a passthrough and if the pointer is set to NULL. If so, no specific I/O logs are generated. ( SR: 8606161696 CR: JAGad31012 ) The SCSI bus reset management is handled poorly by the system. On every SCSI bus reset, a new timer is generated for later processing. This leads to a timer table overflow which causes the system to panic. Resolution: The SCSI bus reset management was improved by the system checking if a reset timer is set for the corresponding bus each time the SCSI bus is reset. If a reset timer already exists for that bus, the previous timer is removed and another timer is set. Otherwise, a new timer is set for the corresponding bus. This ensures that only one reset timer can be set per reset on a specific SCSI bus and prevents the possibility of timer table overflow resulting from a defective SCSI card. ( SR: 8606176639 CR: JAGad45877 ) When a powerfail condition occurs, the LVM requests are queued to the Powerfail queue and the lvmkd deamon will attempt to test the links by sending a SCSI_INQUIRY request to each link. If a device returns SCTL_INCOMPLETE, the scsi code will retry the SCSI_INQUIRY every 2 seconds until the inquiry is successful. If the inquiry is never successful, the scsi driver will never return to the lvmkd, and the powerfail recovery will hang rather than timing out and switching to the alternate link. Resolution: On a SCSI_INQUIRY done through the sdisk_ioctl path where the device returns SCTL_INCOMPLETE, the inquiry request will be retried every 2 seconds, but now there is a maximum of 5 retries. PHKL_22941: ( SR: 8606112261 CR: JAGab84575 ) There was no mechanism available to set the scsi queue depth tunable on a particular device. Resolution: A new ioctl interface is provided for setting the scsi queue depth tunable for a particular device. However, this change takes effect on subsequent device opens. scsictl command is used to set the scsi queue depth tunable on a device. It is defaulted to the value set by global tunable "scsi_queue_depth" unless overridden by the scsictl command. ( SR: 8606135046 CR: JAGad04180 ) The SCSI subsystem logs errors for each I/O attempt in syslog.log. Bus resets being more common in Fibre Channel systems, the errors for each retry attempt could result in excessive error logs. Resolution: Recoverable errors are not logged except for Unit attention or Deferred errors. However, the errors that are persistent even after the maximum number of retries, are logged once. ( SR: 8606158437 CR: JAGad27767 ) This was caused by the driver indefinitely retrying a failing mode sense command on a LUN with capacity zero. Resolution: The mode sense command is not retried indefinitely and the scsictl command now fails with an I/O error after maximum retry attempts on such LUNs. ( SR: 8606166729 CR: JAGad36016 ) The LVM requests were retried indefinitely in the SCSI subsystem when the device returns "unit attention" error. This prevents LVM from recognizing this as an error and switching to an alternate path. Resolution: The LVM requests are not retried indefinitely for such errors, instead an error is returned after the maximum retries, allowing LVM to switch-over to an alternate path. ( SR: 8606167125 CR: JAGad36411 ) LVM requests were retried indefinitely when an invalid driver-internal status was returned. Resolution: The LVM requests are not retried indefinitely when an invalid driver-internal status is returned, instead an error is returned to LVM after retrying in the SCSI layer for the maximum number of attempts. ( SR: 8606169435 CR: JAGad38710 ) I/Os which timeout were not always being returned with an error but being retried indefinitely. This resulted in requests that had timed out, getting stuck in the disk driver's queue, resulting in a hang. Resolution: The requests are tracked and those which timeout are returned to the upper layer, thus allowing it to switch to an alternate path if one is configured. PHKL_22759: ( SR: 8606169631 CR: JAGad38905 ) The bad fix introduced into PHKL_22460 was to check the write access on the buffer passed to the sctl driver Commands are allowed to pass NULL pointers for the data through the pass thru driver when they don't need any return data. The I/O generated by those commands won't pass the test and fail. Resolution: The fix is to totally remove the check introduced in PHKL_22460 (for JAGad04900) so that all its other fixes can be available. The specific problem incorrectly fixed previously will be addressed in a future patch. PHKL_22460: ( SR: 8606158623 CR: JAGad27953 ) The system paniced because the SCSI driver tried to dereference a field in a structure after the structure had been freed. Resolution: The dd_lun structure is freed in two places in the code, one in scsi_lun_open(), the other in scsi_lun_close(). However, the scb_q_nonempty field is only NULLed out in scsi_lun_open(), not scsi_lun_close(). The fix is to NULL it out in both places. ( SR: 8606157951 CR: JAGad27281 ) During a regular use of a system, a request can be aborted. When the abort is issued, if the controller returns an error, the expected behavior is to resend the original request. The active request needs to be initialized before processing the I/O. In the current design, the request is left untouched and therefore is not retried. Resolution: The active request that returned with an error is assigned to NULL and is placed back on the request queue so that it can be correctly resent. ( SR: 8606155189 CR: JAGad24506 ) The system tries to execute a callback function that has not been initialized. Resolution: The callback function was set in the c720_isrRST routine (initializing lisc->cbfns). A call to C720_START then unlocks the bus, which leaves a window for another processor waiting in an open routine to acquire it. The problem is that lisc->cbfns is non-null. The fix for this is to remove the UNLOCK,LOCK from C720_START which, after analysis, was found to be unnecessary ( SR: 8606155173 CR: JAGad24490 ) The kernel did improper clean up in an abort condition. A SCSI script MOVE instruction is patched in the cmd_setup procedure in case of an abort to an INT instruction. The instruction is not restored to MOVE in the cleanup of Abort. Resolution: The MOVE instruction is restored in the c720_cleanup_ABORT procedure. This ensures that c720_start finds the command bytes. ( SR: 8606155151 CR: JAGad24468 ) A buffer structure can get lost during heavy use of the SCSI driver. Scsi_sp_start() does a dequeue of a buffer from lp->special_scb_q queue and then tries to allocate a SCSI control block (SCB). If the SCB allocation fails, the routine simply returns and the buffer is not enqueued back. This causes scsi_iowait() to wait forever, and the process will hang. This also causes the buffer to be lost, resulting in a memory leak. Resolution: The fix is to enqueue the buffer back into the head of the queue if the SCB memory allocation fails in scsi_sp_start(). ( SR: 8606133057 CR: JAGad02204 ) The sctl device driver's original design was not MP safe. If an application makes heavy use of the driver, the code is forced to execute on the same CPU. All processes using the driver will accumulate on the same processor. Resolution: The necessary code to make the driver MP safe has been added into the driver ( SR: 8606135767 CR: JAGad04900 ) A user application can pass an unwritable buffer (or portion of it) to the sctl driver and it isn't verified before use. We have to check the write access to this buffer before trying to use it. Resolution: We validate the entire buffer prior to using it. ( SR: 8606155155 CR: JAGad24472 ) Memory was not available for error handling. Resolution: The system paniced because the kernel failed to preallocate memory for its sense data. Due to this failure, a piece of code, which sets the owner of the bus is skipped. The fix is to do a dma setup during the opening of the bus. The sense buffer is preallocated to ensure there is memory available for the autosense. ( SR: 8606155022 CR: JAGad24339 ) Once a QUEUE FULL condition has been hit by a device, the device is switch to untagged state. The device will then process only one I/O at a time and stop queuing the I/Os. The problem is that this condition is never reset even if the device is back to a state where it can process queued I/Os again. Resolution: The problem was that when we handle a QUEUE FULL condition we turn off tagged queuing for that LUN, and never turn it back on. A new algorithm was elaborated to throttle the incoming bp rate by dynamically changing the queue depth value. The fix actually completes that algorithm which only partially fixed the problem (in PHKL_21607). ( SR: 8606125977 CR: JAGac46733 ) Some c720 driver information can be lost while trying to recover from a non-responding device. Resolution: The c720_chip_hang() call may reset the value of lbp->owner to NULL in some cases. If the lbp->owner is used after the call it causes a panic. To avoid this panic, we cache this information before calling the function. ( SR: 8606138825 CR: JAGad08088 ) A pointer is trusted and dereferenced in these functions. We found that the pointer can be NULL for a variety of corner-case reasons in the operation of the driver, and thus checking for NULL should have been done and was not. Resolution: We now check the value of the pointer before dereferencing it. If NULL, we dump the contents of the SCSI I/O card registers to the syslog file and continue processing. ( SR: 8606155947 CR: JAGad25258 ) Twice in the code, we are missing wakeup(). Resolution: It two places in the code, the lp->in_use field was decremented but no corresponding wakeup() was called. This defect is fixed by waking up the sleeping thread at the two appropriate places. ( SR: 8606160406 CR: JAGad29728 ) The speed is defaulted improperly to Fast 20 on the LVD SCSI boards where it should be Ultra2. Resolution : When the PDC settings are uninitialized for LVD SCSI boards, the scsi speeds default for any board is Fast 20. LVD boards were not taken care in the code that decides the speed settings when the PDC settings are unitialized. The switch statement deciding on the min_period, has been changed to handle LVD boards (895 & 896). ( SR: 8606105472 CR: JAGab73559 ) The current architecture allows the queue depth to be set by ioctl(). The value is not retained across reboots. Resolution: The driver code was enhanced to provide this feature. The tag queue depth will now be controlled by a tunable value (scsi_max_qdepth) which will not be lost after reboots. ( SR: 8606160479 CR: JAGad29800 ) The data page fault occurred when a stale pointer was accessed while trying to resend the timed-out requests. Resolution: To solve this defect the timed-out I/O requests in internal scsi queues are secured with proper locking. PHKL_21989: ( SR: 8606142756 CR: JAGad12108 ) The built-in narrow single-ended SCSI bus on workstation models J5600 and C3600 is incorrectly setup as a wide bus. Resolution: The model numbers J5600 and C3600 were added to conditionals in the c720_init() and c720_pci_attach() routines. PHKL_21607: ( SR: 8606132292 CR: JAGad01441 ) Specifically, the problem is that when a QUEUE FULL condition occurs, tagged queuing is turned off for that LUN, and is never turned back on when the queue empties. For a complete solution a dynamic adjustment algorithm is needed. Resolution : After a QUEUE FULL, we'll wait for any outstanding I/Os to complete before turning tagged queuing back on, and then we'll gradually increase the queue depth back up to the previous max queue depth in such a way as to minimize the likelihood of another immediate QUEUE FULL condition. ( SR: 8606130227 CR: JAGac95098 ) The "incomplete" field is decremented in the debug path in SCSI services, but is not in the non-debug path. So non debug kernel has incorrect information in this field. Resolution: The field is correctly updated, the way it is done in the debug path. ( SR: 8606132288 CR: JAGad01437 ) This is a known limitation of ioscan and HSC bus. This limitation was by design to reduce the scan times for HSC devices. Resolution: A redesign of scsi_probe() has been completed which minimized the time needed to probe and display all the existing devices. ( SR: 8606132426 CR: JAGad01575 ) The scsi_dmesg_log_io() routine only gives the dev in hexadecimal format. Resolution: The use of the GIO services help to translate the dev into its hardware path ( SR: 8606106155 CR: JAGab75050 ) If we try to print from the isr (we may), get_printf_lock() can wait up to 10mS per call if it cannot aquire the lock. We easily do enough printing to exceed the allowed time to keep a spinlock. Resolution: Several levels of messages have been introduce to greatly reduce the number of printf() calls on a normal kernel and on a debug kernel. ( SR: 8606133146 CR: JAGad02293 ) An unlock at the end of c720_start() allowed c720_timer() to modify scripts memory. The assertion checked that the scripts were not modified, hence failed. Resolution: Removed unlock/lock pair at the end of c720_start(). ( SR: 8606133067 CR: JAGad02214 ) When the message is issued (typically caused by a bus RESET during contingent allegiance condition (CAC)), the corresponding I/O request is then lost and never returned to the requestor, eventually causing a system hang. Resolution: When a bus RESET happens during a CAC, the c720 driver now insures that all currently active I/O requests are posted as incomplete and scheduled to be retried. ( SR: 8606133280 CR: JAGad02425 ) The routine controller c720_timer() uses C720_LOCK() which uses assert to check the state of lisc->cbfns. Since c720_timer() is called asynchronously to the rest of the code, lisc->cbfns could be in any state, therefore should not be checked Resolution: In c720_timer(), changed C720_LOCK(lisc) to spinlock(lisc->id_lock) to avoid the assert. ( SR: 8606125811 CR: JAGac42754 ) The scsi_is_multilun() is hardcoded with various tape devices, the list for which does not include this particular device. Resolution: Take DDS4 autochangers into account in scsi_is_multilun(). PHKL_21504: ( SR: 8606125610 CR: JAGac41000 ) The in-use count for a SCSI LUN was not being incremented before the lock on the LUN was released and the I/O request completed. Since the in-use count was zero, a succeeding I/O request to close the LUN deallocates memory for the LUN lock. The driver then attempts to reacquire the LUN lock which has since been deallocated causing a data memory protection panic. Resolution : The SCSI LUN in-use count is being incremented before the LUN is unlocked and biodone() called and decremented after the call to biodone() and the LUN is again locked. ( SR: 8606130829 CR: JAGac97596 ) The calculation for a SCSI hardware memory pointer incorrectly computed the number of bytes for the size of SCSI SCRIPTS RAM. Resolution : The calculation was modified to use the sizeof() function to return the number of bytes allocated to SCSI SCRIPTS RAM. PHKL_20688: ( SR: 8606127757 CR: JAGac78558 ) The SCSI driver detects a hardware failure and resets the bus. However, the reset operation cannot resolve the bus hang and the reset interrupt never occurs. Without a bus reset timeout, the processes hang waiting for I/O's queued for the bus. LVM expects either an I/O error or an EPOWERF (timeout) to continue. Resolution: Added code to timeout on an unsuccessful bus reset and abort the I/O with a return of EPOWERF. PHKL_20629: ( SR: 8606112882 CR: JAGab93301 ) While resolving defect JAGab78589 (in SCSI pass-through driver modules spt and spt0), patch PHKL_20452 introduced a defect which resulted in memory initialized to the wrong address. Resolution: Corrected initialization of memory buffer addresses. ( SR: 8606110931 CR: JAGab83681 ) SCSI message logging function was calling priviledged copy routine with KERNELSPACE id and a stack buffer. Kernel stack was moved out of KERNELSPACE in 64 bit system, which caused Data Page Fault panic. Resolution: KERNELSPACE id was replaced with ldsid() function, which uses correct space id for the buffer. PHKL_20452: ( SR: 5003432120 CR: JAGaa22888 ) NOT READY devices are continually retried, extending beyond pftimeout period, and until device becomes READY. Resolution: Cause IO retries to NOT READY devices to go to a new routine (sd_retry_check) which limits the retries to the pftimeout period if B_PFTIMEOUT set. ( SR: 8606103129 CR: JAGaa44450 ) scsi_fast_read & scsi_fast_write used driver switch table in non-standard way causing unexpected side-effects on pass-through functionality. Resolution: Removed scsi_fast_read/write because inserting them into the driver switch table at open time causes pass-through command mode to stop working. ( SR: 8606103814 CR: JAGab19070 ) No check for lsp NULL pointer in c720_isrAbort() before dereference. Resolution: Check if lsp pointer is set in c720_isrAbort() before using. ( SR: 8606103810 CR: JAGab19072 ) In open path the SIOP driver loops until all interrupts are cleared, but there is no check to prevent looping forever when either bad data or -1 is returned. Resolution: Added check to recover if return of -1 is received. ( SR: 1653307298 CR: JAGab20815 ) Driver resets SCSI bus on Parity Error, instead of aborting and retrying. Resolution: Change in SIOP driver to Abort the I/O request on Parity Error. ( SR: 8606100396 CR: JAGab31749 ) Wrong test condition in SCSI driver routine to log SIOP register access. Resolution: Change test condition to only verify SIOP is not running, or that we are only accessing the ISTAT register. ( SR: 8606103820 CR: JAGab39677 ) Only enabled Master Parity checking for SIOPs 53C720 and 53C770, although this needs to be set for all SIOPs. Resolution: Removed check for specific SIOPs, and now enables Master Parity checking for all SIOPs (including PCI-attached). ( SR: 8606103148 CR: JAGab69517 ) Pointer not initialized to NULL in LUN open routine. Resolution: Initialized the pointer to NULL. ( SR: 8606103192 CR: JAGab69594 ) Infinite SCSI IO retry due to variable not getting set. Resolution: Set variable at end of strategy routine. ( SR: 8606105969 CR: JAGab74731 ) SIOP not started correctly after Bus Device Reset. Resolution: Call SIOP's startup routine following SCSI Bus Device Reset. ( SR: 8606106038 CR: JAGab74836 ) SCSI Ultra-II speeds not negotiated for PCI-attached Ultra-II adapters with Ultra-II devices. Resolution: Added SCSI Ultra-II speed negotiations for Ultra-II adapters and devices. ( SR: 8606113541 CR: JAGab76136 ) Improper handling of WDTR and SDTR in some cases for PCI-attached SIOPs. Resolution: Expanded script interrupt support & associated scripts support for handling SDTR and WDTR, and other conditions there weren't handled completely correct. ( SR: 8606108198 CR: JAGab78589 ) Newly allocated memory for buf structure not initialized. Resolution: Fully initialized buf structure after allocation. ( SR: 8606110476 CR: JAGab83179 ) EIM mask of ~0x1f did not mask lower 6bits of register for 64-bit architecture, where as 32-bit architecture always has 6th bit set to zero. Resolution: Changed EIM mask to ~0x3f to mask all 6 bits for 64-bit architecture. ( SR: 8606110477 CR: JAGab83180 ) Some machines return more data from the SIOPs for SCSI Width, Speed, and Mode than was allowed by original design of the SCSI driver. Resolution: Changed SCSI driver to provide buffer area for all the Width, Speed, and Mode data returned from all supported SIOPs. ( SR: 8606110479 CR: JAGab83182 ) Early revision 896 SCSI controllers send wrong data on the SCSI bus under certain circumstances. Resolution: During initialization of 896, disabled the DPR bit that permits this corner-case data corruption. ( SR: 8606110481 CR: JAGab83184 ) SCSI driver did not verify pointer was set before using it in a printf call. Resolution: Checked for NULL pointer, and used the value zero if pointer is not populated for printf call. ( SR: 8606110616 CR: JAGab83364 ) Pointer variable set incorrectly. Resolution: Correctly set the pointer variable. ( SR: 8606110653 CR: JAGab83401 ) Wrong value sent onto the SCSI bus for SCSI Abort Message for PCI-attached SIOPs. Resolution: Changed the value used for the SCSI Abort Message to be Endian-neutral in SIOP script initialization. ( SR: 8606110782 CR: JAGab83531 ) Populated SCSI LUN pointer before known to be needed. Resolution: Changed to only populate the SCSI LUN pointer when required. PHKL_20208: ( SR: 1653281824 DTS: JAGaa42584 ) When a device's write cache is enabled, the device may signal successful completion of a write command upon receiving (and caching) the data, but before the data has been written to the media. This is referred to as "immediate reporting". If an error (i.e. bad media) occurs during the actual execution of the write to media, the data can be lost. The device reports this error back to the driver as an "unrecovered deferred error". The driver panics. Resolution: The driver was modified to handle the "unrecovered deferred error" by blocking all IO requests for the disk when a deferred error occurs, until the device is closed and reopened. It no longer panics the system. PHKL_19776: ( SR: 8606103698 CR: JAGab70738 ) ( SR: 8606113358 CR: JAGab70313 ) When SCSI disk was inaccessible, code would keep retrying the failed IO continually. To reproduce the LVM VG long failover times, do: 1. create LV with a 4-way PV on a dual-ported AutoRAID, with two HW paths defined for the VG access (see vgdisplay) 2. with continual IO via the primary path, pull the SCSI cable off from the primary path. 3. if it takes inordinately longer than the time set via pvchange -t XX, to failover to alternate path, then you duplicated the problem. if it takes about (2*XX)+15secs then you've fixed the problem. Note that XX is the num of seconds for pftimeout, set by pvchange -t XX. Also note that the value of XX should be 30secs or longer. Resolution: Stopped retries on inaccessible devices, allow retries when device is once again accessible. ( SR: 1653310672 CR: JAGab31999 ) Cached SCSI INQUIRY data was returned instead of going directly to the disk for this information. To reproduce the cached INQUIRY data problem, do: 1. boot up with a disk that can be removed from service. 2. use ioscan to identify the disk. 3. perform continual IO on the disk w/read, write to /dev/null 4. power-off the disk. 5. perform diskinfo -v on the disk devfile: If it shows full disk information, you've duplicated. If it shows no such device or file, you've fixed it. Resolution: Perform SCSI INQUIRY directly to device, always. Note: this fix was required for the Retries fix to know when the disk was actually not accessible, and to give positive recognition of becoming accessible, for CR JAGab70738 and CR JAGab70313. PHKL_19245: ( SR: 8606103582 DTS: JAGaa09970 ) WSIO did not support kernel SCSI Pass thru ioctl(), and SIO did not support this for kernel or user space. Resolution: Enhanced the SCSI Pass thru ioctl(), SIOC_IO, on the WSIO side to support calls from Kernel space too. Added SCSI Pass thru ioctl() functionality on the SIO side; supports calls from both User and Kernel space. PHKL_19561: ( SR: 4701424978 DTS: JAGab13476 ) After installing the latest firmware in the HP-PB F/W SCSI Adapter, reading an odd-length record from a tape device that supports wide transfer mode causes an I/O error. The HP-PB FW SCSI Adapter expected a specific sequence of interactions by the driver when handling this event. Resolution: Change the sequence of interactions to that expected by the card. PHKL_17333: ( SR: 1653284257 DTS: JAGaa44107 ) NIO disks become unresponsive causing commands like "ioscan" & "dd" to hang. This problem can only be seen on s800 systems, and happens because when the "frozen" bit is set in the PDA for the disk, without any means of "unfreezing" it. Resolution: A target is put into the frozen state when an abort command is sent to it. When the command completes all I/O queues will be checked for additional abort commands. The completion reply for the last abort command will unfreeze the target. PHKL_14807: None; The previous patch was recut with enhanced optimization. No code was changed. PHKL_13371: ( SR: 4701376111 DTS: JAGaa09879) When a check condition occurs on a probe to the CASCADE device (C2430D), a panic occurs due to a code path that is taken which leads to an uninitialized pointer being dereferenced. Note that the CASCADE may not be the only disk device that could cause this problem to occur. The panic is a data page fault. PHKL_19287: ( SR: 8606101377 DTS: JAGab17408 ) LVM failover to alternate path fails on logical volumes configured on SCSI3 devices as the driver ingnores sense data from SCSI3 devices. Resolution: Modified scsi_sense_action() to process sense data from SCSI3 devices too. PHKL_20157: ( SR: 8606107849 DTS: JAGab78147 ) PCI ID for the 53C895A chip is not in the list of supported PCI IDs. Therefore c720 driver does not claim this chip. Resolution: Add 53C895A chip to the list of supported devices. ( SR: 8606107164 DTS: JAGab76873 ) c720 driver makes an erroneous assumption that unless an interface is running at Ultra2 speeds, it is not in LVD mode. N Class systems run the built-in LVD SCSI at Fast speed, not Ultra2. Resolution: Make driver look at the actual SCSI bus mode irrespective of bus speed and set the description accordingly. ( SR: 8606113567 DTS: JAGab76903 ) There was a logic problem in the checking for PCI HBA cards which caused the driver to think that it was on an Ultra card when it was actually on an Ultra-2 card. Resolution: Correct the logic error to make the driver detect Ultra2 card correctly. ( SR: 8606105969 DTS: JAGab74731 ) C720 driver does not restart the SCSI chip properly when an unexpected disconnect occurs. Resolution: Remove the return statement that causes the routine handling Unexpected disconnect return before restarting the chip. ( SR: 8606103151 DTS: JAGab69533 ) c720 driver misinterpreted the data returned by the firmware for the width of bus and erroneously configured the SCSI interface as narrow when the interface truely is wide. This caused communication failure with the drives and thedrives were not detected correctly. Resolution: Correct the interpretation of data returned by the firmware for the width of the bus. PHKL_17368: In the scsi pass through driver, we check to see if the device is scsi3 (so that we can set the "fast wide" flag). A "lun report page command" is sent to hot spares (and they are therefore visible to SAM) only if the "fast wide" flag is set. To reproduce the defect, configure hot spare(s) on a Disk Array (attached to a SCSI3 card) and run SAM. Resolution: A strcmp(drv_name,"scsi13") was being done; this should be strcmp(drv_name,"scsi3"). PHKL_14688: The scsi_pt driver issued a SCSI_CTRL_REQ_MSG to the lower driver from spt_open() and exited, cleaning up the active request, without waiting for command completion. Reproduction method:: Have a device, say a tape drive, on a bus along with other devices (disks). Bind the tape device to the scsi_pt driver. Run 'dd' on the other devices on the bus and repeatedly issue a inquiry command to the tape drive. System panics almost immediately. Enhancement: No (superseded patches contained enhancements) PHKL_28496: Enhancements were delivered in a patch this one has superseded. Please review the Defect Description text for more information. SR: 1653256065 1653281824 1653284257 1653307298 1653310672 4701376111 4701424978 5003432120 5003440982 8606100396 8606101377 8606103129 8606103148 8606103151 8606103192 8606103582 8606103698 8606103810 8606103814 8606103820 8606105472 8606105969 8606106038 8606106155 8606107164 8606107849 8606108198 8606110476 8606110477 8606110479 8606110481 8606110616 8606110653 8606110782 8606112261 8606112882 8606113358 8606113541 8606113567 8606125610 8606125811 8606125977 8606127757 8606130227 8606130829 8606132288 8606132292 8606132426 8606133057 8606133067 8606133146 8606133280 8606135046 8606135767 8606135832 8606137271 8606138825 8606139670 8606142756 8606147432 8606155022 8606155151 8606155155 8606155173 8606155189 8606155947 8606157951 8606158437 8606158623 8606158737 8606160406 8606160479 8606161696 8606165305 8606165403 8606166729 8606167125 8606167814 8606168360 8606169435 8606169631 8606173791 8606173887 8606176639 8606177456 8606178041 8606179935 8606185203 8606186960 8606189487 8606193416 8606199984 8606204859 8606207857 8606216118 8606223745 8606225743 8606226043 8606226361 8606230478 8606230706 8606232873 8606238711 8606241873 8606242143 8606244278 8606249862 8606264850 8606265990 8606266268 8606282310 8606286272 8606286789 8606289589 8606293572 8606295123 8606298657 8606299275 8606304019 8606304724 8606322906 8606335728 8606344298 8606349130 8606351535 Patch Files: OS-Core.ADMN-ENG-A-MAN,fr=B.11.00,fa=HP-UX_B.11.00_32/64, v=HP: /usr/share/man/man7.Z/scsi.7 OS-Core.CORE-KRN,fr=B.11.00,fa=HP-UX_B.11.00_32/64,v=HP: /usr/lbin/sw/pre_kernel/scsi.clean ProgSupport.C-INC,fr=B.11.00,fa=HP-UX_B.11.00_32/64,v=HP: /usr/include/sys/scsi_ctl.h OS-Core.CORE2-KRN,fr=B.11.00,fa=HP-UX_B.11.00_32,v=HP: /usr/conf/lib/libhp-ux.a(disc3.o) /usr/conf/lib/libhp-ux.a(disc30.o) /usr/conf/lib/libhp-ux.a(scsi3.o) /usr/conf/lib/libhp-ux.a(scsi_c720.o) /usr/conf/lib/libhp-ux.a(scsi_ctl.o) /usr/conf/lib/libhp-ux.a(scsi_disk.o) OS-Core.KERN2-RUN,fr=B.11.00,fa=HP-UX_B.11.00_32,v=HP: /usr/conf/master.d/scsi-disk /usr/conf/space.h.d/scsi-disk.h SCSI-Passthru.SPT2-DVR,fr=B.11.00,fa=HP-UX_B.11.00_32,v=HP: /usr/conf/lib/libspt.a(scsi_pt.o) /usr/conf/lib/libspt.a(scsi_pt0.o) OS-Core.CORE2-KRN,fr=B.11.00,fa=HP-UX_B.11.00_64,v=HP: /usr/conf/lib/libhp-ux.a(disc3.o) /usr/conf/lib/libhp-ux.a(disc30.o) /usr/conf/lib/libhp-ux.a(scsi3.o) /usr/conf/lib/libhp-ux.a(scsi_c720.o) /usr/conf/lib/libhp-ux.a(scsi_ctl.o) /usr/conf/lib/libhp-ux.a(scsi_disk.o) OS-Core.KERN2-RUN,fr=B.11.00,fa=HP-UX_B.11.00_64,v=HP: /usr/conf/master.d/scsi-disk /usr/conf/space.h.d/scsi-disk.h SCSI-Passthru.SPT2-DVR,fr=B.11.00,fa=HP-UX_B.11.00_64,v=HP: /usr/conf/lib/libspt.a(scsi_pt.o) /usr/conf/lib/libspt.a(scsi_pt0.o) what(1) Output: OS-Core.ADMN-ENG-A-MAN,fr=B.11.00,fa=HP-UX_B.11.00_32/64, v=HP: /usr/share/man/man7.Z/scsi.7: None OS-Core.CORE-KRN,fr=B.11.00,fa=HP-UX_B.11.00_32/64,v=HP: /usr/lbin/sw/pre_kernel/scsi.clean: scsi.clean $Date: 2001/04/24 14:55:09 $Revision: r11 ros/1 PATCH_11.00 (PHKL_24004) ProgSupport.C-INC,fr=B.11.00,fa=HP-UX_B.11.00_32/64,v=HP: /usr/include/sys/scsi_ctl.h: scsi_ctl.h $Date: 2003/02/18 20:46:27 $Revision: r11 ros/15 PATCH_11.00 (PHKL_28496) OS-Core.CORE2-KRN,fr=B.11.00,fa=HP-UX_B.11.00_32,v=HP: /usr/conf/lib/libhp-ux.a(disc3.o): disc3.c $Date: 2002/10/28 21:54:59 $Revision: r11ros /13 PATCH_11.00 (PHKL_28131) /usr/conf/lib/libhp-ux.a(disc30.o): disc30.c $Date: 2002/10/28 07:16:27 $Revision: r11ro s/8 PATCH_11.00 (PHKL_28131) /usr/conf/lib/libhp-ux.a(scsi3.o): scsi3.c $Date: 2002/02/25 14:16:43 $Revision: r11ros /10 PATCH_11.00 (PHKL_26452) /usr/conf/lib/libhp-ux.a(scsi_c720.o): scsi_c720.c $Date: 2004/03/31 00:19:32 $Revision: r1 1ros/32 PATCH_11.00 (PHKL_30508) /usr/conf/lib/libhp-ux.a(scsi_ctl.o): scsi_ctl.c $Date: 2004/03/31 00:19:32 $Revision: r11 ros/39 PATCH_11.00 (PHKL_30508) /usr/conf/lib/libhp-ux.a(scsi_disk.o): scsi_disk.c $Date: 2004/03/31 03:21:03 $Revision: r1 1ros/24 PATCH_11.00 (PHKL_30508) OS-Core.KERN2-RUN,fr=B.11.00,fa=HP-UX_B.11.00_32,v=HP: /usr/conf/master.d/scsi-disk: scsi-disk $Date: 2000/11/15 11:33:46 $Revision: r11r os/2 PATCH_11.00 (PHKL_22759) /usr/conf/space.h.d/scsi-disk.h: scsi-disk.h $Date: 2000/11/15 11:33:46 $Revision: r1 1ros/2 PATCH_11.00 (PHKL_22759) SCSI-Passthru.SPT2-DVR,fr=B.11.00,fa=HP-UX_B.11.00_32,v=HP: /usr/conf/lib/libspt.a(scsi_pt.o): scsi_pt.h $Date: 1998/04/04 14:46:29 $Revision: r11r os/1 PATCH_11.00 (PHKL_14688) scsi_pt.c $Date: 2002/10/28 18:52:38 $Revision: r11r os/7 PATCH_11.00 (PHKL_28131) /usr/conf/lib/libspt.a(scsi_pt0.o): scsi_pt0.c $Date: 2002/10/28 18:52:38 $Revision: r11 ros/7 PATCH_11.00 (PHKL_28131) OS-Core.CORE2-KRN,fr=B.11.00,fa=HP-UX_B.11.00_64,v=HP: /usr/conf/lib/libhp-ux.a(disc3.o): disc3.c $Date: 2002/10/28 21:54:59 $Revision: r11ros /13 PATCH_11.00 (PHKL_28131) /usr/conf/lib/libhp-ux.a(disc30.o): disc30.c $Date: 2002/10/28 07:16:27 $Revision: r11ro s/8 PATCH_11.00 (PHKL_28131) /usr/conf/lib/libhp-ux.a(scsi3.o): scsi3.c $Date: 2002/02/25 14:16:43 $Revision: r11ros /10 PATCH_11.00 (PHKL_26452) /usr/conf/lib/libhp-ux.a(scsi_c720.o): scsi_c720.c $Date: 2004/03/31 00:19:32 $Revision: r1 1ros/32 PATCH_11.00 (PHKL_30508) /usr/conf/lib/libhp-ux.a(scsi_ctl.o): scsi_ctl.c $Date: 2004/03/31 00:19:32 $Revision: r11 ros/39 PATCH_11.00 (PHKL_30508) /usr/conf/lib/libhp-ux.a(scsi_disk.o): scsi_disk.c $Date: 2004/03/31 03:21:03 $Revision: r1 1ros/24 PATCH_11.00 (PHKL_30508) OS-Core.KERN2-RUN,fr=B.11.00,fa=HP-UX_B.11.00_64,v=HP: /usr/conf/master.d/scsi-disk: scsi-disk $Date: 2000/11/15 11:33:46 $Revision: r11r os/2 PATCH_11.00 (PHKL_22759) /usr/conf/space.h.d/scsi-disk.h: scsi-disk.h $Date: 2000/11/15 11:33:46 $Revision: r1 1ros/2 PATCH_11.00 (PHKL_22759) SCSI-Passthru.SPT2-DVR,fr=B.11.00,fa=HP-UX_B.11.00_64,v=HP: /usr/conf/lib/libspt.a(scsi_pt.o): scsi_pt.h $Date: 1998/04/04 14:46:29 $Revision: r11r os/1 PATCH_11.00 (PHKL_14688) scsi_pt.c $Date: 2002/10/28 18:52:38 $Revision: r11r os/7 PATCH_11.00 (PHKL_28131) /usr/conf/lib/libspt.a(scsi_pt0.o): scsi_pt0.c $Date: 2002/10/28 18:52:38 $Revision: r11 ros/7 PATCH_11.00 (PHKL_28131) cksum(1) Output: OS-Core.ADMN-ENG-A-MAN,fr=B.11.00,fa=HP-UX_B.11.00_32/64, v=HP: 2477585752 5697 /usr/share/man/man7.Z/scsi.7 OS-Core.CORE-KRN,fr=B.11.00,fa=HP-UX_B.11.00_32/64,v=HP: 738809134 992 /usr/lbin/sw/pre_kernel/scsi.clean ProgSupport.C-INC,fr=B.11.00,fa=HP-UX_B.11.00_32/64,v=HP: 4170464417 59614 /usr/include/sys/scsi_ctl.h OS-Core.CORE2-KRN,fr=B.11.00,fa=HP-UX_B.11.00_32,v=HP: 3199384661 30228 /usr/conf/lib/libhp-ux.a(disc3.o) 3531510560 45704 /usr/conf/lib/libhp-ux.a(disc30.o) 453184929 87068 /usr/conf/lib/libhp-ux.a(scsi3.o) 1180232596 132296 /usr/conf/lib/libhp-ux.a(scsi_c720.o) 1107003528 95816 /usr/conf/lib/libhp-ux.a(scsi_ctl.o) 2872667021 26384 /usr/conf/lib/libhp-ux.a(scsi_disk.o) OS-Core.KERN2-RUN,fr=B.11.00,fa=HP-UX_B.11.00_32,v=HP: 3614771302 491 /usr/conf/master.d/scsi-disk 2284398589 1176 /usr/conf/space.h.d/scsi-disk.h SCSI-Passthru.SPT2-DVR,fr=B.11.00,fa=HP-UX_B.11.00_32,v=HP: 3174597030 20232 /usr/conf/lib/libspt.a(scsi_pt.o) 1436035651 22784 /usr/conf/lib/libspt.a(scsi_pt0.o) OS-Core.CORE2-KRN,fr=B.11.00,fa=HP-UX_B.11.00_64,v=HP: 482509784 70400 /usr/conf/lib/libhp-ux.a(disc3.o) 1450580338 103632 /usr/conf/lib/libhp-ux.a(disc30.o) 421221428 154752 /usr/conf/lib/libhp-ux.a(scsi3.o) 2348076294 275064 /usr/conf/lib/libhp-ux.a(scsi_c720.o) 1187271752 228856 /usr/conf/lib/libhp-ux.a(scsi_ctl.o) 1177371051 56664 /usr/conf/lib/libhp-ux.a(scsi_disk.o) OS-Core.KERN2-RUN,fr=B.11.00,fa=HP-UX_B.11.00_64,v=HP: 3614771302 491 /usr/conf/master.d/scsi-disk 2284398589 1176 /usr/conf/space.h.d/scsi-disk.h SCSI-Passthru.SPT2-DVR,fr=B.11.00,fa=HP-UX_B.11.00_64,v=HP: 2817562799 50736 /usr/conf/lib/libspt.a(scsi_pt.o) 1387208667 55064 /usr/conf/lib/libspt.a(scsi_pt0.o) Patch Conflicts: None Patch Dependencies: s700: 11.00: PHKL_18543 PHKL_24187 PHKL_30509 s800: 11.00: PHKL_18543 PHKL_24187 PHKL_30509 Hardware Dependencies: None Other Dependencies: None Supersedes: PHKL_17368 PHKL_14688 PHKL_20157 PHKL_19776 PHKL_19561 PHKL_19287 PHKL_19245 PHKL_17333 PHKL_14807 PHKL_13371 PHKL_23281 PHKL_22941 PHKL_22759 PHKL_22460 PHKL_21989 PHKL_21607 PHKL_21504 PHKL_20688 PHKL_20629 PHKL_20452 PHKL_20208 PHKL_29834 PHKL_29364 PHKL_29049 PHKL_29041 PHKL_28496 PHKL_28131 PHKL_27003 PHKL_26452 PHKL_25938 PHKL_25675 PHKL_24004 Equivalent Patches: PHKL_30510: s700: 11.11 s800: 11.11 Patch Package Size: 770 KBytes Installation Instructions: Please review all instructions and the Hewlett-Packard SupportLine User Guide or your Hewlett-Packard support terms and conditions for precautions, scope of license, restrictions, and, limitation of liability and warranties, before installing this patch. ------------------------------------------------------------ 1. Back up your system before installing a patch. 2. Login as root. 3. Copy the patch to the /tmp directory. 4. Move to the /tmp directory and unshar the patch: cd /tmp sh PHKL_30508 5. Run swinstall to install the patch: swinstall -x autoreboot=true -x patch_match_target=true \ -s /tmp/PHKL_30508.depot By default swinstall will archive the original software in /var/adm/sw/save/PHKL_30508. If you do not wish to retain a copy of the original software, include the patch_save_files option in the swinstall command above: -x patch_save_files=false WARNING: If patch_save_files is false when a patch is installed, the patch cannot be deinstalled. Please be careful when using this feature. For future reference, the contents of the PHKL_30508.text file is available in the product readme: swlist -l product -a readme -d @ /tmp/PHKL_30508.depot To put this patch on a magnetic tape and install from the tape drive, use the command: dd if=/tmp/PHKL_30508.depot of=/dev/rmt/0m bs=2k Special Installation Instructions: For patch PHKL_29834 and later: JAGae62769: SCSI Driver has been modified to log the recovered error events. To enable logging, set the 0x40 flag in scsi_log_mask. This may be done as below. adb -w /stand/vmunix /dev/kmem scsi_log_mask/X <===== Get the value of current log mask scsi_log_mask: scsi_log_mask: 1F238B10 Add 0x40 to current mask scsi_log_mask/W 0x1F238B50 scsi_log_mask: 1F238B10 = 1F238B50 scsi_log_mask?W 0x1F238B50 scsi_log_mask: 1F238B10 = 1F238B50 to exit adb JAGae85372: For model C3750 workstations that have external single-ended wide SCSI disks connected to the narrow 50 pin SCSI connector, it is imperative that these disks be powered down and then powered up after this patch is installed. Else, if these devices are not power-cycled, they will not be accessible and any I/O requests to them may hang.