In order to analyse log file and resolve issues we need to view the log files, such as vmkernel.log and vmsummary.log using a program such as tail to end of the log file. The log file will show related related messages, in this example we will examine the vmkernel.log file which will display messages related to storage and contains SCSI sense codes. The SCSI sense codes are an industry standard maintained by Technical Committee T10 to which the ESXi host system conforms to this standard. For information on SCSI sense codes this information can be found at http://www.t10.org/lists/1spc-lst.htm.
The SCSI sense codes are sent during the status phase, which occurs prior to the Command Complete Message and indicates a success or failure. For any time a SCSI command is sent to a target, the initiator expects a completion status. The various status codes are displayed below:
Status Code | Description |
00h | Good |
02h | Check Condition |
04h | Condition Met |
08h | Busy |
18h | Reservation Conflict |
28h | Task Set Full |
30h | ACA Active |
40h | Task Aborted |
SCSI Sense Key | Description |
0h | No Sense |
1h | Receovered Error |
2h | Not Ready |
3h | Medium Error |
4h | Hardware Error |
5h | Illegal Request |
6h | Unit Attention |
7h | Data Protect |
8h | Blank Check |
9h | Vendor Specific |
Ah | Copy Aborted |
Bh | Aborted Command |
Dh | Volume Overflow |
Eh | Miscomplete |
Fh | Completed |
So, if I take a log entry in my vmkernel.log file, we can analyse the error message being reported and isolate the SCSI event.
2015-02-14T08:29:47.778Z cpu1:32784)ScsiDeviceIO: 2337: Cmd(0x412e80896940) 0x1a, CmdSN 0x3884 from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0
From the Status Code received we can determine that the device is reporting a Check Condition.
- Host Status – H:0x0 = Good
- Device Status – D:0x2 = Check Condition
- Plugin Status – P:0x0 = Good
From the SCSI sense key (0x5) we can determine that the error message is being reported due to an Illegal Request. Finally, we need to determine from the cause of the error by analysing the additional sense code (0x20) and ASC qualifier (0x0). The list of additional SCSI sense data is far too detailed to document in this blog, so the following can be used as a reference http://www.t10.org/lists/asc-alph.txt.
From the additional sense code and ASC qualifier information, we can now determine that the error message reported by the device is due to ‘INVALID COMMAND OPERATION CODE’, which may be due to the device not supporting VPD pages.