[tech] Motsugo Downtime (Don't worry, it already happened)

Matt Johnston matt at ucc.asn.au
Fri Aug 23 22:34:19 WST 2013


On Fri, Aug 23, 2013 at 09:55:32PM +0800, Andrew Adamson wrote:
> I had a quick look at the motsugo IPMI event logs and there's nothing in 
> there about any ECC errors or SMART errors. It did log that the case cover 
> was taken off at 16:26 though, so the log is definitely working.
> 
> A software bug perhaps?

In the remote syslog it looks like something was unhappy with the root SSD disk
sda. Could be cabling, there's nothing interesting showing up from smartctl on
sda.

Matt


Aug 23 13:44:37 motsugo kernel: [3023702.430597] tad[12002]: segfault at 0 ip 00007f9d6a46b86f sp 00007fffef94b4d8 error 4 in libc-2.13.so[7f9d6a352000+180000
]
Aug 23 14:10:52 motsugo kernel: [3025274.555130] ata1.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen
Aug 23 14:10:52 motsugo kernel: [3025274.555149] ata1.00: failed command: WRITE FPDMA QUEUED
Aug 23 14:10:52 motsugo kernel: [3025274.555165] ata1.00: cmd 61/10:00:a0:62:89/00:00:04:00:00/40 tag 0 ncq 8192 out
Aug 23 14:10:52 motsugo kernel: [3025274.555166]          res 40/00:01:00:00:00/00:00:00:00:00/e0 Emask 0x4 (timeout)
Aug 23 14:10:52 motsugo kernel: [3025274.555197] ata1.00: status: { DRDY }
Aug 23 14:10:52 motsugo kernel: [3025274.555207] ata1.00: failed command: WRITE FPDMA QUEUED
Aug 23 14:10:52 motsugo kernel: [3025274.555222] ata1.00: cmd 61/08:08:c0:22:48/00:00:04:00:00/40 tag 1 ncq 4096 out
Aug 23 14:10:52 motsugo kernel: [3025274.555223]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 23 14:10:52 motsugo kernel: [3025274.555254] ata1.00: status: { DRDY }
Aug 23 14:10:52 motsugo kernel: [3025274.555267] ata1: hard resetting link
Aug 23 14:10:57 motsugo kernel: [3025279.906332] ata1: link is slow to respond, please be patient (ready=0)
Aug 23 14:11:02 motsugo kernel: [3025284.550712] ata1: COMRESET failed (errno=-16)
Aug 23 14:11:02 motsugo kernel: [3025284.550730] ata1: hard resetting link
Aug 23 14:11:07 motsugo kernel: [3025289.901888] ata1: link is slow to respond, please be patient (ready=0)
Aug 23 14:11:12 motsugo kernel: [3025294.546269] ata1: COMRESET failed (errno=-16)
Aug 23 14:11:12 motsugo kernel: [3025294.546285] ata1: hard resetting link
Aug 23 14:11:17 motsugo kernel: [3025299.897310] ata1: link is slow to respond, please be patient (ready=0)
Aug 23 14:11:47 motsugo kernel: [3025329.544549] ata1: COMRESET failed (errno=-16)
Aug 23 14:11:47 motsugo kernel: [3025329.544570] ata1: limiting SATA link speed to 1.5 Gbps
Aug 23 14:11:47 motsugo kernel: [3025329.544574] ata1: hard resetting link
Aug 23 14:11:52 motsugo kernel: [3025334.568247] ata1: COMRESET failed (errno=-16)
Aug 23 14:11:52 motsugo kernel: [3025334.568268] ata1: reset failed, giving up
Aug 23 14:11:52 motsugo kernel: [3025334.568281] ata1.00: disabled
Aug 23 14:11:52 motsugo kernel: [3025334.568288] ata1.00: device reported invalid CHS sector 0
Aug 23 14:11:52 motsugo kernel: [3025334.568292] ata1.00: device reported invalid CHS sector 0
Aug 23 14:11:52 motsugo kernel: [3025334.568306] ata1: EH complete
Aug 23 14:11:52 motsugo kernel: [3025334.568330] sd 0:0:0:0: [sda] Unhandled error code
Aug 23 14:11:52 motsugo kernel: [3025334.568333] sd 0:0:0:0: [sda]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Aug 23 14:11:52 motsugo kernel: [3025334.568338] sd 0:0:0:0: [sda] Unhandled error code
Aug 23 14:11:52 motsugo kernel: [3025334.568345] sd 0:0:0:0: [sda]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Aug 23 14:11:52 motsugo kernel: [3025334.568354] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 04 89 62 a0 00 00 10 00
Aug 23 14:11:52 motsugo kernel: [3025334.568375] end_request: I/O error, dev sda, sector 76112544
Aug 23 14:11:52 motsugo kernel: [3025334.568479] Aborting journal on device dm-2-8.
Aug 23 14:11:52 motsugo kernel: [3025334.568532] sd 0:0:0:0: [sda] Unhandled error code
Aug 23 14:11:52 motsugo kernel: [3025334.568544] sd 0:0:0:0: [sda] CDB: 
Aug 23 14:11:52 motsugo kernel: [3025334.568555] EXT4-fs error (device dm-2) in ext4_reserve_inode_write:4499: Journal has aborted
Aug 23 14:11:52 motsugo kernel: [3025334.568570] Write(10): 2a 00 04 48 22 c0 00 00 08 00

Lots of CDB errors for many more screens.


More information about the tech mailing list