kernel: [955316.300735] sd 0:0:1:0: [sda] abort

1,775 views
Skip to first unread message

Matthew Lenz

unread,
Dec 6, 2017, 10:57:55 AM12/6/17
to gce-discussion
Has anyone else experienced this in the past with a persistent disk on GCE?

Any idea why one of my production persistent ssd's decided it wanted to die and cost me a day's work restoring it?

I've never seen something like this happen unless it was a hardware failure (in 20 years of working with linux).  I was under the impression that persistent volumes were mapped to RAIDed SANs.


Taher (Cloud Platform Support)

unread,
Dec 6, 2017, 4:24:41 PM12/6/17
to gce-discussion

Hello Matthew,


To investigate the issue further, I would require your project number and instance ID. Also please provide the date/time of the issue with associated logs.


To protect your private information, you can reply through private email, by using the drop-down menu of the "reply" command at the top right of the edit window.

Taher (Cloud Platform Support)

unread,
Dec 7, 2017, 3:50:18 PM12/7/17
to gce-discussion

Hi Matthew,


Thank you for providing the data. It seems that there was an issue at Google end which caused it. The root cause of the issue has been mitigated, so there should be no further impact. Long term, we are focusing on automation to prevent this kind of issue from recurring in the future.


On Wednesday, December 6, 2017 at 10:57:55 AM UTC-5, Matthew Lenz wrote:

Matthew Lenz

unread,
Dec 7, 2017, 9:50:07 PM12/7/17
to Taher (Cloud Platform Support), gce-discussion
So the reason the drive failed? or the failed snapshot?  or both?

--
© 2017 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
 
Email preferences: You received this email because you signed up for the Google Compute Engine Discussion Google Group (gce-discussion@googlegroups.com) to participate in discussions with other members of the Google Compute Engine community and the Google Compute Engine Team.
---
You received this message because you are subscribed to a topic in the Google Groups "gce-discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gce-discussion/xH9nzYGtAaM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gce-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to gce-discussion@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gce-discussion/f37ecd83-1776-4432-9dfe-c52ec6f00b86%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Taher (Cloud Platform Support)

unread,
Dec 11, 2017, 10:51:55 AM12/11/17
to gce-discussion

Hello Matthew,


The issue would have manifested in two ways:


1. Slow snapshot operations (which may then timeout and fail). Because snapshots read from the same place as the primary data path, an unavailable disk can sometimes also be not snapshot-able.


2. Slow (or hanging) reads, which can then cause a timeout, which causes the given SCSI error in dmesg. Most filesystems are not written for a network file system, and so they can sometimes cause data corruption when encountering high latency like this.


Hope this answers your query.


On Thursday, December 7, 2017 at 9:50:07 PM UTC-5, Matthew Lenz wrote:
So the reason the drive failed? or the failed snapshot?  or both?

Brent Chang

unread,
Jul 23, 2018, 9:00:45 AM7/23/18
to gce-discussion
I have the same issue on GCE,

which is running public image CentOS 7.3.1611 / 3.10.0-514.16.1.el7.x86_64.

from /var/log/messages:
Jul 22 17:11:21 kernel: sd 0:0:1:0: [sda] abort
Jul 22 17:11:21 kernel: sd 0:0:1:0: [sda] abort
Jul 22 17:11:21 kernel: sd 0:0:1:0: [sda] abort
Jul 22 17:11:21 kernel: sd 0:0:1:0: [sda] abort
Jul 22 17:11:21 kernel: sd 0:0:1:0: [sda] abort
Jul 22 17:11:21 kernel: sd 0:0:1:0: [sda] abort
Jul 22 17:11:21 kernel: sd 0:0:1:0: [sda] abort
Jul 22 17:11:21 kernel: sd 0:0:1:0: device reset

and then my web server refused every connection from port 80.

please help to figure out what happened and how to prevent it.

Regards,
Brent

Matthew Lenz於 2017年12月6日星期三 UTC+8下午11時57分55秒寫道:

Fady (Google Cloud Platform)

unread,
Jul 23, 2018, 7:33:55 PM7/23/18
to gce-dis...@googlegroups.com

Hello Brent,


I have not found any similar reports lately about this particular error. Thus, and since this platform is meant for general discussions, I  suggest opening a private issue tracker report to investigate it. Opening the report, please include the project ID and the instance name/ID, and we will be happy to assist.


Brent Chang

unread,
Jul 24, 2018, 1:48:33 AM7/24/18
to gce-discussion
Hello Fady,

Got it, but I don't have a account in the link you gave, may I ask where can I sign up?

Regards,
Brent

Fady (Google Cloud Platform)於 2018年7月24日星期二 UTC+8上午7時33分55秒寫道:

Fady (Google Cloud Platform)

unread,
Jul 24, 2018, 4:56:14 PM7/24/18
to gce-discussion

Hello Brent,


Sorry about that. The link provided was actually wrong. Here is the correct one (I have already edited the thread above with the correct link). For further instructions about opening issue tracker reports you may also check this guide.


Reply all
Reply to author
Forward
0 new messages