DSR failures mapping to file paths

813 views
Skip to first unread message

Steve Maggioncalda

unread,
Mar 12, 2015, 9:56:17 AM3/12/15
to isilon-u...@googlegroups.com
Hey all.
I recall a method of determining the file path of the file affected by
the block error but cannot remember the syntax.

Does anyone have this handy?

Here's an example of the error:
isi_spy_d: Node MyNode6 (id8) : DSR repair attempt unsuccessful for
inode(s) {} and/or block(s) { 7,8,745325469696:8192 } - EIO.

Thanks a ton
Steve

Jamie Ivanov

unread,
Mar 12, 2015, 10:28:28 AM3/12/15
to isilon-u...@googlegroups.com


Steve,

The 8k size indicates that it would be a block and not an inode so you may try accessing the information in raw block in DDA mode on the node where the DSR is being reported from:

# isi_cpr -2r b7,8,745325469696:8192

or for reverse block mapping in DDA mode:

# isi_cpr -2r r7,8,745325469696:8192

or

# isi_cpr -2r R7,8,745325469696:8192

 (note: the prefixed "r", "R", or "b" typos but you may try it with or without a trailing slash)

"-2" Don't require connecting to slave process on every node.
"-r" Read information in DDA mode.

DISCLAIMER: This is typically not a command that is recommended to run due to the fact that it can cause data loss/corruption if not used properly. The above commands are only trying to acquire diagnostic information about that particular block reference but too much tinkering will lead to Isilon engineering sighing heavily at your expense. DSR errors should be handled by the Filesystems team in Isilon support.

Jamie Ivanov
Mobile: 608.399.4252
http://www.linkedin.com/in/jamieivanov
-- -- -- -- -- -- -- -- -- -- -- --
This transmission (including any attachments) may contain confidential information, privileged material (including material protected by the solicitor-client or other applicable privileges), or constitute non-public information. Any use of this information by anyone other than the intended recipient is prohibited. If you have received this transmission in error, please immediately reply to the sender and delete this information from your system. Use, dissemination, distribution, or reproduction of this transmission by unintended recipients is not authorized and may be unlawful.

Steve

--
You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-gr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jason Davis

unread,
Mar 12, 2015, 11:02:47 AM3/12/15
to isilon-u...@googlegroups.com
Nice disclaimer :)

Bernie Case

unread,
Mar 12, 2015, 1:44:23 PM3/12/15
to isilon-u...@googlegroups.com
I would see if the cluster logged something in /var/log/messages on any of the nodes from the same time the error was logged that includes the LIN, along with the block address, as the LIN may still be resolvable via isi get -L <LIN>.  That may give you the path, where you can figure out if the file is over protection.  EIO would normally indicate that the block in question couldn't be read, perhaps due to a down device, or a sector of a drive that can't be read (perhaps due to an ECC).

Since this was logged by isi_spy_d, I'd venture a guess that this version of OneFS is no longer supported.  That being said, the reverse block map has not existed for several years (having been replaced by something else called block history), so I'm not certain that cpr command would work correctly.  Besides, as was mentioned, running isi_cpr should only be done under the direction of Isilon support or engineering.

Steve Maggioncalda

unread,
Mar 12, 2015, 2:29:20 PM3/12/15
to isilon-u...@googlegroups.com
Thanks so much folks.
So the back story:
OneFS v5.0.6.4 on 9000i - Yes EOL.

Two drives failed on one node and the smartfail mechanism seemed to go
south. During this, the DSRs started flying. We powered down the
mis-smartfailing node, moved data off the cluster, and we're now
smartfailing the node.

I don't necessarily need the files associated with these blocks (a
loss we're willing to take)
Also, I don't necessarily need the node back online (unless I get
lucky somewhere)
I just need the smartfail to complete so I can get back to n+1. The
node smartfail is quitting after 44 - 45 errors. We have a total of 55
unique DSR failures. Since the time the sick node was taken offline,
we have not encountered any "new" DSR failures.
I am trying to determine the files associated with these blocks so I
can delete these files from the filesystem. Of course hoping that that
will render the DSR repair unnecessary and silence the errors

Thanks again!
Steve

Bernie Case

unread,
Mar 12, 2015, 5:57:30 PM3/12/15
to isilon-u...@googlegroups.com
Let me preface this that I think you might want to look at opening a time & materials support case with EMC so that they can fully attempt to get your cluster back into protection.

Two drives down in a cluster that's +1 is the same as having the node down.  The system would have to try to rebuild from parity for any stripes of data that were on those 2 drives, but other stripes might be completely intact and might be able to be reprotected to the remaining nodes.  Do you think there's any hope of getting the node back online?  That might help quite a lot for FlexProtect.  Otherwise, I'll see if I can explain a bit more what might be needed.

And, 5.0.6.4 on 9000i.  Pretty old release, but some of the tools that support uses now are still there; the logging is a bit different, but likely similar enough that you might be able to work through the problem.

Are your files snapshotted?  If so, then simply running rm on the file won't remove the file... all references to its blocks just move into the snapshot when you run rm.  Removing the file in that case will require support involvement, hence the T&M support case.

With the DSR failure, there *should* have been something logged in /var/log/messages that indicates the LIN that was accessed when the filesystem was attempting to perform DSR.  If you have that then you can convert the LIN to a path with isi get -L <LIN>.  You'll want to grep for DSR failure in /var/log/messages on all nodes to find all the log messages, and then you can pull the list of LINs from that.  You should also look in /var/log/restripe.log as the restriper would've also been logging EIOs when FlexProtect was running.  isi restripe -wD might also have more about the FlexProtect failure.

Once you have the list of LINs, you convert them to paths with isi get -L.  Assuming they're not snapshotted, then you attempt to rm the LINs, which will remove them from the filesystem and prevent FlexProtect from attempting to reprotect them the next time it runs.

There's another possibility, that the blocks that are trying to be accessed are in damaged HDD sectors (what Isilon calls ECCs).  Support might be able to recover those blocks via another procedure, but even that is a tricky procedure and would require a support case to even get it looked at.  That's why I suggested having that node back online, even for the smartfail process, as FlexProtect will attempt to read from a device in smartfail even if it's in a read-only state, assuming it can't get the data/parity from some other device that's already online (and not smartfailed).

So, in summary:
  • With the node in smartfail, but online, FlexProtect might be able to reprotect more data than if the node's powered down
  • ECCs on other drives in the cluster might prevent some stripes from being reprotected
  • rm will not delete data if the data's snapshotted; it'll just move that data into the snapshot
  • If you can't remove the data cleanly with rm, or there are ECCs on drives, you'll need to try to get support involved.  I'm not sure what the quote would be for T&M in this case.
I think that covers the majority of issues here...

Steve Maggioncalda

unread,
Mar 12, 2015, 8:34:22 PM3/12/15
to isilon-u...@googlegroups.com
Thank for the solid write-up, Bernie.
I really appreciate the time and energy.

I tried many times to get a T&M opened w/ EMC but they refused each
time. Too old. You need to upgrade.
Well, yeah I do. But right now I'm stuck! Anyway...

No snapshots so hoping once I find the LINs, I'll be moving in the
right direction.

The powered down node has two faulty drives --neither fully
smartfailed. Since the cluster has this node in a smartfail"ing"
state, maybe I should try powering it up, as you mentioned. Just
worried that the inconsistency/instability of that node may corrupt
more files. Also, it's been off a few days and the cluster has changed
considerably. I imagine that shouldn't have an affect but I can't help
thinking about it.

Clearing more data off tonight in hopes I can get off of some bad
blocks. Also, as we all know, it's much easier to work on a lightly
loaded cluster!

Cheers all
Steve

Saker Klippsten

unread,
Mar 12, 2015, 8:42:10 PM3/12/15
to isilon-u...@googlegroups.com
Hey Steve!

Powering up that node should not be an issue. As long as it has not been removed from the cluster yet. 
It will be in a read-only state and still part of the cluster. No Data will be written to it, just ready from it. 

We have been down this road in the past  and we had a node off for 2 weeks and  support had us power it back on to help when we had a two drive failure when we were only n +1 long story for sure..
I have a load of 12000i's if you need any to add to the cluster to increase space...they are in Vancouver but we could Fedex down some to you.




-S



Eugene Lipsky

unread,
Mar 12, 2015, 10:31:07 PM3/12/15
to isilon-u...@googlegroups.com
Hey Bernie,

Sorry to hijack the thread but just wanted to ask if your statement below means that there is a way to get around deleting data when there are snapshots involved even if it means getting support involved?

"If so, then simply running rm on the file won't remove the file... all references to its blocks just move into the snapshot when you run rm.  Removing the file in that case will require support involvement, hence the T&M support case"

Thanks,
Eugene

--

Jamie Ivanov

unread,
Mar 13, 2015, 10:06:49 AM3/13/15
to isilon-u...@googlegroups.com
Eugene,

Yes, it is possible to remove the data completely which removes it from all snapshots. You would need to contact Isilon support and speak with one of the Filesystems support engineers.

Jamie Ivanov
Mobile: 608.399.4252
http://www.linkedin.com/in/jamieivanov
-- -- -- -- -- -- -- -- -- -- -- --
This transmission (including any attachments) may contain confidential information, privileged material (including material protected by the solicitor-client or other applicable privileges), or constitute non-public information. Any use of this information by anyone other than the intended recipient is prohibited. If you have received this transmission in error, please immediately reply to the sender and delete this information from your system. Use, dissemination, distribution, or reproduction of this transmission by unintended recipients is not authorized and may be unlawful.

Eugene Lipsky

unread,
Mar 13, 2015, 10:12:12 AM3/13/15
to isilon-u...@googlegroups.com
Thanks Jaime. Good to know. I was told in the past by support that it was impossible :(

Jamie Ivanov

unread,
Mar 13, 2015, 10:15:20 AM3/13/15
to isilon-u...@googlegroups.com
Eugene,

Various OneFS releases have various capabilities and this type of operation is not a common one. It would require manipulating metadata to remove all references to the blocks used by the file(s) in question which would then mark them as re-usable blocks. This needs to be done carefully otherwise it could be a resume-generating-event for someone.

Jamie Ivanov
Mobile: 608.399.4252
http://www.linkedin.com/in/jamieivanov
-- -- -- -- -- -- -- -- -- -- -- --
This transmission (including any attachments) may contain confidential information, privileged material (including material protected by the solicitor-client or other applicable privileges), or constitute non-public information. Any use of this information by anyone other than the intended recipient is prohibited. If you have received this transmission in error, please immediately reply to the sender and delete this information from your system. Use, dissemination, distribution, or reproduction of this transmission by unintended recipients is not authorized and may be unlawful.

Peter Serocka

unread,
Mar 13, 2015, 10:16:41 AM3/13/15
to isilon-u...@googlegroups.com
Deleting might well be possible, but getting through to the right engineers is a bit difficult,...

:-|


— Peter

Jamie Ivanov

unread,
Mar 13, 2015, 10:19:17 AM3/13/15
to isilon-u...@googlegroups.com
Peter,

I completely understand. I used to work in the Plymouth, MN office so I can personally vouch for the level if skill within the Filesystems team there. Those engineers have some excellent resources at that site but they also have good resources to reach out to in order to help them along.

I hope everyone doesn't get the bright idea to request all of their cases be moved to Plymouth haha ;-)

Jamie Ivanov
Mobile: 608.399.4252
http://www.linkedin.com/in/jamieivanov
-- -- -- -- -- -- -- -- -- -- -- --
This transmission (including any attachments) may contain confidential information, privileged material (including material protected by the solicitor-client or other applicable privileges), or constitute non-public information. Any use of this information by anyone other than the intended recipient is prohibited. If you have received this transmission in error, please immediately reply to the sender and delete this information from your system. Use, dissemination, distribution, or reproduction of this transmission by unintended recipients is not authorized and may be unlawful.

Eugene Lipsky

unread,
Mar 13, 2015, 10:38:28 AM3/13/15
to isilon-u...@googlegroups.com
Gotcha. We're on 7.1.0.5 now, when I asked we were on 7.1.0.2 (I think) and were nearly out of space on the cluster. Took forever to delete a chunk of data because the snapshot policy in place was 2 weeks, taken every 15 minutes. We've since redesigned our snapshot policy but it's still painful deleting large chunks of data as we basically have to wait for snapshots to expire.

Peter Serocka

unread,
Mar 13, 2015, 10:45:15 AM3/13/15
to isilon-u...@googlegroups.com

On 2015 Mar 13 Fri, at 22:38, Eugene Lipsky <eli...@gmail.com> wrote:

> Gotcha. We're on 7.1.0.5 now, when I asked we were on 7.1.0.2 (I think) and were nearly out of space on the cluster. Took forever to delete a chunk of data because the snapshot policy in place was 2 weeks, taken every 15 minutes. We've since redesigned our snapshot policy but it's still painful deleting large chunks of data as we basically have to wait for snapshots to expire.

You don’t have to (wait)... you can expire snaps ad hoc (isi snaps delete).

But obviously you can’t have “both”.

— Peter

Jamie Ivanov

unread,
Mar 13, 2015, 10:50:37 AM3/13/15
to isilon-u...@googlegroups.com
Eugene,

I've done exactly what I mentioned on 7.x clusters, more than once, in the past so I do know it's possible. I've also designed some custom scripts that will forcibly remove pending snapshots which has been used in some emergency situations with full clusters. Deleting those snapshots will be painful but you can manually delete snapshots as was mentioned in another email; that activity is exceptionally metadata intensive as well so if you have SSDs then it would be beneficial to have the snapshot metadata configured for full read/write acceleration (this may be done via the web UI or a series of cluster-wide sysctl OIDs).

Jamie Ivanov
Mobile: 608.399.4252
http://www.linkedin.com/in/jamieivanov
-- -- -- -- -- -- -- -- -- -- -- --
This transmission (including any attachments) may contain confidential information, privileged material (including material protected by the solicitor-client or other applicable privileges), or constitute non-public information. Any use of this information by anyone other than the intended recipient is prohibited. If you have received this transmission in error, please immediately reply to the sender and delete this information from your system. Use, dissemination, distribution, or reproduction of this transmission by unintended recipients is not authorized and may be unlawful.

Eugene Lipsky

unread,
Mar 13, 2015, 10:59:35 AM3/13/15
to isilon-u...@googlegroups.com
Yes but if you have many snapshots, 2 weeks worth at 15 minute interval and data in question is multi-terabyte it takes a while to delete each snapshot, 1 at a time.

Peter Serocka

unread,
Mar 13, 2015, 11:53:22 AM3/13/15
to isilon-u...@googlegroups.com
On 2015 Mar 13 Fri, at 22:59, Eugene Lipsky <eli...@gmail.com> wrote:

> Yes but if you have many snapshots, 2 weeks worth at 15 minute interval and data in question is multi-terabyte it takes a while to delete each snapshot, 1 at a time.

They can be “expired” all at once,
but yeah SnapshotDelete will process the back to front.

Hard to imagine how this can be made faster by
fiddling with the internal bits.

But wasn’t your original question aiming at
deleting large amounts of alive (but snapshotted) data,
and having their disk space freed **asap**?

That is easy, though only semi-obvious:

Expire (isi delete) all relevant snapshots,
takes a few seconds (maybe minutes if thousands).

While these snaps are pending for their final removal
by SnapshotDelete, just go ahead and rm the alive files.

They will be wiped immediately, rather than going
into the snapshots, bar is closed…

Voila

Jamie Ivanov

unread,
Mar 13, 2015, 12:34:55 PM3/13/15
to isilon-u...@googlegroups.com
Peter,

There is plenty that can be done given certain circumstances.

"Internal bits" would be metadata manipulation and job engine performance.

There are other ways to delete snapshots rather than relying on the Snapshot Delete job -- I wrote scripts that would parallel delete snapshots that were part of different snapshot domains. The benefit to the Snapshot Delete job is utilizing multiple nodes in parallel as well as throttling job engine activity. If you manually delete snapshots, it will utilize that one node to handle the I/O transactions for the entire snapshot domain; you can delete multiple snapshots at the same time this way BUT they must not share a common path or a lot of unhappiness will follow. For example, if you have 20 snapshots on /ifs/data and 20 snapshots on /ifs/home, Snapshot Delete will delete the first expiring snapshot from all domains; however, you can manually simultaneously delete the oldest snapshot from both /ifs/data and /ifs/home.

This isn't going to be a well documented technique as there is a margin for error and you will be saturating a single node with I/O requests whilst handling client I/O. Is it possible? Of course. Is it recommended? Not as an everyday practice but instances just need to get results or production will be impacted.

Jamie Ivanov
Mobile: 608.399.4252
http://www.linkedin.com/in/jamieivanov
-- -- -- -- -- -- -- -- -- -- -- --
This transmission (including any attachments) may contain confidential information, privileged material (including material protected by the solicitor-client or other applicable privileges), or constitute non-public information. Any use of this information by anyone other than the intended recipient is prohibited. If you have received this transmission in error, please immediately reply to the sender and delete this information from your system. Use, dissemination, distribution, or reproduction of this transmission by unintended recipients is not authorized and may be unlawful.

Eugene Lipsky

unread,
Mar 13, 2015, 5:12:57 PM3/13/15
to isilon-u...@googlegroups.com
Yes my goal was to free space asap.

Sorry if I'm being a bit dense but again I'm just going by what I was told by support, apparently incorrectly. I guess based on what you're saying I may have been going about it the wrong way. For example last time I needed to delete a large chunk of data ~50TB I deleted it first then the snapshot became 50TB. So trying to delete that snapshot took a few days only to grow the next snapshot in the series to the same size. I guess what you're saying is delete the snapshot series first which should be quick. Then remove the files, which would then also be quick. If so that makes sense.

Peter Serocka

unread,
Mar 14, 2015, 12:34:18 AM3/14/15
to isilon-u...@googlegroups.com
Yes you got me correctly. And the fun part is you don’t even
need to wait for the SnapshotDelete job to finish.

Just “isi snaps delete”, which sends the snaps to
be pending for deletion, already provides solid ground
for the “rm” or TreeDelete of the alive files.

— Peter

Steve Maggioncalda

unread,
Mar 16, 2015, 12:49:56 PM3/16/15
to isilon-u...@googlegroups.com
Yo Saker!
Always good to hear from you.
Thanks for the input and the offer, sir. I'll certainly let you know
about the 12000s

So I tracked down the LINs and replaced the "corrupted" files with
files from backup.
No more DSR errors and the smartfail of the node completed. Back to
n+1. I can sleep a bit deeper.
But I'm a node down. Now trying to work out getting this smartfailed
node back into the cluster.
Good times!

Thanks again, bud!
Steve
Reply all
Reply to author
Forward
0 new messages