Large SyncIQ "latest" snapshots

987 views
Skip to first unread message

John Beranek - PA

unread,
Aug 6, 2015, 9:00:51 AM8/6/15
to Isilon Technical User Group
Hi all,

We have 2 Isilon clusters that we synchronise data between using SyncIQ. The plan is that at any one time there is one Isilon cluster which is active, and one which is passive/DR. Further to this, we plan to fail between the 2 clusters on a regular basis.

This much we have now done a few times, but I've just noticed that our current passive/DR cluster has a few large snapshots created by SyncIQ of the form:

SIQ-UUID-latest

Given the SyncIQ policy which is responsible for the snapshots is up-to-date with respect to synchronising from the current active cluster to the current passive/DR cluster, why would there be a big snapshot on the passive/DR cluster?

Cheers,

John

Chris Pepper

unread,
Aug 6, 2015, 9:34:04 AM8/6/15
to isilon-u...@googlegroups.com
John,

SyncIQ (optionally) keeps a snapshot between the last complete run and current (HEAD). That provides a map of all the changes, which can then be efficiently updated.

Chris
> --
> You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-gr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

John Beranek - PA

unread,
Aug 7, 2015, 5:54:05 AM8/7/15
to Isilon Technical User Group
I don't really follow...if the directory covered by the policy is identical on each cluster, why are there changes to keep track of on the passive/DR cluster?

These snapshots are somewhat concerning to me, as they add up to 5TiB, which is a significant proportional of the cluster.

Is there a way to safely delete them?

Cheers,

John

Peter Serocka

unread,
Aug 7, 2015, 6:05:40 AM8/7/15
to John Beranek - PA, Isilon Technical User Group
On 2015 Aug 7. md, at 17:54 st, John Beranek - PA wrote:

I don't really follow...if the directory covered by the policy is identical on each cluster, why are there changes to keep track of on the passive/DR cluster?

Good question :)

You might be able so see wether they are left over and should
have been deleted. Run (with your snap id or snap name as argument):

 isi snaps view 311136
               ID: 311136
             Name: home_hourly_2015-08-07_13-40
             Path: /ifs/data/home
        Has Locks: No
         Schedule: home hourly
  Alias Target ID: -
Alias Target Name: -
          Created: 2015-08-07T13:40:05
          Expires: 2015-08-08T13:40:00
             Size: 8.542G
     Shadow Bytes: 0b
        % Reserve: 0.00%
     % Filesystem: 0.00%
            State: active

When has your snap been created, and how
does that match with your SyncIQ schedule?


Cheers

-- Peter

Peter Serocka
CAS-MPG Partner Institute for Computational Biology (PICB)
Shanghai Institutes for Biological Sciences (SIBS)
Chinese Academy of Sciences (CAS)
320 Yue Yang Rd, Shanghai 200031, China





John Beranek - PA

unread,
Aug 7, 2015, 6:51:36 AM8/7/15
to Isilon Technical User Group, john.b...@pressassociation.com
On Friday, 7 August 2015 11:05:40 UTC+1, Pete wrote:

On 2015 Aug 7. md, at 17:54 st, John Beranek - PA wrote:

I don't really follow...if the directory covered by the policy is identical on each cluster, why are there changes to keep track of on the passive/DR cluster?

Good question :)

You might be able so see wether they are left over and should
have been deleted. Run (with your snap id or snap name as argument):

Well, the snapshot times equate to the last manual SyncIQ which was performed before the site failover. So, the night of the failover we did:

On "old primary": Last sync
On "new primary": Allow writes
On "old primary": Prepare resync

Here follows output from the now passive/DR cluster, heavily redacted (apologies):

$ sudo isi sync pol list
Name                       | Path                              | Action | State | Target
---------------------------+-----------------------------------+--------+-------+----------------------
PolicyA_mirror             | /ifs/data_a                       | Sync   | Off   | sync.sitea.example.com

$ sudo isi sync target list
Policy             | Source       | Target Path                      | Run     | FOFB State
-------------------+--------------+----------------------------------+---------+----------------------
PolicyA            | isilon-sitea | /ifs/data_a                      | Success | writes disabled

Then, from current active cluster:

$ isi sync pol list
Name               Path                             Action  Enabled  Target
------------------------------------------------------------------------------------------
PolicyA            /ifs/data_a                      sync    Yes      sync.siteb.example.com

$ isi sync target list
Name                       Source       Target Path                       Last Job State  FOFB State
------------------------------------------------------------------------------------------------------------
PolicyA_mirror             isilon-siteb /ifs/data_a                       finished        resync_policy_created


 
Cheers,

John

Chris Pepper

unread,
Aug 7, 2015, 9:39:58 AM8/7/15
to isilon-u...@googlegroups.com, john.b...@pressassociation.com
I *think* I normally see the SIQ snaps only on the source, which is the one which changes -- the target is read-only. Not that if you delete the snapshot to reclaim space, the SyncIQ policy may break. In the best case it will rescan the entire source and target trees to figure out what to copy. In the worst case you would need to delete and recreate the policy and then rescan source & target.

Chris

John Beranek - PA

unread,
Aug 7, 2015, 9:43:08 AM8/7/15
to Isilon Technical User Group, john.b...@pressassociation.com
On Friday, 7 August 2015 14:39:58 UTC+1, Chris Pepper wrote:
        I *think* I normally see the SIQ snaps only on the source, which is the one which changes -- the target is read-only. Not that if you delete the snapshot to reclaim space, the SyncIQ policy may break. In the best case it will rescan the entire source and target trees to figure out what to copy. In the worst case you would need to delete and recreate the policy and then rescan source & target.

I'm think I'm going to take this to EMC Support...

John

Chris Pepper

unread,
Aug 7, 2015, 9:52:53 AM8/7/15
to isilon-u...@googlegroups.com, john.b...@pressassociation.com
Note that SIQ snaps are named after the SyncIQ policy which created them.

Chris

> gclisi-5# isi sync policies view gclisi-input-to-solisi|grep ID
> ID: 092576798ec496580f5f9a76e2c4a19b
> gclisi-5# isi snap list|grep 0925
> 4841 SIQ-092576798ec496580f5f9a76e2c4a19b-latest /ifs/nl/input

Peter Serocka

unread,
Aug 7, 2015, 10:46:38 AM8/7/15
to isilon-u...@googlegroups.com, john.b...@pressassociation.com
So the large snap is now cluster A from
the time (say day 0) when you failed over ot cluster B.

Since then cluster B has been active, and
syncing data changes back to passive cluster A.

As the snap in question is being kept, there
is no surprise that is keeps growing, right?

(You can compare the snap folder on A
and the actual target data folder on A
to have this hopefully confirmed.)

Now the remaining question is, does the snap
on cluster A from day 0 actually help
for failing back from B to A in the
next round, or could be savely deleted?

And yes, I’d take that to support :)

In the mean time, I believe this educational clip
is spot on, in a certain way

https://www.youtube.com/watch?feature=player_detailpage&v=8osRaFTtgHo#t=133

John Beranek - PA

unread,
Aug 13, 2015, 5:49:11 AM8/13/15
to Isilon Technical User Group, john.b...@pressassociation.com
On Friday, 7 August 2015 15:46:38 UTC+1, Pete wrote:
So the large snap is now cluster A from
the time (say day 0) when you failed over ot cluster B.

Since then cluster B has been active, and
syncing data changes back to passive cluster A.

As the snap in question is being kept, there
is no surprise that is keeps growing, right?

(You can compare the snap folder on A
and the actual target data folder on A
to have this hopefully confirmed.)

Well sure, cluster A continues to get updates due to being the SyncIQ target of cluster B, but if cluster A doesn't have an active SyncIQ policy, why does it still have a SIQ-*-latest snapshot?

I've taken the issue to support, but haven't got very far yet.

Noticed a similar report on the EMC Community https://community.emc.com/thread/218227 where someone reports they get the same thing, and now just follows a procedure to delete the SIQ-*-latest snapshots after a planned failover...


Cheers,

John

John Beranek - PA

unread,
Sep 1, 2015, 9:37:46 AM9/1/15
to Isilon Technical User Group, john.b...@pressassociation.com

On Thursday, 13 August 2015 10:49:11 UTC+1, John Beranek - PA wrote:
On Friday, 7 August 2015 15:46:38 UTC+1, Pete wrote:
So the large snap is now cluster A from
the time (say day 0) when you failed over ot cluster B.

Since then cluster B has been active, and
syncing data changes back to passive cluster A.

As the snap in question is being kept, there
is no surprise that is keeps growing, right?

(You can compare the snap folder on A
and the actual target data folder on A
to have this hopefully confirmed.)

Well sure, cluster A continues to get updates due to being the SyncIQ target of cluster B, but if cluster A doesn't have an active SyncIQ policy, why does it still have a SIQ-*-latest snapshot?

I've taken the issue to support, but haven't got very far yet.

Well, I finally got a definitive response from EMC Support (after suggesting that their testing which suggested I couldn't safely delete the snapshots was flawed) that I could safely delete the snapshots as long as I had already prepared a re-sync policy and started synchronising PreviousTarget -> PreviousSource.

Cheers,

John

Steffen Lützenkirchen

unread,
Sep 2, 2015, 10:09:52 AM9/2/15
to Isilon Technical User Group, john.b...@pressassociation.com


Am Donnerstag, 13. August 2015 11:49:11 UTC+2 schrieb John Beranek - PA:

Well sure, cluster A continues to get updates due to being the SyncIQ target of cluster B, but if cluster A doesn't have an active SyncIQ policy, why does it still have a SIQ-*-latest snapshot?

I've taken the issue to support, but haven't got very far yet.

Noticed a similar report on the EMC Community https://community.emc.com/thread/218227 where someone reports they get the same thing, and now just follows a procedure to delete the SIQ-*-latest snapshots after a planned failover...


Cheers,

John

Hi John,

that someone is me.
 
funny side is: depending on the source emc currently states either "do not delete that snap, the whole world will crash!" or "you are safe to delete that snap"

no matter what, from my point of view this behaviour is a design flaw. there should neither be a snap that grows and is necessary nor a snap that grows and isn't necessary.

my current experience is we do not have any negative Impact by deleting the snaps.

I'm talking to emc on several channels (Community, Account-Team, Support) to get that behaviour fixed.
Reply all
Reply to author
Forward
0 new messages