SyncIQ and 'default IPs'

600 views
Skip to first unread message

Chris Pepper

unread,
Aug 22, 2014, 4:14:13 PM8/22/14
to isilon-u...@googlegroups.com
We are investigating a SyncIQ issue with EMC, and a few odd things came up.

We already knew that every Isilon cluster has a 'head' node. Isilon loves to say their clusters are headless, but this is not actually true. SmartConnect Advanced requires a single node to act as a nameserver, directing clients to available pool members. Various functions use a single coordinator node. As I understand it, the lowest-numbered available node is always the 'head'. In hybrid clusters (using SmartPools) you might want to make sure node #1 is a faster model, so a slower/older node doesn't impair performance of faster nodes.

Today I learned that every node also has a 'default' IP. This is the IP that shows up in "isi status" and the status web page. If the SyncIQ policy does not specify source and target pools, the SyncIQ source and target nodes communicate from these default IPs. This is important because we want to make sure SyncIQ runs over 10GE interfaces rather than 1GE. In practice, for us, SyncIQ communications paths are governed by the *target* IPs, because we do not have routes between 10GE and 1GE networks.

When a SyncIQ policy starts, the source and target negotiate how they will communicate with each other. You can see this with "isi_for_array tail -f /var/log/isi_migrate.log" on source & target clusters. There is some extra logic here which attempts to work through NAT but fails in our particular scenario. The tipoff is that the target provides a list of IPs to the source, and the source then lists target IPs it will (try to) communicate with. Our source used *different* IPs than the target specified, and SyncIQ of course failed to transfer data to these invalid IPs.

The SyncIQ source starts by getting an accessible target node IP via DNS. The source then contacts this node to start communications. The target returns its own cluster name and a list of usable node IPs (those 'default IPs' mentioned above) for the source to communicate with. Our problem is that *IF* the IP returned from SmartConnect Advanced is not in the list of (default) IPs returned to the source cluster, SyncIQ on the source cluster goes into 'NAT' mode and discards the specified target IPs. Instead it performs DNS queries. It generates node names from the returned cluster name (such as $CLUSTER-1, $CLUSTER-2, $CLUSTER-3, etc.) and looks them up -- SyncIQ does **NOT** use the name specified in the SyncIQ policy. Our target cluster's internal name is valid but currently maps to invalid IPs, so the source fails to connect.

There are a couple workarounds. First, SyncIQ has an undocumented option which can make a policy skip these NAT-traversal DNS queries. Instead it simply accepts the IPs provided by the target cluster, and data transfers properly.

Alternatively, the SyncIQ policy can specify specific source and target pools, which *should* avoid this problem. We primarily use NFS with dynamic SmartConnect zones, though, and SyncIQ can only use static SmartConnect zones, which we don't have. Especially because it now seems like perhaps every SyncIQ source or target needs a static pool to make sure communications run over the faster network, rather than the slower.

We also need visibility into the cluster's default pool, and a way to specify the preferred pool. In all 5 of our clusters, "isi status" shows 1gbps 'campus' IPs rather than our preferred 10gbps IPs. Now we need to see whether SyncIQ is actually running over our 1gbps network...

I hope this is of use to someone.

Regards,

Chris Pepper

Peter Serocka

unread,
Aug 23, 2014, 8:03:29 AM8/23/14
to isilon-u...@googlegroups.com
Thanks for sharing this journey… so did you arrive at using static pools successfully?

— Peter
> --
> You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-gr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

LinuxRox

unread,
Aug 23, 2014, 3:14:21 PM8/23/14
to isilon-u...@googlegroups.com
On my cluster it's actually the other way around. We wanted to use 1G interfaces for SyncIQ only and 10G interfaces for client connectivity.  1G interfaces are in a dedicated subnet, using static IP allocation pool ..and we always specify to restrict source nodes to this dedicated "replication" subnet. Same thing when specifying Target cluster, SmartConnect Zone is used to specify target host and only the nodes/interfaces in that SmartConnect zone are allowed to be connected to.


On Sat, Aug 23, 2014 at 8:03 AM, Peter Serocka <pser...@picb.ac.cn> wrote:
Thanks for sharing this journey... so did you arrive at using static pools successfully?

-- Peter

Chris Pepper

unread,
Aug 23, 2014, 7:36:20 PM8/23/14
to isilon-u...@googlegroups.com
Peter,

I just found out what's going on Friday. On 5 clusters (10 networks) we only have 2 static pools. I don't think we have enough IPs to add all the static pools SyncIQ would require. This is an unusual case because the 10GE network exists on the source and in DNS, but the target isn't connected until later this week, so I very much hope we can get SyncIQ running over the 10GE network without burning all those IPs. I certainly don't want to bring up additional Isilon-only subnets to make room for lots more static pools.

Chris

Saker Klippsten

unread,
Aug 24, 2014, 1:43:11 AM8/24/14
to isilon-u...@googlegroups.com
Are you syncing over a WAN? or LAN?

Peter Serocka

unread,
Aug 24, 2014, 7:56:19 AM8/24/14
to isilon-u...@googlegroups.com
Chris,

let me phrase it this way, you are about to find out
what is more painful in your environment, obtaining more
IPs or running SyncIQ with dynamic IPs...

But that issue can be seperated from the other issue,
so let me focus on restricting to 10GE interfaces.

The SyncIQ Best Practices White Paper
https://support.emc.com/docu39531_White_Paper:_Best_Practices_for_Data_Replication_with_EMC_Isilon_SyncIQ.pdf
has a neat trick for this, on page 30.

Create static pools to collect the
10GE interfaces, but do not(!) assign IP addresses.

So these are more a kind of “interface pools” rather
than “IP pools” — certainly a less obvious move.
Specify the pools for source and target resp,
with —force_interface=on as shown on page 30.

That will restrict the SyncIQ traffic
to the 10GE lines, using the whatever IPs
are present from *other* pools.

And in case you will have more IPs available
later on, you can just throw them into
these “interface pools” turning them
into real static “IP pools” ;-)

Hope this helps; best of luck!

— Peter

Chris Pepper

unread,
Aug 24, 2014, 10:52:35 AM8/24/14
to isilon-u...@googlegroups.com
Saker,

SyncIQ runs over a WAN. We always put replication targets on the *other* side of the Hudson River.

Peter,

Each node has a static IP in a pool for administration, but they were not configured as SmartConnect zones. I will have to reconsider.

Thanks for the suggestion to assign nodes but not IPs to a pool for this purpose. That's perverse but useful. Only useful for the source side, presumably, since SyncIQ on the source cannot access the target pool if it has no IPs.

Chris

On Aug 24, 2014, at 7:55 AM, Peter Serocka <pser...@picb.ac.cn> wrote:

> Chris,
>
> let me phrase it this way, you are about to find out
> what is more painful in your environment, obtaining more
> IPs or running SyncIQ with dynamic IPs...
>
> But that issue can be seperated from the other issue,
> so let me focus on restricting to 10GE interfaces.
>
> The SyncIQ Best Practices White Paper
> https://support.emc.com/docu39531_White_Paper:_Best_Practices_for_Data_Replication_with_EMC_Isilon_SyncIQ.pdf
> has a neat trick for this, on page 30.
>
> Create static pools to collect the
> 10GE interfaces, but do not(!) assign IP addresses.
>
> So these are more a kind of "interface pools" rather
> than "IP pools" -- certainly a less obvious move.
> Specify the pools for source and target resp,
> with --force_interface=on as shown on page 30.

LinuxRox

unread,
Aug 24, 2014, 11:10:17 AM8/24/14
to isilon-u...@googlegroups.com
Chris,

how do you have IP in a pool for administration but no SmartConnect Zone ? When you created an IP pool you did not specify anything for Zone Name ?


Saker Klippsten

unread,
Aug 24, 2014, 11:40:12 AM8/24/14
to isilon-u...@googlegroups.com
How big is that pipe? If lack of ip's are an issue and you have just a handful maybe just assign to a few nodes instead of all nodes to do the sync.. Won't take much to saturate a gige pipe. If it's 10gig or dark fiber muxed out you would need more depending on type of data

Chris Pepper

unread,
Aug 24, 2014, 12:10:33 PM8/24/14
to isilon-u...@googlegroups.com
LinuxRox,

We don't assign SmartConnect zone names to static node pools -- we use them only to access the nodes directly. We will revisit that.

Saker,

Our 10GE private network has 10GE WAN links, shared among multiple sites and nodes. The 1GE connections to the campus network have something like 80gbps aggregate bandwidth between sites. We also need to revisit which path we *want* SyncIQ to use...

Thanks, all

Chris
Reply all
Reply to author
Forward
0 new messages