URL resolution issue

12 views
Skip to first unread message

Stéphane Poss

unread,
Aug 5, 2014, 10:57:35 AM8/5/14
to diracgri...@googlegroups.com
Hi,
I seem to have as problem with the service URLs resolution. I thought that if a URL wasn't available the client would be able to try another URL automatically. This seems not to be the case:

Proceed and submit job(s)? y/[n] :
Job submission failure Can't connect to dips://54.77.33.240:9132/WorkloadManagement/JobManager: {'Message': "Could not connect to ('54.77.33.240', 9132): ('54.77.33.240', 9132): Can't connect: timed out", 'OK': False}

That host is the ElasticIP I have on amazon and it's off right now as I'm doing development and don't need/want to run that.

That yields lots of nasty little errors here and there which doesn't allow me to track down exactly where the problems I have come from...

Thanks,
S P

Stéphane Poss

unread,
Aug 5, 2014, 11:05:53 AM8/5/14
to diracgri...@googlegroups.com
Ok, I found the reason: line 193 of BaseClient (in v6r10p23, I know I'm a bit late): the actual host URLs are randomized and the first one is taken. So basically it's not possible to have a machine down, as that would induce failures. Could it be possible to add a "ping" to that service and in case it does not reply chose another until it works, and fail if none are available?

Thanks,
S Poss



--
You received this message because you are subscribed to the Google Groups "diracgrid-develop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to diracgrid-deve...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Adrià Casajús

unread,
Sep 23, 2014, 8:17:28 AM9/23/14
to diracgri...@googlegroups.com
That would slow all connections to any server. If you want to do an intervention in a host, remove it from the URL list, do whatever it's needed and add it again. That's the easiest way.

Stéphane Poss

unread,
Sep 23, 2014, 9:07:06 AM9/23/14
to diracgri...@googlegroups.com
Hi Adria,
What I meant was that at the first connection, when the server URL is chosen, a ping could be issued. I didn't mean to say that at each connection there should be a ping, as that would be wrong...

The idea is that in case there is a server down when restarting agents/services, things do not fail in a nasty way. It surely does not prevent failures further down...

Cheers,
SP

Christophe HAEN

unread,
Sep 24, 2014, 6:51:06 AM9/24/14
to diracgri...@googlegroups.com
I second Stephane's request. We should have the possibilities to define several URLs that would work in round robin or failover mode
Reply all
Reply to author
Forward
0 new messages