testing ucx with high number of workers

2 views
Skip to first unread message

Ahmet Uyar

unread,
Sep 4, 2020, 10:16:27 AM9/4/20
to Twister2
Hi Chathura,

I tested ucx by running terasort with more than 200 workers at victor in standalone mode. 
Sometimes, it works, but most of the time, it throws an exception. 
Following exception is thrown: 
Caused by: org.openucx.jucx.UcxException: Destination is unreachable
        at org.openucx.jucx.ucp.UcpListener.createUcpListener(Native Method)
        at org.openucx.jucx.ucp.UcpListener.<init>(UcpListener.java:25)
        at org.openucx.jucx.ucp.UcpWorker.newListener(UcpWorker.java:49)
        at edu.iu.dsc.tws.comms.ucx.TWSUCXChannel.createUXCWorker(TWSUCXChannel.java:100)
        at edu.iu.dsc.tws.comms.ucx.TWSUCXChannel.<init>(TWSUCXChannel.java:86)
        ... 13 more

I attached the logs. 

Ahmet

auyar-terasort-obhqr17.log

Chathura Widanage

unread,
Sep 4, 2020, 10:28:21 AM9/4/20
to Ahmet Uyar, Twister2
Hi Ahmet,

When it randomly works, do you still see below logs?

[1599227712.059761] [v-014:184865:0]       context.cc:48   UCX  WARN  JUCX: no such key UCX_SOCKADDR_TLS_PRIORITY, ignoring
[1599227712.187487] [v-011:159084:0]   ucp_listener.c:429  UCX  ERROR none of the available transports can listen for connections on 172.29.200.211:39732

Regards,
Chathura


--
You received this message because you are subscribed to the Google Groups "Twister2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to twister2+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/twister2/CAPBRfYfifECHt5Mno1MeK49R8efEHy%3Dk2a_cXFO-f66rAOF4pg%40mail.gmail.com.

Ahmet Uyar

unread,
Sep 4, 2020, 10:39:20 AM9/4/20
to Chathura Widanage, Twister2
WARN messages are always there, but the ERROR message is printed only on failures. 

Ahmet

Chathura Widanage

unread,
Sep 4, 2020, 11:02:34 AM9/4/20
to Ahmet Uyar, Twister2
I raised this as an issue in the UCX mailing list. Let's wait for their response.

Regards,
Chathura

Chathura Widanage

unread,
Sep 4, 2020, 11:22:01 AM9/4/20
to Ahmet Uyar, Twister2
Ahmet,

I have applied a small fix based on Peter's feedback. Could you please rerun with below change.


Regards,
Chathura

Ahmet Uyar

unread,
Sep 4, 2020, 11:35:28 AM9/4/20
to Chathura Widanage, Twister2
I am testing that. thanks. 

Ahmet
Reply all
Reply to author
Forward
0 new messages