Fwd: Twister2 debug session

13 views
Skip to first unread message

Ahmet Uyar

unread,
Sep 16, 2020, 7:05:24 AM9/16/20
to chathura widanage, Twister2
Hi Chathura,

When I set UCX_TCP_CM_ALLOW_ADDR_INUSE in mpiworker.sh as follows:
export UCX_TCP_CM_ALLOW_ADDR_INUSE=y

I get below warning. So, probably our ucx version does not have the commit that Peter mentions. 
[1600254107.765897] [v-015:73504:0]         parser.c:1626 UCX  WARN  unused env variable: UCX_TCP_CM_ALLOW_ADDR_INUSE (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning)

Ahmet

---------- Forwarded message ---------
From: Peter Rudenko <prud...@nvidia.com>
Date: Wed, Sep 16, 2020 at 12:51 PM
Subject: RE: Twister2 debug session
To: Chathura Widanage <chathura...@gmail.com>, Yossi Itigin <yos...@nvidia.com>
Cc: Ahmet Uyar <ahme...@gmail.com>, Alina Sklarevich <ali...@mellanox.com>, Vibhatha Abeykoon <vibh...@gmail.com>, Yossi Itigin <yos...@mellanox.com>, Peter Rudenko <pet...@mellanox.com>, Supun Kamburugamuve <sup...@gmail.com>, Dmitry Gladkov <dmit...@mellanox.com>


Hi, you need ucx from master with this PR: https://github.com/openucx/ucx/commit/086140eb89d6f621fe3547c95bb909a6ac779c41

 

Then you can set: export UCX_TCP_CM_ALLOW_ADDR_INUSE=y

 

 

Thanks,

Peter Rudenko
Mellanox Software Engineer
+380 67 9348614

NVIDIA

 

From: Chathura Widanage <chathura...@gmail.com>
Sent: Wednesday, September 16, 2020 3:01 AM
To: Yossi Itigin <yos...@nvidia.com>
Cc: Ahmet Uyar <ahme...@gmail.com>; Peter Rudenko <prud...@nvidia.com>; Alina Sklarevich <ali...@mellanox.com>; Vibhatha Abeykoon <vibh...@gmail.com>; Yossi Itigin <yos...@mellanox.com>; Peter Rudenko <pet...@mellanox.com>; Supun Kamburugamuve <sup...@gmail.com>; Dmitry Gladkov <dmit...@mellanox.com>
Subject: Re: Twister2 debug session

 

Hi,

 

How can we set SO_REUSEADDR to the TCP server socket opened by UCX from JUCX?


Regards,

Chathura

 

 

On Sun, Sep 6, 2020 at 5:59 AM Yossi Itigin <yos...@nvidia.com> wrote:

Hi,

 

I think there is high chance that increasing this value would solve the issue.

In MPI flows, there are several retries for connect(), so it could be the reason it works even with larger number of workers.

 

--Yossi

 

From: Ahmet Uyar <ahme...@gmail.com>
Sent: Saturday, 5 September 2020 12:39
To: Yossi Itigin <yos...@nvidia.com>
Cc: Chathura Widanage <chathura...@gmail.com>; Peter Rudenko <prud...@nvidia.com>; Alina Sklarevich <ali...@mellanox.com>; Vibhatha Abeykoon <vibh...@gmail.com>; Yossi Itigin <yos...@mellanox.com>; Peter Rudenko <pet...@mellanox.com>; Supun Kamburugamuve <sup...@gmail.com>; Dmitry Gladkov <dmit...@mellanox.com>
Subject: Re: Twister2 debug session

 

Hi Yossi,

 

The value of /proc/sys/net/core/somaxconn is 128. 

However, I can not increase it, since I do not have the root permissions. 

If you think that the problem might be related to this, I can ask our system admin to increase it. 

 

By the way, when I run the same application with OpenMPI, there is no problem. It works fine even with 400 workers. 

 

many thanks, 

 

Ahmet

 

 

 

On Fri, Sep 4, 2020 at 10:07 PM Yossi Itigin <yos...@nvidia.com> wrote:

Hi,

 

Perhaps there are too many connects in same time.

What is the value in /proc/sys/net/core/somaxconn? Can you pls try increasing it (say, to 1024)?

 

From: Ahmet Uyar <ahme...@gmail.com>
Sent: Friday, 4 September 2020 21:19
To: Chathura Widanage <chathura...@gmail.com>
Cc: Peter Rudenko <prud...@nvidia.com>; Alina Sklarevich <ali...@mellanox.com>; Vibhatha Abeykoon <vibh...@gmail.com>; Yossi Itigin <yos...@mellanox.com>; Peter Rudenko <pet...@mellanox.com>; Supun Kamburugamuve <sup...@gmail.com>; Dmitry Gladkov <dmit...@mellanox.com>
Subject: Re: Twister2 debug session

 

Hi Peter and all, 

 

This is from another run with 240 workers. In this case, after logging these error messages, the application just stucks. I killed it forcefully. 

These errors seem to be different. 

 

Ahmet

 

On Fri, Sep 4, 2020 at 7:22 PM Ahmet Uyar <ahme...@gmail.com> wrote:

sorry, the log file did not have the error messages. I added them to the attached file. 

 

Ahmet

 

On Fri, Sep 4, 2020 at 7:19 PM Ahmet Uyar <ahme...@gmail.com> wrote:

Hi Peter,

 

We have set the env variables as: 

.setConfig("SOCKADDR_CM_ENABLE", "y")
.setConfig(
"SOCKADDR_TLS_PRIORITY", "tcp")
Now, we do not get warnings about env variables. 
And it works fine with smaller number of workers such as 100 workers.
However, when we are testing with 240 workers, we get attached errors. 
Ahmet

 

On Fri, Sep 4, 2020 at 6:26 PM Chathura Widanage <chathura...@gmail.com> wrote:

Hi Peter,

 

Attached are the logs for all 240 worker instances. We are seeing the same WARN/ERROR repeated. Hopefully, this will get fixed once we get rid of the UCX prefix. Regarding the listener failure, we are seeing only below two log lines.

 

[1599227712.187487] [v-011:159084:0]   ucp_listener.c:429  UCX  ERROR none of the available transports can listen for connections on 172.29.200.211:39732
[1599227712.187522] [v-011:159084:0]      listener.cc:53   UCX  ERROR JUCX: Destination is unreachable


Regards,

Chathura

 

 

On Fri, Sep 4, 2020 at 11:12 AM Peter Rudenko <prud...@nvidia.com> wrote:

Hi Chathura,

 

  1. When you set config programmatically, no need to set UCX prefix. It’s only for environment variable. So just set: .setConfig("SOCKADDR_CM_ENABLE", "y")
  2. Can you please send full ucx logs for failed listener.

 

BR,

 

Peter Rudenko
Mellanox Software Engineer
+1 380 67 9348614

NVIDIA

 

From: Chathura Widanage <chathura...@gmail.com>
Sent: Friday, September 4, 2020 6:00 PM
To: Alina Sklarevich <ali...@mellanox.com>; Vibhatha Abeykoon <vibh...@gmail.com>
Cc: Yossi Itigin <yos...@mellanox.com>; Ahmet Uyar <ahme...@gmail.com>; Peter Rudenko <pet...@mellanox.com>; Supun Kamburugamuve <sup...@gmail.com>; Dmitry Gladkov <dmit...@mellanox.com>
Subject: Re: Twister2 debug session

 

Hi,

 

Could you please clarify below 2 issues.

 

1. We tried configuring UCX from Java as follows.

 

UcpContext context = new UcpContext(new UcpParams().requestTagFeature()
    .setMtWorkersShared(
false)
    .setConfig(
"UCX_SOCKADDR_CM_ENABLE", "y")
    .setConfig(
"UCX_SOCKADDR_TLS_PRIORITY", "tcp")
)
;

 

But we are getting below WARN and ERROR at runtime.

 

[1599227712.059761] [v-014:184865:0]       context.cc:48   UCX  WARN  JUCX: no such key UCX_SOCKADDR_TLS_PRIORITY, ignoring
[1599227712.187487] [v-011:159084:0]   ucp_listener.c:429  UCX  ERROR none of the available transports can listen for connections on 172.29.200.211:39732

 

2. When we create a UCPListener, does it blocks at the newListener() call till UCX internally start listening?

 

UcpListener ucpListener = ucpWorker.newListener(new UcpListenerParams().setSockAddr(
   
new InetSocketAddress(iWorkerController.getWorkerInfo().getWorkerIP(),
       
iWorkerController.getWorkerInfo().getPort())
))
;

We end up getting below ERROR along with the above two errors.

 

[1599227712.187522] [v-011:159084:0]      listener.cc:53   UCX  ERROR JUCX: Destination is unreachable

 

We are using the below revision.

 

 

Thanks a lot!


Regards,

Chathura

 

 

On Mon, Jul 20, 2020 at 9:16 PM Chathura Widanage <chathura...@gmail.com> wrote:

Adding Vibhatha.

 

Regards,

Chathura

 

 

On Mon, Jul 20, 2020 at 1:02 PM Chathura Widanage <chathura...@gmail.com> wrote:

Thanks Alina. We will check this out.

 

Regards,

Chathura

 

 

On Mon, Jul 20, 2020 at 12:54 PM Alina Sklarevich <ali...@mellanox.com> wrote:

Hi,

 

In order to use the new TCP connection manager in the UCX master branch, please set the following environment parameters:

UCX_SOCKADDR_CM_ENABLE=y 

UCX_SOCKADDR_TLS_PRIORITY=tcp

 

Thanks,

Alina.

 

From: Yossi Itigin <yos...@mellanox.com>
Sent: Monday, July 20, 2020 19:22
To: Chathura Widanage <chathura...@gmail.com>; Alina Sklarevich <ali...@mellanox.com>
Cc: Ahmet Uyar <ahme...@gmail.com>; Peter Rudenko <pet...@mellanox.com>; Supun Kamburugamuve <sup...@gmail.com>; Dmitry Gladkov <dmit...@mellanox.com>
Subject: RE: Twister2 debug session

 

Hi,

 

The TCP connection manager fixes fixes are currently available in UCX master branch.

@Alina Sklarevich which parameters to we need to pass to UCX in order to use it?

 

From: Chathura Widanage <chathura...@gmail.com>
Sent: Monday, 20 July 2020 18:57
To: Yossi Itigin <yos...@mellanox.com>
Cc: Ahmet Uyar <ahme...@gmail.com>; Peter Rudenko <pet...@mellanox.com>; Supun Kamburugamuve <sup...@gmail.com>; Dmitry Gladkov <dmit...@mellanox.com>; Alina Sklarevich <ali...@mellanox.com>
Subject: Re: Twister2 debug session

 

Hi Yossi and UCX team,

 

I hope you all are doing good. We released twister2 v0.7.0 on 17th Friday. 

Could you please let us know whether the latest release of UCX(1.8.1) includes the fix mentioned above.

 

Regards,

Chathura

 

 

On Sun, Apr 5, 2020 at 10:41 AM Chathura Widanage <chathura...@gmail.com> wrote:

Hi Yossi,

Thanks for the update. Yeah! We are fine with the July timeframe. Let us know, once this component is available in your master branch, so we also would be able to test the new implementation on our environments. 

Regards,

Chathura

 

 

On Sun, Apr 5, 2020 at 8:28 AM Yossi Itigin <yos...@mellanox.com> wrote:

Hi,

 

Peter managed to reproduce the issues on our cluster, and seems the problem is in TCP connection manager (“sockcm”).

We are planning to replace this component for next UCX release (v1.9, July timeframe)

We’re wondering whether it would be acceptable to wait for the new TCP connection manager to fix these issues?

 

Thanks,

--Yossi

 

From: Ahmet Uyar <ahme...@gmail.com>
Sent: Wednesday, 1 April 2020 15:41
To: Peter Rudenko <pet...@mellanox.com>
Cc: Chathura Widanage <chathura...@gmail.com>; Supun Kamburugamuve <sup...@gmail.com>; Yossi Itigin <yos...@mellanox.com>
Subject: Re: Twister2 debug session

 

I set twister2 logging level to WARNING. My logs are small I think. 

in conf/common/logger.properties file,

I set:

 .level=WARNING

edu.iu.dsc.tws.level=WARNING

 

Ahmet

 

 

On Wed, Apr 1, 2020 at 3:34 PM Peter Rudenko <pet...@mellanox.com> wrote:

Yes, I’m using tee to collect, but it creates 600Mb file on 250 workers:

 

./bin/twister2 submit standalone jar examples/libexamples-java.jar   edu.iu.dsc.tws.examples.batch.terasort.TeraSort  -size 1   -valueSize 90   -keySize 10   -instances 250   -instanceCPUs 1   -instanceMemory 4000   -sources 125  -sinks 125   -memoryBytesLimit 200   -fileSizeBytes 1000  2>&1 | tee twister2.log

 

From: Ahmet Uyar <ahme...@gmail.com>
Sent: Wednesday, April 1, 2020 3:32 PM
To: Peter Rudenko <pet...@mellanox.com>
Cc: Chathura Widanage <chathura...@gmail.com>; Supun Kamburugamuve <sup...@gmail.com>; Yossi Itigin <yos...@mellanox.com>
Subject: Re: Twister2 debug session

 

are you running twister2 as standalone with openmPI?

 

Ahmet

 

On Wed, Apr 1, 2020 at 3:19 PM Peter Rudenko <pet...@mellanox.com> wrote:

Hi, thanks, one more question. When I submit twister2 job I see logs aggregated from all the nodes. Also some logs appears in jobs/job-id/logs directory.

 

E.g. in code I do:

LOG.info(“log1”);

LOG.info(“log2”);

 

And log1 I may see on a master node in aggregated logs and log2 in jobs/jobid1/logs/worker-123/worker-123.log

 

Is it possible to capture stdout to log files?  If I run with UCX_LOG_LEVEL=debug it prints logs from all workers, but in log files it’s empty.

 

Thanks,

Peter

 

From: Ahmet Uyar <ahme...@gmail.com>
Sent: Tuesday, March 31, 2020 6:16 PM
To: Peter Rudenko <pet...@mellanox.com>
Cc: Chathura Widanage <chathura...@gmail.com>; Supun Kamburugamuve <sup...@gmail.com>; Yossi Itigin <yos...@mellanox.com>
Subject: Re: Twister2 debug session

 

Hi Peter,

 

I added following parameter to conf/common/network.yaml file:

twister2.network.interfaces.for.workers: ['interface1', 'interface2']

 

You can specify the network interface for workers to use with this. 

If this parameter is off, we get the ip address by InetAddress.getLocalHost().getHostAddress()

If this parameter is specified, it uses the first ip address it gets that is up and not loop back address from this list.

 

This is not in the master branch yet, you can use it from ahmet/scalability-tests branch. 

 

Hope it works, all the best,

 

Ahmet

 

 

 

On Mon, Mar 30, 2020 at 6:31 PM Ahmet Uyar <ahme...@gmail.com> wrote:

Hi Peter,

 

Currently, there is no configuration parameter for this. We will try to implement it quickly and let you know. 

 

Ahmet

 

 

On Mon, Mar 30, 2020 at 2:44 PM Peter Rudenko <pet...@mellanox.com> wrote:

Hi, quick question: so on my environment UcxListener starts to listen on public network interface (which is slow network interface). How can I tell twister2 which network interface to use, so iWorkerController.getWorkerInfo().getWorkerIP() will return correct IP? I’ve set conf/standalone/nodes to hostnames of correct netiface, but it still starts on a public one.

 

Thanks,

Peter

 

From: Chathura Widanage <chathura...@gmail.com>
Sent: Friday, March 27, 2020 10:02 PM
To: Peter Rudenko <pet...@mellanox.com>
Cc: Ahmet Uyar <ahme...@gmail.com>; Supun Kamburugamuve <sup...@gmail.com>; Yossi Itigin <yos...@mellanox.com>
Subject: Re: Twister2 debug session

 

One ucpWorker.progress call couldn’t (and wouldn’t) be enough to “ internally progress all the send, recv events. 


I misread this part. Yeah, we do multiple progress calls until a (worker receives END from all the sources && the worker successfully sends its own messages to the destinations). We don't want to keep calling ucpWorker.progress based on a condition as you mentioned in 2 since it blocks the thread. Instead, we periodically call progress infinitely until the above condition is met.

Regards,

Chathura

 

 

On Fri, Mar 27, 2020 at 4:00 PM Peter Rudenko <pet...@mellanox.com> wrote:

https://github.com/openucx/ucx/blob/master/src/ucp/api/ucp.h#L1449

 

/**

 * @ingroup UCP_WORKER

 * @brief Progress all communications on a specific worker.

 *

 * This routine explicitly progresses all communication operations on a worker.

 *

 * @note

 * @li Typically, request wait and test routines call @ref

 * ucp_worker_progress "this routine" to progress any outstanding operations.

 * @li Transport layers, implementing asynchronous progress using threads,

 * require callbacks and other user code to be thread safe.

 * @li The state of communication can be advanced (progressed) by blocking

 * routines. Nevertheless, the non-blocking routines can not be used for

 * communication progress.

 *

 * @param [in]  worker    Worker to progress.

 *

 * @return Non-zero if any communication was progressed, zero otherwise.

 */

unsigned ucp_worker_progress(ucp_worker_h worker);

 

worker.progress() is responsible for connection wireup communication, and other non visible communications.

 

So if you submit 1 request and then call worker.progress() it doesn’t guaranteed that this request is completed even if it returned some non zero number. Need call progress as long as you have uncompleted requests or you don’t got callbacks.

 

Thanks,

Peter

 

 

From: Chathura Widanage <chathura...@gmail.com>
Sent: Friday, March 27, 2020 9:49 PM
To: Peter Rudenko <pet...@mellanox.com>
Cc: Ahmet Uyar <ahme...@gmail.com>; Supun Kamburugamuve <sup...@gmail.com>; Yossi Itigin <yos...@mellanox.com>
Subject: Re: Twister2 debug session

 

Hi,

With the last message from each source, they send a flag indicating the END. That's how we determine the condition to stop progressing(If a worker has received END from all the sources and if the worker has successfully sent all the messages to its destinations).

Could you please explain the behavior of ucpWorker.progress();  If it doesn't progress and send and recv events internally, what is the purpose of this method?

Regards,

Chathura

 

 

On Fri, Mar 27, 2020 at 3:43 PM Peter Rudenko <pet...@mellanox.com> wrote:

Hi,

 

  1. One ucpWorker.progress call couldn’t (and wouldn’t) be enough to “ internally progress all the send, recv events. 
  2. I would recommend to progress on some condition. E.g. while (!requestMap.isEmpty) progress();
  3. I don’t see calls to

@Override

  public void progressSends() {

    this.progress();

  }

 

How does it progress send request?

  1. Here’s a stackTrace:

 

"MPIWorker-52" #1 prio=5 os_prio=0 tid=0x00007ffff000d800 nid=0x6423 runnable [0x00007ffff7fce000]

   java.lang.Thread.State: RUNNABLE

        at org.openucx.jucx.ucp.UcpWorker.progressWorkerNative(Native Method)

        at org.openucx.jucx.ucp.UcpWorker.progress(UcpWorker.java:63)

        at edu.iu.dsc.tws.comms.ucx.TWSUCXChannel.progress(TWSUCXChannel.java:266)

        at edu.iu.dsc.tws.executor.threading.BatchSharingExecutor2.runExecution(BatchSharingExecutor2.java:181)

        at edu.iu.dsc.tws.executor.threading.BatchSharingExecutor2.execute(BatchSharingExecutor2.java:119)

        at edu.iu.dsc.tws.task.impl.TaskExecutor.execute(TaskExecutor.java:195)

        at edu.iu.dsc.tws.task.impl.TaskExecutor.execute(TaskExecutor.java:208)

        at edu.iu.dsc.tws.examples.batch.terasort.TeraSort.execute(TeraSort.java:216)

        at edu.iu.dsc.tws.rsched.schedulers.standalone.MPIWorker.startWorker(MPIWorker.java:419)

        at edu.iu.dsc.tws.rsched.schedulers.standalone.MPIWorker.<init>(MPIWorker.java:174)

        at edu.iu.dsc.tws.rsched.schedulers.standalone.MPIWorker.main(MPIWorker.java:233)

 

Where do you have a decision when to stop progressing ?

 

Sorry for dummy questions, trying to better understand twister2 architecture to figure out where’s an issue.

 

Thanks,
Peter

 

From: Chathura Widanage <chathura...@gmail.com>
Sent: Friday, March 27, 2020 9:26 PM
To: Peter Rudenko <pet...@mellanox.com>
Cc: Ahmet Uyar <ahme...@gmail.com>; Supun Kamburugamuve <sup...@gmail.com>; Yossi Itigin <yos...@mellanox.com>
Subject: Re: Twister2 debug session

 

Hi Peter,

When Twister2 starts, the worker posts recv request to all possible sources by calling below method.

https://github.com/DSC-SPIDAL/twister2/blob/b8d53eaf9cd65580cd5aba834e74df8f99a80de6/twister2/comms/src/java/edu/iu/dsc/tws/comms/ucx/TWSUCXChannel.java#L250

Queue<DataBuffer> receiveBuffers


contains the available buffers.

Then we completely rely on the UcxWorker.progress() method and we assume it internally progress all the send, recv events. 

So if the worker receive something we expect UCX to call below callback.

https://github.com/DSC-SPIDAL/twister2/blob/b8d53eaf9cd65580cd5aba834e74df8f99a80de6/twister2/comms/src/java/edu/iu/dsc/tws/comms/ucx/TWSUCXChannel.java#L215

Then we call the below callback.

https://github.com/DSC-SPIDAL/twister2/blob/b8d53eaf9cd65580cd5aba834e74df8f99a80de6/twister2/comms/src/java/edu/iu/dsc/tws/comms/ucx/TWSUCXChannel.java#L228

Then twister2 starts processing that buffer and once done, it adds the buffer back to 
Queue<DataBuffer> receiveBuffers.

 

Then it posts another recv request to the same source. This continues until the end of the job.

 

-----

 

Do we need to manually call progress on receive requests instead of relying on the ucxWorker.progress()?

 

Regards,

Chathura

 

 

On Fri, Mar 27, 2020 at 2:11 PM Chathura Widanage <chathura...@gmail.com> wrote:

Hi Peter, 

 

It doesn't post N recv requests at every iteration.  It does that only if buffers are available. In order for buffers to be available previous recvs should get completed. So what happens is, if any of the previous recvs from the source  X are completed it will post a new recv request to the same source X. It continuously goes in this cycle. 

 

Regards,

Chathura

 

On Fri, Mar 27, 2020, 14:02 Peter Rudenko <pet...@mellanox.com> wrote:

I’m confused about this method: https://github.com/DSC-SPIDAL/twister2/blob/ahmet/scalability-tests/twister2/comms/src/java/edu/iu/dsc/tws/comms/ucx/TWSUCXChannel.java#L208

 

It submits N recv request on every iteration of progress?

https://github.com/DSC-SPIDAL/twister2/blob/ahmet/scalability-tests/twister2/comms/src/java/edu/iu/dsc/tws/comms/ucx/TWSUCXChannel.java#L264

 

And it doesn’t actually progress.

 

Best Regards,

Peter Rudenko

Software engineer | Mellanox Technologies Ltd.

+380679348614

 

 

From: Chathura Widanage <chathura...@gmail.com>
Sent: Friday, March 27, 2020 7:52 PM
To: Peter Rudenko <pet...@mellanox.com>
Cc: Ahmet Uyar <ahme...@gmail.com>; Supun Kamburugamuve <sup...@gmail.com>; Yossi Itigin <yos...@mellanox.com>
Subject: Re: Twister2 debug session

 

Hi Peter,

We don't expect anything to complete during the progress() call. Twister2 thread calls progress() repeatedly until the communication completes(but this is not the only method it calls during a progress cycle). The reason is, we don't want the thread to be blocked on this method. Do you think this behavior could lead to issues?

Regards,

Chathura

 

 

On Fri, Mar 27, 2020 at 1:44 PM Peter Rudenko <pet...@mellanox.com> wrote:

Hi, seems like found a problem:

https://github.com/DSC-SPIDAL/twister2/blob/ahmet/scalability-tests/twister2/comms/src/java/edu/iu/dsc/tws/comms/ucx/TWSUCXChannel.java#L266

 

/**
* This routine explicitly progresses all communication operations on a worker.
* @return Non-zero if any communication was progressed, zero otherwise.
*/

 

Worker.progress() doesn’t guarantee that ALL or at least even one communication request will finish. Need to progress on some request:

UcpRequest request = ucpWorker.recvTaggedNonBlocking(…)

ucpWorker.progressRequest(request);

 

 

Or if you have sequence of requests:

While (requestArray.forAll(r -> !r.isCompleted) {

ucpWorker.progress();

}

 

What happens on big number of workers is that queues (send and receive) becomes full of requests.

 

I’ll trying now to progress on request, will make a PR if succeeded.

 

 

Best Regards,

Peter Rudenko

Software engineer | Mellanox Technologies Ltd.

+380679348614

 

 

 

 

From: Chathura Widanage <chathura...@gmail.com>
Sent: Wednesday, March 25, 2020 10:06 PM
To: Peter Rudenko <pet...@mellanox.com>
Cc: Ahmet Uyar <ahme...@gmail.com>; Supun Kamburugamuve <sup...@gmail.com>; Yossi Itigin <yos...@mellanox.com>
Subject: Re: Twister2 debug session

 

Hi Peter,

 

Do you split a message to several buffers? What’s size?

Yes. Buffer size is  102400 bytes by default. But this can be configured(twister2.network.buffer.size) in https://github.com/DSC-SPIDAL/twister2/blob/master/twister2/config/src/yaml/conf/common/network.yaml

 

-memoryBytesLimit 200000000 – what is this parameter for?

This has nothing to do with communication. It's a configuration for TeraSort. Twister2 will keep this much of Tuples in memory before start writing to the disk.

More about this example specific parameters can be found here https://twister2.org/docs/examples/terasort/terasort

 

Will track calling this method: https://github.com/DSC-SPIDAL/twister2/blob/ahmet/scalability-tests/twister2/comms/src/java/edu/iu/dsc/tws/comms/ucx/TWSUCXChannel.java#L286 and maybe we can add memory preregister there to speedup during data path.

 

This method is called only during the initialization to allocate buffers for each communication operation(keyed-gather in Terasort example). Thereafter

it reuses the same set of buffers.

Number of buffers allocated can also be configured(twister2.network.sendBuffer.count, twister2.network.receiveBuffer.count) here https://github.com/DSC-SPIDAL/twister2/blob/master/twister2/config/src/yaml/conf/common/network.yaml

 

Regards,

Chathura

 

 

On Wed, Mar 25, 2020 at 2:29 PM Peter Rudenko <pet...@mellanox.com> wrote:

Ok, so we localized an issue. Even with 60Gb it reproduces. So have several questions:

https://github.com/DSC-SPIDAL/twister2/blob/ahmet/scalability-tests/twister2/comms/src/java/edu/iu/dsc/tws/comms/ucx/TWSUCXChannel.java#L133

 

  1. Do you split a message to several buffers? What’s size?
  2. -memoryBytesLimit 200000000 – what is this parameter for?
  3. Will track calling this method: https://github.com/DSC-SPIDAL/twister2/blob/ahmet/scalability-tests/twister2/comms/src/java/edu/iu/dsc/tws/comms/ucx/TWSUCXChannel.java#L286 and maybe we can add memory preregister there to speedup during data path.

So there seems like some memory overconsumption. Each process is taking ~10GB of res memory.

 

 

 

Will investigate more.

 

Thanks,

Peter

 

From: Ahmet Uyar <ahme...@gmail.com>
Sent: Wednesday, March 25, 2020 7:30 PM
To: Peter Rudenko <pet...@mellanox.com>
Cc: Supun Kamburugamuve <sup...@gmail.com>; chathura widanage <chathura...@gmail.com>; Yossi Itigin <yos...@mellanox.com>
Subject: Re: Twister2 debug session

 

Hi Peter,

 

In kubernetes tests, I used 6GB of memory on each worker/pod. 

I set instance memory as: 

-instanceMemory 6144

 

memory is in MB. so this is 6GB.

161440 would make 161GB of memory that we can not allocate to each worker/pod. 

 

Ahmet

 

 

 

On Wed, Mar 25, 2020 at 6:57 PM Peter Rudenko <pet...@mellanox.com> wrote:

Sorry, my bad. Actual fix is not with UCX lib, but with increasing instance memory:

-instanceMemory 161440

 

With this it works for me with yours binaries. Can you please try that.

 

Thanks,

Peter

 

From: Peter Rudenko
Sent: Wednesday, March 25, 2020 4:30 PM
To: Ahmet Uyar <ahme...@gmail.com>; Supun Kamburugamuve <sup...@gmail.com>; chathura widanage <chathura...@gmail.com>
Cc: Yossi Itigin <yos...@mellanox.com>
Subject: RE: Twister2 debug session

 

Ok, cool. Let me know if you would need an assistance.

 

 

Best Regards,

Peter Rudenko

Software engineer | Mellanox Technologies Ltd.

+380679348614

 

 

 

 

From: Ahmet Uyar <ahme...@gmail.com>
Sent: Wednesday, March 25, 2020 4:29 PM
To: Peter Rudenko <pet...@mellanox.com>; Supun Kamburugamuve <sup...@gmail.com>; chathura widanage <chathura...@gmail.com>
Cc: Yossi Itigin <yos...@mellanox.com>
Subject: Re: Twister2 debug session

 

Hi Peter,

 

Many thanks for that. 

 

I can try it tomorrow and let you know the result. Meanwhile, we can postpone the meeting I guess. 

 

Ahmet

 

 

On Wed, Mar 25, 2020 at 4:59 PM Peter Rudenko <pet...@mellanox.com> wrote:

Ok, so finally seems to figure out. The problem is that you put only *.so objects. Need also to put static library *.a objects to ucx. Can you please put this https://drive.google.com/open?id=1DuWsamVdwPJ1qBXVgLtBwHhNh6JYHFr8 to lib/ucx

 

 

With this it works.

Let me know the results,

 

 

Best Regards,

Peter Rudenko

Software engineer | Mellanox Technologies Ltd.

+380679348614

 

 

 

From: Peter Rudenko
Sent: Wednesday, March 25, 2020 2:35 PM
To: Ahmet Uyar <ahme...@gmail.com>
Cc: Yossi Itigin <yos...@mellanox.com>
Subject: RE: Twister2 debug session

 

It’s in aggregated log when I launch a job. Now trying to locate a problem:

  1. Is it TCP stack causing this – will it work on RDMA transport?
  2. Is it UCX causing this – will it work with other Twister2 channels?

 

Best Regards,

Peter Rudenko

Software engineer | Mellanox Technologies Ltd.

+380679348614

 

 

 

 

From: Ahmet Uyar <ahme...@gmail.com>
Sent: Wednesday, March 25, 2020 2:02 PM
To: Peter Rudenko <pet...@mellanox.com>
Cc: Yossi Itigin <yos...@mellanox.com>
Subject: Re: Twister2 debug session

 

Hi Peter,

 

which file is this error from? 

 

I have neither seen it in the regular log file nor in hs_err-pid16.log file.

 

Ahmet

 

 

 

On Wed, Mar 25, 2020 at 2:20 PM Peter Rudenko <pet...@mellanox.com> wrote:

Hi, seems I was able to reproduce your issue. Before the segfault the same as you have I got:

 

d160c3b76dd-0/part_1

*** Error in `/hpc/scrap/users/peterr/java//bin/java': corrupted double-linked list: 0x00007ffff0adfbd0 ***

======= Backtrace: =========

/usr/lib64/libc.so.6(+0x80b4f)[0x7ffff7458b4f]

/usr/lib64/libc.so.6(+0x82205)[0x7ffff745a205]

/usr/lib64/libc.so.6(__libc_malloc+0x4c)[0x7ffff745d7dc]

/hpc/scrap/users/peterr/jdk1.8.0_151/jre/lib/amd64/libjava.so(Java_java_lang_ClassLoader_defineClass1+0x53)[0x7ffff5aaaab3]

[0x7fffe147a1cb]

 

Checking whether it’s a jucx or some other java memory issue. Have you seen in your logs?

 

 

Best Regards,

Peter Rudenko

Software engineer | Mellanox Technologies Ltd.

+380679348614

 

 

 

 

-----Original Appointment-----
From: Peter Rudenko
Sent: Tuesday, March 24, 2020 6:39 PM
To: ahme...@gmail.com
Cc: Yossi Itigin
Subject: Twister2 debug session
When: Wednesday, March 25, 2020 12:30 PM-1:30 PM (UTC-05:00) Eastern Time (US & Canada).
Where: Skype Meeting

 

Hi, setting invite in Skype, will try to figure out how to setup webex, and update with a link.

 

.........................................................................................................................................

Join Skype Meeting      

Trouble Joining? Try Skype Web App

Join by phone

 

+972 (74) 7237000 (Israel),,84640283# (Mellanox)                      English (United States)

+33 (805) 100672 (France Toll Free),,84640283# (Mellanox)                     English (United States)  

+49 (800) 5892698 (Germany Toll Free),,84640283# (Mellanox)                              English (United States)  

8 800 1006947 (Russia Toll Free),,84640283# (Mellanox)                           English (United States)  

+44 (800) 1700936 (UK Toll Free),,84640283# (Mellanox)                          English (United States)  

+1 (888) 3314421 (US Toll Free),,84640283# (Mellanox)                            English (United States)  

+972 (1809) 494180 (IL Toll Free),,84640283# (Mellanox)                          English (United States)  

+40 (800) 895907 (Romania Toll Free),,84640283# (Mellanox)                 English (United States)  

+61 (1800) 055101 (Australia Toll Free),,84640283# (Mellanox)                               English (United States)  

+82 (007) 9885214386 (South Korea Toll Free),,84640283# (Mellanox)                  English (United States)  

+4 580 880 552 (Denmark Toll free),,84640283# (Mellanox)                     English (United States)  

+44 (203) 1500253 (UK),,84640283# (Mellanox)                          English (United States)  

+1 (408) 9160061 (US West),,84640283# (Mellanox)                  English (United States)  

+1 (978) 4395425 (US East),,84640283# (Mellanox)                    English (United States)  

+91 000 (800) 1004157 (India Toll Free),,84640283# (Mellanox)                              English (United States)  

+55 (800) 8917640 (Brazil Toll Free),,84640283# (Mellanox)                     Portuguese (Brazil)  

+52 (1800) 7334070 (Mexico Toll Free),,84640283# (Mellanox)                               English (United States)  

+27 (800) 988088 (South Africa Toll Free),,84640283# (Mellanox)                           English (United States)  

+41 (800) 321019 (Switzerland Toll Free),,84640283# (Mellanox)                            English (United States)  

+86-1057892250 China,,84640283# (Mellanox)                           English (United States)  

+86 (400) 6713580 China Toll-Free,,84640283# (Mellanox)                       English (United States)  

+886-281786069 Taiwan,,84640283# (Mellanox)                         English (United States)  

+886 00801856740 (Taiwan Toll Free),,84640283# (Mellanox)                 English (United States)  

+65-63504799 Singapore,,84640283# (Mellanox)                        English (United States)  

+65 (800) 9721125 (Singapore Toll Free),,84640283# (Mellanox)                             English (United States)  

+64 (800) 451030 (New Zealand Toll Free),,84640283# (Mellanox)                          English (United States)  

+39 (800) 098375 (Italy Toll Free),,84640283# (Mellanox)                         English (United States)  

+31 0800 0200698 (Netherlands Toll Free),,84640283# (Mellanox)                         English (United States)  

+81 0120 324745 (Japan Toll-Free),,84640283# (Mellanox)                       English (United States)  

+81-345885149 Japan,,84640283# (Mellanox)                             English (United States)  

 

Find a local number

 

Conference ID: 84640283

Forgot your dial-in PIN? |Help  

 

 

Confer...@mellanox.com

 

To dial from External VC system call: Confer...@video.mellanox.com

For example: 123...@video.mellanox.com

[!OC([1033])!]

.........................................................................................................................................

 

Chathura Widanage

unread,
Sep 16, 2020, 9:57:16 AM9/16/20
to Ahmet Uyar, Twister2
Ahmet,

Seems they haven't merged this PR yet to the master branch. We took a revision directly from the master branch. 

Regards, 
Chathura

Chathura Widanage

unread,
Sep 16, 2020, 11:48:56 AM9/16/20
to Ahmet Uyar, Twister2
Hi Ahmet,

Please try below PR.


Regards,
Chathura

Ahmet Uyar

unread,
Sep 17, 2020, 7:08:08 AM9/17/20
to Chathura Widanage, Twister2
Hi Chathura,

I cleaned and recompiled twister2 with your pull request. However, It still has the same problem. 

First, This time it gives slightly different error as below: 
[1600339445.114256] [v-015:86172:0]           sock.c:404  UCX  ERROR bind(fd=205 addr=172.29.200.215:40171) failed: Address already in use
[1600339445.114298] [v-015:86172:0]      listener.cc:53   UCX  ERROR JUCX: Device is busy
It says "Device is busy".

Previously it was saying: "Input/output error"
[1600197786.800022] [v-002:228135:0]           sock.c:376  UCX  ERROR bind(fd=205 addr=172.29.200.202:34761) failed: Address already in use
[1600197786.800083] [v-002:228135:0]      listener.cc:53   UCX  ERROR JUCX: Input/output error

Second, If we set ucx parameter in TWSUCXChannel, it complains as:
[1600339260.618307] [v-015:83935:0]       context.cc:48   UCX  WARN  JUCX: no such key TCP_CM_ALLOW_ADDR_INUSE, ignoring
If I set it in mpiworker.sh as:
export UCX_TCP_CM_ALLOW_ADDR_INUSE=y
it does not give the above warning message. 

Ahmet
auyar-terasort-otwb6st.log

Chathura Widanage

unread,
Sep 17, 2020, 10:17:56 AM9/17/20
to Ahmet Uyar, Twister2
Ahmet,

They might not be supporting setting this programmatically yet from JUCX.
I tried to searching for an API call to force close a socket opened by Java.
But seems there is nothing we can do other than calling close().

Shall we add a significantly large sleep(1-5mins) before the UCX initialization and see whether that works.

Also, shall we try to connect to the same ServerSocket from Java instead of trying from UCX? If java succeeds, let's report that in UCX mailing list.

Regards,
Chathura

Ahmet Uyar

unread,
Sep 17, 2020, 3:38:09 PM9/17/20
to Chathura Widanage, Twister2
Hi Chathura,

I have tested with the regular TWSTCPChannel and have not seen this error. 
I ran the tests around 20 times and it all worked. 
With ucx, I usually get this error around 5 tries. 

I have given UCX more time to retry. But that did not help either. It tried around 200seconds. 
I attached the logs. 

Could you check my retry logic for ucx channel: 

Ahmet
auyar-terasort-oueg69m.log

Chathura Widanage

unread,
Sep 17, 2020, 3:53:51 PM9/17/20
to Ahmet Uyar, Twister2
Hi Ahmet,

Retrying approach seems good to me. Wondering what would happen if we try to create a Java Socket immediately after the UCX failure(and close immediatley if java scoket can be created sucessfully), just to see whether this has something to do with java code(JVM might be still holding the socket) vs native code.

Regards,
Chathura

Ahmet Uyar

unread,
Sep 18, 2020, 7:46:58 AM9/18/20
to Chathura Widanage, Twister2
Hi Chathura,

I tested creating a tcp server with java after UCX failure. Java also can not create the socket on that port. 
So, we should try to learn what program is using that port that time I guess. 

Ahmet

auyar-terasort-ov8o9fw.log

Ahmet Uyar

unread,
Sep 18, 2020, 1:02:59 PM9/18/20
to Chathura Widanage, Twister2
Today I have run the test with tcp channel again. I ran it 50 times. It all worked fine. 

So, this issue comes up only when the ucx channel is used. 

Ahmet

Chathura Widanage

unread,
Sep 18, 2020, 1:20:22 PM9/18/20
to Ahmet Uyar, Twister2
Could there be a case where UCX internally use a random port for it's own housekeeping things? Shall we ask that in their mailing list?

Regards,
Chathura


--
You received this message because you are subscribed to the Google Groups "Twister2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to twister2+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/twister2/CAPBRfYcX%2B-BaT%2Boky1HO4vYRyq2etgQBTtbPk0VXSbeV3pt9Vw%40mail.gmail.com.

Ahmet Uyar

unread,
Sep 18, 2020, 2:06:35 PM9/18/20
to Chathura Widanage, Twister2
I think we need to report this to them. 

Ahmet

Ahmet Uyar

unread,
Sep 21, 2020, 7:04:49 AM9/21/20
to Twister2
Hi guys,

I think, we may not be able to provide different local IP addresses for ucx. 

When UCX channel is used, Can we initially start UcxSocket instead of java ServerSocket to hold the port? 

WDYT?

Ahmet

---------- Forwarded message ---------
From: Peter Rudenko <prud...@nvidia.com>
Date: Sat, Sep 19, 2020 at 6:01 PM
Subject: RE: Twister2 debug session
To: Ahmet Uyar <ahme...@gmail.com>
Cc: Chathura Widanage <chathura...@gmail.com>, Yossi Itigin <yos...@nvidia.com>, Alina Sklarevich <ali...@mellanox.com>, Vibhatha Abeykoon <vibh...@gmail.com>, Yossi Itigin <yos...@mellanox.com>, Peter Rudenko <pet...@mellanox.com>, Supun Kamburugamuve <sup...@gmail.com>, Dmitry Gladkov <dmit...@mellanox.com>


Hi, here’s an answer:

 

  1. UCX_TCP_CM_ALLOW_ADDR_INUSE can be used only with UCX_SOCKADDR_CM_ENABLE=y
  2. SO_REUSEADDR allows a single process to bundle the same port to multiple sockets, as long as each bundle specifies a different local IP address.
  3. The SO_REUSEPORT socket option:

This option allows full duplicate bundling, but only if each socket that wants to bundle the same IP address and port specifies this socket option.

We may add this option to UCX tcp sockcm as well.

So for now you need to bound ServerSocket and UcpListener to different local IP addresses, but the same port:

UcpParams params = new UcpParams().setConfig("TCP_CM_ALLOW_ADDR_INUSE", "y")
    .setConfig(
"SOCKADDR_CM_ENABLE", "y")
    .requestTagFeature()
;
UcpContext context = new UcpContext(params);

UcpWorker worker = context.newWorker(new UcpWorkerParams());

ServerSocket socket = new ServerSocket();
socket.setReuseAddress(true);
InetSocketAddress address =
   
new InetSocketAddress("1.1.24.1", 12345);
socket.bind(address);


InetSocketAddress address2 =
   
new InetSocketAddress("1.1.23.1", 12345);
UcpListener listener = worker.newListener(new UcpListenerParams().setSockAddr(address2));

assertNotNull(listener.getNativeId());

listener.close();
socket.close();
worker.close();
context.close();

We may add SO_REUSEPORT option to UCX then you can start server socket and listener on the same ip-port.

 

Thanks,

Peter

 

From: Ahmet Uyar <ahme...@gmail.com>
Sent: Friday, September 18, 2020 10:07 PM
To: Peter Rudenko <prud...@nvidia.com>
Cc: Chathura Widanage <chathura...@gmail.com>; Yossi Itigin <yos...@nvidia.com>; Alina Sklarevich <ali...@mellanox.com>; Vibhatha Abeykoon <vibh...@gmail.com>; Yossi Itigin <yos...@mellanox.com>; Peter Rudenko <pet...@mellanox.com>; Supun Kamburugamuve <sup...@gmail.com>; Dmitry Gladkov <dmit...@mellanox.com>
Subject: Re: Twister2 debug session

 

Hi Peter and all,

 

We tested with UCX_TCP_CM_ALLOW_ADDR_INUSE variable set. However, we are still getting the port binding error once in a while. 

Let me describe briefly our use case:

-- we have 40 workers on each node. we have two nodes. in total 80 workers. 

-- when a worker starts, it determines a free port. by executing:

   ServerSocket socket = new ServerSocket(0);

   return socket.getLocalPort();

-- each worker keeps this socket bounded during initialization

-- each worker connects to the job master and uses another port assigned automatically

-- after all workers connected to the job master, each worker releases initially constructed socket by freeing up the port

-- ucx channel is created with initially determined free port.

 

this application works fine most of the time, but sometimes it gives below errors and the exception:

[1600421801.337247] [v-015:215549:0]           sock.c:404  UCX  ERROR bind(fd=203 addr=172.29.200.215:46485) failed: Address already in use
[1600421801.337307] [v-015:215549:0]      listener.cc:53   UCX  ERROR JUCX: Device is busy

 

Caused by: org.openucx.jucx.UcxException: Device is busy
        at org.openucx.jucx.ucp.UcpListener.createUcpListener(Native Method)
        at org.openucx.jucx.ucp.UcpListener.<init>(UcpListener.java:25)
        at org.openucx.jucx.ucp.UcpWorker.newListener(UcpWorker.java:49)
        at edu.iu.dsc.tws.comms.ucx.TWSUCXChannel.createUXCWorker(TWSUCXChannel.java:115)

 

When we run the same application with regular TCP channels, they work fine. We never encountered this problem with it. 

 

Although the port is freed by closing java socket socket.close(), ucx is still complaining that the device is busy. After initial failure, we waited some time and retried again several times but that did not help either. It is always complaining that the device is busy. 

 

Any help would be appreciated.

 

thanks in advance,

 

Ahmet

 

[1599227712.187522] [v-011:159084:0]      listener.cc:53   UCX  ERROR JUCX: Destination is unreachable


Regards,

Chathura

 

 

On Fri, Sep 4, 2020 at 11:12 AM Peter Rudenko <prud...@nvidia.com> wrote:

Adding Vibhatha.

 

Regards,

Chathura

 

 

Ahmet Uyar

unread,
Sep 21, 2020, 12:17:38 PM9/21/20
to Twister2
Hi guys,

I tested with socket.setReuseAddress(true); in initial port binding and unfortunately it is still giving the same error. 
Attached the logs. 

Ahmet

auyar-terasort-ozwkdox.log

Ahmet Uyar

unread,
Sep 23, 2020, 8:58:19 AM9/23/20
to Twister2
Hi guys,

I tested without using JobMaster. I thought maybe somehow it is related to running the job master on the same nodes. But unfortunately it still has the same problem. 
So, it seems it is related to re-using the released port for UCX. 

Ahmet
auyar-terasort-p2kugxd.log
Reply all
Reply to author
Forward
0 new messages