Reminder of SIG-Networking meeting tomorrow

29 views
Skip to first unread message

Bairen Yi

unread,
Mar 11, 2019, 1:14:53 PM3/11/19
to netwo...@tensorflow.org, Jeroen Bédorf
Hi SIG Networking,

We are meeting this Tuesday at 8am PT (11pm GMT+8 due to DST, also means one hour earlier for everyone outside US). Here is the agenda, feel free to fill it in:


There seems to be a fair amount of interest in adding collective MPI-style all reduce to networking plugin, and I suggest you to have a look of the C API design draft written by Anna and Paul:


For those who can't connect via Zoom online, here's the dial-in details:

Dial-in +1 646 558 8656 US (New York)
Meeting ID: 998 688 064
Find your local number: https://zoom.us/u/abtW4Tk28v

 Looking forward to talking with you all tomorrow!

Cheers,
Bairen

Bairen Yi

unread,
Mar 12, 2019, 1:19:32 PM3/12/19
to netwo...@tensorflow.org, Jeroen Bédorf
Hi folks,

Thanks everyone for attending the meeting today. The meeting notes could be found here:


We had extensive discussion on the scope and design of the networking C API. We have reached consensus to carve out the BaseRendezvousManager and the RecvTensor RPC as our first step. We have not reached an agreement on the best approach for implementing collective ops in networking plugin. It could either be overriding the RecvBuf RPC, or a complete re-implementation of its OpKernel.

Thanks to the help from Paul and his colleagues in the TF team, we expect to ship some of our plugins with TF 2.0 soon.

Cheers,
Bairen

Anna Revinskaya

unread,
Mar 13, 2019, 6:54:10 PM3/13/19
to Bairen Yi, netwo...@tensorflow.org, Jeroen Bédorf
Hi All,

I just want to follow up that we want to spend more time on the design of the networking plugin and come up with something that would work long term as opposed to a short term solution. I have been working on the approach that Bairen mentioned above (re-implementing at BaseRendezvousMgr level). However, we want to revisit and see if we should carve out APIs higher up in the code stack (above GRPC).

So, I just wanted to follow up and say that this might take additional time to come up with the right API in this case.

--
You received this message because you are subscribed to the Google Groups "SIG Networking" group.
To unsubscribe from this group and stop receiving emails from it, send an email to networking+...@tensorflow.org.
Visit this group at https://groups.google.com/a/tensorflow.org/group/networking/.

Bairen YI

unread,
Mar 13, 2019, 7:20:25 PM3/13/19
to Anna Revinskaya, netwo...@tensorflow.org, Jeroen Bédorf
Hi Anna,

Thanks for sync’ing with us. I think most people in the SIG would be leaning to a long term solution, so gladly we are on the same page. It’s better to make it right than rush into something that we have to fix later on.

Cheers,
Bairen
Reply all
Reply to author
Forward
0 new messages