Which ICE connection states do you actually use/need?

2,116 views
Skip to first unread message

Peter Thatcher

unread,
Jan 28, 2016, 4:21:38 PM1/28/16
to discuss-webrtc
Recently, there's been difficulty in nailing down what the "failed" ICE connection state means.  And part of the difficultly is knowing how applications are using it and what application developers expect/need from the ICE connection state.

So, please write back and let us know:  What ICE connection states do you actually use/need?  
If you prefer not to post publicly, you can email directly with you responses.

connected:  I assume everyone needs this one.

completed: This means connected+"done".  Do you care that it's "done" or not?  Or do you just care if it's connected or not?

disconnected:  This means not connected, but was previously was connected.  Do you care that it was previously connected or not?  Or does it just matter that it's not connected?

checking: This means never connected, but checking.  Do you care that it's checking or not?  If so, would you care to know if it's checking or not when disconnected?

failed: This is the tricky one, and which we need the most input.  It means not connected + "done", for some definition of "done" (that's the hard part we're trying to nail down: what does "done" mean?).

- If we just removed the "failed" state altogether, would you miss it?

- If we implemented "continual gathering" which would allow ICE to better handle switches between WiFI and Cellular (or other network changes) without an ICE restart, but which prevented the "failed" state from being permanent (until an ICE restart), which would you choose?  The failed state being permanent (until an ICE restart), or the better network behavior without ICE restarts?

- If a PeerConnection went to "failed" and then later become "connected", would that be a good thing or a bad thing for your application?

- If knowing ICE were "done" required signalling a new event/message, would you bother hooking up to the event and sending the new message and pushing it down with the new method?  Or would it be too much trouble than it's worth?

- If we redefine what ICE connection states mean and when they change, how much difficulty will it be for you to change your application to match the new meanings?

Thanks for your input.  Knowing how applications really use or would like to use these states will help us greatly in knowing how to design them.  

- Peter



Iñaki Baz Castillo

unread,
Jan 28, 2016, 4:56:05 PM1/28/16
to discuss...@googlegroups.com
Thanks for raising this question. I will reply assuming that, in the
future, ICE will evolve to be "mobility aware", which means that new
candidates can be provided at any time and new 5-tuple can be selected
at any time without the need of an annoying SDP renegotiation. And
thus we can transition from "connected" to "disconnected" and again to
"connected".



2016-01-28 22:20 GMT+01:00 'Peter Thatcher' via discuss-webrtc
<discuss...@googlegroups.com>:
> connected: I assume everyone needs this one.

Yes.


> completed: This means connected+"done". Do you care that it's "done" or
> not? Or do you just care if it's connected or not?

Just remove this state and we all will be happier. In fact, ICE spec
allows media (let's say DTLS or RTP/RTCP) to flow once the first
candidate pair is found (regardless the ICE controller selects another
candidate pair later). So, once media can flow we are done (at
application level).


> disconnected: This means not connected, but was previously was connected.
> Do you care that it was previously connected or not? Or does it just matter
> that it's not connected?

"disconnected" should mean (IMHO) that somehow ICE failed or was
closed so, after being "checking" or "connected", it is not
"disconnected".



Let's check another common API: WebSocket or DataChannel:

- "close" event fires when the connection fails (so it didn't ever
connect) or it is disconnected/closed.

- "error" event fires when the connection fails without previously
being connected.

I don't like this because if the connection just fails both events are
fired and it becomes hard for the app to correlate them.



> checking: This means never connected, but checking. Do you care that it's
> checking or not? If so, would you care to know if it's checking or not when
> disconnected?

"checking" should mean "not connected" but trying, regardless it was
connected before or not. That's simple.


> failed: This is the tricky one, and which we need the most input. It means
> not connected + "done", for some definition of "done" (that's the hard part
> we're trying to nail down: what does "done" mean?).
>
> - If we just removed the "failed" state altogether, would you miss it?
>
> - If we implemented "continual gathering" which would allow ICE to better
> handle switches between WiFI and Cellular (or other network changes) without
> an ICE restart, but which prevented the "failed" state from being permanent
> (until an ICE restart), which would you choose? The failed state being
> permanent (until an ICE restart), or the better network behavior without ICE
> restarts?
>
> - If a PeerConnection went to "failed" and then later become "connected",
> would that be a good thing or a bad thing for your application?
>
> - If knowing ICE were "done" required signalling a new event/message, would
> you bother hooking up to the event and sending the new message and pushing
> it down with the new method? Or would it be too much trouble than it's
> worth?

"failed" should not exist (in case we have continual gathering).


IMHO we need:

- "checking": It was connected or not, but now it is trying to connect.

- "connected": ICE connected and media can flow.

- "disconnected": If it was "connected" then this means that the
chosen pair has suddenly failed (ICE consent, ICE TCP closed, etc). If
it was not "connected" then this means that all the candidate pairs
were tested and failed. When "disconnected" fires it should be
expected that "checking" is fired again (if there are reasons to try
again).
Also, "disconnected" should be fired with a "reason" or "cause"
attribute in the corresponding Event object.

- "closed": The user called stop().



> - If we redefine what ICE connection states mean and when they change, how
> much difficulty will it be for you to change your application to match the
> new meanings?

The current states are not easy to manage.





--
Iñaki Baz Castillo
<i...@aliax.net>

Philipp Hancke

unread,
Jan 28, 2016, 5:12:04 PM1/28/16
to discuss...@googlegroups.com
2016-01-28 13:20 GMT-08:00 'Peter Thatcher' via discuss-webrtc <discuss...@googlegroups.com>:
Recently, there's been difficulty in nailing down what the "failed" ICE connection state means. 

I hear ya :-)
 
And part of the difficultly is knowing how applications are using it and what application developers expect/need from the ICE connection state.

So, please write back and let us know:  What ICE connection states do you actually use/need?  
If you prefer not to post publicly, you can email directly with you responses.

connected:  I assume everyone needs this one.

But what are you really doing on the transition to connected?
Start sending media? The browser does that for you.
Start sending data on datachannels? Well, maybe you should be looking for the bufferedAmountLow event instead?

I'm gathering analytics data here, mostly on the first connect.
One of the more important uses is figuring out if the local client is using a TURN-relayed connection (UX: turn/tcp? screw quality expectations)

completed: This means connected+"done".  Do you care that it's "done" or not?  Or do you just care if it's connected or not?

From a UX perspective I don't care. I want to show the user that media is to be expected from the peer at this point. Even though I think there is a better way by listening to the tracks readyStateChange here?

I do run certain analytics on the transition though.
 
disconnected:  This means not connected, but was previously was connected.  Do you care that it was previously connected or not?  Or does it just matter that it's not connected?

I want to be able to distinguish these two cases:
1) the connection could not be established at all
2) the connection was interrupted

checking: This means never connected, but checking.  Do you care that it's checking or not?  If so, would you care to know if it's checking or not when disconnected?

I only care for certain analytics. And for those I could take the the time at which both local and remote description are set.
 

failed: This is the tricky one, and which we need the most input.  It means not connected + "done", for some definition of "done" (that's the hard part we're trying to nail down: what does "done" mean?).

- If we just removed the "failed" state altogether, would you miss it?

With its current definition of "the call could not be established at all" yes. However, I could easily check if the transition was checking->disconnected.


- If we implemented "continual gathering" which would allow ICE to better handle switches between WiFI and Cellular (or other network changes) without an ICE restart, but which prevented the "failed" state from being permanent (until an ICE restart), which would you choose?  The failed state being permanent (until an ICE restart), or the better network behavior without ICE restarts?

better network behaviour without ice restarts. I am worried that ice-restarts are not glare-free in the 1.0 API (e.g. when the responder adds media while the initiator does an ice restart) which violates the separation of media and transport imo.

 
- If a PeerConnection went to "failed" and then later become "connected", would that be a good thing or a bad thing for your application?

From a UX perspective:
- bad, if I currently show a message like "things have failed. We are sorry".
- good, if I show a message "something is broken, let us check if we can fix it" (with an ice restart). That would require just removing the ice restart (or no-oping it in the browser)

- If knowing ICE were "done" required signalling a new event/message, would you bother hooking up to the event and sending the new message and pushing it down with the new method?  Or would it be too much trouble than it's worth?

Yes. Even if this was not required in the success case, i think it helps with error diagnosis.
 
- If we redefine what ICE connection states mean and when they change, how much difficulty will it be for you to change your application to match the new meanings?

I would like to see a very clear migration path. From all browsers making the change. I only learned about the getStats-maplike change (which arguably is not implemented yet) by accident and was not happy.

Thanks for your input.  Knowing how applications really use or would like to use these states will help us greatly in knowing how to design them.  

- Peter



--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/discuss-webrtc/CAJrXDUGur7V-jJD8-3u0NwHON724LBCse%3DRp%3Dh8aB_cRd7iing%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Rahul Behera

unread,
Jan 28, 2016, 5:25:40 PM1/28/16
to discuss-webrtc
From what I understand if you transition from checking to failed, that means that you failed to connect. Network issues (restricted ports, etc) can be seen at this state
Connected to disconnected means that the live connection has been severed and is actively trying to rebuild the the connection. If it goes back to connected then the connection is back and live. If it hits failed then peerconnection will no longer try to reconnect. I'm not sure how it can go from failed to disconnected (seperate post).

Once you hit the connected state, i'd recommend you unplug ethernet and plug back in (or airplane mode) without touching anything else. Watch how the state will drop to disconnected, then after ~15 seconds drop to a failed state. If you reconnect to the network before hitting the failed state you will get to a connected state which brings back video.

Xander Dumaine

unread,
Jan 28, 2016, 6:20:26 PM1/28/16
to discuss-webrtc
I echo pretty much everything that Fippo said. We're heavily using ICE connection states for UX and most of the time there isn't a 1:1 mapping of ICE connection state to UX state because of the ambiguity of disconnected, failed, and checking, but also because of media streams. If we think about these states in terms of User experience, then I'd break it down into these states

1. The connection is establishing 
2. The connection is established
3. The connection is dropped, reestablishing
4. The connection failed

Unfortunately, what you actually get is something much more complicated, so you have to map these steps, to the ones above

A. The connection is establishing (ICE checking)
B. The connection is established, but media hasn't started
C. The connection is established and media is started
D. The connection is dropped, and is attempting reconnect (ICE Checking)
E. The connection is dropped, ICE restart has succeeded, but media hasn't recovered
F. The connection has failed

Now, in most human scenarios, you can just Lump B in with C (i.e., once ICE connection state is connected, stop showing the loading state), but from what we've noticed, It's better to try to account for that and instead delay the "progressing" of the UX from A to C by instead lumping B in with A, so that B still shows a loading state, despite ICE connection state having transitioned from checking to completed (echo all of this for D/E -> C transition). This is all a way for me to say I like both connected and checking, though they aren't sufficient on their own for a good UX.

The next ambiguity comes in with the combination of "disconnected" and "checking" in UX. Do I want to show a red "disconnected" icon, or do I want to show a spinning "loading" icon? This requires me to know, or at least be optimistic, that the state will quickly transition from disconnected to checking. But, if I want to show this state differently in my UI, I have to know that this "checking" is not the same as the first "checking". We refer to this as a "recovering" (or alternatively, "rechecking") - maybe that might be a good state to have, instead of "disconnected" and then "checking". It would basically be "checking: never connected, but checking" "rechecking: was connected, but checking" - I'm not sure if others agree with having that level of state in the iceConnectionState.

Finally, "failed." If a connection is failed, it should be "done with an error" and should not go back to other states. It's a terminus state, and I would be quite frustrated if I saw failed go back to checking or connected, because that's what "disconnected" is for.

As for how difficult this would be to implement - probably difficult, but worth it, if, as Fippo says, there's a good upgrade path, a reliable way to feature detect the change, and consistent browser behavior.

Dag-Inge Aas

unread,
Jan 29, 2016, 2:31:46 AM1/29/16
to discuss-webrtc
Agreeing with Xander and Philipp here. In our current application we use:

- Checking -> Show a loading spinner thing to tell the user to wait
- Connected/Completed -> Both mean we are connected and will show the video element
- Failed -> We can't connect you at all, and will show an error message.

We do not want failed to mean anything other than: "We have tried everything we can and we can't connect you. Please go to our FAQ for more information". We do want a state like that, should the webrtc stack just give up or detect that nothing is going to work at this point, but we do not want it to ever transition to anything else unless we explicitly ask it to retry.

From a UX perspective we really only need:
- We are trying to connect
- You are connected and media is flowing
- You have become disconnected, we are trying to reestablish a connection for you
- We have given up trying to connect you

As far as upgrade path goes, as long as this is a local client change that we can transition into (that is, we don't have to care about backwards compatibility, which I believe is the case here), we should have no problem upgrading all our clients to whatever you decide.

I'm being intentionally vague on the naming of things because quite frankly I suck at naming, and we will implement anything you decide, as long as it conforms to the UX expectations that we have (and is consistent).

Martin Gartner

unread,
Jan 29, 2016, 4:22:43 AM1/29/16
to discuss-webrtc
Good that this question is raised!

Most important for me are the ICE states disconnected and failed.
- Failed is used as an indicator that the media connection is permanently lost, so it is not usefull to continue the call and it is meaningfull to terminate the call.
- Disconnected is important for informing the user that the connection is at least temoprarily disturbed/interrupted, but the user has to know that it is meaningfull to wait for a possible recovery of the media connection, so a pop-up telling him "reconnect in progress" or something like that.
- Disconnected can also be used to try setting up an alternative media connection, or to do an ICE-restart on the current (disturbed) media connection.

Regards,

Martin

gordon...@telenordigital.com

unread,
Jan 29, 2016, 6:44:54 AM1/29/16
to discuss-webrtc
For the sake of correlating track stats with the connection currently in use, I would like to be notified whenever a new ICE candidate pair is selected. This could be achieved simply by firing a connectionStateChanged 'connected' event each time. Thus, the state transition graph would include a self-transition from connected to connected.

Other than this, I'm amenable to the suggestions already given here.

-Gordon

Iñaki Baz Castillo

unread,
Jan 29, 2016, 6:46:42 AM1/29/16
to discuss...@googlegroups.com
2016-01-29 12:44 GMT+01:00 <gordon...@telenordigital.com>:
> For the sake of correlating track stats with the connection currently in use, I would like to be notified whenever a new ICE candidate pair is selected. This could be achieved simply by firing a connectionStateChanged 'connected' event each time. Thus, the state transition graph would include a self-transition from connected to connected.

Hopefully something like this:

http://ortc.org/wp-content/uploads/2015/11/ortc.html#rtcicecandidatepairchangedevent-interface-definition*

Taylor Brandstetter

unread,
Jan 29, 2016, 1:22:29 PM1/29/16
to discuss...@googlegroups.com
What I'm hearing from multiple people is that it's important from a UX perspective to have some indication that the call is not working, and isn't going to recover.

The "failed" state currently is close to fulfilling this role, but consider these two scenarios:

1. The local PeerConnection finishes gathering candidates, and only gets a host candidate from the remote peer. The connectivity checks quickly fail with ICMP errors. Should the state then transition to "failed"? The browser doesn't know if a new remote candidate will arrive in the future, which will succeed. So if the application is relying on "failed" to show the user a message, that message may be shown prematurely.

2. In the future, we could transition to a model of "continual gathering", where new ICE candidates can be gathered at any time. This means the connection will be able to recover at any time, without explicit action by the application. If this is the case, what should the criteria be for showing the user a "failed" message? Something like "The state has been 'disconnected' and no candidate with a new foundation has been seen for X seconds"? Even then, the user could switch to a different network, and get a new candidate that succeeds. What should happen then?

--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.

Dag-Inge Aas

unread,
Jan 29, 2016, 2:38:28 PM1/29/16
to discuss...@googlegroups.com
On Fri, Jan 29, 2016 at 7:22 PM, 'Taylor Brandstetter' via
discuss-webrtc <discuss...@googlegroups.com> wrote:
> 1. The local PeerConnection finishes gathering candidates, and only gets a
> host candidate from the remote peer. The connectivity checks quickly fail
> with ICMP errors. Should the state then transition to "failed"? The browser
> doesn't know if a new remote candidate will arrive in the future, which will
> succeed. So if the application is relying on "failed" to show the user a
> message, that message may be shown prematurely.

Trickle ICE certainly complicates things, that I can agree with.


> 2. In the future, we could transition to a model of "continual gathering",
> where new ICE candidates can be gathered at any time. This means the
> connection will be able to recover at any time, without explicit action by
> the application. If this is the case, what should the criteria be for
> showing the user a "failed" message? Something like "The state has been
> 'disconnected' and no candidate with a new foundation has been seen for X
> seconds"? Even then, the user could switch to a different network, and get a
> new candidate that succeeds. What should happen then?

Good points here. In both of these cases a "failed" state would not
make sense where the browser simply gives up. However, from the
simplistic view we have now, failed represents a state where
everything has, well, failed. This will as you mention change in the
future.

It all comes down to end-user UX. Users will not sit and wait forever,
and an application is at some point expected to give feedback valuable
feedback on what's going on. I think this is more of a timeout issue,
if nothing has happened in X seconds (where I would suspect X < 5)
then an application should enter a failed state. This can of course be
measured and tested. And it doesn't have to have anything to do with
whatever ICE state we're currently in.

Nonetheless, if the connection state goes to failed, I expect it to
stay failed. Otherwise you are still checking or simply disconnected.

Just my two cents a late night.


--
Dag-Inge Aas
Tech Lead @ appear.in

Thomas Bruun

unread,
Jan 29, 2016, 3:13:29 PM1/29/16
to discuss...@googlegroups.com
On Thu, Jan 28, 2016 at 10:20 PM, 'Peter Thatcher' via discuss-webrtc
<discuss...@googlegroups.com> wrote:
> disconnected: This means not connected, but was previously was connected.
> Do you care that it was previously connected or not? Or does it just matter
> that it's not connected?
>

Previously we used this state to tell the user that "something went
wrong, you are now disconnected", but there were two problems if I
recall correctly:

1. The PC enters this state if liveness checks for _any_ of the
components have failed, but it wasn't made clear which one. If for
some reason audio liveness checks had failed but video had not, we
would tell the user that both video and audio had stopped
transmitting. If the state had more context, such as _which_ component
had failed liveness checks, that would be interesting.

2. We saw that for some flaky connections, the state transitioned
between "connected" and "disconnected" with only seconds in between.
This made using the state alone for decorating the UI pretty confusing
for the user.

We in stead started doing our own liveness checks using the getStats
API, and stopped relying the "disconnected" state all together. It
isn't optimal, but now we only decorate the UI if _no_ data is sent
and/or received the past 10 seconds.

As Dag-Inge said earlier, we treat "failed" as a state you cannot recover from.

--
Thomas Bruun
Software Engineer, appear.in

Lance Stout

unread,
Jan 29, 2016, 3:25:05 PM1/29/16
to discuss-webrtc
UX considerations are not going to be able to escape the fact that there are multiple state machines that need to be monitored in order to deliver the best UX. There is the signaling state (did we lose connection to the signaling service?!), ICE state, media state, etc.


For ICE, I really only care about the checking, connected, disconnected states. The completed and failed states are nice for metrics analysis, but not as directly useful for application UX. I can already keep a timer to check if no media/data has been received recently, and for how long, and update the UI accordingly.


That said, if failed exists, I expect it to stay failed. It sounds like most of us would be in favor of a new state for not-quite-failed when doing trickle where all candidates have been exhausted and a reasonable amount of time has passed, but the connection could still be reestablished if the network changes and a new candidate is found. The connection isn't dead, it's only mostly dead :)

--
Lance

Steve Mcfarlin

unread,
Jan 29, 2016, 4:01:06 PM1/29/16
to discuss-webrtc

connected:  I assume everyone needs this one.

Yes. 
 
completed: This means connected+"done".  Do you care that it's "done" or not?  Or do you just care if it's connected or not?

I do not use this state in my code other than to check if it is in this state. I have seen it go into this state and not transition to connected even though media is flowing. This may have been occurring in older webrtc versions. 

The only use for this state I can think of is for analytics. You could check how long candidate gather was taking.
 
disconnected:  This means not connected, but was previously was connected.  Do you care that it was previously connected or not?  Or does it just matter that it's not connected?

I like the current aggressive nature of the transition to this state. It tells me something is going wrong (e.g. high packet loss, loss of UDP/TCP transport etc..). I was considering hooking ICE restarts up to this state, but have sense stepped back from that approach due to glare issues we will have to handle (your suggestion from here https://groups.google.com/forum/#!topic/discuss-webrtc/CRHffPWGpg4).
 

checking: This means never connected, but checking.  Do you care that it's checking or not?  If so, would you care to know if it's checking or not when disconnected?

I currently do nothing with this state. Again maybe useful for analytics.
 
failed: This is the tricky one, and which we need the most input.  It means not connected + "done", for some definition of "done" (that's the hard part we're trying to nail down: what does "done" mean?).

I do currently use this state in an edge case. If our signaling channel goes down we assume RTC comms are down as well. When the signaling channel reconnects we signal for an ICE restart. If the PC is in Connected state we set a flag such that if it does go to failed at some point in the future we will trigger the ICE restart at that time. Again total edge case. 

I simply want to know two things. 

- Things are going wrong (STUN bindings have been lost).
- All tranport channels no longer have connectivity. 

Currently disconnected and failed provide this for me. However, failed takes a long time to transition to. Most users would not wait around that long for any ice restart handling logic to fire. They would probably refresh the browser, reconnect mobile app. So with this it is only useful for an edge case IMO.
 
- If we just removed the "failed" state altogether, would you miss it?

I would not given you could use closed as the terminal state (no transports were successful).
 
- If we implemented "continual gathering" which would allow ICE to better handle switches between WiFI and Cellular (or other network changes) without an ICE restart, but which prevented the "failed" state from being permanent (until an ICE restart), which would you choose?  The failed state being permanent (until an ICE restart), or the better network behavior without ICE restarts?

Continual gathering 100%. https://tools.ietf.org/html/draft-uberti-mmusic-nombis-00. If you can implement nombis, or something similar I would be very very happy. You would make my life much easier as at the moment I am woking on getting transition to and from wifi/wwan working. Using the ICE states is kinda complex, so I have a mix of ICE state code and signaling channel network state code. It would be great if I could decouple ICE states from our signaling channel state. I would also not have to worry about the current glare issue with ICE restarts.
 
- If a PeerConnection went to "failed" and then later become "connected", would that be a good thing or a bad thing for your application?

It really depends on the time to get into that state. When I use RTC in the browser if Video freezes and audio is lost for more than say 5 seconds I will refresh. I am not gong to wait around 15+ seconds for a call to reconnect.
 
- If knowing ICE were "done" required signalling a new event/message, would you bother hooking up to the event and sending the new message and pushing it down with the new method?  Or would it be too much trouble than it's worth?

I agree with Philipp here. 
 
- If we redefine what ICE connection states mean and when they change, how much difficulty will it be for you to change your application to match the new meanings?

If it gives me opportunities to make clean implementation to do things like wifi->wwan transition then I don't care if it takes me two weeks to change. I will do it. 
 
Thanks for your input.  Knowing how applications really use or would like to use these states will help us greatly in knowing how to design them.  

Thanks for asking this question.

- steve
 
Message has been deleted

Steve Mcfarlin

unread,
Jan 29, 2016, 4:14:44 PM1/29/16
to discuss-webrtc
Also, if google's RTC implementation included continuous nomination (nombis) that could be signaled with a nonstandard SDP extension (until standards are agreed upon) I would gladly handle the extra complexity in my PC wrapper code to detect this and use ICE restarts when not supported by other end points. 
Reply all
Reply to author
Forward
0 new messages