Help! Unstable connection, strophe ignored the response and send again - Openfire BOSH terminates connection.

585 views
Skip to first unread message

Simon So

unread,
Jun 13, 2012, 4:29:26 PM6/13/12
to str...@googlegroups.com
Hi, 

I wonder if anyone came across this issue.

Technologies: Phonegap android app + Strophe.js + Openfire BOSH.

I noticed that, when wifi on client is unstable, and Openfire BOSH came back with a response, Strophe ignored the response and send the msg with the same RID.  

Meanwhile, when wifi on client is back to normal, it proceeded sending msg of RID+1 without waiting for RID to respond back.

Here are my questions:

1. In the log below, the first RID (2337229287) got reply back, but why Strophe sends another one at time 06-12 15:54:30.626?  Strophe didn't error out. 

2. Why does Strophe send RID+1 message ahead of RID, even though RID hasn't come back yet?

Openfire BOSH chokes on #2 because it won't process previous RID.  I am thinking about making Openfire BOSH idempotent by caching the {RID, response} and cleans up the cache periodically.

However, that still wouldn't explain #1.

Any insights on why strophe is doing #1?

Logs:
====

(1) RID = x

06-12 15:53:57.906: V/root.js(1835): {Thread-327}SENT<body rid='2337229287' xmlns='http://jabber.org/protocol/httpbind' sid='9bb5e34e'><iq type='result' to=' ... blah blah blah ></iq></body>

06-12 15:53:58.456: V/root.js(1835): {Thread-328}RECV: <body xmlns='http://jabber.org/protocol/httpbind' ack='2337229287'/>


(2) wifi flaps

06-12 15:53:59.106: D/WifiLocator(215): Good cache hits. Computing WiFi location locally hasLocation=14 noLocation=0 cacheMiss=4

06-12 15:53:59.106: D/WifiLocator(215): Computing location using MaxLre.

06-12 15:53:59.176: D/WifiLocator(215): Finished computing WiFi location: WifiLocationResult [position=Position [redacted], confidence=78, outliers=[]]

06-12 15:53:59.176: D/CellLocator(215): Found cell location: Position [redacted]

06-12 15:53:59.196: D/androidNlpServiceThread(215): reporting Location[mProvider=network,mTime=1339541639187,mLatitude=37.3526663,mLongitude=-122.0613866,mHasAltitude=false,mAltitude=0.0,mHasSpeed=false,mSpeed=0.0,mHasBearing=false,mBearing=0.0,mHasAccuracy=true,mAccuracy=30.0,mExtras=Bundle[{networkLocationSource=cached, networkLocationType=wifi}]]

06-12 15:53:59.276: D/dalvikvm(215): GC_CONCURRENT freed 859K, 54% free 3899K/8391K, external 1625K/2137K, paused 4ms+4ms

06-12 15:53:59.286: D/ClientReporter(215): reported location

06-12 15:54:30.446: I/wpa_supplicant(249): CTRL-EVENT-STATE-CHANGE id=0 state=8 BSSID=00:00:00:00:00:00

06-12 15:54:30.446: V/WifiMonitor(103): Event [CTRL-EVENT-STATE-CHANGE id=0 state=8 BSSID=00:00:00:00:00:00]

06-12 15:54:30.446: V/WifiStateTracker(103): Changing supplicant state: COMPLETED ==> DORMANT

06-12 15:54:30.446: D/WifiStateTracker(103): Reset connections and stopping DHCP

06-12 15:54:30.446: I/wpa_supplicant(249): CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys

06-12 15:54:30.456: V/WifiMonitor(103): Event [CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys]

06-12 15:54:30.456: I/wpa_supplicant(249): CTRL-EVENT-STATE-CHANGE id=-1 state=8 BSSID=00:00:00:00:00:00

06-12 15:54:30.456: V/WifiMonitor(103): Event [CTRL-EVENT-STATE-CHANGE id=-1 state=8 BSSID=00:00:00:00:00:00]

06-12 15:54:30.546: W/Smack/Packet(215): notify conn break (IOEx), close connection

06-12 15:54:30.546: D/Smack(215): [XMPPConn] close connection, notifyClosed=false

06-12 15:54:30.576: W/System.err(1835): javax.net.ssl.SSLException: Write error: ssl=0x70f7a0: I/O error during system call, Broken pipe

06-12 15:54:30.586: E/root.js(1835): {Thread-329}STROPHE: request id 71.1 error 0 happened

06-12 15:54:30.606: W/System.err(1835):         at org.apache.harmony.xnet.provider.jsse.NativeCrypto.SSL_write(Native Method)

06-12 15:54:30.606: W/System.err(1835):         at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl$SSLOutputStream.write(OpenSSLSocketImpl.java:839)

06-12 15:54:30.606: W/System.err(1835):         at org.apache.http.impl.io.AbstractSessionOutputBuffer.flushBuffer(AbstractSessionOutputBuffer.java:87)

06-12 15:54:30.606: W/System.err(1835):         at org.apache.http.impl.io.AbstractSessionOutputBuffer.flush(AbstractSessionOutputBuffer.java:94)

06-12 15:54:30.606: W/System.err(1835):         at android.net.http.AndroidHttpClientConnection.doFlush(AndroidHttpClientConnection.java:286)

06-12 15:54:30.606: W/System.err(1835):         at android.net.http.AndroidHttpClientConnection.close(AndroidHttpClientConnection.java:232)

06-12 15:54:30.606: W/System.err(1835):         at android.net.http.HttpsConnection.closeConnection(HttpsConnection.java:388)

06-12 15:54:30.606: W/System.err(1835):         at android.net.http.Connection.httpFailure(Connection.java:450)

06-12 15:54:30.606: W/System.err(1835):         at android.net.http.Connection.processRequests(Connection.java:247)

06-12 15:54:30.606: W/System.err(1835):         at android.net.http.ConnectionThread.run(ConnectionThread.java:113)06-12 15:54:30.606: W/root.js(1835): {Thread-330}STROPHE: request errored, status: 0, number of errors: 1


(3) see strophe retries 2337229287 even though it got the response already.  


06-12 15:54:30.626: V/root.js(1835): {Thread-331}SENT: <body rid='2337229287' xmlns='http://jabber.org/protocol/httpbind' sid='9bb5e34e'><iq type='result' to=' ... blah blah blah ></iq></body>

06-12 15:54:30.646: E/root.js(1835): {Thread-332}STROPHE: request id 72.2 error 0 happened

06-12 15:54:30.656: W/root.js(1835): {Thread-333}STROPHE: request errored, status: 0, number of errors: 2

06-12 15:54:30.696: V/root.js(1835): {Thread-334}SENT: <body rid='2337229287' xmlns='http://jabber.org/protocol/httpbind' sid='9bb5e34e'><iq type='result' to=' ... blah blah blah ></iq></body>

06-12 15:54:30.706: D/GTalkService(215): [GTalkConnection.23] connectionClosed: connId=66048, error=CONNECTION FAILED

06-12 15:54:31.456: D/WifiStateTracker(103): Disabling interface

06-12 15:54:31.776: V/WifiStateTracker(103): New network state is DISCONNECTED

06-12 15:54:31.776: V/WifiStateTracker(103): Changing supplicant state: DORMANT ==> DORMANT

06-12 15:54:31.776: I/TelephonyRegistry(103): notifyDataConnection: state=0 isDataConnectivityPossible=false reason=apnSwitched interfaceName=null networkType=8

06-12 15:54:31.806: I/TelephonyRegistry(103): notifyDataConnection: state=1 isDataConnectivityPossible=true reason=apnSwitched interfaceName=null networkType=8

06-12 15:54:31.826: D/GTalkService(215): ##### Network broadcast (connected=false) type=WIFI, state=DISCONNECTED

06-12 15:54:31.846: I/ActivityManager(103): Start proc com.google.android.gm for broadcast com.google.android.gm/.downloadprovider.DownloadReceiver: pid=2214 uid=10015 gids={3003, 1015}

06-12 15:54:31.856: D/GTalkService(215): [GTalkConnection.1] setInternalNetworkState: type=1, state=DISCONNECTED

06-12 15:54:31.896: D/Tethering(103): MasterInitialState.processMessage what=3

06-12 15:54:31.906: I/wpa_supplicant(249): CTRL-EVENT-DRIVER-STATE STOPPED

06-12 15:54:31.906: V/WifiMonitor(103): Event [CTRL-EVENT-DRIVER-STATE STOPPED]

06-12 15:54:31.966: I/ActivityThread(2214): Pub gmail-ls: com.google.android.gm.provider.MailProvider

06-12 15:54:31.976: I/ActivityThread(2214): Pub com.google.android.gm.attachmentspreviews: com.google.android.gm.AttachmentPreviewProvider

06-12 15:54:31.976: I/ActivityThread(2214): Pub com.google.android.gmail.SuggestionProvider: com.google.android.gm.SuggestionsProvider

06-12 15:54:31.976: I/ActivityThread(2214): Pub gmail-downloads: com.google.android.gm.downloadprovider.DownloadProvider

06-12 15:54:31.996: I/ConnectivityHandler(1835): {main}updateConnectivityInfo ->type: WIFI, subType: state:DISCONNECTED

06-12 15:54:32.006: D/MonitorPlugin(1835): {main}ProcessMonitorEvent - eventID android.net.conn.CONNECTIVITY_CHANGEIntent mycompany.hb.action.MONITOR_EVENT

06-12 15:54:32.006: I/MonitorPlugin(1835): {main}Connectivity Action. Type:WIFI

06-12 15:54:32.016: I/MonitorPlugin(1835): {main}Connectivity Action. subType:

06-12 15:54:32.016: I/ActivityManager(103): No longer want com.google.android.apps.uploader (pid 1089): hidden #16

06-12 15:54:32.066: I/ActivityManager(103): Start proc com.google.android.apps.uploader for broadcast com.google.android.apps.uploader/.ConnectivityBroadcastReceiver: pid=2222 uid=10038 gids={3003}

06-12 15:54:32.156: I/ActivityThread(2222): Pub com.google.android.apps.uploader: com.google.android.apps.uploader.UploadsContentProvider

06-12 15:54:32.166: I/ActivityThread(2222): Pub com.google.photos.provider.Album: com.google.android.apps.uploader.clients.picasa.AlbumProvider

06-12 15:54:32.166: D/MediaUploader(2222): UploaderApplication.onCreate

06-12 15:54:32.176: I/MonitorPlugin(1835): {main}Connectivity Action. state:DISCONNECTED

06-12 15:54:32.196: D/MediaUploader(2222): nonWifiLimit=20971520, default=20971520

06-12 15:54:32.196: I/MediaUploader(2222): No need to wake up

06-12 15:54:32.196: I/ActivityManager(103): No longer want com.google.android.apps.googlevoice (pid 1741): hidden #16

06-12 15:54:32.546: I/console(1835): {Thread-335}=====>Got event: offline

06-12 15:54:32.706: D/dalvikvm(1835): GC_CONCURRENT freed 1187K, 51% free 4035K/8199K, external 2002K/2137K, paused 5ms+4ms

06-12 15:54:36.436: I/TelephonyRegistry(103): notifyDataConnection: state=2 isDataConnectivityPossible=true reason=apnSwitched interfaceName=rmnet0 networkType=8

06-12 15:54:36.466: W/NetworkStateTracker(103): net.tcp.buffersize.hsdpa not found in system properties. Using defaults

06-12 15:54:36.476: D/NetUtils(103): failed to add 10.12.43.161 as default route for rmnet0: File exists

06-12 15:54:36.476: D/GTalkService(215): ##### Network broadcast (connected=true) type=mobile, state=CONNECTED

06-12 15:54:36.486: D/GTalkService(215): [GTalkConnection.1] setInternalNetworkState: type=0, state=CONNECTED

06-12 15:54:36.486: D/Tethering(103): MasterInitialState.processMessage what=3

06-12 15:54:36.546: I/ConnectivityHandler(1835): {main}updateConnectivityInfo ->type: mobile, subType: HSDPAstate:CONNECTED

06-12 15:54:36.566: D/MonitorPlugin(1835): {main}ProcessMonitorEvent - eventID android.net.conn.CONNECTIVITY_CHANGEIntent actmycompany.hb.action.MONITOR_EVENT

06-12 15:54:36.566: I/MonitorPlugin(1835): {main}Connectivity Action. Type:mobile

06-12 15:54:36.576: I/MonitorPlugin(1835): {main}Connectivity Action. subType:HSDPA


(4) connection comes back


06-12 15:54:36.576: I/MonitorPlugin(1835): {main}Connectivity Action. state:CONNECTED

06-12 15:54:36.576: I/console(1835): {Thread-336}=====>Got event: online


(5) RID=x+1, totally different message now!


06-12 15:54:36.686: I/IQSender.js(1835): {Thread-337}***** new stuff *****

06-12 15:54:36.816: V/root.js(1835): {Thread-339}SENT: <body rid='2337229288' xmlns='http://jabber.org/protocol/httpbind' sid='9bb5e34e'><iq type='set' to='yeah yeah yah.....'</iq></body>

06-12 15:54:37.526: V/root.js(1835): {Thread-340}RECV: <body xmlns='http://jabber.org/protocol/httpbind'><iq xmlns='jabber:client' type='result' id='7420' .....'/></body>

(6) server throws exception.  

(7) Got us a BOSH terminate.

06-12 15:54:38.946: V/root.js(1835): {Thread-341}RECV: <body xmlns='http://jabber.org/protocol/httpbind' type='terminate' condition='item-not-found'/>

06-12 15:54:38.956: I/root.js(1835): {Thread-342}Strophe failed to connect [item-not-found]

06-12 15:54:38.956: I/root.js(1835): {Thread-343}Strophe is disconnecting: undefined

06-12 15:54:38.966: I/root.js(1835): {Thread-344}STROPHE: Disconnect was called because: undefined

06-12 15:54:38.976: I/root.js(1835): {Thread-345}STROPHE: _sendTerminate was called

06-12 15:54:38.976: V/root.js(1835): {Thread-346}SENT: <body rid='2337229289' xmlns='http://jabber.org/protocol/httpbind' sid='9bb5e34e' type='terminate'><presence xmlns='jabber:client' type='unavailable'/></body>

06-12 15:54:39.166: D/dalvikvm(1835): GC_CONCURRENT freed 1042K, 51% free 4079K/8199K, external 2002K/2137K, paused 4ms+3ms

06-12 15:54:39.366: W/root.js(1835): {Thread-347}STROPHE: request errored, status: 404, number of errors: 1

06-12 15:56:20.666: D/dalvikvm(103): GC_CONCURRENT freed 1183K, 42% free 5893K/10055K, external 3481K/4250K, paused 4ms+8ms

06-12 15:56:20.686: I/root.js(1835): {Thread-348}STROPHE: _onDisconnectTimeout was called

06-12 15:56:20.696: I/root.js(1835): {Thread-349}STROPHE: _doDisconnect was called

06-12 15:56:20.716: I/root.js(1835): {Thread-350}Strophe is disconnected.

06-12 15:58:11.726: I/power(103): *** set_screen_state 1




Jack Moffitt

unread,
Jun 13, 2012, 4:48:05 PM6/13/12
to str...@googlegroups.com
> 1. In the log below, the first RID (2337229287) got reply back, but why
> Strophe sends another one at time 06-12 15:54:30.626?  Strophe didn't error
> out.

The server sends the ack to acknowledge that it received the stanza
request. However, the response to this request is not received yet.
That ack is sent as the response to the previous held request.

So now we have a request sitting at the server, but at least we know
the server has seen it.

Now imagine the server generates a reply, but the reply is lost. How
will we deal with this?

In BOSH, if you send the same request again, you should get the same
answer. So if the reply never comes back, you send the same request to
try and get that reply.

See http://xmpp.org/extensions/xep-0124.html#rids-broken

So here the RID request died, as you can see from your log that
request 71 got error 0. In order not to miss any responses, we must
resend it in case Openfire had replied with some new data.
Unfortunately, in this case, openfire hasn't generated an answer yet,
and therefor nothing exists in the buffer for this RID. However, what
should happen is that they should notice this, and throw away the old
connection and start waiting on the new one. I believe this is what
Punjab and ejabberd do.

So openfire and Strophe's behaviour are both technically correct here,
but non-ideal. Strophe is doing what it should be doing in this
situation, since I have no information about whether or not Openfire
has generated a reply yet, so it can do nothing else but try and hope
for the best.

> 2. Why does Strophe send RID+1 message ahead of RID, even though RID hasn't
> come back yet?

This is completely normal. Why shouldn't it send RID+1? We can't wait
for RID to come back until we send new data, or the sending latency
would be very high. When RID+1 appears, Openfire should immediately
release the request for RID and hold on to this new one. This is basic
long polling.

I'm pretty sure the reason your connection terminates is the situation
described in #1.


This is probably worth adding to the BOSH spec, so that this situation
is properly handled. Also, I'm not sure why this caching behavior is
not a MUST. It pretty much destroys data reliability if you can't
count on getting cached responses when network errors occur.

jack.
Reply all
Reply to author
Forward
0 new messages