I am working on an app that uses TouchDB on iPad. I use a continuous replication. I am trying to test the effect of connectivity loss and retrieval on my iPad simulator. As you might know, the iPad simulator uses the same connection as the Mac, so I did three primitive tests to test the loss and retrieval of connectivity. In each test, I got abnormal behaviors. If I am doing this wrong, can you please tell me how to test TouchDB's handling of continuous replication connectivity loss? Here are the results of my three tests. Please note I replace sensible information with* [...] *in the different results
#1 Turn DHCP off and back on. Sometimes replication works after the test, sometimes it doesn't --------------------------- Result ----------------------------
1) Turned DHCP off 2) Received following message: WARNING*** : TDPusher[[...] cloudant.com/ [...]]: Unable to save remote checkpoint: Error Domain=NSURLErrorDomain Code=-1009 "The Internet connection appears to be offline." UserInfo=0x19f6de90 {NSErrorFailingURLStringKey=[...] cloudant.com/ [...], NSErrorFailingURLKey=[...] cloudant.com/ [...], NSLocalizedDescription=The Internet connection appears to be offline., NSUnderlyingError=0x18af31a0 "The Internet connection appears to be offline."} 3) Turned DHCP on 4) REPLICATION WORKS! 5) Turned DHCP off 6) Turned DHCP on 7) Received following message: [{"error":[-1003,"A server with the specified hostname could not be found."],"x_active_requests":[],"type":"Replication","continuous":true,"sou rce":"[...]","task":"repl001","target":"[...] cloudant.com/ [...]","status":"Idle"},{"error":[-1003,"A server with the specified hostname could not be found."],"x_active_requests":[],"type":"Replication","continuous":true,"sou rce":"rocktown_datahex","task":"repl002","target":"[...] cloudant.com/ [...]","status":"Idle"}]
#2 Plug/unplug ethernet cable
(Same results as #1)
#3 Use Little Snitch 3 software to block outgoing HTTP requests and unblock it
--------------------------- Result ---------------------------- 3.1 Blocked the HHTP requests. TouchDB returns a "Stopped" status [{"type":"Replication","source":[...] ,"status":"*Stopped*","error":[-1004,"Could not connect to the server."],"target": "[...] cloudant.com/ [...]":"repl001"},{"x_active_requests":[],"type":"Replication","continuous" :true,"source":"[...]","task":"repl002","target": "[...] cloudant.com/ [...]","status":"*Idle*"}]
3.2 Unblocked the HTTP requests/ The replication returns to an idle state when I unblock them (abnormal behavior IMO) {{ "x_active_requests": [], "type": "Replication", "continuous": true, "source": [...], "task": "repl002", "target": "[...] cloudant.com/ [...]", "status": "*Idle*" }} (NOTICE: he switched to *repl002*, does this mean he tried to re-init the local DB?)
On Nov 9, 2012, at 8:41 AM, Paul K. Gedeon <paul.ged...@gmail.com> wrote:
> #1 Turn DHCP off and back on. Sometimes replication works after the test, sometimes it doesn't
> ---------------------------
> Result
> ----------------------------
> 1) Turned DHCP off
> 2) Received following message:
> WARNING*** : TDPusher[[...] cloudant.com/ [...]]: Unable to save remote checkpoint: Error Domain=NSURLErrorDomain Code=-1009 "The Internet connection appears to be offline." UserInfo=0x19f6de90 {NSErrorFailingURLStringKey=[...] cloudant.com/ [...], NSErrorFailingURLKey=[...] cloudant.com/ [...], NSLocalizedDescription=The Internet connection appears to be offline., NSUnderlyingError=0x18af31a0 "The Internet connection appears to be offline.”}
That’s normal (so probably this message should be downgraded from a warning.)
> 6) Turned DHCP on
> 7) Received following message:
> [{"error":[-1003,"A server with the specified hostname could not be found."],"x_active_requests":[],"type":"Replication","continuous":true,"sou rce":"[...]","task":"repl001","target":"[...] cloudant.com/ [...]","status":"Idle"},{"error":[-1003,"A server with the specified hostname could not be found."],"x_active_requests":[],"type":"Replication","continuous":true,"sou rce":"rocktown_datahex","task":"repl002","target":"[...] cloudant.com/ [...]","status":"Idle”}]
This looks like the system notification that the host is reachable was sent too early, before DNS came back up. Seems like an OS issue, but if it happens often I should figure out a workaround. How commonly does this happen, and does it happen in more realistic situations? (What exactly did you do to turn off DHCP?)
> 3.2 Unblocked the HTTP requests/ The replication returns to an idle state when I unblock them (abnormal behavior IMO)
What is it doing that’s abnormal?
> {{ "x_active_requests": [], "type": "Replication", "continuous": true, "source": [...], "task": "repl002", "target": "[...] cloudant.com/ [...]", "status": "Idle" }} > (NOTICE: he switched to repl002, does this mean he tried to re-init the local DB?)
No, that’s just the identifier of the replication task. Pay no attention to it :)
By the way, there is a “Network Link Conditioner” system-prefs pane you can install that will let you simulate various network conditions. You can specify bandwidth, latency, packet loss, DNS delays, etc. I believe it’s downloadable from Apple’s developer website.
*> [{"error":[-1003,"A server with the specified hostname could not be found."],"x_active_requests":[],"type":"Replication","continuous":true,"sou rce":"[...]","task":"repl001","target":"[...] cloudant.com/ [...]","status":"Idle"},{"error":[-1003,"A server with the specified hostname could not be found."],"x_active_requests":[],"type":"Replication","continuous":true,"sou rce":"rocktown_datahex","task":"repl002","target":"[...] cloudant.com/ [...]","status":"Idle”}] *
> *This looks like the system notification that the host is reachable was > sent too early, before DNS came back up. Seems like an OS issue, but if it > happens often I should figure out a workaround. How commonly does this > happen, and does it happen in more realistic situations? (What exactly did > you do to turn off DHCP?) *
Network Preferences --> Ethernet ---> Configure IPv4 = OFF then
Network Preferences --> Ethernet ---> Configure IPv4 = Using DHCP
After those steps, I receive this error multiple times continuously.
> *> 3.2 Unblocked the HTTP requests/ The replication returns to an idle > state when I unblock them (abnormal behavior IMO) *
> *What is it doing that’s abnormal? > *
Well I keep receiving "Idle"when I query active_tasks and it doesn't replicate anymore. When I lose my connection, do I have to restart manually my continous replication? Maybe it's a conceptual mistake from my part to assume that the replication should restart automatically. I got this impression because sometimes it does restart to replicate by itself. I have to admit that I am not familiar at all with the replication lifecycle.
* *
> *> (NOTICE: he switched to repl002, does this mean he tried to re-init > the local DB?) *
> *No, that’s just the identifier of the replication task. Pay no attention > to it :) *
Excellent, thank you. I asked you this question because the only place I saw the session ID being modified is in initWithDB, but I didn't dig in the code quite further than that.
> *By the way, there is a “Network Link Conditioner” system-prefs pane you > can install that will let you simulate various network conditions. You can > specify bandwidth, latency, packet loss, DNS delays, etc. I believe it’s > downloadable from Apple’s developer website. *
Yeah I read that on a forum, but I thought it was only usable through XCode and I'm developing in MonoDevelop. Nevertheless, you are right, I didn't explore this avenue enough. Do you think it is the best way to simulate a replication connectivity loss for an app using TouchDB? Thank you very much for the quick feedback Jens, and congratulations for hitting 1.0. :-)
On Nov 9, 2012, at 10:26 AM, Paul K. Gedeon <paul.ged...@gmail.com> wrote:
> Network Preferences --> Ethernet ---> Configure IPv4 = OFF > then
> Network Preferences --> Ethernet ---> Configure IPv4 = Using DHCP
I’m not entirely sure what that will do; it’s probably effectively the same as unplugging Ethernet. From TouchDB’s perspective, what matters is whether the SystemConfiguration framework posts host-unreachable/reachable notifications. (If you turn on ‘Sync’ logging in TouchDB it will log messages when it gets these notifications. See the wiki for instructions on logging.)
> After those steps, I receive this error multiple times continuously.
This would be worth filing a bug report on. I can also ask on Apple’s forums whether this sort of DNS hiccup should be happening.
> Well I keep receiving "Idle"when I query active_tasks and it doesn't replicate anymore. When I lose my connection, do I have to restart automatically.
Persistent replications will restart when the network connection comes back online; non-persistent ones won’t.
> Yeah I read that on a forum, but I thought it was only usable through XCode and I'm developing in MonoDevelop. Nevertheless, you are right, I didn't explore this avenue enough. Do you think it is the best way to simulate a replication connectivity loss for an app using TouchDB?
I haven’t tried the Link Conditioner, but it looks like it’s most useful for simulating poor network connections, not complete connection loss.
Alright, I will try Link Conditioner as soon as I can tackle this task again and I will keep you informed if I find anything else. Thanks again for your time!
On Friday, 9 November 2012 13:42:41 UTC-5, Jens Alfke wrote:
> On Nov 9, 2012, at 10:26 AM, Paul K. Gedeon <paul....@gmail.com<javascript:>> > wrote:
> > Network Preferences --> Ethernet ---> Configure IPv4 = OFF > > then > > Network Preferences --> Ethernet ---> Configure IPv4 = Using DHCP
> I’m not entirely sure what that will do; it’s probably effectively the > same as unplugging Ethernet. From TouchDB’s perspective, what matters is > whether the SystemConfiguration framework posts host-unreachable/reachable > notifications. (If you turn on ‘Sync’ logging in TouchDB it will log > messages when it gets these notifications. See the wiki for instructions on > logging.)
> > After those steps, I receive this error multiple times continuously.
> This would be worth filing a bug report on. I can also ask on Apple’s > forums whether this sort of DNS hiccup should be happening.
> > Well I keep receiving "Idle"when I query active_tasks and it doesn't > replicate anymore. When I lose my connection, do I have to restart > automatically.
> Persistent replications will restart when the network connection comes > back online; non-persistent ones won’t.
> > Yeah I read that on a forum, but I thought it was only usable through > XCode and I'm developing in MonoDevelop. Nevertheless, you are right, I > didn't explore this avenue enough. Do you think it is the best way to > simulate a replication connectivity loss for an app using TouchDB?
> I haven’t tried the Link Conditioner, but it looks like it’s most useful > for simulating poor network connections, not complete connection loss.
I just thought about the DHCP issue a bit more. The DHCP reset changes the IP address of the Mac. My replication target has a fixed IP e.g. Cloudant server. Now, my replication source should be pointing on my local database and should not be impacted by the IP address change since it's localhost. Nevertheless, I figured that if there was to be a problem with the testing procedure, that could likely be it. Maybe depending on how long I left the DHCP down, I might get distributed a new IP or not. I'd like to think that this is the difference that explains why restarting the continous replication after the downtime worked once and failed once. So, do you think that the IP change can have an impact on the continuous replication and trigger the "error":[-1003,"A server with the specified hostname could not be found."]," multiple times?
On Nov 9, 2012, at 12:22 PM, Paul K. Gedeon <paul.ged...@gmail.com> wrote:
> I just thought about the DHCP issue a bit more. The DHCP reset changes the IP address of the Mac. My replication target has a fixed IP e.g. Cloudant server. Now, my replication source should be pointing on my local database and should not be impacted by the IP address change since it's localhost. Nevertheless, I figured that if there was to be a problem with the testing procedure, that could likely be it.
I don’t think so. The local database isn’t even accessed over the network; the replicator just calls into the TouchDB database classes, which in turn call sqlite. So TouchDB really doesn’t care what the local IP address is.
> So, do you think that the IP change can have an impact on the continuous replication and trigger the "error":[-1003,"A server with the specified hostname could not be found."]," multiple times?
Again, I don’t think so. That error is caused by a DNS lookup failure, which to me implies that the SystemConfiguration notification got sent too early, before the interface was actually up. Or maybe the DNS resolver has some hiccups in the instants after connectivity is restored.
On Friday, 9 November 2012 15:45:18 UTC-5, Jens Alfke wrote:
> On Nov 9, 2012, at 12:22 PM, Paul K. Gedeon <paul....@gmail.com<javascript:>> > wrote:
> > I just thought about the DHCP issue a bit more. The DHCP reset changes > the IP address of the Mac. My replication target has a fixed IP e.g. > Cloudant server. Now, my replication source should be pointing on my local > database and should not be impacted by the IP address change since it's > localhost. Nevertheless, I figured that if there was to be a problem with > the testing procedure, that could likely be it.
> I don’t think so. The local database isn’t even accessed over the network; > the replicator just calls into the TouchDB database classes, which in turn > call sqlite. So TouchDB really doesn’t care what the local IP address is.
> > So, do you think that the IP change can have an impact on the continuous > replication and trigger the "error":[-1003,"A server with the specified > hostname could not be found."]," multiple times?
> Again, I don’t think so. That error is caused by a DNS lookup failure, > which to me implies that the SystemConfiguration notification got sent too > early, before the interface was actually up. Or maybe the DNS resolver has > some hiccups in the instants after connectivity is restored.
I made a new test, this time with the ipfw firewall, which does about the same as Network Link Conditioner. I really need to test complete connection loss, since my users can move in and out of their Wi-Fi zone. To do this, I am doing a 100% packet loss test. I am not sure if the results are normal, because I never receive the "Offline" status in the active tasks.
Test #1 --- 100% Packet Loss
1) Open app, start replicating. Everything is normal.
2) Activate 100% packet loss with ipfw.
3) Active task returns Idle status until there is something new to replicate (no offline status yet, even if 100% packet loss is activated)
4) When there is new changes to replicate, I receive the Status *Processed N / N Progress 100* continuously until I deactivate the 100% packet loss. *No offline status sent*
5) When I deactivate the 100% packet loss, the X changes that needed to be replicated is now detected and the first status I receive is *Processed N + X / N Progress [...]*
*
*
So, I have a few questions concerning that:
1) Is it normal that I don't receive an offline status when there is a 100% packet loss?
2) Reading the FAQ, I noted that using HTTP to start the replication between TouchDB and Cloudant is not the standard procedure. The reason I use HTTP is because I am using a C# binding to use TouchDB with Monotouch, but since the replication entities are in CouchCocoa, it makes it very hard to bind. I thought HTTP replication was kind of supported, do you think that this can be a problem?
Thank you for your time, it is very appreciated
Paul
*
*
*
*
On Friday, 9 November 2012 15:45:18 UTC-5, Jens Alfke wrote:
> On Nov 9, 2012, at 12:22 PM, Paul K. Gedeon <paul....@gmail.com<javascript:>> > wrote:
> > I just thought about the DHCP issue a bit more. The DHCP reset changes > the IP address of the Mac. My replication target has a fixed IP e.g. > Cloudant server. Now, my replication source should be pointing on my local > database and should not be impacted by the IP address change since it's > localhost. Nevertheless, I figured that if there was to be a problem with > the testing procedure, that could likely be it.
> I don’t think so. The local database isn’t even accessed over the network; > the replicator just calls into the TouchDB database classes, which in turn > call sqlite. So TouchDB really doesn’t care what the local IP address is.
> > So, do you think that the IP change can have an impact on the continuous > replication and trigger the "error":[-1003,"A server with the specified > hostname could not be found."]," multiple times?
> Again, I don’t think so. That error is caused by a DNS lookup failure, > which to me implies that the SystemConfiguration notification got sent too > early, before the interface was actually up. Or maybe the DNS resolver has > some hiccups in the instants after connectivity is restored.
Interesting development: I managed to call the replication methods on the CouchDatabase through the C# bindings and I have the same behavior under the 100% packet loss condition.
On Wednesday, 14 November 2012 10:39:16 UTC-5, Paul K. Gedeon wrote:
> Hello Jens,
> I made a new test, this time with the ipfw firewall, which does about the > same as Network Link Conditioner. I really need to test complete connection > loss, since my users can move in and out of their Wi-Fi zone. To do this, I > am doing a 100% packet loss test. I am not sure if the results are normal, > because I never receive the "Offline" status in the active tasks.
> Test #1 --- 100% Packet Loss
> 1) Open app, start replicating. Everything is normal.
> 2) Activate 100% packet loss with ipfw.
> 3) Active task returns Idle status until there is something new to > replicate (no offline status yet, even if 100% packet loss is activated)
> 4) When there is new changes to replicate, I receive the Status *Processed > N / N Progress 100* continuously until I deactivate the 100% packet loss. > *No offline status sent*
> 5) When I deactivate the 100% packet loss, the X changes that needed to be > replicated is now detected and the first status I receive is *Processed N > + X / N Progress [...]*
> *
> *
> So, I have a few questions concerning that:
> 1) Is it normal that I don't receive an offline status when there is a > 100% packet loss?
> 2) Reading the FAQ, I noted that using HTTP to start the replication > between TouchDB and Cloudant is not the standard procedure. The reason I > use HTTP is because I am using a C# binding to use TouchDB with Monotouch, > but since the replication entities are in CouchCocoa, it makes it very hard > to bind. I thought HTTP replication was kind of supported, do you think > that this can be a problem?
> Thank you for your time, it is very appreciated
> Paul
> *
> *
> *
> *
> On Friday, 9 November 2012 15:45:18 UTC-5, Jens Alfke wrote:
>> On Nov 9, 2012, at 12:22 PM, Paul K. Gedeon <paul....@gmail.com> wrote:
>> > I just thought about the DHCP issue a bit more. The DHCP reset changes >> the IP address of the Mac. My replication target has a fixed IP e.g. >> Cloudant server. Now, my replication source should be pointing on my local >> database and should not be impacted by the IP address change since it's >> localhost. Nevertheless, I figured that if there was to be a problem with >> the testing procedure, that could likely be it.
>> I don’t think so. The local database isn’t even accessed over the >> network; the replicator just calls into the TouchDB database classes, which >> in turn call sqlite. So TouchDB really doesn’t care what the local IP >> address is.
>> > So, do you think that the IP change can have an impact on the >> continuous replication and trigger the "error":[-1003,"A server with the >> specified hostname could not be found."]," multiple times?
>> Again, I don’t think so. That error is caused by a DNS lookup failure, >> which to me implies that the SystemConfiguration notification got sent too >> early, before the interface was actually up. Or maybe the DNS resolver has >> some hiccups in the instants after connectivity is restored.
I've redone the DHCP test, but this time on a persistent continuous replication created via the CouchDatabase object instead of HTTP. I also simply turned on and off the WiFi instead of using DHCP. Now, I have noted three different behaviors:
1) Detects it is offline, active_tasks contains continuously the offline status. When it goes back online, active_tasks contains continuously the previous error message and replication doesn't restart. I suspect that it is the same DNS error you mentioned earlier.
[{"error":[-1003,"A server with the specified hostname could not be found."],"x_active_requests":[],"type":"Replication","continuous":true,"sou rce":"[...]","task":"repl001","target":
"[...] cloudant.com/ [...]","status":"Idle"},{"error":[-1003,"A server with the specified hostname could not be found."],"x_active_requests":[],"type":"Replication","continuous":true,"sou rce":"rocktown_datahex","task":"repl002","target":
"[...] cloudant.com/ [...]","status":"Idle"}]
2) Detects it is offline, active_tasks contains the following:*status":"Stopped","error":[-1009,"The Internet connection appears to be offline."] *
* *When I turn on Wi-Fi, active_tasks is empty and replication doesn't restart.
3) Replication restarts normally.
I am starting to run out of idea to simulate a real connectivity loss, maybe a proxy, turning off the company's router or even a Faraday cage :-P
*
*
On Wednesday, 14 November 2012 11:31:00 UTC-5, Paul K. Gedeon wrote:
> Interesting development: I managed to call the replication methods on the > CouchDatabase through the C# bindings and I have the same behavior under > the 100% packet loss condition.
> On Wednesday, 14 November 2012 10:39:16 UTC-5, Paul K. Gedeon wrote:
>> Hello Jens,
>> I made a new test, this time with the ipfw firewall, which does about the >> same as Network Link Conditioner. I really need to test complete connection >> loss, since my users can move in and out of their Wi-Fi zone. To do this, I >> am doing a 100% packet loss test. I am not sure if the results are normal, >> because I never receive the "Offline" status in the active tasks.
>> Test #1 --- 100% Packet Loss
>> 1) Open app, start replicating. Everything is normal.
>> 2) Activate 100% packet loss with ipfw.
>> 3) Active task returns Idle status until there is something new to >> replicate (no offline status yet, even if 100% packet loss is activated)
>> 4) When there is new changes to replicate, I receive the Status *Processed >> N / N Progress 100* continuously until I deactivate the 100% packet >> loss. *No offline status sent*
>> 5) When I deactivate the 100% packet loss, the X changes that needed to >> be replicated is now detected and the first status I receive is *Processed >> N + X / N Progress [...]*
>> *
>> *
>> So, I have a few questions concerning that:
>> 1) Is it normal that I don't receive an offline status when there is a >> 100% packet loss?
>> 2) Reading the FAQ, I noted that using HTTP to start the replication >> between TouchDB and Cloudant is not the standard procedure. The reason I >> use HTTP is because I am using a C# binding to use TouchDB with Monotouch, >> but since the replication entities are in CouchCocoa, it makes it very hard >> to bind. I thought HTTP replication was kind of supported, do you think >> that this can be a problem?
>> Thank you for your time, it is very appreciated
>> Paul
>> *
>> *
>> *
>> *
>> On Friday, 9 November 2012 15:45:18 UTC-5, Jens Alfke wrote:
>>> On Nov 9, 2012, at 12:22 PM, Paul K. Gedeon <paul....@gmail.com> wrote:
>>> > I just thought about the DHCP issue a bit more. The DHCP reset changes >>> the IP address of the Mac. My replication target has a fixed IP e.g. >>> Cloudant server. Now, my replication source should be pointing on my local >>> database and should not be impacted by the IP address change since it's >>> localhost. Nevertheless, I figured that if there was to be a problem with >>> the testing procedure, that could likely be it.
>>> I don’t think so. The local database isn’t even accessed over the >>> network; the replicator just calls into the TouchDB database classes, which >>> in turn call sqlite. So TouchDB really doesn’t care what the local IP >>> address is.
>>> > So, do you think that the IP change can have an impact on the >>> continuous replication and trigger the "error":[-1003,"A server with the >>> specified hostname could not be found."]," multiple times?
>>> Again, I don’t think so. That error is caused by a DNS lookup failure, >>> which to me implies that the SystemConfiguration notification got sent too >>> early, before the interface was actually up. Or maybe the DNS resolver has >>> some hiccups in the instants after connectivity is restored.
On Nov 14, 2012, at 7:39 AM, Paul K. Gedeon <paul.ged...@gmail.com<mailto:paul.ged...@gmail.com>> wrote:
I made a new test, this time with the ipfw firewall, which does about the same as Network Link Conditioner. I really need to test complete connection loss, since my users can move in and out of their Wi-Fi zone. To do this, I am doing a 100% packet loss test. I am not sure if the results are normal, because I never receive the "Offline" status in the active tasks.
That’s not an accurate simulation, because you haven’t actually taken down the network interface.
What happens on a real device is that, if the WiFi or cellular connection is lost, the associated network interface goes down, and the SystemConfiguration framework sends out a notification that the host is no longer reachable, and TouchDB receives that and goes into offline mode.
What you’ve simulated is something more like unplugging the Ethernet on the upstream WiFi base station. The device can’t tell that anything’s happened; it’s just that it stops getting any incoming packets. Eventually it will figure out that it’s offline, but it’ll take quite some time before various layers of the network stack time out. (For a TCP connection it’s on the order of a minute if it’s actively trying and failing to send data; or more like 90 minutes if it’s idle.)
In short, if you want to simulate a device losing connectivity, the way to do it is to turn off the interfaces in the Network system pref, or unplug your Ethernet cable, or turn off WiFi from its system menu.
Ok, thank you very much for the detailed explanation! I will not use the packet loss test again then. Now I'm doing tests on the real device itself and logging the active_tasks status, going in and out of my Wi-Fi zone. I do not have enough data as of now to draw any conclusion. But the three different behaviors when I turn off and on the Wi-Fi that I reported in my last message are still very weird.
On Wednesday, 14 November 2012 15:00:11 UTC-5, Jens Alfke wrote:
> On Nov 14, 2012, at 7:39 AM, Paul K. Gedeon <paul....@gmail.com<javascript:>> > wrote:
> I made a new test, this time with the ipfw firewall, which does about the > same as Network Link Conditioner. I really need to test complete connection > loss, since my users can move in and out of their Wi-Fi zone. To do this, I > am doing a 100% packet loss test. I am not sure if the results are normal, > because I never receive the "Offline" status in the active tasks.
> That’s not an accurate simulation, because you haven’t actually taken down > the network interface.
> What happens on a real device is that, if the WiFi or cellular connection > is lost, the associated network interface goes down, and the > SystemConfiguration framework sends out a notification that the host is no > longer reachable, and TouchDB receives that and goes into offline mode.
> What you’ve simulated is something more like unplugging the Ethernet on > the upstream WiFi base station. The device can’t tell that anything’s > happened; it’s just that it stops getting any incoming packets. Eventually > it will figure out that it’s offline, but it’ll take quite some time before > various layers of the network stack time out. (For a TCP connection it’s on > the order of a minute if it’s actively trying and failing to send data; or > more like 90 minutes if it’s idle.)
> In short, if you want to simulate a device losing connectivity, the way to > do it is to turn off the interfaces in the Network system pref, or unplug > your Ethernet cable, or turn off WiFi from its system menu.
Here are the results of testing with the app installed on the real device. I did two kind of tests: turning on and off the Wi-Fi *on the device *(my previous Wi-Fi tests were on the simulator) and moving in and out of my Wi-Fi area.
For the turning on and off the Wi-Fi on the device, everything works perfectly. (Idle status to offline and back to idle, returning online replicates all the docs created during the offline period)
Result: Good replication, got the expected status everytime.
Nevertheless, when I tried moving in and out of my Wi-Fi area with the device, things started to get messy. I found 2 particular patterns and another weird behavior, but I only saw it once so let's leave it aside for now.
First pattern: Idle status while online, no active_tasks while offline (no offline status!), back to idle status when online again ---> everything replicates perfectly when back online.
Result: Good replication, weird/no active_task status
Second pattern: Idle status while online, offline status while offline, no active_tasks status when back online ---> Replicates everything. ( I noticed it once it only replicated partially, might be a testing error on my part). Result: Good replication, weird/no active_task status
On Wednesday, 14 November 2012 15:07:06 UTC-5, Paul K. Gedeon wrote:
> Ok, thank you very much for the detailed explanation! I will not use the > packet loss test again then. Now I'm doing tests on the real device itself > and logging the active_tasks status, going in and out of my Wi-Fi zone. I > do not have enough data as of now to draw any conclusion. But the three > different behaviors when I turn off and on the Wi-Fi that I reported in my > last message are still very weird.
> On Wednesday, 14 November 2012 15:00:11 UTC-5, Jens Alfke wrote:
>> On Nov 14, 2012, at 7:39 AM, Paul K. Gedeon <paul....@gmail.com> wrote:
>> I made a new test, this time with the ipfw firewall, which does about the >> same as Network Link Conditioner. I really need to test complete connection >> loss, since my users can move in and out of their Wi-Fi zone. To do this, I >> am doing a 100% packet loss test. I am not sure if the results are normal, >> because I never receive the "Offline" status in the active tasks.
>> That’s not an accurate simulation, because you haven’t actually taken >> down the network interface.
>> What happens on a real device is that, if the WiFi or cellular connection >> is lost, the associated network interface goes down, and the >> SystemConfiguration framework sends out a notification that the host is no >> longer reachable, and TouchDB receives that and goes into offline mode.
>> What you’ve simulated is something more like unplugging the Ethernet on >> the upstream WiFi base station. The device can’t tell that anything’s >> happened; it’s just that it stops getting any incoming packets. Eventually >> it will figure out that it’s offline, but it’ll take quite some time before >> various layers of the network stack time out. (For a TCP connection it’s on >> the order of a minute if it’s actively trying and failing to send data; or >> more like 90 minutes if it’s idle.)
>> In short, if you want to simulate a device losing connectivity, the way >> to do it is to turn off the interfaces in the Network system pref, or >> unplug your Ethernet cable, or turn off WiFi from its system menu.
On Nov 14, 2012, at 1:02 PM, Paul K. Gedeon <paul.ged...@gmail.com<mailto:paul.ged...@gmail.com>> wrote:
First pattern: Idle status while online, no active_tasks while offline (no offline status!), back to idle status when online again ---> everything replicates perfectly when back online.
Result: Good replication, weird/no active_task status
When a TDReplicator goes into offline mode, it mostly just cancels any pending HTTP transactions; it doesn’t explicitly stop itself. So if it’s a continuous replication, it should still show up in _active_tasks. Running with ‘Sync’ logging mode turned on would help figure out what’s actually going on.
In any case, it sounds like the actual behavior is ok, i.e. replication resumes when you come back online, right?
Yes, the replication on the device seems to be working fine. I noticed once where it might have been problematic, but it's a singular case and it might be a mistake, so it's inconclusive. I think the reason why it wasn't working on the simulator is because of the DNS problem you mentioned, polling active_tasks every 100 ms might have been a bit too often too.
Basically I need to track the progress of big replications when we come back online after a long time in offline mode. I think what I will do is watch the CouchDatabase Mode property and when I see it's active, I will try to pull the progress from active_tasks. I might try turning on Sync logging mode just by curiosity so I understand why nothing is showing in active_tasks at some point.
On Wednesday, 14 November 2012 16:16:42 UTC-5, Jens Alfke wrote:
> On Nov 14, 2012, at 1:02 PM, Paul K. Gedeon <paul....@gmail.com<javascript:>> > wrote:
> First pattern: Idle status while online, no active_tasks while offline (no > offline status!), back to idle status when online again ---> everything > replicates perfectly when back online.
> Result: Good replication, weird/no active_task status
> When a TDReplicator goes into offline mode, it mostly just cancels any > pending HTTP transactions; it doesn’t explicitly stop itself. So if it’s a > continuous replication, it should still show up in _active_tasks. Running > with ‘Sync’ logging mode turned on would help figure out what’s actually > going on.
> In any case, it sounds like the actual behavior is ok, i.e. replication > resumes when you come back online, right?
If anyone else ever wonders, the best solution we came up with is using the Mac Internet Sharing feature, connect the device to the Mac's shared WiFi and simply turn the Internet sharing off when we want to fake a connection loss on the device.
On Wednesday, 14 November 2012 16:27:37 UTC-5, Paul K. Gedeon wrote:
> First of all, thank you again for your patience.
> Yes, the replication on the device seems to be working fine. I noticed > once where it might have been problematic, but it's a singular case and it > might be a mistake, so it's inconclusive. I think the reason why it wasn't > working on the simulator is because of the DNS problem you mentioned, > polling active_tasks every 100 ms might have been a bit too often too.
> Basically I need to track the progress of big replications when we come > back online after a long time in offline mode. I think what I will do is > watch the CouchDatabase Mode property and when I see it's active, I will > try to pull the progress from active_tasks. I might try turning on Sync > logging mode just by curiosity so I understand why nothing is showing in > active_tasks at some point.
> Thank you,
> Paul
> On Wednesday, 14 November 2012 16:16:42 UTC-5, Jens Alfke wrote:
>> On Nov 14, 2012, at 1:02 PM, Paul K. Gedeon <paul....@gmail.com> wrote:
>> First pattern: Idle status while online, no active_tasks while offline >> (no offline status!), back to idle status when online again ---> everything >> replicates perfectly when back online.
>> Result: Good replication, weird/no active_task status
>> When a TDReplicator goes into offline mode, it mostly just cancels any >> pending HTTP transactions; it doesn’t explicitly stop itself. So if it’s a >> continuous replication, it should still show up in _active_tasks. Running >> with ‘Sync’ logging mode turned on would help figure out what’s actually >> going on.
>> In any case, it sounds like the actual behavior is ok, i.e. replication >> resumes when you come back online, right?