timeouts in http request - off topic testing request

226 views
Skip to first unread message

Simon H

unread,
May 24, 2017, 6:09:44 AM5/24/17
to Node-RED
Hi all,

I spent most of yesterday diagnosing timeouts in the http-request node....  and tracked them down to packet loss as packets leave my ISPs network -
i.e. if I request against one of THEIR servers, all is ok, but if I request against other internet servers, I get far too many failures for comfort.
(and I tested with wget to exclude NR or node.js or any libraries as the cause, and from linux as well as windows).

So I was wondering what an acceptable level of packet loss resulting in http request timeout was generally;

The below flow repeatedly requests http://www.example.com, and once per second emits a debug of the count so far.
If the http-request node gets a connection timeout (or other error), the flow will stop and emit the msg from the http request node.

It would be interesting for me to hear the average of a few runs in terms of the maximum count achieved, and the ISP you are on.
Would be good to see results from some corporate lans....

My ISP is PlusNet, and I've raised a question in their 'fibre broadband' forum.  'http://www.plus.net/robots.txt' does not seem to fail at all for me.... whereas www.bt.com, www.facebook.com and the above all fail fairly quickly (e.g. 75-200 iterations, sometimes a lot less).

If you get a chance to quickly test, please post back here with:
     the count where is stops, 
     your ISP, 
     the url you tested against, 
     and if the site was considered to be within your ISPs network or outside of it.

thanks,

Simon

p.s. if you want to STOP the flow, disconnect a wire and deploy.  I'm not sure if deploy SHOULD stop such a flow, but it stops only sometimes for me (0.17 branch) (maybe a bug in NR http request node close?).

p.p.s. you may want to run against a different (random) server rather than the one in the flow; could soliciting people to request hundreds of times from multiple places be classed as inciting DoS?





[
    {
        "id": "3b7d8b80.234a14",
        "type": "http request",
        "z": "47b3725c.2a535c",
        "name": "",
        "method": "GET",
        "ret": "txt",
        "url": "http://www.example.com",
        "tls": "",
        "x": 450,
        "y": 100,
        "wires": [
            [
                "925c7db3.93ba4"
            ]
        ]
    },
    {
        "id": "4b8ff552.64e4ec",
        "type": "inject",
        "z": "47b3725c.2a535c",
        "name": "",
        "topic": "",
        "payload": "",
        "payloadType": "date",
        "repeat": "",
        "crontab": "",
        "once": false,
        "x": 120,
        "y": 60,
        "wires": [
            [
                "ef4ad503.62f3d8"
            ]
        ]
    },
    {
        "id": "925c7db3.93ba4",
        "type": "function",
        "z": "47b3725c.2a535c",
        "name": "",
        "func": "context.count = context.count || 0;\n\nif (msg.cmd === 'init'){\n    context.count = 0;\n}\n\nif (msg.statusCode !== 200){\n    node.send([null, msg]);\n    return;\n}\n\ncontext.count++;\n\nvar newmsg = {payload:'trigger'};\n\nsetTimeout(function(){\n    node.send([newmsg, {payload:'success ' + context.count}]);\n}, 0);\n\nreturn;",
        "outputs": "2",
        "noerr": 0,
        "x": 423.9999694824219,
        "y": 225.8203067779541,
        "wires": [
            [
                "3b7d8b80.234a14"
            ],
            [
                "2de38359.c5101c"
            ]
        ]
    },
    {
        "id": "e38278e.9cfe988",
        "type": "debug",
        "z": "47b3725c.2a535c",
        "name": "",
        "active": true,
        "console": "false",
        "complete": "false",
        "x": 870,
        "y": 240,
        "wires": []
    },
    {
        "id": "ef4ad503.62f3d8",
        "type": "change",
        "z": "47b3725c.2a535c",
        "name": "",
        "rules": [
            {
                "t": "set",
                "p": "cmd",
                "pt": "msg",
                "to": "init",
                "tot": "str"
            }
        ],
        "action": "",
        "property": "",
        "from": "",
        "to": "",
        "reg": false,
        "x": 190,
        "y": 120,
        "wires": [
            [
                "3b7d8b80.234a14"
            ]
        ]
    },
    {
        "id": "2de38359.c5101c",
        "type": "delay",
        "z": "47b3725c.2a535c",
        "name": "",
        "pauseType": "rate",
        "timeout": "5",
        "timeoutUnits": "seconds",
        "rate": "1",
        "nbRateUnits": "1",
        "rateUnits": "second",
        "randomFirst": "1",
        "randomLast": "5",
        "randomUnits": "seconds",
        "drop": true,
        "x": 610,
        "y": 240,
        "wires": [
            [
                "e38278e.9cfe988"
            ]
        ]
    }
]

Julian Knight

unread,
May 24, 2017, 8:38:42 AM5/24/17
to Node-RED
As you are UK based, I normally use bbc.co.uk as a baseline for external performance tests since all ISP's can reasonably be expected to have good links to it.

The acceptable packet loss from a home broadband connection to somewhere like bbc.co.uk is ZERO. Well, OK that is putting it a little strongly.

As you are not getting packet loss to PlusNet's services, it is not unreasonable to assume that this is a problem on their end. I would continue to record packet loss to a number of well-known servers and send them an email at least once a day until they get fed up and respond! I've always found Twitter is also great for getting organisations to listen since your complaints there are public.

I would also look for excessive response times. Just a quick test from a corporate Wi-Fi that traverses a large private WAN before getting to the Internet gives me timings of 9-14ms pretty consistently for bbc.co.uk even during the normally busy lunch period. Out of 117 pings, I got 6 timeouts which is interesting and a little more than I'd expected. Can't really do packet traces here so I can't really see if that is due to packet loss.

To directly answer your question about acceptable levels. <1-2% is considered acceptable for most purposes. By around 5% or so, you will start to have problems certainly with real-time services such as video streaming, VoIP, etc.

Consistent loss rates of 2%+ would be considered an indication of a network problem.

Interestingly, running your flow here at work failed after just 4 then 6 then zero then 28 successful runs.

I tried again on some other sites. bbc.co.uk & www.bt.com failed to ever respond - I think because they need https.
When trying to use https, they generally fail because they do clever redirects and the request node can't cope with the certificate. Eventually managed to get www.bt.com to work but it was very slow, got fed up after 10 successful runs.

I'm having problems with the request node and wildcard certs so not sure what is happening.

Simon H

unread,
May 24, 2017, 12:08:40 PM5/24/17
to Node-RED
Thanks Julian, 
some good info there...

If we are to expect some packet loss, maybe the http-request node should have an auto-retry option on timeout?
Most of the time, I'm seeing failure to connect in NR, but wireshark reveals a whole story of packets out of order, etc. which will have been recovered from within the TCP layer (except in the rarer response timeouts that I do also see).

bbc.co.uk also fails for me....
ref the http-request node, it's based on some fairly stable stuff in the background (spent a whole day following it down to it's roots, suspecting node before I reproduced with wget!).
I don't see any noticeable difference in failures between http and https; I went for http because it's easier to follow in wireshark.
I've done a lot of pinging today, and apart from some sites which will only respond if you have not pinged in a few seconds, and some sites which are not responsive to ICMP (bt - you can't even tracert to it), I always get good response times and no failures.  But then a ping is one packet... and they probably test their networks with ICMP.  The failure to make a TCP connection is the most worrying thing.

s

Julian Knight

unread,
May 25, 2017, 6:03:07 AM5/25/17
to Node-RED
It seems there is an issue somewhere because wildcard certs are generating errors for me on the http request node - this clearly isn't happening for everyone.

I've not had a chance to track down what is causing it. It might be a network thing as I'm on a somewhat odd Wi-Fi at work. Or it might be something to do with being on Windows.

Simon H

unread,
May 25, 2017, 6:19:55 AM5/25/17
to Node-RED
@Julian, have you got a specific website which has wildcard certs?
I've got 0.17 on windows (on my plusnet connection), and 0.16.2 on linux (on a BT connection) which I could test from; if the errors show in debug?

In terms of my testing, the machine on a BT fibre-to-home connection does not show the same evidence of packet loss - i.e. I've not had any failures in NR, so any failures at the network level are recovered (don't have wireshark there to check...).
My plusnet connection is still poor (fttc).
Plus, If i target MY webserver from the BT location, I get similar failures; other webservers, no failures.



Julian Knight

unread,
May 25, 2017, 12:00:33 PM5/25/17
to Node-RED
Hi, the BBC and the BT sites both seem to use them. My own sites also report as wildcard certs as they are fronted by CloudFlare: https://it.knightnet.org.uk & https://www.totallyinformation.com

Julian Knight

unread,
May 26, 2017, 4:11:17 AM5/26/17
to Node-RED
Just gone back to test at home. On Windows anything with a wildcard cert fails, works fine on the Pi. I will raise an issue. May be an upstream issue.

Incidentally, running against example.com at home gives me no errors at all, already over 1200 loops.

Simon H

unread,
May 26, 2017, 6:04:57 AM5/26/17
to Node-RED
for:
https://www.totallyinformation.com/robots.txt
I got to 32 before failure.  Raised an issue with plusnet by phone, expecting 2-3 day response.
But no failure based on certs.....
I'm running NR 0.17 from github from about 2 weeks ago? Windows 10 x64, Node 6.9.1 x64


Julian Knight

unread,
May 28, 2017, 3:15:13 PM5/28/17
to Node-RED
Oops! I accidentally left it running and got to 1,197,556 iterations before I cancelled it!

Simon H

unread,
May 30, 2017, 3:01:39 PM5/30/17
to Node-RED
just for the record, I think my issues are down to MTU.
I *think* that the MTU (or possibly MSS) is changing on Plusnet's network (i.e. it routes through different routes, and TCP does not cope with the MTU changing from packet to packet).

Having said that, I made some changes in OpenWRT (LEDE), and it's now working; but I don't know if they have also made changes :(.

S

Reply all
Reply to author
Forward
0 new messages