Cause of ETIMEDOUT?

17,080 views
Skip to first unread message

Felix E. Klee

unread,
Dec 22, 2012, 3:42:29 PM12/22/12
to redi...@googlegroups.com
In the logs of a Node.js app deployed to Nodejitsu, I frequently (once
every couple of hours or so) get error messages such as:

Redis connection to 165.225.131.162:10657 failed - read ETIMEDOUT

What could be the cause of these error messages?

I am asking because, for testing, I ran the app locally and shut down
the Redis server, and the error message is totally different:

[Error: Redis connection to 192.168.233.130:6379 failed - connect
ECONNREFUSED]

Marc Gravell

unread,
Dec 22, 2012, 3:55:21 PM12/22/12
to redi...@googlegroups.com
The following is just "in general", rather than nodejitsu...

I would normally expect "refused" to mean: it could talk to the other machine, but the redis-server software was not listening on the port (or was blocked by firewall), where-as a connect timeout would usually (from what I see, at least) mean: it couldn't get hold of the other machine *at all* (especially if accessing it by name instead of IP).

Marc



--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.




--
Regards,

Marc

Felix E. Klee

unread,
Dec 22, 2012, 5:40:44 PM12/22/12
to redi...@googlegroups.com
Marc Gravell wrote:
> a connect timeout would usually (from what I see, at least) mean: it
> couldn't get hold of the other machine *at all* (especially if
> accessing it by name instead of IP).

Interesting, but when the machine is not reachable, then the error is
neither `ETIMEDOUT` nor `ECONNREFUSED`. It is `EHOSTUNREACH`:

[Error: Redis connection to 165.225.131.162:10657 failed - connect
EHOSTUNREACH]

(from a local test where I switched of wifi)

So the question stands: What's the possible cause for `ETIMEDOUT` here?

Greg Andrews

unread,
Dec 22, 2012, 5:58:41 PM12/22/12
to redi...@googlegroups.com
The 'E' is for Error.  The rest of the error name is 'TIMED OUT'.  Your Node.js application has a timeout defined for when it sends a command to Redis.  It must receive a reply from Redis within the defined time period, or it gives up on the command/response and logs the error you see.  The app might also take other actions like breaking that connection to the Redis server and trying to open another.

There is a wide range of things that could cause your application to think Redis was too slow in returning the response.  Anything from your app's timeout being too short, through intermittent network troubles between your Node.js machine and the Redis machine, or Redis is configured to save db snapshots to a slow disk, all the way to something is sending Redis a command that takes a long time to execute, so all the other clients are made to wait.

There isn't just one cause of slow responses from Redis, so the tiny amount of info you gave us doesn't point to the cause of your error messages.  The most recent versions of Redis have a "slow log" file where long-running commands will be written.  That would be one way to find out if these errors are caused by a long-running command.  Testing the network or verifying that snapshot disk writes are happening at ETIMEDOUT times will require you to do further checking and testing.  If you post he timeout in your app or Node.js config, folks here on the list can comment on whether ot not it might be too short.

  -Greg

Matthew Palmer

unread,
Dec 22, 2012, 6:05:31 PM12/22/12
to redi...@googlegroups.com
On Sat, Dec 22, 2012 at 11:40:44PM +0100, Felix E. Klee wrote:
> Marc Gravell wrote:
> > a connect timeout would usually (from what I see, at least) mean: it
> > couldn't get hold of the other machine *at all* (especially if
> > accessing it by name instead of IP).
>
> Interesting, but when the machine is not reachable, then the error is
> neither `ETIMEDOUT` nor `ECONNREFUSED`. It is `EHOSTUNREACH`:
>
> [Error: Redis connection to 165.225.131.162:10657 failed - connect
> EHOSTUNREACH]
>
> (from a local test where I switched of wifi)

That's because your ARP failed and the local machine was able to detect that
and return a different error code. Try hitting an IP address out of your
local network and see how often you get EHOSTUNREACH instead of ETIMEDOUT.

> So the question stands: What's the possible cause for `ETIMEDOUT` here?

The two most likely causes of a connection timeout are: 1) the machine is
down; 2) firewalling along the way is dropping your packets (rather than
rejecting them, which is the polite-but-uncommon way to firewall off traffic
you don't want to deal with). Do a tcpdump at each end of the connection
and I'll bet the answer will become fairly obvious, both what the cause is
and where the firewall is that's doing the blocking (if it's a firewall
problem).

- Matt

--
Just because we work at a University doesn't mean we're surrounded by smart
people.
-- Brian Kantor, in the monastery

Felix E. Klee

unread,
Dec 22, 2012, 7:15:33 PM12/22/12
to redi...@googlegroups.com
On Sat, Dec 22, 2012 at 11:58 PM, Greg Andrews <hva...@gmail.com>
wrote:
> The 'E' is for Error. The rest of the error name is 'TIMED OUT'.

Yes, I've discovered that after staring at the string.

> Your Node.js application has a timeout defined

No timeout explicitly defined. I'm using `node_redis` to connect. In
fact, looking at the code and at the `node_redis` documentation, I don't
even see the possibility to set a timeout.

I would like to lower the timeout to trigger the error on my development
system.

> or it gives up on the command/response and logs the error you see.

The weird thing is that the error somehow bypasses the normal error
handling. Normally, when the app listens to the `node_redis` `error`
event, then the connection is reopened. In the case of `ETIMEDOUT`, the
`error` event is somehow not emitted, and there is no attempt to
reconnect. Relevant code from `node_redis` (possibly the first `return`
is hit):

var message = "Redis connection to " + this.host + ":" + this.port
+ " failed - " + msg;

if (this.closing) {
return;
}

if (exports.debug_mode) {
console.warn(message);
}

this.flush_and_error(message);

this.connected = false;
this.ready = false;

this.emit("error", new Error(message));
// "error" events get turned into exceptions if they aren't
listened for. If the user handled this error
// then we should try to reconnect.
this.connection_gone("error");

> There isn't just one cause of slow responses from Redis, so the tiny
> amount of info you gave us doesn't point to the cause of your error
> messages.

According to OpenRedis, the machine I am on is underutilized. In fact, I
was the first user of OpenRedis on Joyent (they set up the server
specifically on my request). Output of `INFO` is attached below.

> The most recent versions of Redis have a "slow log" file where
> long-running commands will be written.

That log is empty, as expected: My app is extremely light weight on the
database. There are only a couple of bytes written once every couple of
minutes or so. Also, while there are LUA scripts, they are not long
running.

Greg, thanks a lot for the detailed reply!

Finally, the output of `INFO`:

# Server
redis_version:2.6.6
redis_git_sha1:ed709555
redis_git_dirty:0
redis_mode:standalone
os:Linux 3.6.5-joyent-ubuntu-12-opt x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.6.3
process_id:26309
run_id:226b4a8911fd1608b1f4f80ef51d64519bbaa6e8
tcp_port:10657
uptime_in_seconds:1905326
uptime_in_days:22
lru_clock:1404348

# Clients
connected_clients:3
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

# Memory
used_memory:462968
used_memory_human:452.12K
used_memory_rss:2367488
used_memory_peak:4645600
used_memory_peak_human:4.43M
used_memory_lua:35840
mem_fragmentation_ratio:5.11
mem_allocator:jemalloc-3.2.0

# Persistence
loading:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1356217515
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:0
rdb_current_bgsave_time_sec:-1
aof_enabled:1
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:0
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_current_size:328883
aof_base_size:18871
aof_pending_rewrite:0
aof_buffer_length:0
aof_rewrite_buffer_length:0
aof_pending_bio_fsync:0
aof_delayed_fsync:0

# Stats
total_connections_received:85583
total_commands_processed:1733254
instantaneous_ops_per_sec:0
rejected_connections:0
expired_keys:0
evicted_keys:0
keyspace_hits:1480046
keyspace_misses:111723
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:277

# Replication
role:master
connected_slaves:0

# CPU
used_cpu_sys:1973.72
used_cpu_user:2489.85
used_cpu_sys_children:0.00
used_cpu_user_children:0.00

# Keyspace
db0:keys=18,expires=0

Felix E. Klee

unread,
Dec 23, 2012, 7:59:39 AM12/23/12
to redi...@googlegroups.com
Now I let the app run locally, connected with the production database
on OpenRedis. After some time, I got:

Redis connection gone from close event.

Weird.

Felix E. Klee

unread,
Dec 23, 2012, 5:09:45 PM12/23/12
to redi...@googlegroups.com
Further tests showed that the when the Nodejitsu logs show `Redis
connection to 165.225.131.162:10657 failed - read ETIMEDOUT`, then that
time out does not happen in the locally running instance, connected to
the same database!

First thought: The issue is related to Nodejitsu.

In any case, what I need is a way to reliably reconnect the client in
case the connection was dropped. Ideally, I would be able to reproduce
the problem locally, but so far I haven't been successful with that.

Felix E. Klee

unread,
Dec 25, 2012, 5:18:58 PM12/25/12
to redi...@googlegroups.com
Latest findings:

* When running the app on a different server (Amazon EC2 in Ireland),
then the problem does not seem to appear, and database access is
quick.

* Close inspection of the log suggests that the timeout happens more
than five minutes after a request. Sometimes, it doesn't happen, but
database access in general is slow.

So, it looks like this is an issue with Nodejitsu, which is weird
especially because the database server and the Nodejitsu server should
be in the same data center.

Bryce Baril

unread,
Dec 27, 2012, 11:56:42 AM12/27/12
to redi...@googlegroups.com
When you get this is it after a long gap in activity for your app?

I don't get these frequently, but I do have an app running on EC2 that gets them pathologically every few days. Given the infrequency and difficulty to reproduce for debug purposes I haven't looked into it much yet, but it sounds like it could be related. My gut feeling is there could be a bug in node-redis' keep-alive logic that loses accuracy of the server connection status.

In my case they are all running in the same EC2 region/availability zone as the Redis server, and are the least-active clients, with often days between Redis calls. I've yet to see anything similar on my other much more active node/redis jobs.


Felix E. Klee

unread,
Dec 28, 2012, 7:16:49 AM12/28/12
to redi...@googlegroups.com
On Thu, Dec 27, 2012 at 5:56 PM, Bryce Baril <vati...@gmail.com> wrote:
> When you get this is it after a long gap in activity for your app?

Looking at the logs, this could indeed be the case, where "long" would be
"some minutes".

> I don't get these frequently, but I do have an app running on EC2 that
> gets them pathologically every few days.

Thing is, I only notice the timeouts on Nodejitsu (Joyent). With my app
hosted on EC2, communication with Redis is fast.

Note that the EC2 and the Redis server are on different continents,
while the Nodejitsu and the Redis server are likely in the same data
center!

Felix E. Klee

unread,
Apr 14, 2013, 1:25:01 PM4/14/13
to redi...@googlegroups.com, Jacob Friis Saxberg
On Sat, Apr 6, 2013 at 10:43 AM, Jacob Friis Saxberg <ja...@webcom.dk> wrote:
> Did you find a solution?

I moved my app to Amazon's EC2, and performance and stability is quite good:

<http://sq.rotogame.com>

Redis is provided by OpenRedis. Hosting Redis on EC2 as well should
give a considerable performance boost.

Still, I prefer hosted solutions. So, perhaps one day I will try
Nodejitsu again, with IrisCouch Redis, which currently is the official
Redis provider for Nodejistu:

<https://www.nodejitsu.com/paas/addons/>

Oh, I just noticed that the hiscores tables in the game (link above)
are reset to an older version. It almost looks like OpenRedis restored
an old backup. So, my statement above about stability should be
revised.

Clark Endrizzi

unread,
Aug 28, 2013, 1:50:20 AM8/28/13
to redi...@googlegroups.com, Jacob Friis Saxberg
Felix.

I am seeing this potentially too. I will monitor this closely but just signed up with openredis after the irisredis had pretty constant problems (not the same ones). I've so far seen this once (several hours later). Maybe this is acceptable but it doesn't seem like it should be to me.

I really hate to have to deal with supporting my own infrastructure with the likes of EC2 so I really hope I can get this solution working well (you would think this stuff, at least node.js, would work best on Joyent). I also happen to like how nodejitsu works (but I've seen my share of small problems here and there).

Also I have a biz account so my nodejitsu instance is dedicated (and even using the higher 512mb ram).

We will see :(.

Vlad Vega

unread,
Jan 31, 2014, 1:01:22 PM1/31/14
to redi...@googlegroups.com, Jacob Friis Saxberg
Hi,

Has anyone been able to solve this issue at all? I'm still getting ETIMEOUT every 1.5 hours when using nodejitsu + openredis on joyent use-east-1.

Vlad

Tom M

unread,
Feb 11, 2014, 10:16:30 AM2/11/14
to redi...@googlegroups.com
I'm also having same issue, every day I got my app off on nodejitsu....this is getting to be annoying... 

Itamar Haber

unread,
Feb 11, 2014, 10:46:43 AM2/11/14
to redi...@googlegroups.com
I may be on a wild goose chase here, but is there any chance that your app is using Node's hiredis/socket.io and doing UNSUBSCRIBE? If so, I've seen similar reports that may indicate there's an issue with that client.


--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.

To post to this group, send email to redi...@googlegroups.com.



--
Itamar Haber
VP Customer Relations | Garantia Data the commercial Redis company
Mobile (IL): +972 54-567-9692
Blog    Twitter    LinkedIn

Redis Cloud is a fully managed service for hosting and running infinitely-scalable & highly-available Redis.  
Memcached Cloud is a fully managed service for hosting and running highly-available Memcached.

Tadeu Panato

unread,
Feb 18, 2014, 1:23:07 PM2/18/14
to redi...@googlegroups.com
We had the same issue here and we tried the following thing in the network interface for each redis server.

first run the following to check if its already off
ethtool -k eth0

# sudo /sbin/ethtool -K eth0 tso off && sudo /sbin/ethtool -K eth0 gso off


works here.

Vlad Vega

unread,
Feb 19, 2014, 8:06:28 PM2/19/14
to redi...@googlegroups.com
I think I have found a fix finally => it's been running without error for 24 hours. Ping every 20min with a 30min openredis timeout: https://github.com/VladVega/openRedisOnNodejitsuIssue
Reply all
Reply to author
Forward
0 new messages