etcd2: etcdserver: could not get cluster response: dial tcp: connection refused

1,851 views
Skip to first unread message

gnresende

unread,
Aug 4, 2015, 8:37:09 PM8/4/15
to CoreOS User
Thanks in advance for your help. I really tried to solve it by my self but I can't.

I'm starting with coreos this week and trying to configure a "Easy Development/Testing Cluster" as described in docs "https://coreos.com/os/docs/latest/cluster-architectures.html". Etcd role started without problem and I can check it by etcctl member list. At the worker role I'm having problems with etcd2 with the following message:

etcd2[528]: 2015/08/04 23:48:36 etcdserver: could not get cluster response from http://192.168.56.10:2380: Get http://192.168.56.10:2380/members: dial tcp 192.168.56.10:2380: connection refused

I can curl this URI without any problem and get the values for /members.

Someone can help me? I'm really sorry for not figure out what is happening here.

Thanks.
 

anton....@coreos.com

unread,
Aug 5, 2015, 4:01:57 AM8/5/15
to CoreOS User
Hi!

Have to ask some questions to clarify your problem.
Did you try to tune etcd cluster options? https://coreos.com/etcd/docs/latest/tuning.html
What is your ping time between hosts?
Did you install these machines in one network?
Can you post full journal logs from your proxy node? journalctl -b -u etcd2 --no-pager

Regards,
Anton

gnresende

unread,
Aug 5, 2015, 3:26:10 PM8/5/15
to CoreOS User
Hi Anton, 

Thank you for your attention.

Did you install these machines in one network?
I have three VirtualBox VMs in my notebook with VirtualBox Host-Only Ethernet Adapter configured with IP range 192.168.56.x. One VM with UBUNTU for development tools and running etcd-v2.1.0-alpha.1-linux-amd64 ("2015/08/5 15:37:48 etcdserver: published {Name:default ClientURLs:[http://192.168.56.10:2379]} to cluster 7e27652122e8b2ae"). Other two VMs with CoreOS and I configured these two nodes exactly as mentioned in "Easy Development/Testing Cluster" ("https://coreos.com/os/docs/latest/cluster-architectures.html"). The etcd service on UBUNTU is configured as described in this doc too.

Did you try to tune etcd cluster options? https://coreos.com/etcd/docs/latest/tuning.html
No, I'm sorry. I will read it but not apply until we can figure out what is happing with these services.

What is your ping time between hosts?
CoreOS Node1 (192.168.56.20) to DEVNODE (192.168.56.10)
PING 192.168.56.10 (192.168.56.10) 56(84) bytes of data.
64 bytes from 192.168.56.10: icmp_seq=1 ttl=64 time=0.522 ms
64 bytes from 192.168.56.10: icmp_seq=2 ttl=64 time=1.09 ms

CoreOS Node 2 (192.168.56.50) to DEVNODE (192.168.56.10)
PING 192.168.56.10 (192.168.56.10) 56(84) bytes of data.
64 bytes from 192.168.56.10: icmp_seq=1 ttl=64 time=0.338 ms
64 bytes from 192.168.56.10: icmp_seq=2 ttl=64 time=0.702 ms

DEVNODE to CoreOS Node 1
PING 192.168.56.20 (192.168.56.20) 56(84) bytes of data.
64 bytes from 192.168.56.20: icmp_seq=1 ttl=64 time=0.417 ms
64 bytes from 192.168.56.20: icmp_seq=2 ttl=64 time=0.527 ms
64 bytes from 192.168.56.20: icmp_seq=3 ttl=64 time=0.582 ms

Can you post full journal logs from your proxy node? journalctl -b -u etcd2 --no-pager
Well... so interesting. Yesterday during my posting I can see the address 192.168.56.10 in these log. Today only localhost and I'm not sure why? I have declared in cloud-config the correct address. I will attach the cloud-config too (user_data) following. 

-- Logs begin at Tue 2015-08-04 12:04:56 , end at Wed 2015-08-05 18:51:18 . --
Aug 05 18:50:43 vmnode1 systemd[1]: Started etcd2.
Aug 05 18:50:43 vmnode1 systemd[1]: Starting etcd2...
Aug 05 18:50:43 vmnode1 etcd2[521]: 2015/08/05 18:50:43 etcd: already initialized as proxy before, starting as etcd proxy...
Aug 05 18:50:43 vmnode1 etcd2[521]: 2015/08/05 18:50:43 proxy: using peer urls [http://localhost:2380 http://localhost:7001] from cluster file .//var/lib/etcd2/proxy/cluster
Aug 05 18:50:43 vmnode1 etcd2[521]: 2015/08/05 18:50:43 etcdserver: could not get cluster response from http://localhost:2380: Get http://localhost:2380/members: dial tcp 127.0.0.1:2380: connection refused
Aug 05 18:50:43 vmnode1 etcd2[521]: 2015/08/05 18:50:43 etcdserver: could not get cluster response from http://localhost:7001: Get http://localhost:7001/members: dial tcp 127.0.0.1:7001: connection refused
Aug 05 18:50:43 vmnode1 etcd2[521]: 2015/08/05 18:50:43 proxy: etcdserver: could not retrieve cluster information from the given urls
Aug 05 18:50:43 vmnode1 etcd2[521]: 2015/08/05 18:50:43 proxy: listening for client requests on localhost:2379
Aug 05 18:50:43 vmnode1 etcd2[521]: 2015/08/05 18:50:43 proxy: zero endpoints currently available
Aug 05 18:50:43 vmnode1 etcd2[521]: 2015/08/05 18:50:43 proxy: zero endpoints currently available
Aug 05 18:50:43 vmnode1 etcd2[521]: 2015/08/05 18:50:43 proxy: zero endpoints currently available
Aug 05 18:50:44 vmnode1 etcd2[521]: 2015/08/05 18:50:44 proxy: zero endpoints currently available
Aug 05 18:50:44 vmnode1 etcd2[521]: 2015/08/05 18:50:44 proxy: zero endpoints currently available
Aug 05 18:50:44 vmnode1 etcd2[521]: 2015/08/05 18:50:44 proxy: zero endpoints currently available
Aug 05 18:50:44 vmnode1 etcd2[521]: 2015/08/05 18:50:44 proxy: zero endpoints currently available
Aug 05 18:50:45 vmnode1 etcd2[521]: 2015/08/05 18:50:45 proxy: zero endpoints currently available
Aug 05 18:50:46 vmnode1 etcd2[521]: 2015/08/05 18:50:46 proxy: zero endpoints currently available
Aug 05 18:50:46 vmnode1 etcd2[521]: 2015/08/05 18:50:46 proxy: zero endpoints currently available
Aug 05 18:50:46 vmnode1 etcd2[521]: 2015/08/05 18:50:46 proxy: zero endpoints currently available
Aug 05 18:50:47 vmnode1 etcd2[521]: 2015/08/05 18:50:47 proxy: zero endpoints currently available
Aug 05 18:50:47 vmnode1 etcd2[521]: 2015/08/05 18:50:47 proxy: zero endpoints currently available
Aug 05 18:50:47 vmnode1 etcd2[521]: 2015/08/05 18:50:47 proxy: zero endpoints currently available
Aug 05 18:50:47 vmnode1 etcd2[521]: 2015/08/05 18:50:47 proxy: zero endpoints currently available
Aug 05 18:50:47 vmnode1 etcd2[521]: 2015/08/05 18:50:47 proxy: zero endpoints currently available
Aug 05 18:50:47 vmnode1 etcd2[521]: 2015/08/05 18:50:47 proxy: zero endpoints currently available
Aug 05 18:50:47 vmnode1 etcd2[521]: 2015/08/05 18:50:47 proxy: zero endpoints currently available
Aug 05 18:50:48 vmnode1 etcd2[521]: 2015/08/05 18:50:48 proxy: zero endpoints currently available
Aug 05 18:50:50 vmnode1 etcd2[521]: 2015/08/05 18:50:50 proxy: zero endpoints currently available
Aug 05 18:50:50 vmnode1 etcd2[521]: 2015/08/05 18:50:50 proxy: zero endpoints currently available
Aug 05 18:50:50 vmnode1 etcd2[521]: 2015/08/05 18:50:50 proxy: zero endpoints currently available
Aug 05 18:50:51 vmnode1 etcd2[521]: 2015/08/05 18:50:51 proxy: zero endpoints currently available
Aug 05 18:50:51 vmnode1 etcd2[521]: 2015/08/05 18:50:51 proxy: zero endpoints currently available
Aug 05 18:50:51 vmnode1 etcd2[521]: 2015/08/05 18:50:51 proxy: zero endpoints currently available
Aug 05 18:50:51 vmnode1 etcd2[521]: 2015/08/05 18:50:51 proxy: zero endpoints currently available
Aug 05 18:50:52 vmnode1 etcd2[521]: 2015/08/05 18:50:52 proxy: zero endpoints currently available
Aug 05 18:50:56 vmnode1 etcd2[521]: 2015/08/05 18:50:56 proxy: zero endpoints currently available
Aug 05 18:50:56 vmnode1 etcd2[521]: 2015/08/05 18:50:56 proxy: zero endpoints currently available
Aug 05 18:50:56 vmnode1 etcd2[521]: 2015/08/05 18:50:56 proxy: zero endpoints currently available
Aug 05 18:50:57 vmnode1 etcd2[521]: 2015/08/05 18:50:57 proxy: zero endpoints currently available
Aug 05 18:50:57 vmnode1 etcd2[521]: 2015/08/05 18:50:57 proxy: zero endpoints currently available
Aug 05 18:50:57 vmnode1 etcd2[521]: 2015/08/05 18:50:57 proxy: zero endpoints currently available
Aug 05 18:50:57 vmnode1 etcd2[521]: 2015/08/05 18:50:57 proxy: zero endpoints currently available
Aug 05 18:50:57 vmnode1 etcd2[521]: 2015/08/05 18:50:57 proxy: zero endpoints currently available
Aug 05 18:50:57 vmnode1 etcd2[521]: 2015/08/05 18:50:57 proxy: zero endpoints currently available
Aug 05 18:50:58 vmnode1 etcd2[521]: 2015/08/05 18:50:58 proxy: zero endpoints currently available
Aug 05 18:50:58 vmnode1 etcd2[521]: 2015/08/05 18:50:58 proxy: zero endpoints currently available
Aug 05 18:51:06 vmnode1 etcd2[521]: 2015/08/05 18:51:06 proxy: zero endpoints currently available
Aug 05 18:51:06 vmnode1 etcd2[521]: 2015/08/05 18:51:06 proxy: zero endpoints currently available
Aug 05 18:51:06 vmnode1 etcd2[521]: 2015/08/05 18:51:06 proxy: zero endpoints currently available
Aug 05 18:51:07 vmnode1 etcd2[521]: 2015/08/05 18:51:07 proxy: zero endpoints currently available
Aug 05 18:51:07 vmnode1 etcd2[521]: 2015/08/05 18:51:07 proxy: zero endpoints currently available
Aug 05 18:51:07 vmnode1 etcd2[521]: 2015/08/05 18:51:07 proxy: zero endpoints currently available
Aug 05 18:51:07 vmnode1 etcd2[521]: 2015/08/05 18:51:07 proxy: zero endpoints currently available
Aug 05 18:51:08 vmnode1 etcd2[521]: 2015/08/05 18:51:08 proxy: zero endpoints currently available
Aug 05 18:51:13 vmnode1 etcd2[521]: 2015/08/05 18:51:13 etcdserver: could not get cluster response from http://localhost:2380: Get http://localhost:2380/members: dial tcp 127.0.0.1:2380: connection refused
Aug 05 18:51:13 vmnode1 etcd2[521]: 2015/08/05 18:51:13 etcdserver: could not get cluster response from http://localhost:7001: Get http://localhost:7001/members: dial tcp 127.0.0.1:7001: connection refused
Aug 05 18:51:13 vmnode1 etcd2[521]: 2015/08/05 18:51:13 proxy: etcdserver: could not retrieve cluster information from the given urls
Aug 05 18:51:18 vmnode1 etcd2[521]: 2015/08/05 18:51:18 proxy: zero endpoints currently available
Aug 05 18:51:18 vmnode1 etcd2[521]: 2015/08/05 18:51:18 proxy: zero endpoints currently available
Aug 05 18:51:18 vmnode1 etcd2[521]: 2015/08/05 18:51:18 proxy: zero endpoints currently available

Cloud-config 
#cloud-config

hostname: vmnode1

coreos: 
  etcd2:
    proxy: on
    listen-client-urls: http://localhost:2379
    initial-cluster: etcdserver=http://192.168.56.10:2380

coreos:
  fleet:
    etcd_servers: "http://localhost:2379"

coreos:
  units:
    - name: etcd2.service
      command: start
    - name: fleet.service
      command: start
    - name: static.network
      content: |
        [Match]
        Name=enp0s3

        [Network]
        Address=192.168.56.20/24

users:
  - name: gnresende
    passwd: xxxxxxxxxxxxxxxxxxxxxxx
    groups:
      - sudo
      - docker
    ssh_authorized_keys:
      - ssh-rsa xxxxxxxxxxxxxxxxxxxxx
 

Let me know if you need more information.

Thank you.

gnresende

unread,
Aug 5, 2015, 5:15:09 PM8/5/15
to CoreOS User
Anton,

just to complement, this is the message that I got on journal yesterday:

Aug 04 23:38:06 vmnode1 etcd2[528]: 2015/08/04 23:38:06 etcdserver: could not get cluster response from http://192.168.56.10:2380: Get http://192.168.56.10:2380/members: dial tcp 192.168.56.10:2380: connection refused
Aug 04 23:38:06 vmnode1 etcd2[528]: 2015/08/04 23:38:06 proxy: etcdserver: could not retrieve cluster information from the given urls

And if I test with CURL, seems to be ok:

[{"id":14855829450254237642,"peerURLs":["http://localhost:2380","http://localhost:7001"],"name":"default","clientURLs":["http://192.168.56.10:2379"]}]

Regards,

On Tuesday, August 4, 2015 at 9:37:09 PM UTC-3, gnresende wrote:

Yicheng Qin

unread,
Aug 5, 2015, 6:21:02 PM8/5/15
to gnresende, CoreOS User
It may be related to this issue: https://github.com/coreos/etcd/issues/3215

The getClusterFromRemotePeers has some fixed timeout today, which is 1s. Considering your TTL is 500ms, it is easy to timeout.

--
You received this message because you are subscribed to the Google Groups "CoreOS User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to coreos-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

gnresende

unread,
Aug 7, 2015, 5:25:52 AM8/7/15
to CoreOS User, gil.n....@gmail.com
Hi Yicheng,

I don't think that is my root cause. As you can see my latency (ping) is very low and all VMs are on my notebook (VirtualBox):

PING 192.168.56.10 (192.168.56.10) 56(84) bytes of data.
64 bytes from 192.168.56.10: icmp_seq=1 ttl=64 time=0.522 ms
64 bytes from 192.168.56.10: icmp_seq=2 ttl=64 time=1.09 ms

Thanks for you attention and help.

Regards.

Yicheng Qin

unread,
Aug 7, 2015, 5:59:52 PM8/7/15
to gnresende, CoreOS User
I read the wrong number of TTL value.

Did you CURL from vmnode1to  http://192.168.56.10:2380? could you record the time it used to finish the curl?

gnresende

unread,
Aug 7, 2015, 6:05:32 PM8/7/15
to CoreOS User
Between 0.006s and 0.014s for the following command:


Thank you.


On Tuesday, August 4, 2015 at 9:37:09 PM UTC-3, gnresende wrote:

Yicheng Qin

unread,
Aug 7, 2015, 7:47:21 PM8/7/15
to gnresende, CoreOS User
`time curl 192.168.56.10/members` it should for 2380 port, right?

Could you share the command line/systemd service file with us? And the member list(https://github.com/coreos/etcd/blob/master/Documentation/other_apis.md#list-members)?

The "peerURLs":["http://localhost:2380","http://localhost:7001"] should be your peer urls, which is http://192.168.56.10:2380, and you should set -listen-peer-urls to be same.

--

gnresende

unread,
Aug 7, 2015, 8:01:59 PM8/7/15
to CoreOS User, gil.n....@gmail.com
I'm sorry for the mistake...

I'm not sure if I undestood your request "command line/systemd service file" but I thought that you are talking about etcd service, right? Following is the command line for etcd service:

./etcd -initial-cluster "etcdserver=http://192.168.56.10:2380" -initial-advertise-peer-urls "http://192.168.56.10:2380" -listen-client-urls "http://0.0.0.0:2379,http://0.0.0.0:4001" -advertise-client-urls "http://192.168.56.10:2379" -listen-peer-urls "http://192.168.56.10:2380"

And for the members list... I think that you got the root cause:

vmnode1 ~ # curl 192.168.56.10:2380/v2/members
404 page not found
vmnode1 ~ # curl 192.168.56.10:2380/members

[{"id":14855829450254237642,"peerURLs":["http://localhost:2380","http://localhost:7001"],"name":"default","clientURLs":["http://192.168.56.10:2379"]}]
vmnode1 ~ #


Problably I'm not running etcd2, right?

Thank you.

gnresende

unread,
Aug 7, 2015, 8:09:03 PM8/7/15
to CoreOS User, gil.n....@gmail.com
One more complement, etcd log:

2015/08/7 19:02:37 etcdmain: setting maximum number of CPUs to 1, total number of available CPUs is 2
2015/08/7 19:02:37 etcdmain: no data-dir provided, using default data-dir ./default.etcd
2015/08/7 19:02:37 etcdmain: the server is already initialized as member before, starting as etcd member...
2015/08/7 19:02:37 etcdmain: listening for peers on http://192.168.56.10:2380
2015/08/7 19:02:37 etcdmain: listening for client requests on http://0.0.0.0:2379
2015/08/7 19:02:37 etcdmain: listening for client requests on http://0.0.0.0:4001
2015/08/7 19:02:37 etcdserver: recovered store from snapshot at index 300030
2015/08/7 19:02:37 etcdserver: name = default
2015/08/7 19:02:37 etcdserver: data dir = default.etcd
2015/08/7 19:02:37 etcdserver: member dir = default.etcd/member
2015/08/7 19:02:37 etcdserver: heartbeat = 100ms
2015/08/7 19:02:37 etcdserver: election = 1000ms
2015/08/7 19:02:37 etcdserver: snapshot count = 10000
2015/08/7 19:02:37 etcdserver: advertise client URLs = http://192.168.56.10:2379
2015/08/7 19:02:37 etcdserver: loaded cluster information from store: <nil>
2015/08/7 19:02:38 etcdserver: restarting member ce2a822cea30bfca in cluster 7e27652122e8b2ae at commit index 302378
2015/08/7 19:02:38 raft: ce2a822cea30bfca became follower at term 13
2015/08/7 19:02:38 raft: newRaft ce2a822cea30bfca [peers: [ce2a822cea30bfca], term: 13, commit: 302378, applied: 300030, lastindex: 302378, lastterm: 13]
2015/08/7 19:02:38 etcdserver: starting server... [version: 2.1.0-alpha.1, cluster version: 2.1.0]
2015/08/7 19:02:39 raft: ce2a822cea30bfca is starting a new election at term 13
2015/08/7 19:02:39 raft: ce2a822cea30bfca became candidate at term 14
2015/08/7 19:02:39 raft: ce2a822cea30bfca received vote from ce2a822cea30bfca at term 14
2015/08/7 19:02:39 raft: ce2a822cea30bfca became leader at term 14
2015/08/7 19:02:39 raft: raft.node: ce2a822cea30bfca elected leader ce2a822cea30bfca at term 14
2015/08/7 19:02:39 etcdserver: published {Name:default ClientURLs:[http://192.168.56.10:2379]} to cluster 7e27652122e8b2ae
2015/08/7 20:06:25 etcdserver: start to snapshot (applied: 310031, lastsnap: 300030)
2015/08/7 20:06:25 etcdserver: saved snapshot at index 310031
2015/08/7 20:06:25 etcdserver: compacted raft log at 305031
2015/08/7 20:06:38 fileutil: purged file default.etcd/member/snap/000000000000000d-000000000003f7ba.snap successfully

Thank you.

gnresende

unread,
Aug 11, 2015, 7:02:40 AM8/11/15
to CoreOS User, gil.n....@gmail.com
Hi Qin,

I tried some changes here and can't solve the problem - even with your help. Do you have some other tip to help me with this situation.

Thank you in advance.

Yicheng Qin

unread,
Aug 11, 2015, 5:56:52 PM8/11/15
to gnresende, CoreOS User
I think you start etcd without any flag at the first time, so it registers its peerURLs to "http://localhost:2380","http://localhost:7001". When reboot, etcd ignores -initial flags and load data from data dir, so the peerURL is always "http://localhost:2380","http://localhost:7001". You could update it through https://github.com/coreos/etcd/blob/master/Documentation/other_apis.md#change-the-peer-urls-of-a-member

Considering the server listens on `listening for peers on http://192.168.56.10:2380`, the etcd proxy should be able to connect to it. Does it have firewall, or etcd has enough file descriptors? Could you try to run one etcd proxy on the same node to see whether it could work?

gnresende

unread,
Aug 13, 2015, 8:29:55 AM8/13/15
to CoreOS User, gil.n....@gmail.com
Hi Yicheng,

It's working now!

I removed the etcd on my UBUNTU node (not on CoreOS) and reinstall it taking care about the addresses. I don't mentioned localhost in any part of my script that start etcd but after started, peersURL have the localhost value. I used your tip to change that values:
curl http://192.168.56.10:2379/v2/members/ce2a822cea30bfca -XPUT \
-H "Content-Type: application/json" -d '{"peerURLs":["http://192.168.56.10:2380"]}'
After this change, and after remove /var/lib/etcd2/proxy/cluster file on CoreOs node, etcd proxy works fine and fleet also started well. 

So, thank you for your help and attention. Let me know if you need more information about my case to check if it is any bug or something that need to be documented.

Now I will start to use my CoreOS and Docker environment to develop a new product.

If you need some more information or something from Brazil, let me know.

Best regards!

Yicheng Qin

unread,
Aug 14, 2015, 1:50:41 AM8/14/15
to gnresende, CoreOS User
Glad to know that it works!

You could use --initial-advertise-peer-urls to set peer urls at the first bootstrap. See https://github.com/coreos/etcd/blob/master/Documentation/configuration.md#-initial-advertise-peer-urls for more details. The localhost is the default setting.

Moreover, for update peer urls, you could try to use `etcdctl member update`, which is released in 2.2.0-alpha

Let us know if you have any problem.

Cheers!
Reply all
Reply to author
Forward
0 new messages