Hey Philip, thanks for responding. Sorry for my confusion there, and thanks for clarifying how it's supposed to work. Makes intuitive sense now that I think a little harder about the cluster-control problem.
In my defense though... :-) ... I had already tried what you suggested and it also fails with a fairly dire-looking error, which made it easier for me to fall into the wrong assumption. I just retried per your suggestion in a freshly-minted GCE VM, and I still see the same error in that vanilla environment. Here are the steps, in case the problem is still in my own (mis)understanding, and/or if anyone wants to try.
[...apt-based install of
docker.io, add user to the "docker" group, etc...]
In shell A:
$ docker network create foobar
$ docker run --rm --name=rqlite1 --network=foobar rqlite/rqlite
In shell B:
$ docker run --rm --name=rqlite2 --network=foobar rqlite/rqlite -join rqlite1:4001
That's pretty simple - so I'm curious if you think that the above _should_ work? Or can you suggest a corrected but equally simple variant of it? (FWIW, a couple of reference command-lines of this sort would be really helpful on the markdown README for rqlite-docker and/or the dockerhub page for rqlite/rqlite.)
The first instance starts up OK. This is its output prior to starting the second instance;
[rqlited] 2021/05/22 15:39:03 rqlited starting, version v5.12.1, commit 181761d953fad4ce05d52ceeee70718be18cbee1, branch master
[rqlited] 2021/05/22 15:39:03 go1.15.7, target architecture is amd64, operating system target is linux
[rqlited] 2021/05/22 15:39:03 launch command: rqlited -http-addr
0.0.0.0:4001 -raft-addr
0.0.0.0:4002 /rqlite/file/data
[rqlited] 2021/05/22 15:39:03 no preexisting node state detected in /rqlite/file/data, node may be bootstrapping
[rqlited] 2021/05/22 15:39:03 no join addresses set
[store] 2021/05/22 15:39:03 opening store with node ID
0.0.0.0:4002[store] 2021/05/22 15:39:03 ensuring directory at /rqlite/file/data exists
[store] 2021/05/22 15:39:03 0 preexisting snapshots present
[store] 2021/05/22 15:39:03 first log index: 0, last log index: 0, last command log index: 0:
2021-05-22T15:39:03.522Z [INFO] raft: initial configuration: index=0 servers=[]
[store] 2021/05/22 15:39:03 executing new cluster bootstrap
2021-05-22T15:39:03.522Z [INFO] raft: entering follower state: follower="Node at [::]:4002 [Follower]" leader=
2021-05-22T15:39:05.521Z [WARN] raft: heartbeat timeout reached, starting election: last-leader=
2021-05-22T15:39:05.521Z [INFO] raft: entering candidate state: node="Node at [::]:4002 [Candidate]" term=2
2021-05-22T15:39:05.525Z [INFO] raft: election won: tally=1
2021-05-22T15:39:05.525Z [INFO] raft: entering leader state: leader="Node at [::]:4002 [Leader]"
[store] 2021/05/22 15:39:05 waiting for up to 2m0s for application of initial logs
[http] 2021/05/22 15:39:05 service listening on [::]:4001
[rqlited] 2021/05/22 15:39:05 node is ready
When I start the second instance, the first instance starts repeating the following;
[store] 2021/05/22 15:42:03 received request to join node at
0.0.0.0:4002[store] 2021/05/22 15:42:03 failed to remove node: need at least one voter in configuration: {[]}
[store] 2021/05/22 15:42:09 received request to join node at
0.0.0.0:4002[store] 2021/05/22 15:42:09 failed to remove node: need at least one voter in configuration: {[]}
[store] 2021/05/22 15:42:14 received request to join node at
0.0.0.0:4002[store] 2021/05/22 15:42:14 failed to remove node: need at least one voter in configuration: {[]}
And here is the output of the second instance;
[store] 2021/05/22 15:42:03 first log index: 0, last log index: 0, last command log index: 0:
2021-05-22T15:42:03.997Z [INFO] raft: initial configuration: index=0 servers=[]
[store] 2021/05/22 15:42:03 no cluster bootstrap requested
[rqlited] 2021/05/22 15:42:03 join addresses are: [rqlite1:4001]
2021-05-22T15:42:03.997Z [INFO] raft: entering follower state: follower="Node at [::]:4002 [Follower]" leader=
[cluster-join] 2021/05/22 15:42:03 failed to join cluster at [rqlite1:4001]: failed to join, node returned: 500 Internal Server Error: (need at least one voter in configuration: {[]}), sleeping 5s before retry
2021-05-22T15:42:05.817Z [WARN] raft: no known peers, aborting election
[cluster-join] 2021/05/22 15:42:09 failed to join cluster at [rqlite1:4001]: failed to join, node returned: 500 Internal Server Error: (need at least one voter in configuration: {[]}), sleeping 5s before retry
[cluster-join] 2021/05/22 15:42:14 failed to join cluster at [rqlite1:4001]: failed to join, node returned: 500 Internal Server Error: (need at least one voter in configuration: {[]}), sleeping 5s before retry
BTW, this is the first time I tried to run this in an un-hacked-up environment, and there had also been plenty of other moving parts up till now, so one thing I didn't look at was trying different rqlite versions. I figured that "rqlite/rqlite" on dockerhub was likely to map to a stable release, but if I'm unwittingly running a dev snapshot or should check different versions for any other reason, I'm thankful for any suggestions.
Best,
Geoff