Jepsen for tendermint faults testing

241 views
Skip to first unread message
Assigned to abiramie...@gmail.com by me

Abiramie Shree TGR

unread,
Sep 18, 2019, 6:36:35 AM9/18/19
to Jepsen Talk
i am now working in Wipro Limited. Here, i need to conduct a test on Tendermint with the help of Jepsen tool. I found a paper regarding this https://jepsen.io/analyses/tendermint-0-10-2
Tendermint is a distributed, byzantine fault-tolerant consensus system designed to replicate arbitrary state machines. We experimentally verify Tendermint’s safety properties using its built-in key-value store, Merkleeyes, as the hosted state machine, while creating simple and complex network partitions, clock skew, process crashes, write-ahead-log truncation, simple byzantine faults, and ...
 But i am not clear with how to conduct a test by creating faults in Tendermint. 

Currently, i created tendermint testnet nodes with the help of docker-compose but i don't have any idea to proceed further.

Can you please help me with that and guide me to a proper material which will help me to conduct test on tendermint with the Jepsen tool

Thank you,
Abiramie Shree T G R

Kyle Kingsbury

unread,
Sep 18, 2019, 8:57:02 AM9/18/19
to ta...@jepsen.io

Have you taken a look at the Jepsen and Tendermint READMEs? They might be a helpful starting place.
https://github.com/jepsen-io/jepsen

--Kyle

Abiramie Shree TGR

unread,
Sep 18, 2019, 8:59:38 AM9/18/19
to Jepsen Talk
No i haven't seen that yet.. i will read that.. thanks for your reply.

Abiramie Shree TGR

unread,
Sep 27, 2019, 3:11:35 AM9/27/19
to Jepsen Talk
I read those READNME files.. but i not yet clear how to conduct the test in my desired docker-compose tendermint nodes.can you help me with that?


On Wednesday, September 18, 2019 at 6:27:02 PM UTC+5:30, Kyle Kingsbury wrote:

Kyle Kingsbury

unread,
Sep 30, 2019, 2:01:20 PM9/30/19
to ta...@jepsen.io
I don't know much about docker-compose, but in general, you'll need SSH access to each node running Tendermint. The Tendermint test readme has instructions for running a test against arbitrary nodes. Have you tried following those directions? What did you get stuck on? How did you try to resolve that problem?

I can offer more specific advice if I know more about what you've already tried.

--Kyle

--
You received this message because you are subscribed to the Google Groups "Jepsen Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to talk+uns...@jepsen.io.
To view this discussion on the web visit https://groups.google.com/a/jepsen.io/d/msgid/talk/ad402d0d-7bcb-4379-8067-2f1a5645bc74%40jepsen.io.

Abiramie Shree TGR

unread,
Oct 1, 2019, 1:35:34 AM10/1/19
to Jepsen Talk
Hi,
Thanks for your help. Actually i don't know how to make a SSH connection with my tendermint nodes. 
Actually i tried to implement according to this documentation https://github.com/jepsen-io/tendermint 
But, i don't know that i am doing it correctly or not. 
Thank you,
Abiramie Shree T G R
--Kyle

To unsubscribe from this group and stop receiving emails from it, send an email to ta...@jepsen.io.

Kyle Kingsbury

unread,
Oct 1, 2019, 10:55:46 AM10/1/19
to ta...@jepsen.io
On 10/1/19 1:35 AM, Abiramie Shree TGR wrote:
> Hi,
> Thanks for your help. Actually i don't know how to make a SSH connection with
> my tendermint nodes.

Okay, so that sounds like a good place to start. Figure out SSH first; that's a
prerequisite for Jepsen. If you prefer, we have a pre-built AWS cluster with SSH
credentials already in place, which should be able to run the Tendermint tests
out of the box:
https://aws.amazon.com/marketplace/pp/B01LZ7Y7U0?qid=1486758124485&sr=0-1&ref_=srh_res_product_title

--Kyle

Abiramie Shree TGR

unread,
Oct 3, 2019, 1:01:45 AM10/3/19
to Jepsen Talk
ok, thanks for your reply. i will check the link you gave..

Abiramie Shree TGR

unread,
Oct 4, 2019, 8:19:49 AM10/4/19
to Jepsen Talk
Is it any other way to made SSH connection with the Jepsen nodes
The link https://github.com/jepsen-io/tendermint gives these errors...

Screenshot from 2019-10-04 17-38-28.png


Screenshot from 2019-10-04 17-40-25.png


Screenshot from 2019-10-04 17-41-13.png


Screenshot from 2019-10-04 17-41-44.png


Can you help me to solve this error ??
And i also tried docker jepsen https://github.com/jepsen-io/jepsen/tree/master/docker where i can run test successfully. Is there any chance to run the tendermint docker-compose that i created can be fed into this jepsen docker https://github.com/jepsen-io/jepsen/tree/master/docker to run test??

My tendermint nodes using docker 

keerthi@keerthi-Precision-Tower-5810:~/projects/src/github.com/tendermint/tendermint$ make localnet-start
docker-compose down
Removing node0 ... done
Removing node3 ... done
Removing node2 ... done
Removing node1 ... done
Removing node4 ... done
Removing network tendermint_localnet
make[1]: Entering directory '/home/keerthi/projects/src/github.com/tendermint/tendermint/networks/local'
docker build --tag tendermint/localnode localnode
Sending build context to Docker daemon  4.608kB
Step 1/11 : FROM alpine:3.7
 ---> 6d1ef012b567
Step 2/11 : MAINTAINER Greg Szabo <gr...@tendermint.com>
 ---> Using cache
 ---> 938543951190
Step 3/11 : RUN apk update &&     apk upgrade &&     apk --no-cache add curl jq file
 ---> Using cache
 ---> 3a1cfb2d4620
Step 4/11 : VOLUME [ /tendermint ]
 ---> Using cache
 ---> 3713309f746d
Step 5/11 : WORKDIR /tendermint
 ---> Using cache
 ---> 8c08b4383bd7
Step 6/11 : EXPOSE 26656 26657
 ---> Using cache
 ---> 892d8ecddc36
Step 7/11 : ENTRYPOINT ["/usr/bin/wrapper.sh"]
 ---> Using cache
 ---> fdded4514508
Step 8/11 : CMD ["node", "--proxy_app", "kvstore"]
 ---> Using cache
 ---> 22dd207b0713
Step 9/11 : STOPSIGNAL SIGTERM
 ---> Using cache
 ---> 075e26ab2653
Step 10/11 : COPY wrapper.sh /usr/bin/wrapper.sh
 ---> Using cache
 ---> 12b82a8ce3ae
Step 11/11 : COPY config-template.toml /etc/tendermint/config-template.toml
 ---> Using cache
 ---> 35b56a738b06
Successfully built 35b56a738b06
Successfully tagged tendermint/localnode:latest
make[1]: Leaving directory '/home/keerthi/projects/src/github.com/tendermint/tendermint/networks/local'
I[2019-10-01|05:04:23.129] Generated private validator                  module=main keyFile=node0/config/priv_validator_key.json stateFile=node0/data/priv_validator_state.json
I[2019-10-01|05:04:23.129] Generated node key                           module=main path=node0/config/node_key.json
I[2019-10-01|05:04:23.129] Generated genesis file                       module=main path=node0/config/genesis.json
I[2019-10-01|05:04:23.204] Generated private validator                  module=main keyFile=node1/config/priv_validator_key.json stateFile=node1/data/priv_validator_state.json
I[2019-10-01|05:04:23.205] Generated node key                           module=main path=node1/config/node_key.json
I[2019-10-01|05:04:23.205] Generated genesis file                       module=main path=node1/config/genesis.json
I[2019-10-01|05:04:23.272] Generated private validator                  module=main keyFile=node2/config/priv_validator_key.json stateFile=node2/data/priv_validator_state.json
I[2019-10-01|05:04:23.272] Generated node key                           module=main path=node2/config/node_key.json
I[2019-10-01|05:04:23.272] Generated genesis file                       module=main path=node2/config/genesis.json
I[2019-10-01|05:04:23.347] Generated private validator                  module=main keyFile=node3/config/priv_validator_key.json stateFile=node3/data/priv_validator_state.json
I[2019-10-01|05:04:23.348] Generated node key                           module=main path=node3/config/node_key.json
I[2019-10-01|05:04:23.348] Generated genesis file                       module=main path=node3/config/genesis.json
I[2019-10-01|05:04:23.440] Generated private validator                  module=main keyFile=node4/config/priv_validator_key.json stateFile=node4/data/priv_validator_state.json
I[2019-10-01|05:04:23.440] Generated node key                           module=main path=node4/config/node_key.json
I[2019-10-01|05:04:23.440] Generated genesis file                       module=main path=node4/config/genesis.json
Successfully initialized 5 node directories
docker-compose up
Creating network "tendermint_localnet" with driver "bridge"
Creating node4 ... done
Creating node1 ... done
Creating node2 ... done
Creating node3 ... done
Creating node0 ... done
Attaching to node0, node2, node4, node1, node3
node0    | I[2019-10-01|05:04:31.737] Version info                                 module=main software=0.32.3 block=10 p2p=7
node2    | I[2019-10-01|05:04:31.931] Version info                                 module=main software=0.32.3 block=10 p2p=7
node0    | I[2019-10-01|05:04:32.040] Starting Node                                module=main impl=Node
node0    | I[2019-10-01|05:04:32.106] Started node                                 module=main nodeInfo="{ProtocolVersion:{P2P:7 Block:10 App:1} ID_:c088765b2d766568da7b85768d9ecce6afe16c68 ListenAddr:tcp://0.0.0.0:26656 Network:chain-rQyJWz Version:0.32.3 Channels:4020212223303800 Moniker:70E3E28BF8899642 Other:{TxIndex:on RPCAddress:tcp://127.0.0.1:26657}}"
node0    | E[2019-10-01|05:04:32.161] dialing failed (attempts: 1): dial tcp 192.167.10.5:26656: connect: connection refused module=pex addr=118c061c6b79cb250beb...@192.167.10.5:26656
node0    | E[2019-10-01|05:04:32.161] dialing failed (attempts: 1): dial tcp 192.167.10.4:26656: connect: connection refused module=pex addr=642b5440de365c62b8ab...@192.167.10.4:26656
node0    | E[2019-10-01|05:04:32.161] dialing failed (attempts: 1): dial tcp 192.167.10.3:26656: connect: connection refused module=pex addr=dd5c20e0120391faa2ae...@192.167.10.3:26656
node0    | E[2019-10-01|05:04:32.161] dialing failed (attempts: 1): dial tcp 192.167.10.6:26656: connect: connection refused module=pex addr=15cbf28e987333b72908...@192.167.10.6:26656
node2    | I[2019-10-01|05:04:32.299] Starting Node                                module=main impl=Node
node2    | I[2019-10-01|05:04:32.381] Started node                                 module=main nodeInfo="{ProtocolVersion:{P2P:7 Block:10 App:1} ID_:642b5440de365c62b8ab30ea2f57ff958bee5eaa ListenAddr:tcp://0.0.0.0:26656 Network:chain-rQyJWz Version:0.32.3 Channels:4020212223303800 Moniker:2DBC60526E1EC269 Other:{TxIndex:on RPCAddress:tcp://127.0.0.1:26657}}"
node2    | E[2019-10-01|05:04:32.382] dialing failed (attempts: 1): dial tcp 192.167.10.5:26656: connect: connection refused module=pex addr=118c061c6b79cb250beb...@192.167.10.5:26656
node2    | E[2019-10-01|05:04:32.382] dialing failed (attempts: 1): dial tcp 192.167.10.6:26656: connect: connection refused module=pex addr=15cbf28e987333b72908...@192.167.10.6:26656
node2    | E[2019-10-01|05:04:32.383] dialing failed (attempts: 1): dial tcp 192.167.10.3:26656: connect: connection refused module=pex addr=dd5c20e0120391faa2ae...@192.167.10.3:26656
node2    | E[2019-10-01|05:04:32.399] Error dialing peer                           module=p2p err="dial tcp 192.167.10.6:26656: connect: connection refused"
node1    | I[2019-10-01|05:04:32.604] Version info                                 module=main software=0.32.3 block=10 p2p=7
node4    | I[2019-10-01|05:04:32.604] Version info                                 module=main software=0.32.3 block=10 p2p=7
node3    | I[2019-10-01|05:04:32.639] Version info                                 module=main software=0.32.3 block=10 p2p=7
node1    | I[2019-10-01|05:04:32.808] Starting Node                                module=main impl=Node
node4    | I[2019-10-01|05:04:32.809] Starting Node                                module=main impl=Node
node3    | I[2019-10-01|05:04:32.834] Starting Node                                module=main impl=Node
node4    | I[2019-10-01|05:04:32.914] Started node                                 module=main nodeInfo="{ProtocolVersion:{P2P:7 Block:10 App:1} ID_:15cbf28e987333b72908640fd392e89866cd0e2e ListenAddr:tcp://0.0.0.0:26656 Network:chain-rQyJWz Version:0.32.3 Channels:4020212223303800 Moniker:F5D7F171FBCF7218 Other:{TxIndex:on RPCAddress:tcp://127.0.0.1:26657}}"
node1    | I[2019-10-01|05:04:32.914] Started node                                 module=main nodeInfo="{ProtocolVersion:{P2P:7 Block:10 App:1} ID_:dd5c20e0120391faa2ae41ca920c744cfef61b7a ListenAddr:tcp://0.0.0.0:26656 Network:chain-rQyJWz Version:0.32.3 Channels:4020212223303800 Moniker:996E62F8069CB345 Other:{TxIndex:on RPCAddress:tcp://127.0.0.1:26657}}"
node1    | E[2019-10-01|05:04:32.921] dialing failed (attempts: 1): duplicate ID<15cbf28e987333b72908640fd392e89866cd0e2e> module=pex addr=15cbf28e987333b72908...@192.167.10.6:26656
node4    | E[2019-10-01|05:04:32.921] Stopping peer for error                      module=p2p peer="Peer{MConn{192.167.10.3:38924} dd5c20e0120391faa2ae41ca920c744cfef61b7a in}" err=EOF
node4    | E[2019-10-01|05:04:32.921] MConnection flush failed                     module=p2p peer=dd5c20e0120391faa2ae...@192.167.10.3:38924 err="write tcp 192.167.10.6:26656->192.167.10.3:38924: use of closed network connection"
node3    | I[2019-10-01|05:04:32.933] Started node                                 module=main nodeInfo="{ProtocolVersion:{P2P:7 Block:10 App:1} ID_:118c061c6b79cb250beb6ad58cf1cae0c2ed479e ListenAddr:tcp://0.0.0.0:26656 Network:chain-rQyJWz Version:0.32.3 Channels:4020212223303800 Moniker:47B060EE54A0076E Other:{TxIndex:on RPCAddress:tcp://127.0.0.1:26657}}"
node3    | E[2019-10-01|05:04:32.937] dialing failed (attempts: 1): duplicate ID<dd5c20e0120391faa2ae41ca920c744cfef61b7a> module=pex addr=dd5c20e0120391faa2ae...@192.167.10.3:26656
node2    | I[2019-10-01|05:04:35.083] Executed block                               module=state height=1 validTxs=0 invalidTxs=0
node3    | I[2019-10-01|05:04:35.083] Executed block                               module=state height=1 validTxs=0 invalidTxs=0
node0    | I[2019-10-01|05:04:35.121] Executed block                               module=state height=1 validTxs=0 invalidTxs=0
node1    | I[2019-10-01|05:04:35.121] Executed block                               module=state height=1 validTxs=0 invalidTxs=0
node4    | I[2019-10-01|05:04:35.121] Executed block                               module=state height=1 validTxs=0 invalidTxs=0
node2    | I[2019-10-01|05:04:35.165] Committed state                              module=state height=1 txs=0 appHash=0000000000000000
node3    | I[2019-10-01|05:04:35.165] Committed state                              module=state height=1 txs=0 appHash=0000000000000000
node4    | I[2019-10-01|05:04:35.207] Committed state                              module=state height=1 txs=0 appHash=0000000000000000
node0    | I[2019-10-01|05:04:35.207] Committed state                              module=state height=1 txs=0 appHash=0000000000000000
node1    | I[2019-10-01|05:04:35.207] Committed state                              module=state height=1 txs=0 appHash=0000000000000000
node1    | I[2019-10-01|05:04:37.152] Executed block                               module=state height=2 validTxs=0 invalidTxs=0
node4    | I[2019-10-01|05:04:37.240] Executed block                               module=state height=2 validTxs=0 invalidTxs=0
node3    | I[2019-10-01|05:04:37.285] Executed block                               module=state height=2 validTxs=0 invalidTxs=0
node2    | I[2019-10-01|05:04:37.288] Executed block                               module=state height=2 validTxs=0 invalidTxs=0
node1    | I[2019-10-01|05:04:37.288] Committed state                              module=state height=2 txs=0 appHash=0000000000000000
node4    | I[2019-10-01|05:04:37.324] Committed state                              module=state height=2 txs=0 appHash=0000000000000000
node0    | I[2019-10-01|05:04:37.326] Executed block                               module=state height=2 validTxs=0 invalidTxs=0
node3    | I[2019-10-01|05:04:37.374] Committed state                              module=state height=2 txs=0 appHash=0000000000000000
node2    | I[2019-10-01|05:04:37.374] Committed state                              module=state height=2 txs=0 appHash=0000000000000000
node0    | I[2019-10-01|05:04:37.449] Committed state                              module=state height=2 txs=0 appHash=0000000000000000
node1    | I[2019-10-01|05:04:39.292] Executed block                               module=state height=3 validTxs=0 invalidTxs=0
node2    | I[2019-10-01|05:04:39.323] Executed block                               module=state height=3 validTxs=0 invalidTxs=0
node3    | I[2019-10-01|05:04:39.324] Executed block                               module=state height=3 validTxs=0 invalidTxs=0
node0    | I[2019-10-01|05:04:39.325] Executed block                               module=state height=3 validTxs=0 invalidTxs=0
node4    | I[2019-10-01|05:04:39.326] Executed block                               module=state height=3 validTxs=0 invalidTxs=0
node1    | I[2019-10-01|05:04:39.386] Committed state                              module=state height=3 txs=0 appHash=0000000000000000
node4    | I[2019-10-01|05:04:39.422] Committed state                              module=state height=3 txs=0 appHash=0000000000000000
node3    | I[2019-10-01|05:04:39.422] Committed state                              module=state height=3 txs=0 appHash=0000000000000000
node0    | I[2019-10-01|05:04:39.422] Committed state                              module=state height=3 txs=0 appHash=0000000000000000
node2    | I[2019-10-01|05:04:39.422] Committed state                              module=state height=3 txs=0 appHash=0000000000000000
node0    | I[2019-10-01|05:04:41.259] Executed block                               module=state height=4 validTxs=0 invalidTxs=0
node1    | I[2019-10-01|05:04:41.261] Executed block                               module=state height=4 validTxs=0 invalidTxs=0
node3    | I[2019-10-01|05:04:41.299] Executed block                               module=state height=4 validTxs=0 invalidTxs=0
node2    | I[2019-10-01|05:04:41.300] Executed block                               module=state height=4 validTxs=0 invalidTxs=0
node4    | I[2019-10-01|05:04:41.303] Executed block                               module=state height=4 validTxs=0 invalidTxs=0
node1    | I[2019-10-01|05:04:41.345] Committed state                              module=state height=4 txs=0 appHash=0000000000000000
node0    | I[2019-10-01|05:04:41.345] Committed state                              module=state height=4 txs=0 appHash=0000000000000000
node3    | I[2019-10-01|05:04:41.404] Committed state                              module=state height=4 txs=0 appHash=0000000000000000
node4    | I[2019-10-01|05:04:41.404] Committed state                              module=state height=4 txs=0 appHash=0000000000000000
node2    | I[2019-10-01|05:04:41.404] Committed state                              module=state height=4 txs=0 appHash=0000000000000000

Can you help me with that?? Thanks in advance.

Kyle Kingsbury

unread,
Oct 4, 2019, 8:56:49 AM10/4/19
to ta...@jepsen.io
Again, I don't really know how docker-compose works, so I can't help you there, but I can say that the error message you're hitting suggests that the node "n1" isn't resolvable from your control node. Did you name your DB nodes something different?

--
You received this message because you are subscribed to the Google Groups "Jepsen Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to talk+uns...@jepsen.io.
To view this discussion on the web visit https://groups.google.com/a/jepsen.io/d/msgid/talk/30c15a59-fbd4-4f65-8c52-55ea07fe3b3c%40jepsen.io.

Abiramie Shree TGR

unread,
Oct 4, 2019, 9:05:15 AM10/4/19
to Jepsen Talk
No i haven't changed any names. I simply download the tendermint repository and tried to run the lein run test command.

Thanks for your quick response.
node0    | E[2019-10-01|05:04:32.161] dialing failed (attempts: 1): dial tcp 192.167.10.5:26656: connect: connection refused module=pex addr=118c061c6b79cb250beb6ad58cf1cae0...@192.167.10.5:26656
node0    | E[2019-10-01|05:04:32.161] dialing failed (attempts: 1): dial tcp 192.167.10.4:26656: connect: connection refused module=pex addr=642b5440de365c62b8ab30ea2f57ff95...@192.167.10.4:26656
node0    | E[2019-10-01|05:04:32.161] dialing failed (attempts: 1): dial tcp 192.167.10.3:26656: connect: connection refused module=pex addr=dd5c20e0120391faa2ae41ca920c744c...@192.167.10.3:26656
node0    | E[2019-10-01|05:04:32.161] dialing failed (attempts: 1): dial tcp 192.167.10.6:26656: connect: connection refused module=pex addr=15cbf28e987333b72908640fd392e898...@192.167.10.6:26656
node2    | I[2019-10-01|05:04:32.299] Starting Node                                module=main impl=Node
node2    | I[2019-10-01|05:04:32.381] Started node                                 module=main nodeInfo="{ProtocolVersion:{P2P:7 Block:10 App:1} ID_:642b5440de365c62b8ab30ea2f57ff958bee5eaa ListenAddr:tcp://0.0.0.0:26656 Network:chain-rQyJWz Version:0.32.3 Channels:4020212223303800 Moniker:2DBC60526E1EC269 Other:{TxIndex:on RPCAddress:tcp://127.0.0.1:26657}}"
node2    | E[2019-10-01|05:04:32.382] dialing failed (attempts: 1): dial tcp 192.167.10.5:26656: connect: connection refused module=pex addr=118c061c6b79cb250beb6ad58cf1cae0...@192.167.10.5:26656
node2    | E[2019-10-01|05:04:32.382] dialing failed (attempts: 1): dial tcp 192.167.10.6:26656: connect: connection refused module=pex addr=15cbf28e987333b72908640fd392e898...@192.167.10.6:26656
node2    | E[2019-10-01|05:04:32.383] dialing failed (attempts: 1): dial tcp 192.167.10.3:26656: connect: connection refused module=pex addr=dd5c20e0120391faa2ae41ca920c744c...@192.167.10.3:26656
node2    | E[2019-10-01|05:04:32.399] Error dialing peer                           module=p2p err="dial tcp 192.167.10.6:26656: connect: connection refused"
node1    | I[2019-10-01|05:04:32.604] Version info                                 module=main software=0.32.3 block=10 p2p=7
node4    | I[2019-10-01|05:04:32.604] Version info                                 module=main software=0.32.3 block=10 p2p=7
node3    | I[2019-10-01|05:04:32.639] Version info                                 module=main software=0.32.3 block=10 p2p=7
node1    | I[2019-10-01|05:04:32.808] Starting Node                                module=main impl=Node
node4    | I[2019-10-01|05:04:32.809] Starting Node                                module=main impl=Node
node3    | I[2019-10-01|05:04:32.834] Starting Node                                module=main impl=Node
node4    | I[2019-10-01|05:04:32.914] Started node                                 module=main nodeInfo="{ProtocolVersion:{P2P:7 Block:10 App:1} ID_:15cbf28e987333b72908640fd392e89866cd0e2e ListenAddr:tcp://0.0.0.0:26656 Network:chain-rQyJWz Version:0.32.3 Channels:4020212223303800 Moniker:F5D7F171FBCF7218 Other:{TxIndex:on RPCAddress:tcp://127.0.0.1:26657}}"
node1    | I[2019-10-01|05:04:32.914] Started node                                 module=main nodeInfo="{ProtocolVersion:{P2P:7 Block:10 App:1} ID_:dd5c20e0120391faa2ae41ca920c744cfef61b7a ListenAddr:tcp://0.0.0.0:26656 Network:chain-rQyJWz Version:0.32.3 Channels:4020212223303800 Moniker:996E62F8069CB345 Other:{TxIndex:on RPCAddress:tcp://127.0.0.1:26657}}"
node1    | E[2019-10-01|05:04:32.921] dialing failed (attempts: 1): duplicate ID<15cbf28e987333b72908640fd392e89866cd0e2e> module=pex addr=15cbf28e987333b72908640fd392e898...@192.167.10.6:26656
node4    | E[2019-10-01|05:04:32.921] Stopping peer for error                      module=p2p peer="Peer{MConn{192.167.10.3:38924} dd5c20e0120391faa2ae41ca920c744cfef61b7a in}" err=EOF
node4    | E[2019-10-01|05:04:32.921] MConnection flush failed                     module=p2p peer=dd5c20e0120391faa2ae41ca920c744c...@192.167.10.3:38924 err="write tcp 192.167.10.6:26656->192.167.10.3:38924: use of closed network connection"
node3    | I[2019-10-01|05:04:32.933] Started node                                 module=main nodeInfo="{ProtocolVersion:{P2P:7 Block:10 App:1} ID_:118c061c6b79cb250beb6ad58cf1cae0c2ed479e ListenAddr:tcp://0.0.0.0:26656 Network:chain-rQyJWz Version:0.32.3 Channels:4020212223303800 Moniker:47B060EE54A0076E Other:{TxIndex:on RPCAddress:tcp://127.0.0.1:26657}}"
node3    | E[2019-10-01|05:04:32.937] dialing failed (attempts: 1): duplicate ID<dd5c20e0120391faa2ae41ca920c744cfef61b7a> module=pex addr=dd5c20e0120391faa2ae41ca920c744c...@192.167.10.3:26656
To unsubscribe from this group and stop receiving emails from it, send an email to ta...@jepsen.io.

Kyle Kingsbury

unread,
Oct 4, 2019, 9:07:08 AM10/4/19
to ta...@jepsen.io
So your DB nodes are named "n1", "n2", etc, but you can't resolve them?

node0    | E[2019-10-01|05:04:32.161] dialing failed (attempts: 1): dial tcp 192.167.10.5:26656: connect: connection refused module=pex addr=118c061c6b79cb250beb...@192.167.10.5:26656
node0    | E[2019-10-01|05:04:32.161] dialing failed (attempts: 1): dial tcp 192.167.10.4:26656: connect: connection refused module=pex addr=642b5440de365c62b8ab...@192.167.10.4:26656
node0    | E[2019-10-01|05:04:32.161] dialing failed (attempts: 1): dial tcp 192.167.10.3:26656: connect: connection refused module=pex addr=dd5c20e0120391faa2ae...@192.167.10.3:26656
node0    | E[2019-10-01|05:04:32.161] dialing failed (attempts: 1): dial tcp 192.167.10.6:26656: connect: connection refused module=pex addr=15cbf28e987333b72908...@192.167.10.6:26656
node2    | I[2019-10-01|05:04:32.299] Starting Node                                module=main impl=Node
node2    | I[2019-10-01|05:04:32.381] Started node                                 module=main nodeInfo="{ProtocolVersion:{P2P:7 Block:10 App:1} ID_:642b5440de365c62b8ab30ea2f57ff958bee5eaa ListenAddr:tcp://0.0.0.0:26656 Network:chain-rQyJWz Version:0.32.3 Channels:4020212223303800 Moniker:2DBC60526E1EC269 Other:{TxIndex:on RPCAddress:tcp://127.0.0.1:26657}}"
node2    | E[2019-10-01|05:04:32.382] dialing failed (attempts: 1): dial tcp 192.167.10.5:26656: connect: connection refused module=pex addr=118c061c6b79cb250beb...@192.167.10.5:26656
node2    | E[2019-10-01|05:04:32.382] dialing failed (attempts: 1): dial tcp 192.167.10.6:26656: connect: connection refused module=pex addr=15cbf28e987333b72908...@192.167.10.6:26656
node2    | E[2019-10-01|05:04:32.383] dialing failed (attempts: 1): dial tcp 192.167.10.3:26656: connect: connection refused module=pex addr=dd5c20e0120391faa2ae...@192.167.10.3:26656
node2    | E[2019-10-01|05:04:32.399] Error dialing peer                           module=p2p err="dial tcp 192.167.10.6:26656: connect: connection refused"
node1    | I[2019-10-01|05:04:32.604] Version info                                 module=main software=0.32.3 block=10 p2p=7
node4    | I[2019-10-01|05:04:32.604] Version info                                 module=main software=0.32.3 block=10 p2p=7
node3    | I[2019-10-01|05:04:32.639] Version info                                 module=main software=0.32.3 block=10 p2p=7
node1    | I[2019-10-01|05:04:32.808] Starting Node                                module=main impl=Node
node4    | I[2019-10-01|05:04:32.809] Starting Node                                module=main impl=Node
node3    | I[2019-10-01|05:04:32.834] Starting Node                                module=main impl=Node
node4    | I[2019-10-01|05:04:32.914] Started node                                 module=main nodeInfo="{ProtocolVersion:{P2P:7 Block:10 App:1} ID_:15cbf28e987333b72908640fd392e89866cd0e2e ListenAddr:tcp://0.0.0.0:26656 Network:chain-rQyJWz Version:0.32.3 Channels:4020212223303800 Moniker:F5D7F171FBCF7218 Other:{TxIndex:on RPCAddress:tcp://127.0.0.1:26657}}"
node1    | I[2019-10-01|05:04:32.914] Started node                                 module=main nodeInfo="{ProtocolVersion:{P2P:7 Block:10 App:1} ID_:dd5c20e0120391faa2ae41ca920c744cfef61b7a ListenAddr:tcp://0.0.0.0:26656 Network:chain-rQyJWz Version:0.32.3 Channels:4020212223303800 Moniker:996E62F8069CB345 Other:{TxIndex:on RPCAddress:tcp://127.0.0.1:26657}}"
node1    | E[2019-10-01|05:04:32.921] dialing failed (attempts: 1): duplicate ID<15cbf28e987333b72908640fd392e89866cd0e2e> module=pex addr=15cbf28e987333b72908...@192.167.10.6:26656
node4    | E[2019-10-01|05:04:32.921] Stopping peer for error                      module=p2p peer="Peer{MConn{192.167.10.3:38924} dd5c20e0120391faa2ae41ca920c744cfef61b7a in}" err=EOF
node4    | E[2019-10-01|05:04:32.921] MConnection flush failed                     module=p2p peer=dd5c20e0120391faa2ae...@192.167.10.3:38924 err="write tcp 192.167.10.6:26656->192.167.10.3:38924: use of closed network connection"
node3    | I[2019-10-01|05:04:32.933] Started node                                 module=main nodeInfo="{ProtocolVersion:{P2P:7 Block:10 App:1} ID_:118c061c6b79cb250beb6ad58cf1cae0c2ed479e ListenAddr:tcp://0.0.0.0:26656 Network:chain-rQyJWz Version:0.32.3 Channels:4020212223303800 Moniker:47B060EE54A0076E Other:{TxIndex:on RPCAddress:tcp://127.0.0.1:26657}}"
node3    | E[2019-10-01|05:04:32.937] dialing failed (attempts: 1): duplicate ID<dd5c20e0120391faa2ae41ca920c744cfef61b7a> module=pex addr=dd5c20e0120391faa2ae...@192.167.10.3:26656
To unsubscribe from this group and stop receiving emails from it, send an email to talk+uns...@jepsen.io.
To view this discussion on the web visit https://groups.google.com/a/jepsen.io/d/msgid/talk/2b4e57a6-af3b-49da-a6ab-04b23c762974%40jepsen.io.

Abiramie Shree TGR

unread,
Oct 4, 2019, 9:15:41 AM10/4/19
to Jepsen Talk
actually i haven't modified any files from the repository. In validator.cji they specified two nodes as n1 and n2
validator.clj code
(ns jepsen.tendermint.validator
  {:lang :core.typed
   :doc "Supports validator set configuration and changes."}
  (:require [clojure.set :as set]
            [clojure.tools.logging :refer [info warn]]
            [clojure.pprint :refer [pprint]]
            [clojure.core.typed :as t]
            [cheshire.core :as json]
            [dom-top.core :as dt]
            [jepsen.tendermint [client :as tc]
                               [util :refer [base-dir]]]
            [jepsen [util :as util :refer [map-vals]]
                    [control :as c]
                    [client :as client]
                    [nemesis :as nemesis]
                    [generator :as gen]])
  (:import (clojure.tools.logging.impl LoggerFactory Logger)
           (clojure.lang Namespace
                         Symbol)))

; Type support
(defmacro tk
  "Typechecked keyword function. Returns the given keyword, but tells
  core.typed it's a function of [m -> v]."
  [kw m v]
  `(t/ann-form ~kw [~m ~'-> ~v]))

(defmacro tmfn
  "Typed map fn. Core.typed doesn't know (Map a b) is also the fn [a -> (Option
  b)], so we have to tell it."
  [m K V]
  `(t/fn [k# :- ~K] :- (t/Option ~V)
     (get ~m k#)))


; Domain types

(t/defalias Node
  "Jepsen nodes are strings."
  String)

(t/defalias Test
  "Jepsen tests have nodes and a current validator atom."
  (HMap :mandatory {:nodes            (t/NonEmptyVec  Node)
                    :validator-config (t/Atom1        Config)}
        :optional {:dup-validators              Boolean
                   :max-byzantine-vote-fraction Number
                   :super-byzantine-validators  Boolean}))

(t/defalias Version
  "Tendermint cluster version numbers."
  Long)

(t/defalias ShortKey
  "In some places, Tendermint represents keys only by their raw data."
  String)

(t/defalias Key
  "A key is a map with :type and :data. Tendermint uses this to represent
  public and private keys in validators"
  (HMap :mandatory {:type String
                    :data ShortKey}
        :complete? true))

(t/defalias GenValidator
  "The structure of a validator as generated by tendermint, and stored in
  priv_validator.json. Does not include votes."
  (HMap :mandatory {:address  String
                    :pub_key  Key
                    :priv_key Key}))

(t/defalias Validator
  "A Validator's complete structure, including both votes and information
  necessary to construct priv_validator.json & genesis.json."
  (HMap :mandatory {:address  String
                    :pub_key  Key
                    :priv_key Key
                    :votes    Long}))

(t/defalias Config
  "A configuration represents a definite state of the cluster: the validators
  which are a part of the cluster, what nodes are running what validators, the
  version number of the config in tendermint, the nodes that are in the test,
  etc.

  :prospective-validators is used to track validators we *try* to add to the
  cluster, but which haven't *actually* been added yet."
  (HMap :mandatory {:version                      Version
                    :node-set                     (t/Set Node)
                    :nodes                        (t/Map Node Key)
                    :validators                   (t/Map Key Validator)
                    :prospective-validators       (t/Map Key Validator)
                    :max-byzantine-vote-fraction  Number
                    :super-byzantine-validators   Boolean}))

(t/defalias TendermintValidator
  "The cluster's representation of a validator."
  (t/HMap :mandatory {:pub_key ShortKey
                      :power   Long}))

(t/defalias TendermintValidatorSet
  "The cluster's representation of a validator set."
  (t/HMap :mandatory {:validators (t/Coll TendermintValidator)
                      :version     Version}))

(t/defalias CreateTransition
  "Create an instance of a validator on a node"
  (t/HMap :mandatory {:type       (t/Val :create)
                      :node       Node
                      :validator  Validator}
          :complete? true))

(t/defalias DestroyTransition
  "Destroy an instance of a validator"
  (t/HMap :mandatory {:type (t/Val :destroy)
                      :node Node}
          :complete? true))

(t/defalias AddTransition
  "Add a new validator to the config"
  (t/HMap :mandatory {:type       (t/Val :add)
                      :version    Version
                      :validator  Validator}
          :complete? true))

(t/defalias RemoveTransition
  "Remove a validator from the config"
  (t/HMap :mandatory {:type     (t/Val :remove)
                      :version  Version
                      :pub_key  Key}
          :complete? true))

(t/defalias AlterVotesTransition
  "Change the votes allocated to a validator"
  (t/HMap :mandatory {:type     (t/Val :alter-votes)
                      :version  Version
                      :pub_key  Key
                      :votes    Long}
          :complete? true))

(t/defalias Transition (t/U CreateTransition
                            DestroyTransition
                            AddTransition
                            RemoveTransition
                            AlterVotesTransition))

; External types

(t/ann jepsen.control/*dir* String)
(t/ann jepsen.tendermint.util/base-dir String)

(t/ann ^:no-check clojure.core/update
       (t/All [m k v v' arg ...]
              (t/IFn
                [m k [v arg ... arg -> v'] arg ... arg -> (t/Assoc m k v')])))

(t/ann ^:no-check clojure.tools.logging/*logger-factory* LoggerFactory)

(t/ann ^:no-check clojure.tools.logging.impl/get-logger
       [LoggerFactory (t/U clojure.lang.Symbol Namespace)
              -> clojure.tools.logging.impl.Logger])

(t/ann ^:no-check clojure.tools.logging.impl/enabled?
       [Logger t/Keyword -> Boolean])

(t/ann ^:no-check clojure.tools.logging/log*
       [Logger t/Keyword (t/U Throwable nil) String -> nil])

(t/ann ^:no-check jepsen.util/map-vals
       (t/All [k v1 v2]
              [[v1 -> v2] (t/Map k v1) -> (t/Map k v2)]))

(t/ann ^:no-check jepsen.control/on-nodes
       (t/All [res]
              (t/IFn [Test [Test Node -> res]
                      -> (t/Map Node res)]
                     [Test (t/NonEmptyColl Node) [Test Node -> res]
                      -> (t/I (t/Map Node res)
                              (t/NonEmptySeqable
                                (clojure.lang.AMapEntry Node res)))])))

(t/ann ^:no-check jepsen.control/expand-path [String -> String])

(t/ann ^:no-check jepsen.control/exec [t/Any * -> String])

(t/ann ^:no-check cheshire.core/parse-string [String true ->
                                              (t/Map t/Keyword t/Any)])

(t/ann jepsen.tendermint.client/validator-set
       [Node -> TendermintValidatorSet])

; A regression in core.typed breaks occurrence typing for locals (!?), so we
; can only convince the type system of filters using function args.
(t/ann conform-map [t/Any -> (t/Map t/Any t/Any)])
(defn conform-map
  [x]
  (assert (map? x))
  x)

(t/ann conform-string [t/Any -> String])
(defn conform-string
  [x]
  (assert (string? x))
  x)

(t/ann conform-long [t/Any -> Long])
(defn conform-long
  [x]
  (assert (instance? Long x))
  x)

(t/ann conform-key [t/Any -> Key])
(defn conform-key
  [x]
  (let [m (conform-map x)]
    {:type (conform-string (:type m))
     :data (conform-string (:data m))}))

(t/ann conform-gen-validator [t/Any -> GenValidator])
(defn conform-gen-validator
  [x]
  (let [m (conform-map x)]
    {:address   (conform-string (:address x))
     :pub_key   (conform-key    (:pub_key x))
     :priv_key  (conform-key    (:priv_key x))}))

; OK, let's begin

(t/ann nodes-running-validators [Config -> (t/Map Key (t/Coll Node))])
(defn nodes-running-validators
  "Takes a config, yielding a map of validator keys to groups of nodes that run
  that validator."
  [config]
  (->> (:nodes config)
       (reduce (t/fn [m               :- (t/Map Key (t/Vec Node))
                      [node pub-key]  :- '[Node Key]]
                   (assoc m pub-key (conj (get m pub-key []) node)))
               {})))

(t/ann ^:no-check byzantine-validators [Config -> (t/Coll Validator)])
(defn byzantine-validators
  "A collection of all validators in the validator set which are running on
  more than one node."
  [config]
  (->> (nodes-running-validators config)
       (filter (t/fn [[key nodes] :- '[Key (t/Coll Node)]]
                 (< 1 (count nodes))))
       (map key)
       (keep (tmfn (:validators config) Key Validator))))


(t/ann initial-validator-votes [Config -> (t/Map Key Long)])
(defn initial-validator-votes
  "Takes a config. Computes a map of validator public keys to votes. When there
  are byzantine validators and the config has :super-byzantine-validators
  enabled, allocates just shy of 2/3 votes to the byzantine validator.
  Otherwise, allocates just shy of 1/3 votes to the byzantine validator."
  [config]
  (if-let [bs (seq (byzantine-validators config))]
    (do (assert (= 1 (count bs))
                "Only know how to deal with 1 or 0 byzantine validators")
        (let [b (:pub_key (first bs))
              n (count (:validators config))]
          ; For super dup validators, we want the dup validator key to have
          ; just shy of 2/3 voting power. That means the sum of the normal
          ; nodes weights should be just over 1/3, so that the remaining node
          ; can make up just under 2/3rds of the votes by itself. Let a normal
          ; node's weight be 2. Then 2(n-1) is the combined voting power of the
          ; normal bloc. We can then choose 4(n-1) - 1 as the weight for the
          ; dup validator. The total votes are
          ;
          ;    2(n-1) + 4(n-1) - 1
          ;  = 6(n-1) - 1
          ;
          ; which implies a single dup node has fraction...
          ;
          ;    (4(n-1) - 1) / (6(n-1) - 1)
          ;
          ; which approaches 2/3 from 0 for n = 1 -> infinity, and if a single
          ; regular node is added to a duplicate node, a 2/3+ majority is
          ; available for all n >= 1.
          ;
          ; For regular dup validators, let an individual node have weight 2.
          ; The total number of individual votes is 2(n-1), which should be
          ; just larger than twice the number of dup votes, e.g:
          ;
          ;     2(n-1) = 2d + e
          ;
          ; where e is some small positive integer, and d is the number of dup
          ; votes. Solving for d:
          ;
          ;     (2(n-1) - e) / 2 = d
          ;          n - 1 - e/2 = d    ; Choose e = 2
          ;                n - 2 = d
          ;
          ; The total number of votes is therefore:
          ;
          ;     2(n-1) + n - 2
          ;   = 3n - 4
          ;
          ; So a dup validator alone has vote fraction:
          ;
          ;     (n - 2) / (3n - 4)
          ;
          ; which is always under 1/3. And with a single validator, it has vote
          ; fraction:
          ;
          ;     (n - 2) + 2 / (3n - 4)
          ;   =           n / (3n - 4)
          ;
          ; which is always over 1/3.
          (let [base-votes (zipmap (remove #{b} (keys (:validators config)))
                                   (repeat 2))
                byz-votes  {b (conform-long
                                (if (:super-byzantine-validators config)
                                  (dec (* 4 (dec n)))
                                  (- n 2)))}]
            (t/ann-form base-votes (t/Map Key Long))
            (merge base-votes byz-votes))))

    ; Default case: no byzantine validator, everyone has 2 votes.
    (zipmap (keys (:validators config)) (repeat 2))))

(t/ann with-initial-validator-votes [Config -> Config])
(defn with-initial-validator-votes
  "Takes a config, computes the correct distribution of initial validator
  votes, and assigns those votes to validators, returning the resulting
  config."
  [config]
  (let [votes (initial-validator-votes config)
        validators (reduce (t/fn [m :- (t/Map Key Validator)
                                  [k votes] :- '[Key Long]]
                             (let [v (get m k)]
                               (assert v)
                               (assoc m k (assoc v :votes votes))))
                           (:validators config)
                           (initial-validator-votes config))]
    (assoc config :validators validators)))

(t/ann gen-validator [-> GenValidator])
(defn gen-validator
  "Generate a new validator structure, and return the validator's data as a
  map."
  []
  (conform-gen-validator
    (c/cd base-dir
          (-> (c/exec "./tendermint" :--home base-dir :gen_validator)
              (json/parse-string true)))))

(t/ann augment-gen-validator [GenValidator -> Validator])
(defn augment-gen-validator
  "Takes a GenValidator, as generated by tendermint, and adds :votes to make it
  a complete representation of a Validator."
  [v]
  (assoc v :votes 2))

(t/ann config [(HMap :optional {:version    Version
                                :node-set   (t/Set Node)
                                :nodes      (t/Map Node Key)
                                :validators (t/Map Key Validator)
                                :super-byzantine-validators Boolean
                                :max-byzantine-vote-fraction Number})
                 -> Config])
(defn config
  "There are two pieces of state we need to handle. The first is the validator
  set, as known to the cluster, which maps public keys to maps like:

      {:address
       :pub_key {:type ...
                 :data ...}
       :priv_key {:type ...
                  :data ...}
       :votes    an-int}

  And the second is a map of nodes to the validator key they're running:

      {\"n1\" \"ABCD...\"
       ...}

  Additionally, we need a bound :max-byzantine-vote-fraction on the fraction of
  the vote any byzantine validator is allowed to control, a :version, denoting
  the version of the validator set that the cluster knows, and a :node-set, the
  set of nodes that exist."
  [opts]
  (merge {:validators             {}
          :nodes                  {}
          :node-set               #{}
          :version                -1
          :max-byzantine-vote-fraction 1/3
          :super-byzantine-validators false}
         opts
         {:prospective-validators {}}))

(t/ann initial-config [Test -> Config])
(defn initial-config
  "Constructs an initial configuration for a test with a list of :nodes
  provided."
  [test]
  (let [; Generate a validator for every node
        validators (c/with-test-nodes test
                     (augment-gen-validator (gen-validator)))
        ; Map of nodes to validators
        nodes (map-vals (tk :pub_key Validator Key) validators)
        ; Map of validator keys to validators
        validators (reduce (t/fn [m         :- (t/Map Key Validator)
                                  [node v]  :- '[Node Validator]]
                             (assoc m (:pub_key v) v))
                           {}
                           validators)

        ; If we're working with dup validators, run the second validator on 2
        ; nodes and drop the first.
        [n1 n2]     (:nodes test)
        validators  (if (:dup-validators test)
                      (let [v1 (get nodes n1)]
                        (assert v1)
                        (dissoc validators v1))
                      validators)
        nodes       (if (:dup-validators test)
                      (let [v2 (get validators (get nodes n2))]
                        (assert v2)
                        (assoc nodes n1 (:pub_key v2)))
                      nodes)]
    (t/ann-form validators (t/Map Key Validator))
    (-> {:validators validators
         :nodes      nodes
         :node-set   (set (:nodes test))
         :super-byzantine-validators (:super-byzantine-validators test false)
         :max-byzantine-vote-fraction (:max-byzantine-vote-fraction test 1/3)}
        config
        with-initial-validator-votes)))

(t/ann genesis [Config -> t/Any])
(defn genesis
  "Computes a genesis.json structure for the given config."
  [config]
  {:app_hash      ""
   :chain_id      "jepsen"
   :genesis_time  "0001-01-01T00:00:00.000Z"
   :validators    (->> (:validators config)
                       vals
                       (map (t/fn [validator :- Validator]
                              (let [pub-key (:pub_key validator)
                                    name (->> (:nodes config)
                                              (filter
                                                (t/fn [[_ v] :- '[t/Any Key]]
                                                   (= v pub-key)))
                                              first)
                                    _ (assert name)
                                    name (key name)]
                                {:amount  (:votes validator)
                                 :name    name
                                 :pub_key pub-key}))))})

(t/ann pub-key-on-node [Config Node -> (t/Option Key)])
(defn pub-key-on-node
  "What pubkey is running on a given node?"
  [config node]
  (-> config :nodes (get node)))

(t/ann total-votes [Config -> Number])
(defn total-votes
  "How many votes are in the validator set total?"
  [config]
  (->> (:validators config)
       vals
       (map (tk :votes Validator Number))
       (reduce + 0)))

(t/ann compact-key [Key -> ShortKey])
(defn compact-key
  "A compact, lossy, human-friendly representation of a validator key."
  [k]
  (subs (:data k) 0 5))

(t/ann compact-config [Config -> (HMap)])
(defn compact-config
  "Just the essentials, please. Compacts a config into a human-readable,
  limited representation for debugging."
  [c]
  {:version (:version c)
   :total-votes (total-votes c)
   :prospective-validators (->> (:prospective-validators c)
                                (map (t/fn [[k v] :- '[Key Validator]]
                                       (compact-key k)))
                                sort)
   :validators (->> (:validators c)
                    (map (t/fn [pair :- (clojure.lang.AMapEntry Key Validator)]
                           (let [k (key pair)
                                 v (val pair)]
                             [(compact-key k)
                              {:votes (:votes v)}])))
                    (into (sorted-map)))
   :nodes (map-vals compact-key (:nodes c))
   :max-byzantine-vote-fraction (:max-byzantine-vote-fraction c)})



(t/ann vote-fractions [Config -> (t/Map Key Number)])
(defn vote-fractions
  "A map of validator public keys to the fraction of the vote they control."
  [config]
  (let [total (total-votes config)]
    (->> (:validators config)
         (map-vals (t/fn [v :- Validator]
                     (/ (:votes v) total))))))

(t/ann running-validators [Config -> (t/Option (t/Coll Validator))])
(defn running-validators
  "A collection of validators running on at least one node."
  [config]
  (->> (set (vals (:nodes config)))
       (keep (tmfn (:validators config) Key Validator))))

(t/ann ghost-validators [Config -> (t/Coll Validator)])
(defn ghost-validators
  "A collection of validators not running on any node."
  [config]
  (set/difference (set (vals (:validators config)))
                  (set (running-validators config))))

(t/ann byzantine-validator-keys [Config -> (t/Coll Key)])
(defn byzantine-validator-keys
  "A collection of all validator keys in the validator set which are running on
  more than one node."
  [config]
  (map (tk :pub_key Validator Key) (byzantine-validators config)))

(t/ann dup-groups [Config -> (HMap :mandatory {:groups (t/Coll (t/Coll Node))
                                               :singles (t/Coll (t/Coll Node))
                                               :dups    (t/Coll (t/Coll Node))}
                                   :complete? true)])
(defn dup-groups
  "Takes a config. Computes a map of:

      {:groups  A collection of groups of nodes, each running the same validator
       :singles Groups with only one nodes
       :dups    Groups with multiple nodes}"
  [config]
  (let [groups (-> config nodes-running-validators vals)]
    {:groups  groups
     :singles (filter (t/fn [g :- (t/Coll t/Any)] (= 1 (count g))) groups)
     :dups    (filter (t/fn [g :- (t/Coll t/Any)] (< 1 (count g))) groups)}))

(t/ann at-least-one-running-validator? [Config -> Boolean])
(defn at-least-one-running-validator?
  "Does the given config have at least one validator which is running on some
  node?"
  [config]
  (boolean (seq (running-validators config))))

(t/ann omnipotent-byzantines? [Config -> Boolean])
(defn omnipotent-byzantines?
  "Does this config contain any byzantine validator which controls more than
  max-byzantine-vote-fraction of the vote?"
  [config]
  (let [vfs       (vote-fractions config)
        threshold (:max-byzantine-vote-fraction config)]
    (boolean (some (t/fn [k :- Key]
                     (let [vf (get vfs k)]
                       (assert vf (str "No vote fraction for " k))
                       (<= threshold vf)))
                   (byzantine-validator-keys config)))))

(t/ann ghost-limit Long)
(def ghost-limit
  "Ghosts are souls without bodies. How many validators can exist without
  actually running on any node?"
  2)

(t/ann too-many-ghosts? [Config -> Boolean])
(defn too-many-ghosts?
  "Does this config have too many validators which aren't running on any
  nodes?"
  [config]
  (< ghost-limit
     (count
       (set/difference (set (keys (:validators config)))
                       (set (vals (:nodes config)))))))

(t/ann zombie-limit Long)
(def zombie-limit
  "Zombies are bodies without souls. How many nodes can run a validator that's
  not actually a part of the cluster?"
  2)

(t/ann too-many-zombies? [Config -> Boolean])
(defn too-many-zombies?
  "Does this config have too many nodes which are running validators that
  aren't a part of the cluster?"
  [config]
  (< zombie-limit
     (count
       (remove (set (keys (:validators config)))
               (vals (:nodes config))))))

(t/ann quorum Number)
(def quorum
  "What fraction of the configuration's voting power should be
  online and non-byzantine in order to perform operations?"
  2/3)

(t/ann quorum? [Config -> Boolean])
(defn quorum?
  "Does the given configuration provide a quorum of running votes?"
  [config]
  (< quorum (/ (reduce + 0 (map (tk :votes Validator Number)
                                (running-validators config)))
               (total-votes config))))

(t/ann fault-limit Number)
(def fault-limit
  "What fraction of votes can be either byzantine or ghosts?"
  1/3)

(t/ann faulty? [Config -> Boolean])
(defn faulty?
  "Are too many nodes byzantine or down?"
  [config]
  (<= fault-limit
      (/ (reduce + 0 (map (tk :votes Validator Number)
                          (set/union (set (byzantine-validators config))
                                     (set (ghost-validators config)))))
         (total-votes config))))


(t/ann assert-valid [Config -> Config])
(defn assert-valid
  "Ensures that the given config is valid, and returns it. Throws
  AssertError if not."
  [config]
  (assert (at-least-one-running-validator? config))
  (assert (not (omnipotent-byzantines? config)))
  (assert (not (too-many-ghosts? config)))
  (assert (not (too-many-zombies? config)))
  (assert (quorum? config))
  (assert (not (faulty? config)))
  (assert (every? (:node-set config) (keys (:nodes config))))
  (assert (every? pos? (map (tk :votes Validator Number)
                            (vals (:validators config)))))
  config)

; Possible state transitions:
; - Create an instance of a validator on a node
; - Destroy a validator instance on some node

; - Add a validator to the validator set
; - Remove a validator from the config set

; - Adjust the weight of a validator

(t/ann pre-step [Config Transition -> Config])
(defn pre-step
  "Where `step` defines the consequences of an atomic transition, we don't
  actually get to perform all transitions atomically. In particular, when we
  create or delete a validator, we *request* that the system create or delete
  it, but we don't actually *know* whether it will happen until the transaction
  completes. This function transitions a configuration to that in-between
  state."
  [config transition]
  (assert-valid
    (case (:type transition)
      :create       config
      :destroy      config
      :add          (let [v (:validator transition)]
                      (assert (not (get-in config [:validators (:pub_key v)])))
                      (assoc config :prospective-validators
                             (assoc (:prospective-validators config)
                                    (:pub_key v) v)))
      :remove       config
      :alter-votes  config)))

(t/ann post-step [Config Transition -> Config])
(defn post-step
  "Complete a transition once we know it's been executed."
  [config transition]
  (assert-valid
    (case (:type transition)
      ; Create a new validator on a node
      :create (let [n (:node transition)
                    v (:validator transition)]
                (assert (not (get-in config [:nodes n])))
                (assoc config :nodes (assoc (:nodes config) n (:pub_key v))))

      ; Destroy a validator on a node
      :destroy (assoc config :nodes (dissoc (:nodes config)
                                            (:node transition)))


      ; Add a validator to the validator set
      :add (let [v (:validator transition)]
             (assert (not (get-in config [:validators (:pub_key v)])))
             (-> config
                 (assoc :prospective-validators
                        (dissoc (:prospective-validators config)
                                (:pub_key v)))
                 (assoc :validators
                        (assoc (:validators config) (:pub_key v) v))))

      ; Remove a validator from the validator set
      :remove (assoc config :validators
                     (dissoc (:validators config) (:pub_key transition)))

      ; Change the votes allocated to a validator
      :alter-votes (let [k (:pub_key transition)
                         v (:votes transition)
                         validators (:validators config)
                         validator  (get validators k)
                         _ (assert validator)
                         validator' (assoc validator :votes v)
                         validators' (assoc validators k validator')]
                     (assoc config :validators validators')))))

(t/ann step [Config Transition -> Config])
(defn step
  "Apply a low-level state transition to a config, returning a new config.
  Throws if the requested transition is illegal."
  [config transition]
  (-> config
      (pre-step transition)
      (post-step transition)))

(t/ann rand-validator [Config -> Validator])
(defn rand-validator
  "Selects a random validator from the config."
  [config]
  (rand-nth (vals (:validators config))))

(t/ann rand-free-node [Config -> (t/Option Node)])
(defn rand-free-node
  "Selects a random node which isn't running anything."
  [config]
  (when-let [candidates (seq (set/difference (:node-set config)
                                             (set (keys (:nodes config)))))]
    (rand-nth candidates)))

(t/ann rand-taken-node [Config -> (t/Option Node)])
(defn rand-taken-node
  "Selects a random node that's running a validator."
  [config]
  (rand-nth (keys (:nodes config))))

(t/ann rand-transition [Test Config -> Transition])
(defn rand-transition
  "Generates a random transition on the given config."
  [test config]
  (or (condp <= (rand)
        ; Create a new instance of a validator on a node.
        4/5 (let [v (rand-validator config)
                  n (rand-free-node config)]
              (when (and v n)
                {:type      :create
                 :node      n
                 :validator v}))

        ;; Nuke a node
        3/5 (when-let [node (rand-taken-node config)]
              {:type :destroy
               :node node})

        ;; Create a new validator
        2/5 (let [v (-> (c/on-nodes test
                                    [(rand-nth (:nodes test))]
                                    (t/fn [test :- Test, node :- Node]
                                      (gen-validator)))
                        first
                        val
                        (assoc :votes 2))]
              {:type      :add
               :version   (:version config)
               :validator v})

        ;; Remove a validator
        1/5 (let [v (rand-validator config)]
              {:type :remove
               :version (:version config)
               :pub_key (:pub_key v)})

        ; Adjust a node's weight
        0/5 (let [v (rand-validator config)]
              {:type    :alter-votes
               :version (:version config)
               :pub_key (:pub_key v)
               ; Long forces core.typed to admit it's not AnyInteger;
               ; strictly speaking we might go out of bounds
               :votes   (long (max 1 (+ (:votes v) (- (rand-int 11) 5))))}))

      ; We rolled an impossible transition; try again
      (recur test config)))

(t/ann ^:no-check rand-legal-transition [Test Config -> Transition])
(defn rand-legal-transition
  "Generates a random transition on the given config which results in a legal
  state."
  [test config]
  (dt/with-retry [i 0]
    (if (<= 100 i)
      (throw (RuntimeException. (str "Unable to generate state transition from "
                                     (pr-str config)
                                     " in less than 100 tries; aborting."
                                     " Last failure was:")))
      (let [t (rand-transition test config)]
        (step config t)
        t))
    (catch AssertionError e
      (retry (inc i)))))

(t/ann prospective-validator-by-short-key
       [Config ShortKey -> (t/Option Validator)])
(defn prospective-validator-by-short-key
  "Looks up a prospective validator by key data alone; e.g. instead of by
  {:type ...  :data ...}."
  [config pub-key-data]
  (t/loop [validators :- (t/Option (t/NonEmptyASeq Validator))
           (seq (vals (:prospective-validators config)))]
    (when validators
      (if (= pub-key-data (:data (:pub_key (first validators))))
        (first validators)
        (recur (next validators))))))

(t/ann validator-by-short-key [Config ShortKey -> (t/Option Validator)])
(defn validator-by-short-key
  "Looks up a validator by key data alone; e.g. instead of by {:type ... :data
  ...}."
  [config pub-key-data]
  (t/loop [validators :- (t/Option (t/NonEmptyASeq Validator))
           (seq (vals (:validators config)))]
    (when validators
      (if (= pub-key-data (:data (:pub_key (first validators))))
        (first validators)
        (recur (next validators))))))

(t/ann tendermint-validator-set->vote-map
       [Config TendermintValidatorSet -> (t/Map Key Long)])
(defn tendermint-validator-set->vote-map
  "Converts a map of tendermint short keys to tendermint validators into a
  map of full public keys to votes."
  [config validator-set]
  (->> validator-set
       :validators
       (map (t/fn [v :- TendermintValidator]
              (let [short-key (:pub_key v)
                    k (or (validator-by-short-key             config short-key)
                          (prospective-validator-by-short-key config short-key)
                          (throw (IllegalStateException.
                                   (str "Don't recognize cluster validator "
                                        (pr-str v)
                                        "; where did it come from?"))))]
                [(:pub_key k) (:power v)])))
       (into {})))

(t/ann clear-removed-nodes [Config (t/Map Key Long) -> Config])
(defn clear-removed-nodes
  "Takes a config and a map of validator keys to votes, and clears out config
  validators which no longer exist in votes map."
  [config votes]
  (->> (:validators config)
       (filter (t/fn [[k v] :- '[Key Validator]]
                 (contains? votes k)))
       (into {})
       (assoc config :validators)))

(t/ann update-known-nodes [Config (t/Map Key Long) -> Config])
(defn update-known-nodes
  "Takes a config and a map of validator keys to votes, and where a key is
  present in the vote map but is not yet a validator in the config, promotes
  that validator from :prospective-validators to :validators in the config.
  Also updates all votes in the config."
  [config votes]
  (reduce (t/fn [config :- Config, [k v] :- '[Key Long]]
            (let [validators (:validators config)]
              (if-let [validator (get validators k)]
                ; We know this validator already
                (let [validator (assoc validator :votes v)
                      validators (assoc validators k validator)]
                  (assoc config :validators validators))

                ; Promote from prospective-validators
                (let [prospective (:prospective-validators config)
                      validator   (get prospective k)
                      _           (assert validator
                                         (str "Don't recognize validator "
                                              k "; where did it come from?"))
                      validator   (assoc validator :votes v)
                      validators  (assoc validators k validator)
                      prospective (dissoc prospective k)]
                  (assoc config
                         :validators              validators
                         :prospective-validators  prospective)))))
          config
          votes))

(t/ann current-config [Test Node -> Config])
(defn current-config
  "Combines our internal test view of which nodes are running what validators
  with a transactional read of the current state of validator votes, producing
  a config that can be used to generate cluster transitions."
  [test node]
  ; TODO: improve node selection
  (let [local-config   @(:validator-config test)
        cluster-config (tc/validator-set node)
        votes          (tendermint-validator-set->vote-map
                         local-config cluster-config)]
    (-> local-config
        (clear-removed-nodes votes)
        (update-known-nodes votes)
        (assoc :version (:version cluster-config)))))

(t/tc-ignore

(defn refresh-config!
  "Attempts to update the test's config with new information from the cluster.
  Returns our estimate of the current config. Not threadsafe."
  [test]
  ; TODO: make this threadsafe
  (or (reduce (fn [_ node]
                (try
                  (when-let [c (current-config test node)]
                    (reset! (:validator-config test) c)
                    (reduced c))
                  (catch java.io.IOException e
                    ; (info e "unable to fetch current validator set config")
                    nil)))
              nil
              (shuffle (:nodes test)))
      @(:validator-config test)))

(defn generator
  "A generator of legal state transitions on the current validator state."
  []
  (reify gen/Generator
    (op [this test process]
      (try
        (info "refreshing config")
        (let [config (refresh-config! test)]
          (info :config-refreshed)
          (info (with-out-str (pprint config)))
          (info (with-out-str (pprint (compact-config config))))
          {:type  :info
           :f     :transition
           :value (rand-legal-transition test config)})
        (catch Exception e
          (warn e "error generating transition")
          (throw e))))))

)
node0    | E[2019-10-01|05:04:32.161] dialing failed (attempts: 1): dial tcp 192.167.10.5:26656: connect: connection refused module=pex addr=118c061c6b79cb250beb6ad58cf1cae0...@192.167.10.5:26656
node0    | E[2019-10-01|05:04:32.161] dialing failed (attempts: 1): dial tcp 192.167.10.4:26656: connect: connection refused module=pex addr=642b5440de365c62b8ab30ea2f57ff95...@192.167.10.4:26656
node0    | E[2019-10-01|05:04:32.161] dialing failed (attempts: 1): dial tcp 192.167.10.3:26656: connect: connection refused module=pex addr=dd5c20e0120391faa2ae41ca920c744c...@192.167.10.3:26656
node0    | E[2019-10-01|05:04:32.161] dialing failed (attempts: 1): dial tcp 192.167.10.6:26656: connect: connection refused module=pex addr=15cbf28e987333b72908640fd392e898...@192.167.10.6:26656
node2    | I[2019-10-01|05:04:32.299] Starting Node                                module=main impl=Node
node2    | I[2019-10-01|05:04:32.381] Started node                                 module=main nodeInfo="{ProtocolVersion:{P2P:7 Block:10 App:1} ID_:642b5440de365c62b8ab30ea2f57ff958bee5eaa ListenAddr:tcp://0.0.0.0:26656 Network:chain-rQyJWz Version:0.32.3 Channels:4020212223303800 Moniker:2DBC60526E1EC269 Other:{TxIndex:on RPCAddress:tcp://127.0.0.1:26657}}"
node2    | E[2019-10-01|05:04:32.382] dialing failed (attempts: 1): dial tcp 192.167.10.5:26656: connect: connection refused module=pex addr=118c061c6b79cb250beb6ad58cf1cae0...@192.167.10.5:26656
node2    | E[2019-10-01|05:04:32.382] dialing failed (attempts: 1): dial tcp 192.167.10.6:26656: connect: connection refused module=pex addr=15cbf28e987333b72908640fd392e898...@192.167.10.6:26656
node2    | E[2019-10-01|05:04:32.383] dialing failed (attempts: 1): dial tcp 192.167.10.3:26656: connect: connection refused module=pex addr=dd5c20e0120391faa2ae41ca920c744c...@192.167.10.3:26656
node2    | E[2019-10-01|05:04:32.399] Error dialing peer                           module=p2p err="dial tcp 192.167.10.6:26656: connect: connection refused"
node1    | I[2019-10-01|05:04:32.604] Version info                                 module=main software=0.32.3 block=10 p2p=7
node4    | I[2019-10-01|05:04:32.604] Version info                                 module=main software=0.32.3 block=10 p2p=7
node3    | I[2019-10-01|05:04:32.639] Version info                                 module=main software=0.32.3 block=10 p2p=7
node1    | I[2019-10-01|05:04:32.808] Starting Node                                module=main impl=Node
node4    | I[2019-10-01|05:04:32.809] Starting Node                                module=main impl=Node
node3    | I[2019-10-01|05:04:32.834] Starting Node                                module=main impl=Node
node4    | I[2019-10-01|05:04:32.914] Started node                                 module=main nodeInfo="{ProtocolVersion:{P2P:7 Block:10 App:1} ID_:15cbf28e987333b72908640fd392e89866cd0e2e ListenAddr:tcp://0.0.0.0:26656 Network:chain-rQyJWz Version:0.32.3 Channels:4020212223303800 Moniker:F5D7F171FBCF7218 Other:{TxIndex:on RPCAddress:tcp://127.0.0.1:26657}}"
node1    | I[2019-10-01|05:04:32.914] Started node                                 module=main nodeInfo="{ProtocolVersion:{P2P:7 Block:10 App:1} ID_:dd5c20e0120391faa2ae41ca920c744cfef61b7a ListenAddr:tcp://0.0.0.0:26656 Network:chain-rQyJWz Version:0.32.3 Channels:4020212223303800 Moniker:996E62F8069CB345 Other:{TxIndex:on RPCAddress:tcp://127.0.0.1:26657}}"
node1    | E[2019-10-01|05:04:32.921] dialing failed (attempts: 1): duplicate ID<15cbf28e987333b72908640fd392e89866cd0e2e> module=pex addr=15cbf28e987333b72908640fd392e898...@192.167.10.6:26656
node4    | E[2019-10-01|05:04:32.921] Stopping peer for error                      module=p2p peer="Peer{MConn{192.167.10.3:38924} dd5c20e0120391faa2ae41ca920c744cfef61b7a in}" err=EOF
node4    | E[2019-10-01|05:04:32.921] MConnection flush failed                     module=p2p peer=dd5c20e0120391faa2ae41ca920c744c...@192.167.10.3:38924 err="write tcp 192.167.10.6:26656->192.167.10.3:38924: use of closed network connection"
node3    | I[2019-10-01|05:04:32.933] Started node                                 module=main nodeInfo="{ProtocolVersion:{P2P:7 Block:10 App:1} ID_:118c061c6b79cb250beb6ad58cf1cae0c2ed479e ListenAddr:tcp://0.0.0.0:26656 Network:chain-rQyJWz Version:0.32.3 Channels:4020212223303800 Moniker:47B060EE54A0076E Other:{TxIndex:on RPCAddress:tcp://127.0.0.1:26657}}"
node3    | E[2019-10-01|05:04:32.937] dialing failed (attempts: 1): duplicate ID<dd5c20e0120391faa2ae41ca920c744cfef61b7a> module=pex addr=dd5c20e0120391faa2ae41ca920c744c...@192.167.10.3:26656

Abiramie Shree TGR

unread,
Oct 10, 2019, 3:17:59 AM10/10/19
to Jepsen Talk
Is there a solution for my error? 
node0    | E[2019-10-01|05:04:32.161] dialing failed (attempts: 1): dial tcp 192.167.10.5:26656: connect: connection refused module=pex addr=118c061c6b79cb250beb6ad58cf1cae0...@192.167.10.5:26656
node0    | E[2019-10-01|05:04:32.161] dialing failed (attempts: 1): dial tcp 192.167.10.4:26656: connect: connection refused module=pex addr=642b5440de365c62b8ab30ea2f57ff95...@192.167.10.4:26656
node0    | E[2019-10-01|05:04:32.161] dialing failed (attempts: 1): dial tcp 192.167.10.3:26656: connect: connection refused module=pex addr=dd5c20e0120391faa2ae41ca920c744c...@192.167.10.3:26656
node0    | E[2019-10-01|05:04:32.161] dialing failed (attempts: 1): dial tcp 192.167.10.6:26656: connect: connection refused module=pex addr=15cbf28e987333b72908640fd392e898...@192.167.10.6:26656
node2    | I[2019-10-01|05:04:32.299] Starting Node                                module=main impl=Node
node2    | I[2019-10-01|05:04:32.381] Started node                                 module=main nodeInfo="{ProtocolVersion:{P2P:7 Block:10 App:1} ID_:642b5440de365c62b8ab30ea2f57ff958bee5eaa ListenAddr:tcp://0.0.0.0:26656 Network:chain-rQyJWz Version:0.32.3 Channels:4020212223303800 Moniker:2DBC60526E1EC269 Other:{TxIndex:on RPCAddress:tcp://127.0.0.1:26657}}"
node2    | E[2019-10-01|05:04:32.382] dialing failed (attempts: 1): dial tcp 192.167.10.5:26656: connect: connection refused module=pex addr=118c061c6b79cb250beb6ad58cf1cae0...@192.167.10.5:26656
node2    | E[2019-10-01|05:04:32.382] dialing failed (attempts: 1): dial tcp 192.167.10.6:26656: connect: connection refused module=pex addr=15cbf28e987333b72908640fd392e898...@192.167.10.6:26656
node2    | E[2019-10-01|05:04:32.383] dialing failed (attempts: 1): dial tcp 192.167.10.3:26656: connect: connection refused module=pex addr=dd5c20e0120391faa2ae41ca920c744c...@192.167.10.3:26656
node2    | E[2019-10-01|05:04:32.399] Error dialing peer                           module=p2p err="dial tcp 192.167.10.6:26656: connect: connection refused"
node1    | I[2019-10-01|05:04:32.604] Version info                                 module=main software=0.32.3 block=10 p2p=7
node4    | I[2019-10-01|05:04:32.604] Version info                                 module=main software=0.32.3 block=10 p2p=7
node3    | I[2019-10-01|05:04:32.639] Version info                                 module=main software=0.32.3 block=10 p2p=7
node1    | I[2019-10-01|05:04:32.808] Starting Node                                module=main impl=Node
node4    | I[2019-10-01|05:04:32.809] Starting Node                                module=main impl=Node
node3    | I[2019-10-01|05:04:32.834] Starting Node                                module=main impl=Node
node4    | I[2019-10-01|05:04:32.914] Started node                                 module=main nodeInfo="{ProtocolVersion:{P2P:7 Block:10 App:1} ID_:15cbf28e987333b72908640fd392e89866cd0e2e ListenAddr:tcp://0.0.0.0:26656 Network:chain-rQyJWz Version:0.32.3 Channels:4020212223303800 Moniker:F5D7F171FBCF7218 Other:{TxIndex:on RPCAddress:tcp://127.0.0.1:26657}}"
node1    | I[2019-10-01|05:04:32.914] Started node                                 module=main nodeInfo="{ProtocolVersion:{P2P:7 Block:10 App:1} ID_:dd5c20e0120391faa2ae41ca920c744cfef61b7a ListenAddr:tcp://0.0.0.0:26656 Network:chain-rQyJWz Version:0.32.3 Channels:4020212223303800 Moniker:996E62F8069CB345 Other:{TxIndex:on RPCAddress:tcp://127.0.0.1:26657}}"
node1    | E[2019-10-01|05:04:32.921] dialing failed (attempts: 1): duplicate ID<15cbf28e987333b72908640fd392e89866cd0e2e> module=pex addr=15cbf28e987333b72908640fd392e898...@192.167.10.6:26656
node4    | E[2019-10-01|05:04:32.921] Stopping peer for error                      module=p2p peer="Peer{MConn{192.167.10.3:38924} dd5c20e0120391faa2ae41ca920c744cfef61b7a in}" err=EOF
node4    | E[2019-10-01|05:04:32.921] MConnection flush failed                     module=p2p peer=dd5c20e0120391faa2ae41ca920c744c...@192.167.10.3:38924 err="write tcp 192.167.10.6:26656->192.167.10.3:38924: use of closed network connection"
node3    | I[2019-10-01|05:04:32.933] Started node                                 module=main nodeInfo="{ProtocolVersion:{P2P:7 Block:10 App:1} ID_:118c061c6b79cb250beb6ad58cf1cae0c2ed479e ListenAddr:tcp://0.0.0.0:26656 Network:chain-rQyJWz Version:0.32.3 Channels:4020212223303800 Moniker:47B060EE54A0076E Other:{TxIndex:on RPCAddress:tcp://127.0.0.1:26657}}"
node3    | E[2019-10-01|05:04:32.937] dialing failed (attempts: 1): duplicate ID<dd5c20e0120391faa2ae41ca920c744cfef61b7a> module=pex addr=dd5c20e0120391faa2ae41ca920c744c...@192.167.10.3:26656

Abiramie Shree TGR

unread,
Oct 14, 2019, 1:54:44 AM10/14/19
to Jepsen Talk
is that error related to SSH connection ?? because while browsing the google and i found a link 
https://stackoverflow.com/questions/44813439/jepsen-ssh-issue which is similar to my error. But, the answer in this link is not clear to me. Can you please help me with that??
node0    | E[2019-10-01|05:04:32.161] dialing failed (attempts: 1): dial tcp 192.167.10.5:26656: connect: connection refused module=pex addr=118c061c6b79cb250beb6ad58cf1cae0...@192.167.10.5:26656
node0    | E[2019-10-01|05:04:32.161] dialing failed (attempts: 1): dial tcp 192.167.10.4:26656: connect: connection refused module=pex addr=642b5440de365c62b8ab30ea2f57ff95...@192.167.10.4:26656
node0    | E[2019-10-01|05:04:32.161] dialing failed (attempts: 1): dial tcp 192.167.10.3:26656: connect: connection refused module=pex addr=dd5c20e0120391faa2ae41ca920c744c...@192.167.10.3:26656
node0    | E[2019-10-01|05:04:32.161] dialing failed (attempts: 1): dial tcp 192.167.10.6:26656: connect: connection refused module=pex addr=15cbf28e987333b72908640fd392e898...@192.167.10.6:26656
node2    | I[2019-10-01|05:04:32.299] Starting Node                                module=main impl=Node
node2    | I[2019-10-01|05:04:32.381] Started node                                 module=main nodeInfo="{ProtocolVersion:{P2P:7 Block:10 App:1} ID_:642b5440de365c62b8ab30ea2f57ff958bee5eaa ListenAddr:tcp://0.0.0.0:26656 Network:chain-rQyJWz Version:0.32.3 Channels:4020212223303800 Moniker:2DBC60526E1EC269 Other:{TxIndex:on RPCAddress:tcp://127.0.0.1:26657}}"
node2    | E[2019-10-01|05:04:32.382] dialing failed (attempts: 1): dial tcp 192.167.10.5:26656: connect: connection refused module=pex addr=118c061c6b79cb250beb6ad58cf1cae0...@192.167.10.5:26656
node2    | E[2019-10-01|05:04:32.382] dialing failed (attempts: 1): dial tcp 192.167.10.6:26656: connect: connection refused module=pex addr=15cbf28e987333b72908640fd392e898...@192.167.10.6:26656
node2    | E[2019-10-01|05:04:32.383] dialing failed (attempts: 1): dial tcp 192.167.10.3:26656: connect: connection refused module=pex addr=dd5c20e0120391faa2ae41ca920c744c...@192.167.10.3:26656
node2    | E[2019-10-01|05:04:32.399] Error dialing peer                           module=p2p err="dial tcp 192.167.10.6:26656: connect: connection refused"
node1    | I[2019-10-01|05:04:32.604] Version info                                 module=main software=0.32.3 block=10 p2p=7
node4    | I[2019-10-01|05:04:32.604] Version info                                 module=main software=0.32.3 block=10 p2p=7
node3    | I[2019-10-01|05:04:32.639] Version info                                 module=main software=0.32.3 block=10 p2p=7
node1    | I[2019-10-01|05:04:32.808] Starting Node                                module=main impl=Node
node4    | I[2019-10-01|05:04:32.809] Starting Node                                module=main impl=Node
node3    | I[2019-10-01|05:04:32.834] Starting Node                                module=main impl=Node
node4    | I[2019-10-01|05:04:32.914] Started node                                 module=main nodeInfo="{ProtocolVersion:{P2P:7 Block:10 App:1} ID_:15cbf28e987333b72908640fd392e89866cd0e2e ListenAddr:tcp://0.0.0.0:26656 Network:chain-rQyJWz Version:0.32.3 Channels:4020212223303800 Moniker:F5D7F171FBCF7218 Other:{TxIndex:on RPCAddress:tcp://127.0.0.1:26657}}"
node1    | I[2019-10-01|05:04:32.914] Started node                                 module=main nodeInfo="{ProtocolVersion:{P2P:7 Block:10 App:1} ID_:dd5c20e0120391faa2ae41ca920c744cfef61b7a ListenAddr:tcp://0.0.0.0:26656 Network:chain-rQyJWz Version:0.32.3 Channels:4020212223303800 Moniker:996E62F8069CB345 Other:{TxIndex:on RPCAddress:tcp://127.0.0.1:26657}}"
node1    | E[2019-10-01|05:04:32.921] dialing failed (attempts: 1): duplicate ID<15cbf28e987333b72908640fd392e89866cd0e2e> module=pex addr=15cbf28e987333b72908640fd392e898...@192.167.10.6:26656
node4    | E[2019-10-01|05:04:32.921] Stopping peer for error                      module=p2p peer="Peer{MConn{192.167.10.3:38924} dd5c20e0120391faa2ae41ca920c744cfef61b7a in}" err=EOF
node4    | E[2019-10-01|05:04:32.921] MConnection flush failed                     module=p2p peer=dd5c20e0120391faa2ae41ca920c744c...@192.167.10.3:38924 err="write tcp 192.167.10.6:26656->192.167.10.3:38924: use of closed network connection"
node3    | I[2019-10-01|05:04:32.933] Started node                                 module=main nodeInfo="{ProtocolVersion:{P2P:7 Block:10 App:1} ID_:118c061c6b79cb250beb6ad58cf1cae0c2ed479e ListenAddr:tcp://0.0.0.0:26656 Network:chain-rQyJWz Version:0.32.3 Channels:4020212223303800 Moniker:47B060EE54A0076E Other:{TxIndex:on RPCAddress:tcp://127.0.0.1:26657}}"
node3    | E[2019-10-01|05:04:32.937] dialing failed (attempts: 1): duplicate ID<dd5c20e0120391faa2ae41ca920c744cfef61b7a> module=pex addr=dd5c20e0120391faa2ae41ca920c744c...@192.167.10.3:26656

Kyle Kingsbury

unread,
Oct 15, 2019, 1:00:27 PM10/15/19
to ta...@jepsen.io
On 10/14/19 1:54 AM, Abiramie Shree TGR wrote:
> is that error related to SSH connection ?? because while browsing the google
> and i found a link
> https://stackoverflow.com/questions/44813439/jepsen-ssh-issue which is similar
> to my error. But, the answer in this link is not clear to me. Can you please
> help me with that??

Yes: as I've said repeatedly, you need to sort out SSH connectivity to your DB
nodes first. I don't think I can offer you additional help at this juncture; I
think you're going to need more assistance I can reasonably provide for free.

--Kyle

Abiramie Shree TGR

unread,
Oct 15, 2019, 1:08:08 PM10/15/19
to Jepsen Talk
Thanks for your reply.. I initially thought the repository https://github.com/jepsen-io/Tendermint itself created tendermint nodes(DB nodes) .Because readme documentation is not clear.It didn't specified the creation of nodes and all.That's why I am asking this question repeatedly.

-Abiramie Shree T G R

Kyle Kingsbury

unread,
Oct 15, 2019, 1:17:20 PM10/15/19
to ta...@jepsen.io
On 10/15/19 1:08 PM, Abiramie Shree TGR wrote:
> Thanks for your reply.. I initially thought the repository https://github.com/jepsen-io/Tendermint itself created tendermint nodes(DB nodes) .Because readme documentation is not clear.It didn't specified the creation of nodes and all.That's why I am asking this question repeatedly.

All Jepsen tests require a Jepsen environment, as described by the Jepsen
README: https://github.com/jepsen-io/jepsen. The Tendermint test is responsible
for setting up Tendermint on DB nodes.

--Kyle

Abiramie Shree TGR

unread,
Oct 15, 2019, 1:21:44 PM10/15/19
to Jepsen Talk
Ok.Thanks for your help.
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted

Abiramie Shree TGR

unread,
Oct 30, 2019, 7:24:14 AM10/30/19
to Jepsen Talk

i solved this error. Now, i am getting a different error while running the command lein run test in the repository https://github.com/jepsen-io/tendermint

The error code is

INFO [2019-10-30 16:13:08,306] jepsen node n3 - jepsen.os.debian n3 setting up debian

INFO [2019-10-30 16:13:08,306] jepsen node n5 - jepsen.os.debian n5 setting up debian

INFO [2019-10-30 16:13:08,306] jepsen node n4 - jepsen.os.debian n4 setting up debian

INFO [2019-10-30 16:13:08,306] jepsen node n1 - jepsen.os.debian n1 setting up debian

INFO [2019-10-30 16:13:08,306] jepsen node n2 - jepsen.os.debian n2 setting up debian

INFO [2019-10-30 16:13:08,830] jepsen node n2 - jepsen.os.debian Installing #{iproute man-db curl psmisc ntpdate faketime libzip2 unzip wget iptables}

INFO [2019-10-30 16:13:08,830] jepsen node n5 - jepsen.os.debian Installing #{iproute man-db curl psmisc ntpdate faketime libzip2 unzip wget iptables}

INFO [2019-10-30 16:13:08,831] jepsen node n4 - jepsen.os.debian Installing #{iproute man-db curl psmisc ntpdate faketime libzip2 unzip wget iptables}

INFO [2019-10-30 16:13:08,840] jepsen node n3 - jepsen.os.debian Installing #{iproute man-db curl psmisc ntpdate faketime libzip2 unzip wget iptables}

INFO [2019-10-30 16:13:08,961] jepsen node n1 - jepsen.os.debian Installing #{iproute man-db curl psmisc ntpdate faketime libzip2 unzip wget iptables}

ERROR [2019-10-30 16:13:09,589] main - jepsen.cli Oh jeez, I'm sorry, Jepsen broke. Here's why:

java.util.concurrent.ExecutionException: java.lang.RuntimeException: sudo -S -u root bash -c "cd /; apt-get install -y --force-yes iproute man-db curl psmisc ntpdate faketime libzip2 unzip wget iptables" returned non-zero exit status 100 on n1. STDOUT:

Reading package lists...

Building dependency tree...

Reading state information...

Package iproute is not available, but is referred to by another package.

This may mean that the package is missing, has been obsoleted, or

is only available from another source

However the following packages replace it:

  iproute2




STDERR:

W: --force-yes is deprecated, use one of the options starting with --allow instead.

E: Package 'iproute' has no installation candidate

E: Unable to locate package libzip2


at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[na:1.8.0_222]

at java.util.concurrent.FutureTask.get(FutureTask.java:192) ~[na:1.8.0_222]

at clojure.core$deref_future.invokeStatic(core.clj:2208) ~[clojure-1.8.0.jar:na]

at clojure.core$future_call$reify__6962.deref(core.clj:6688) ~[clojure-1.8.0.jar:na]

at clojure.core$deref.invokeStatic(core.clj:2228) ~[clojure-1.8.0.jar:na]

at clojure.core$deref.invoke(core.clj:2214) ~[clojure-1.8.0.jar:na]

at clojure.core$map$fn__4785.invoke(core.clj:2644) ~[clojure-1.8.0.jar:na]

at clojure.lang.LazySeq.sval(LazySeq.java:40) ~[clojure-1.8.0.jar:na]

at clojure.lang.LazySeq.seq(LazySeq.java:49) ~[clojure-1.8.0.jar:na]

at clojure.lang.RT.seq(RT.java:521) ~[clojure-1.8.0.jar:na]

at clojure.core$seq__4357.invokeStatic(core.clj:137) ~[clojure-1.8.0.jar:na]

at clojure.core.protocols$seq_reduce.invokeStatic(protocols.clj:24) ~[clojure-1.8.0.jar:na]

at clojure.core.protocols$fn__6738.invokeStatic(protocols.clj:75) ~[clojure-1.8.0.jar:na]

at clojure.core.protocols$fn__6738.invoke(protocols.clj:75) ~[clojure-1.8.0.jar:na]

at clojure.core.protocols$fn__6684$G__6679__6697.invoke(protocols.clj:13) ~[clojure-1.8.0.jar:na]

at clojure.core$reduce.invokeStatic(core.clj:6545) ~[clojure-1.8.0.jar:na]

at clojure.core$into.invokeStatic(core.clj:6610) ~[clojure-1.8.0.jar:na]

at clojure.core$into.invoke(core.clj:6604) ~[clojure-1.8.0.jar:na]

at jepsen.control$on_nodes.invokeStatic(control.clj:353) ~[jepsen-0.1.6.jar:na]

at jepsen.control$on_nodes.invoke(control.clj:337) ~[jepsen-0.1.6.jar:na]

at jepsen.control$on_nodes.invokeStatic(control.clj:342) ~[jepsen-0.1.6.jar:na]

at jepsen.control$on_nodes.invoke(control.clj:337) ~[jepsen-0.1.6.jar:na]

at jepsen.core$run_BANG_$fn__3400$fn__3403.invoke(core.clj:411) ~[jepsen-0.1.6.jar:na]

at jepsen.core$run_BANG_$fn__3400.invoke(core.clj:400) ~[jepsen-0.1.6.jar:na]

at jepsen.core$run_BANG_.invokeStatic(core.clj:382) ~[jepsen-0.1.6.jar:na]

at jepsen.core$run_BANG_.invoke(core.clj:329) ~[jepsen-0.1.6.jar:na]

at jepsen.cli$single_test_cmd$fn__4053.invoke(cli.clj:327) ~[jepsen-0.1.6.jar:na]

at jepsen.cli$run_BANG_.invokeStatic(cli.clj:271) [jepsen-0.1.6.jar:na]

at jepsen.cli$run_BANG_.invoke(cli.clj:201) [jepsen-0.1.6.jar:na]

at jepsen.tendermint.cli$_main.invokeStatic(cli.clj:25) [na:na]

at jepsen.tendermint.cli$_main.doInvoke(cli.clj:23) [na:na]

at clojure.lang.RestFn.invoke(RestFn.java:408) [clojure-1.8.0.jar:na]

at clojure.lang.Var.invoke(Var.java:379) [clojure-1.8.0.jar:na]

at user$eval720.invokeStatic(form-init2121207811388987748.clj:1) [na:na]

at user$eval720.invoke(form-init2121207811388987748.clj:1) [na:na]

at clojure.lang.Compiler.eval(Compiler.java:6927) [clojure-1.8.0.jar:na]

at clojure.lang.Compiler.eval(Compiler.java:6917) [clojure-1.8.0.jar:na]

at clojure.lang.Compiler.load(Compiler.java:7379) [clojure-1.8.0.jar:na]

at clojure.lang.Compiler.loadFile(Compiler.java:7317) [clojure-1.8.0.jar:na]

at clojure.main$load_script.invokeStatic(main.clj:275) [clojure-1.8.0.jar:na]

at clojure.main$init_opt.invokeStatic(main.clj:277) [clojure-1.8.0.jar:na]

at clojure.main$init_opt.invoke(main.clj:277) [clojure-1.8.0.jar:na]

at clojure.main$initialize.invokeStatic(main.clj:308) [clojure-1.8.0.jar:na]

at clojure.main$null_opt.invokeStatic(main.clj:342) [clojure-1.8.0.jar:na]

at clojure.main$null_opt.invoke(main.clj:339) [clojure-1.8.0.jar:na]

at clojure.main$main.invokeStatic(main.clj:421) [clojure-1.8.0.jar:na]

at clojure.main$main.doInvoke(main.clj:384) [clojure-1.8.0.jar:na]

at clojure.lang.RestFn.invoke(RestFn.java:421) [clojure-1.8.0.jar:na]

at clojure.lang.Var.invoke(Var.java:383) [clojure-1.8.0.jar:na]

at clojure.lang.AFn.applyToHelper(AFn.java:156) [clojure-1.8.0.jar:na]

at clojure.lang.Var.applyTo(Var.java:700) [clojure-1.8.0.jar:na]

at clojure.main.main(main.java:37) [clojure-1.8.0.jar:na]

Caused by: java.lang.RuntimeException: sudo -S -u root bash -c "cd /; apt-get install -y --force-yes iproute man-db curl psmisc ntpdate faketime libzip2 unzip wget iptables" returned non-zero exit status 100 on n1. STDOUT:

Reading package lists...

Building dependency tree...

Reading state information...

Package iproute is not available, but is referred to by another package.

This may mean that the package is missing, has been obsoleted, or

is only available from another source

However the following packages replace it:

  iproute2




STDERR:

W: --force-yes is deprecated, use one of the options starting with --allow instead.

E: Package 'iproute' has no installation candidate

E: Unable to locate package libzip2


at jepsen.control$throw_on_nonzero_exit.invokeStatic(control.clj:128) ~[jepsen-0.1.6.jar:na]

at jepsen.control$throw_on_nonzero_exit.invoke(control.clj:121) ~[jepsen-0.1.6.jar:na]

at jepsen.control$exec_STAR_.invokeStatic(control.clj:165) ~[jepsen-0.1.6.jar:na]

at jepsen.control$exec_STAR_.doInvoke(control.clj:162) ~[jepsen-0.1.6.jar:na]

at clojure.lang.RestFn.applyTo(RestFn.java:137) [clojure-1.8.0.jar:na]

at clojure.core$apply.invokeStatic(core.clj:646) ~[clojure-1.8.0.jar:na]

at clojure.core$apply.invoke(core.clj:641) ~[clojure-1.8.0.jar:na]

at jepsen.control$exec.invokeStatic(control.clj:181) ~[jepsen-0.1.6.jar:na]

at jepsen.control$exec.doInvoke(control.clj:175) ~[jepsen-0.1.6.jar:na]

at clojure.lang.RestFn.applyTo(RestFn.java:137) [clojure-1.8.0.jar:na]

at clojure.core$apply.invokeStatic(core.clj:654) ~[clojure-1.8.0.jar:na]

at clojure.core$apply.doInvoke(core.clj:641) ~[clojure-1.8.0.jar:na]

at clojure.lang.RestFn.invoke(RestFn.java:533) [clojure-1.8.0.jar:na]

at jepsen.os.debian$install.invokeStatic(debian.clj:98) ~[na:na]

at jepsen.os.debian$install.invoke(debian.clj:78) ~[na:na]

at jepsen.os.debian$reify__1367$fn__1368.invoke(debian.clj:148) ~[na:na]

at jepsen.os.debian$reify__1367.setup_BANG_(debian.clj:146) ~[na:na]

at jepsen.os$fn__1153$G__1147__1157.invoke(os.clj:4) ~[jepsen-0.1.6.jar:na]

at jepsen.os$fn__1153$G__1146__1162.invoke(os.clj:4) ~[jepsen-0.1.6.jar:na]

at clojure.core$partial$fn__4759.invoke(core.clj:2516) ~[clojure-1.8.0.jar:na]

at jepsen.control$on_nodes$fn__1797.invoke(control.clj:352) ~[jepsen-0.1.6.jar:na]

at clojure.lang.AFn.applyToHelper(AFn.java:154) [clojure-1.8.0.jar:na]

at clojure.lang.AFn.applyTo(AFn.java:144) [clojure-1.8.0.jar:na]

at clojure.core$apply.invokeStatic(core.clj:646) ~[clojure-1.8.0.jar:na]

at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1881) ~[clojure-1.8.0.jar:na]

at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1881) ~[clojure-1.8.0.jar:na]

at clojure.lang.RestFn.applyTo(RestFn.java:142) [clojure-1.8.0.jar:na]

at clojure.core$apply.invokeStatic(core.clj:650) ~[clojure-1.8.0.jar:na]

at clojure.core$bound_fn_STAR_$fn__4671.doInvoke(core.clj:1911) ~[clojure-1.8.0.jar:na]

at clojure.lang.RestFn.invoke(RestFn.java:408) [clojure-1.8.0.jar:na]

at jepsen.util$real_pmap$launcher__950$fn__951.invoke(util.clj:48) ~[jepsen-0.1.6.jar:na]

at clojure.core$binding_conveyor_fn$fn__4676.invoke(core.clj:1938) ~[clojure-1.8.0.jar:na]

at clojure.lang.AFn.call(AFn.java:18) [clojure-1.8.0.jar:na]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_222]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_222]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_222]

at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_222]


Can you help me to solve this error and what this error mean?


Thank you,

Abiramie Shree T G R


Kyle Kingsbury

unread,
Oct 30, 2019, 7:26:09 AM10/30/19
to ta...@jepsen.io
On 10/30/19 7:24 AM, Abiramie Shree TGR wrote:

E: Package 'iproute' has no installation candidate

E: Unable to locate package libzip2

IIRC, the Tendermint tests were written for Debian Jessie. What version of Debian are you running on your DB nodes?

--Kyle

Abiramie Shree TGR

unread,
Oct 30, 2019, 7:29:39 AM10/30/19
to Jepsen Talk
i created it using your lxc documentation for ubuntu https://github.com/jepsen-io/jepsen/blob/master/doc/lxc.md

i created 5 nodes of ubuntu bionic version 

Kyle Kingsbury

unread,
Oct 30, 2019, 7:47:59 AM10/30/19
to ta...@jepsen.io
On 10/30/19 7:29 AM, Abiramie Shree TGR wrote:
> i created it using your lxc documentation for
> ubuntu https://github.com/jepsen-io/jepsen/blob/master/doc/lxc.md
>
> i created 5 nodes of ubuntu bionic version

The instructions specify Debian Jessie, not Ubuntu Bionic, so... part of the
problem here is that you're not following the directions. These are different
operating systems with different names for packages. That's why Jepsen is
complaining about mismatching package names. You have three options here:

1. Keep using Bionic, and replace the :os in tendermint's test maps with a
custom implementation of a Jepsen OS adapter specifically for Bionic. You can
use Jepsen 0.1.6's jepsen.os.debian as a guide, but of course the package names
will be different.

2. Switch to Debian Jessie. This is the obvious answer, but Jessie was EOL'ed
back in March, and many of the apt repos for Jessie were taken down. That's
gonna make it hard for Jepsen to install packages. You'll have to find a
still-running (or build your own) Jessie mirror so you can install packages.

3. Use Debian Stretch, and upgrade Jepsen. Tendermint was written against Jepsen
0.1.6, which came out shortly after Stretch's release, and consequently still
used Jessie. Jepsen 0.1.13+ supports Debian Stretch, so you can upgrade the
Jepsen dependency in project.clj to 0.1.13, 14, 15, ... and run with that.
However, there have been a few API changes in the intervening years, and you'll
be responsible for patching the Tendermint test to follow them.

--Kyle

Abiramie Shree TGR

unread,
Oct 30, 2019, 7:57:25 AM10/30/19
to Jepsen Talk
Sorry for disturbing you again.
Thanks for your detailed reply. I will try your options and suggestions. 
But if i face issues or errors i will disturb you in future also.

Thank you
Abiramie Shree T G R

Kyle Kingsbury

unread,
Oct 30, 2019, 9:15:01 AM10/30/19
to ta...@jepsen.io
On 10/30/19 6:50 AM, Abiramie Shree TGR wrote:
> Sorry to disturb you again.

On 10/30/19 6:54 AM, Abiramie Shree TGR wrote:
> Sorry to disturb you again.

On 10/30/19 7:01 AM, Abiramie Shree TGR wrote:
> Sorry for disturbing you again.

On 10/30/19 7:21 AM, Abiramie Shree TGR wrote:
> Sorry for disturbing you again.

On 10/30/19 7:24 AM, Abiramie Shree TGR wrote:
> i solved this error. Now, i am getting a different error

Abiramie, please, never do this again.

--Kyle

Abiramie Shree TGR

unread,
Oct 30, 2019, 9:29:08 AM10/30/19
to Jepsen Talk
Sorry for this inconvenience.. When I try to post it shows that my message got deleted.. That's why i try to post the same question repeatedly.. I never thought it will deliver to you lately..

Abiramie Shree TGR

unread,
Jan 14, 2020, 1:27:36 AM1/14/20
to Jepsen Talk
i am having one doubt. It is related to tendermint in clojure not on testing using jepsen.

whether it is possible to create two clients in tendermint(using clojure) one for handling reading operation and another one for writing operation?




Thank you,
Abiramie Shree T G R. 
Reply all
Reply to author
Forward
0 new messages