vault won't unseal, ping-pong's unseal progress

1,637 views
Skip to first unread message

Rich Fromm

unread,
Jan 12, 2017, 4:04:24 PM1/12/17
to Vault
I've gotten vault into a bad state, where it won't unseal. My backend is S3. I used the default settings during initialization and have not re-keyed, so there are 5 keys and a key threshold of 3. Up until sometime yesterday, this worked fine.

So far this is just a test instance, but we are exploring the setup to see what we want to change to roll it out into production. I've been playing around with multiple S3 buckets (eventually planning on turning on automatic Cross-Region Replication, but I haven't done that yet), S3 Versioning, using aws s3 sync to copy the data from one bucket to another, as well as for backup and restore. So it's possible I've done something outside of the normal flow, although I can't see how anything I've done should cause what I'm seeing to happen.

As to what is actually happening, normally when I restart vault it comes up unsealed. I successively give it keys, and the unseal progress goes from 0 to 1 to 2 and then it is unsealed, like so, from the vault logs:

Thu Jan 12 00:49:02 2017 - Starting vault
Thu Jan 12 00:49:02 2017 - Vault started
Thu Jan 12 00:49:02 2017 - AWS region is us-east-2
==> Vault server configuration:

                 Backend: s3
              Listener 1: tcp (addr: "0.0.0.0:8200", cluster address: "", tls: "disabled")
               Log Level: debug
                   Mlock: supported: true, enabled: false
                 Version: Vault v0.6.1

==> Vault server started! Log data will stream in below:

2017/01/12 00:49:35.726889 [DBG] core: cannot unseal, not enough keys keys=1 threshold=3
2017/01/12 00:49:55.767824 [DBG] core: cannot unseal, not enough keys keys=2 threshold=3
2017/01/12 00:50:02.354043 [INF] core: vault is unsealed

But now when I try to unseal, the progress goes 0 to 1 to 2, and then instead of unsealing, it goes back to 1. Then 2. etc.

From the command line (timestamps in PST):

$ vault status
Sealed: true
Key Shares: 5
Key Threshold: 3
Unseal Progress: 0
Version: Vault v0.6.1

High-Availability Enabled: false
rich@rich-trusty [12:49:41 Thu Jan 12] ~
$ vault unseal
Key (will be hidden):
Sealed: true
Key Shares: 5
Key Threshold: 3
Unseal Progress: 1
rich@rich-trusty [12:49:53 Thu Jan 12] ~
$ vault unseal
Key (will be hidden):
Sealed: true
Key Shares: 5
Key Threshold: 3
Unseal Progress: 2
rich@rich-trusty [12:49:59 Thu Jan 12] ~
$ vault unseal
Key (will be hidden):
Sealed: true
Key Shares: 5
Key Threshold: 3
Unseal Progress: 1
rich@rich-trusty [12:50:06 Thu Jan 12] ~
$ vault unseal
Key (will be hidden):
Sealed: true
Key Shares: 5
Key Threshold: 3
Unseal Progress: 2
rich@rich-trusty [12:50:12 Thu Jan 12] ~
$ vault unseal
Key (will be hidden):
Sealed: true
Key Shares: 5
Key Threshold: 3
Unseal Progress: 1
rich@rich-trusty [12:50:19 Thu Jan 12] ~

From the vault log (timestamps in UTC):

Thu Jan 12 20:49:37 2017 - Starting vault
Thu Jan 12 20:49:37 2017 - Vault started
Thu Jan 12 20:49:37 2017 - AWS region is us-west-2
==> Vault server configuration:

                 Backend: s3
              Listener 1: tcp (addr: "0.0.0.0:8200", cluster address: "", tls: "disabled")
               Log Level: debug
                   Mlock: supported: true, enabled: false
                 Version: Vault v0.6.1

==> Vault server started! Log data will stream in below:

2017/01/12 20:49:53.657949 [DBG] core: cannot unseal, not enough keys keys=1 threshold=3
2017/01/12 20:49:59.275866 [DBG] core: cannot unseal, not enough keys keys=2 threshold=3
2017/01/12 20:50:06.907048 [DBG] core: cannot unseal, not enough keys keys=1 threshold=3
2017/01/12 20:50:12.002856 [DBG] core: cannot unseal, not enough keys keys=2 threshold=3
2017/01/12 20:50:19.603509 [DBG] core: cannot unseal, not enough keys keys=1 threshold=3

Before I started any of this, one of the first things I did was make a local backup of the current state of my s3 bucket (when things were definitely working) with aws s3 sync). Even creating an entirely new bucket from scratch, and syncing from my local backup to that, I get the same broken behavior shown above.

Any help anyone can offer would be most appreciated. As I said, it's all test data, so if it's lost it's not a huge deal. But if I can't explain what went wrong, it might cause us to question using Vault for production use.

My vault is 0.6.1. I realize the latest is 0.6.4, and I could try upgrading to that, but everything was working fine with 0.6.1 until sometime late yesterday.

Jeff Mitchell

unread,
Jan 12, 2017, 10:23:09 PM1/12/17
to vault...@googlegroups.com
Hi Rich,

That's definitely strange. So this persists across restarts, but only
started happening (with the same Vault version) yesterday? Before that
you could unseal just fine?

I don't have a great answer for you at the moment; the code that
collects up the unseal keys is really quite simple -- an array of the
unseal key bytes that gets appended to with each submission. (FWIW,
this is tested, many times over, with every Travis run, which is
against every commit.)

You say that you can repeat this behavior with a fresh bucket. Do you
do anything other than simply init and then try to unseal before
seeing this?

Do you get this behavior with a different backend, like inmem or file?

Best,
Jeff
> --
> This mailing list is governed under the HashiCorp Community Guidelines -
> https://www.hashicorp.com/community-guidelines.html. Behavior in violation
> of those guidelines may result in your removal from this mailing list.
>
> GitHub Issues: https://github.com/hashicorp/vault/issues
> IRC: #vault-tool on Freenode
> ---
> You received this message because you are subscribed to the Google Groups
> "Vault" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to vault-tool+...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/vault-tool/4826d214-cade-490e-815e-9ec58a84f1cf%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Rich Fromm

unread,
Jan 13, 2017, 1:13:56 AM1/13/17
to Vault
On Thursday, January 12, 2017 at 7:23:09 PM UTC-8, Jeff Mitchell wrote:

That's definitely strange. So this persists across restarts, but only
started happening (with the same Vault version) yesterday? Before that
you could unseal just fine?

I will have to go back through my notes (and terminal sessions) and see if I can pinpoint exactly the point at which things broke. I think it may have been after trying to sync one s3 bucket with another (I'm exploring replicated buckets). But even if I somehow corrupted the destination bucket by doing so, I can't understand why that would also corrupt the source bucket.

I had previously unsealed the original s3 bucket many times, and I think (will need to check my notes tomorrow) that I also unsealed the duplicate bucket at least a few times.

You say that you can repeat this behavior with a fresh bucket. Do you
do anything other than simply init and then try to unseal before
seeing this?

Sorry, I guess I did not make myself clear. I have not tried re-running vault init. When I say a fresh bucket, what I did is make an empty bucket with the S3 web console, then run aws s3 sync to upload the contents that I had previously downloaded (back when this was working) with aws s3 sync. I am assuming this is a viable option for backup and restore -- is this wrong? At that point it gets stuck when I try to unseal. 

Do you get this behavior with a different backend, like inmem or file?

Since it's trying to restore the previous data, and not just starting from scratch, I don't think it would be feasible (or at least not easily) to test this. I don't know enough about the s3 format to be able to try to migrate s3 data manually to some other backend.

I am fairly confident that if I totally start over and re-run vault init that things will be fine. But that worries me as an answer. I assume in a real backup and restore situation I would not be reinitializing, just restoring verbatim the contents of the saved bucket?

Jeff Mitchell

unread,
Jan 13, 2017, 9:20:12 AM1/13/17
to vault...@googlegroups.com
Sure. And to be clear, I don't have any suggestions as to why S3 would
have anything to do with this, because the progress of the unseal
process takes place entirely in-memory. This is also a large part of
why, since it's not backend specific and it's continually-tested code,
I don't have any good ideas at the moment. (It's also why I'm
wondering if you can replicate somehow with a newly-inited S3 bucket,
in case somehow in some way I can't see right now that does actually
have something to do with it.)

Can you, at each step along the way, run (the equivalent in your
environment of) 'curl http://127.0.0.1:8200/v1/sys/seal-status' ?
Before you start the unsealing process and after each key? I realize
it gives you status as you go, but it would be good to see e.g. before
the process is started.

Best,
Jeff

Rich Fromm

unread,
Jan 13, 2017, 3:58:01 PM1/13/17
to Vault
On Friday, January 13, 2017 at 6:20:12 AM UTC-8, Jeff Mitchell wrote:

Can you, at each step along the way, run (the equivalent in your
environment of) 'curl http://127.0.0.1:8200/v1/sys/seal-status' ?
Before you start the unsealing process and after each key? I realize
it gives you status as you go, but it would be good to see e.g. before
the process is started.

Previously unseal and status checking was done on a remote host.

In this case it's all done on the local host. Same results.

To be clear, the 5 unseal cases are passing in succession the 5 unique keys.

ubuntu@ip-172-19-129-10:~$ sudo stop vault
vault stop/waiting
ubuntu@ip-172-19-129-10:~$ sudo start vault
vault start/running, process 16350
ubuntu@ip-172-19-129-10:~$ echo $VAULT_ADDR
http://127.0.0.1:8200/v1/sys/seal-status
ubuntu@ip-172-19-129-10:~$ curl http://127.0.0.1:8200/v1/sys/seal-status
{"sealed":true,"t":3,"n":5,"progress":0,"version":"Vault v0.6.1"}
ubuntu@ip-172-19-129-10:~$ vault status

Sealed: true
Key Shares: 5
Key Threshold: 3
Unseal Progress: 0
Version: Vault v0.6.1

High-Availability Enabled: false
ubuntu@ip-172-19-129-10:~$ vault unseal

Key (will be hidden):
Sealed: true
Key Shares: 5
Key Threshold: 3
Unseal Progress: 1
ubuntu@ip-172-19-129-10:~$ curl http://127.0.0.1:8200/v1/sys/seal-status
{"sealed":true,"t":3,"n":5,"progress":1,"version":"Vault v0.6.1"}
ubuntu@ip-172-19-129-10:~$ vault unseal

Key (will be hidden):
Sealed: true
Key Shares: 5
Key Threshold: 3
Unseal Progress: 2
ubuntu@ip-172-19-129-10:~$ curl http://127.0.0.1:8200/v1/sys/seal-status
{"sealed":true,"t":3,"n":5,"progress":2,"version":"Vault v0.6.1"}
ubuntu@ip-172-19-129-10:~$ vault unseal

Key (will be hidden):
Sealed: true
Key Shares: 5
Key Threshold: 3
Unseal Progress: 1
ubuntu@ip-172-19-129-10:~$ curl http://127.0.0.1:8200/v1/sys/seal-status
{"sealed":true,"t":3,"n":5,"progress":1,"version":"Vault v0.6.1"}
ubuntu@ip-172-19-129-10:~$ vault unseal

Key (will be hidden):
Sealed: true
Key Shares: 5
Key Threshold: 3
Unseal Progress: 2
ubuntu@ip-172-19-129-10:~$ curl http://127.0.0.1:8200/v1/sys/seal-status
{"sealed":true,"t":3,"n":5,"progress":2,"version":"Vault v0.6.1"}
ubuntu@ip-172-19-129-10:~$ vault unseal

Key (will be hidden):
Sealed: true
Key Shares: 5
Key Threshold: 3
Unseal Progress: 1
ubuntu@ip-172-19-129-10:~$ curl http://127.0.0.1:8200/v1/sys/seal-status
{"sealed":true,"t":3,"n":5,"progress":1,"version":"Vault v0.6.1"}
ubuntu@ip-172-19-129-10:~$

I'm going to try some other things, possible reinitializing, redeploying, upgrading, etc. I'll report back when those are done. My guess is that reinitializing will solve the problem and I won't be able to reproduce, but we'll see.

Rich Fromm

unread,
Jan 13, 2017, 4:18:32 PM1/13/17
to Vault

I realize that the S3 backend is community supported, so perhaps this level of detail is outside the scope of what's relevant, but I'm going to try to itemize what I did when things went wrong.

I have an S3 bucket in us-west-2. I've been testing with that for a while now, and all has been fine. For the purpose of this discussion, let's call it "primary". Initially, this bucket does *not* have Versioning enabled.

As part of my recent work, I made a new S3 bucket in us-east-2. Let's call that "backup". It was created initially empty, with the same configuration as primary, except that Versioning was enabled. (I'm pretty sure of this last statement.)

Part of this involved running `aws s3 sync`, which I had not used before. Using this, there is a local copy of the S3 data, let's call it "local".

1) All works fine pointing at primary.
2) Sync primary to local. This is the initial creation of local.
3) Sync local to backup. backup had been previously empty, so the assumption now is that primary and backup have identical contents. (I could have presumably sync'd primary directly to backup, but I wanted a local copy for further testing.)
4) Restart vault pointing at backup. All works fine. I was able to unseal, auth, and read data. (Really this is an oversimplification. I initially had some problems where it wasn't working due to AWS IAM issues, but I resolved those and got it to work.)
5) Enable Versioning on primary
6) Sync backup to primary. (The reason for this is that initially enabling Versioning does nothing. Only future writes get versioned. My idea was that this would copy everything, but with the same contents, and now it would all be versioned. I'm not positive if that logic is sound or not.)
7) I would be extremely surprised if this matters, but for completeness, soon after step (6) I lost my connectivity to Vault (the instance is in AWS EC2) due to some fiddling with the network that was being done unbeknownst to me. Note that the connection loss was *not* in the middle of the sync operation, so I can't see how it should be relevant.
8) Restart vault pointing at primary. This is when things were first broke.
9) Various debugging happened, not all of which is documented, but the bottom line is that when I restart vault pointing at the backup (which previously worked), it's also broken. Even if I somehow hosed the primary with the sync, I'm baffled as to why the backup is hosed too.
10) I also tried sync'ing local to primary, and local to a newly created bucket. All fail in the same way.

Jeff Mitchell

unread,
Jan 13, 2017, 8:13:52 PM1/13/17
to vault...@googlegroups.com
Hi Rich,

> I realize that the S3 backend is community supported, so perhaps this level
> of detail is outside the scope of what's relevant, but I'm going to try to
> itemize what I did when things went wrong.

It's possible the flow you describe below have something to do
with...something...but I don't see how this could be S3-specific
(although I also don't see what could be causing this generally).
The main issue with all of this is this line:

https://github.com/hashicorp/vault/blob/v0.6.1/vault/core.go#L752

As you can see there, the value that is being returned is coming from
the length of the slice of submitted unseal keys, which is in-memory.
That's why I don't see both what is going wrong, and why S3 would have
anything to do with it. While the existing seal configuration with the
threshold information is stored on-disk, it doesn't seem to be
relevant here.

That slice is only appended to unless the unseal operation is
canceled, in which case it is cleared. This can either be because of
an explicit cancelation, or because something is actually submitting a
third key, which isn't valid, which then causes the process to reset.
Why that would happen, no idea, but it's (in a literal sense) in the
realm of possibility.

I've looped in some other people here to see if anyone has ideas. You
could also try 0.6.4 just in case this is related to something we
happened to have fixed since 0.6.1 although I don't remember anything
around that specifically (if you do, make sure you look at the
0.6.2/0.6.3/0.6.4 upgrade notes).

I also realize that you assume that if you init-ed a new bucket with
new keys you don't expect to be able to reproduce -- but it would be
good to try, because if you *can* reproduce in that scenario, then
that's a very useful data point.

Best,
Jeff

Jeff Mitchell

unread,
Jan 16, 2017, 11:21:36 AM1/16/17
to vault...@googlegroups.com
Hi Rich,

If I were to add some thing to try to help debug this would you be okay running a build from master?

Best,
Jeff

Rich Fromm

unread,
Jan 16, 2017, 3:45:51 PM1/16/17
to Vault
On Monday, January 16, 2017 at 8:21:36 AM UTC-8, Jeff Mitchell wrote:

If I were to add some thing to try to help debug this would you be okay running a build from master?

My first step is to try this with 0.6.4 (currently running 0.6.1) and maximal logging and see if that helps (I doubt it). I am however slightly blocked on trying that right now due to some totally unrelated networking issues; hopefully those will be fixed very soon.

I don't currently have a Go dev environment set up. I assume doing so won't be too hard? (My desktop is Ubuntu 14.04) I'll look into that, which is a prereq for me to build from master. Assuming I can get that working without too much trouble, sure I'll give it a try.

Are you going to add some debug or trace logging? I have to say I was a little bit surprised at how relatively small the amount of additional logging there is at debug and trace beyond info. I was expecting that trace would be spewing tons of logs, but it doesn't.

Jeff Mitchell

unread,
Jan 16, 2017, 4:14:36 PM1/16/17
to vault...@googlegroups.com
On Mon, Jan 16, 2017 at 3:45 PM, Rich Fromm <rich....@gmail.com> wrote:
> On Monday, January 16, 2017 at 8:21:36 AM UTC-8, Jeff Mitchell wrote:
> My first step is to try this with 0.6.4 (currently running 0.6.1) and
> maximal logging and see if that helps (I doubt it). I am however slightly
> blocked on trying that right now due to some totally unrelated networking
> issues; hopefully those will be fixed very soon.
>
> I don't currently have a Go dev environment set up. I assume doing so won't
> be too hard? (My desktop is Ubuntu 14.04) I'll look into that, which is a
> prereq for me to build from master. Assuming I can get that working without
> too much trouble, sure I'll give it a try.

Sure. Worse comes to worse I can make a build for you.

> Are you going to add some debug or trace logging?

No.

> I have to say I was a
> little bit surprised at how relatively small the amount of additional
> logging there is at debug and trace beyond info. I was expecting that trace
> would be spewing tons of logs, but it doesn't.

We try to ensure we have all errors properly handled so usually when
we need to figure out the cause of something, there is already a
unique error message that we can find that gives relevant details.

We're careful about too much logging because a) too many logs and
nobody will run a higher level anyways since it'll just kill disk; and
b) we don't want things to accidentally creep in that end up being
security-sensitive (e.g. randomly logging certain objects in trace
"just in case", then forgetting about that when we add more fields to
those objects that contain sensitive values).

Best,
Jeff

Rich Fromm

unread,
Jan 16, 2017, 5:04:58 PM1/16/17
to Vault
On Monday, January 16, 2017 at 1:14:36 PM UTC-8, Jeff Mitchell wrote:
On Mon, Jan 16, 2017 at 3:45 PM, Rich Fromm <rich....@gmail.com> wrote:
>
> I don't currently have a Go dev environment set up. I assume doing so won't
> be too hard? (My desktop is Ubuntu 14.04) I'll look into that, which is a
> prereq for me to build from master. Assuming I can get that working without
> too much trouble, sure I'll give it a try.

Sure. Worse comes to worse I can make a build for you.

No need, this gave me a good excuse to set a Go env up. I successfully built hello world, and built vault (and verified that it started up okay as a dev server). I also ran the test suite. I do note that one of the tests failed:

--- FAIL: TestConsulHABackend (29.36s)
    consul_test.go:538: exit status 1
FAIL
FAIL    github.com/hashicorp/vault/physical    44.676s

This is building from f1c8b772fdecdd20f483653d18619d41b25ef934 on master.

Jeff Mitchell

unread,
Jan 16, 2017, 6:33:46 PM1/16/17
to vault...@googlegroups.com
Hi Rich,

Can you please build the "nonce-unseal" branch and then post the output from the unseal attempts?

Thanks,
Jeff
> --
> This mailing list is governed under the HashiCorp Community Guidelines -
> https://www.hashicorp.com/community-guidelines.html. Behavior in violation
> of those guidelines may result in your removal from this mailing list.
>
> GitHub Issues: https://github.com/hashicorp/vault/issues
> IRC: #vault-tool on Freenode
> ---
> You received this message because you are subscribed to the Google Groups
> "Vault" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to vault-tool+...@googlegroups.com.
> To view this discussion on the web visit

Rich Fromm

unread,
Jan 16, 2017, 8:51:05 PM1/16/17
to Vault
On Monday, January 16, 2017 at 3:33:46 PM UTC-8, Jeff Mitchell wrote:

Can you please build the "nonce-unseal" branch and then post the output from the unseal attempts?

 One thing at a time, I have not yet tried deploying the build from the branch. (I will do that next.)

But I thought what I did find was worth reporting. My previous tests were with vault 0.6.1. I tried upgrading to 0.6.4. It is still broken, but the behavior is different.

The third unseal no longer semi-silently ping-pongs back from unseal progress 2 to 1. Instead, the unseal generates a 400, claiming the key is invalid, and goes back to 0.

Local attempt at unsealing:

ubuntu@ip-172-19-130-208:~$ export VAULT_ADDR=http://127.0.0.1:8200
ubuntu@ip-172-19-130-208:~$ curl $VAULT_ADDR/v1/sys/seal-status
{"sealed":true,"t":3,"n":5,"progress":0,"version":"0.6.4"}
ubuntu@ip-172-19-130-208:~$ vault status

Sealed: true
Key Shares: 5
Key Threshold: 3
Unseal Progress: 0
Version: 0.6.4

High-Availability Enabled: false
ubuntu@ip-172-19-130-208:~$ vault unseal

Key (will be hidden):
Sealed: true
Key Shares: 5
Key Threshold: 3
Unseal Progress: 1
ubuntu@ip-172-19-130-208:~$ curl $VAULT_ADDR/v1/sys/seal-status
{"sealed":true,"t":3,"n":5,"progress":1,"version":"0.6.4"}
ubuntu@ip-172-19-130-208:~$ vault status

Sealed: true
Key Shares: 5
Key Threshold: 3
Unseal Progress: 1
Version: 0.6.4

High-Availability Enabled: false
ubuntu@ip-172-19-130-208:~$ vault unseal

Key (will be hidden):
Sealed: true
Key Shares: 5
Key Threshold: 3
Unseal Progress: 2
ubuntu@ip-172-19-130-208:~$ curl $VAULT_ADDR/v1/sys/seal-status
{"sealed":true,"t":3,"n":5,"progress":2,"version":"0.6.4"}
ubuntu@ip-172-19-130-208:~$ vault status

Sealed: true
Key Shares: 5
Key Threshold: 3
Unseal Progress: 2
Version: 0.6.4

High-Availability Enabled: false
ubuntu@ip-172-19-130-208:~$ vault unseal
Key (will be hidden):
Error: Error making API request.

URL: PUT http://127.0.0.1:8200/v1/sys/unseal
Code: 400. Errors:

* Unseal failed, invalid key
ubuntu@ip-172-19-130-208:~$ curl $VAULT_ADDR/v1/sys/seal-status
{"sealed":true,"t":3,"n":5,"progress":0,"version":"0.6.4"}
ubuntu@ip-172-19-130-208:~$ vault status

Sealed: true
Key Shares: 5
Key Threshold: 3
Unseal Progress: 0
Version: 0.6.4

High-Availability Enabled: false
ubuntu@ip-172-19-130-208:~$

Corresponding log with trace level enabled:
(I understand the rationale for keeping the logs down, but no logging at all for the failure strikes me as a little odd.)

Tue Jan 17 01:34:00 2017 - Starting vault
Tue Jan 17 01:34:00 2017 - Vault started
Tue Jan 17 01:34:00 2017 - AWS region is us-west-2

==> Vault server configuration:

                 Backend: s3
                     Cgo: disabled

              Listener 1: tcp (addr: "0.0.0.0:8200", cluster address: "", tls: "disabled")
               Log Level: trace

                   Mlock: supported: true, enabled: false
                 Version: Vault v0.6.4
             Version Sha: f4adc7fa960ed8e828f94bc6785bcdbae8d1b263


==> Vault server started! Log data will stream in below:

2017/01/17 01:34:00.360592 [TRACE] physical/cache: creating LRU cache: size=32768
2017/01/17 01:34:00.360757 [TRACE] cluster listener addresses synthesized: cluster_addresses=[0.0.0.0:8201]
2017/01/17 01:40:38.158257 [DEBUG] core: cannot unseal, not enough keys: keys=1 threshold=3
2017/01/17 01:40:59.768372 [DEBUG] core: cannot unseal, not enough keys: keys=2 threshold=3

I'm pretty sure the problem is not just that I corrupted the 3rd key. I used the default settings when running vault init, so there are 5 keys total. I tried various permutations, and it's always the third one that generates the 400, regardless of which keys I use and in which order.

I will try the branch build tomorrow.

Jeff Mitchell

unread,
Jan 16, 2017, 9:44:47 PM1/16/17
to vault...@googlegroups.com
Hi Rich,

On Mon, Jan 16, 2017 at 8:51 PM, Rich Fromm <rich....@gmail.com> wrote:
> On Monday, January 16, 2017 at 3:33:46 PM UTC-8, Jeff Mitchell wrote:
> One thing at a time, I have not yet tried deploying the build from the
> branch. (I will do that next.)
>
> But I thought what I did find was worth reporting. My previous tests were
> with vault 0.6.1. I tried upgrading to 0.6.4. It is still broken, but the
> behavior is different.
>
> The third unseal no longer semi-silently ping-pongs back from unseal
> progress 2 to 1. Instead, the unseal generates a 400, claiming the key is
> invalid, and goes back to 0.

The branch probably won't provide much more information then, because
what it did was add a nonce to the unseal process so that we could see
if the ping-ponging you were seeing was actually *different* unseal
operations. It seems, based on your new results, that they are --
which is exactly what I'd expect, because the process resets after the
a failed Shamir combination. I just looked again at the 0.6.1 code
responsible for that -- and in fact a diff of 0.6.1 to 0.6.4 for that
code shows nothing at all, because it's literally the same code in the
function. The API part of the function has very minimal changes and
the CLI unseal command is exactly the same. So the behavior you see in
0.6.4 is what is expected, and what you should have seen in 0.6.1,
because it hasn't changed in the interim. Something is very strange.
What's the SHA256 of your 0.6.1 binary?

> I'm pretty sure the problem is not just that I corrupted the 3rd key. I used
> the default settings when running vault init, so there are 5 keys total. I
> tried various permutations, and it's always the third one that generates the
> 400, regardless of which keys I use and in which order.

That would be the case if the unseal keys you have don't match the key
used to encrypt the keyring. There would be three potential causes of
this:

1) Your barrier got rekeyed, and nobody updated your unseal keys -- or
conversely, you rekeyed your barrier, and then reverted back to an old
version but not your old unseal keys. We're working on audit logs of
rekey events (right now it's not supported because we only log
authenticated requests/responses) but you could look at your past
server logs to see if a rekey happened. Not sure how far back you'd
have to go if you haven't unsealed in a long time before this.

2) Some kind of data corruption on the keyring file...maybe the result
of S3 sync not actually working properly, or changing the encoding, or
some such thing. (We've seen this kind of thing in the past when
people did backups with tools that stored data into JSON without
base64'ing it on the way in/out, since some backends store as raw
bytes...backend specific though.)

3) Some kind of issue with the Shamir code itself

At this point I'm not sure which of those might be the culprit, but
the strange behavior of 0.6.1 for you makes me think something is
strange on your end...

Best,
Jeff

Rich Fromm

unread,
Jan 17, 2017, 5:26:35 PM1/17/17
to Vault
OMG, I'm so sorry, this is totally user error on my part.

The tl;dr version is that I was using the wrong set of unseal keys.

Given that, I assume the behavior I'm now seeing with 0.6.4, namely:

1) OK, Unseal Progress: 1
2) OK, Unseal Progress: 2
3) 400, Unseal Progress: 0

that this is expected?

And that for some reason the behavior I was seeing with 0.6.1 is wrong:

1) OK, Unseal Progress: 1
2) OK, Unseal Progress: 2
3) OK, Unseal Progress: 1

even if you can't explain what code diff caused the change.

As to how I ended up with the wrong set of keys, I had a post all prepared to send (in which I showed the progress from the nonce branch) and was justifying that the keys can't be corrupted b/c they're just text stored in a file of notes and I can compare those to other copies (that are in revision control, and have appeared in reviews) and that's how I know they're the same. And that's the point when I realized that what I copied to my notes for my current work was from the wrong set.

The longer detail (which you might not care about, but in my defense) is that I had one set of keys that I used for initial work, and then I generated a new set of keys when I changed the backend from file to S3. And I had been looking at my notes from the S3 conversion when doing my most recent work, but got tired of having to keep looking at a previous set of notes, so I just copied the vault init output to my current notes. But at the moment I did the copy, I was looking at the wrong file of notes containing the wrong set of keys. And the reason that I was able to easily make such a silly mistake is that the notes are stored by Jira issue number, and the two sets of keys are stored in files corresponding to issues that differ only by one digit (3007 vs. 3077). So my bad. Sorry.

I don't know how to explain the 0.6.1 behavior, and I suspect it's not worth exploring that much further if the 0.6.4 behavior is as expected. The SHA256 of my binaries are listed below:

# 0.6.1
ubuntu@ip-172-19-129-10:~$ vault version
Vault v0.6.1
ubuntu@ip-172-19-129-10:~$ sha256sum `which vault`
25efc88563c68600dff526d72c1518280f0d427762b62eeef76dba120ab3ca5d  /usr/bin/vault

# 0.6.4
ubuntu@ip-172-19-130-208:~$ vault version
Vault v0.6.4 ('f4adc7fa960ed8e828f94bc6785bcdbae8d1b263')
ubuntu@ip-172-19-130-208:~$ sha256sum `which vault`
65d695492dfde004c91b6c4ae938079d97d9c026ec3a2a5951720a70cd5086ba  /usr/bin/vault

ubuntu@ip-172-19-131-248:~$ vault version
Vault v0.6.4 ('2adc70378b07273ef14456a614849889f7bc4bad')
# 2adc70378b07273ef14456a614849889f7bc4bad from nonce-unseal branch
ubuntu@ip-172-19-131-248:~$ sha256sum `which vault`
a26c4ef9450e4ab344d92640a54407321c5f4107da8639e7ead1601eaf79e85c  /usr/bin/vault

I see that https://releases.hashicorp.com/vault/0.6.1/vault_0.6.1_SHA256SUMS only gives SHA256 values for the zip files, not for the binary within the zip file. However, I did try downloading https://releases.hashicorp.com/vault/0.6.1/vault_0.6.1_linux_amd64.zip, and the SHA256 matched what was listed on that page (4f248214e4e71da68a166de60cc0c1485b194f4a2197da641187b745c8d5b8be), and then I unziped it and the SHA256 of the binary matched what I had deployed (25efc88563c68600dff526d72c1518280f0d427762b62eeef76dba120ab3ca5d).

So I think the bottom line is that unless you want to explore further what changed between 0.6.1 and 0.6.4, I'm good now.

Again, I'm so sorry for wasting your time.

At least I have a working Go environment now, though. :)  Maybe that will motivate me to consider actual code contributions in the future.

Jeff Mitchell

unread,
Jan 17, 2017, 5:41:22 PM1/17/17
to vault...@googlegroups.com
Hi Rich,

No problem. To be honest, I'm quite glad to be able to put a notch in
the correct column under "unseal keys legitimately stop working" vs.
"user error". (For the record, there is nothing that has ever been
substantiated in the first column, and many notches in the second :-)
)

However, this report did push me to spend time doing something I've
been meaning to do for a long time, which was to convert all of the
unit tests to use multiple test parts instead of a single part (single
parts don't get split and recombined by Shamir). To be clear, we did
have many unit tests exercising Shamir on every run, but this now
means *every* test exercises it. Because defense in depth!

Best,
Jeff
> --
> This mailing list is governed under the HashiCorp Community Guidelines -
> https://www.hashicorp.com/community-guidelines.html. Behavior in violation
> of those guidelines may result in your removal from this mailing list.
>
> GitHub Issues: https://github.com/hashicorp/vault/issues
> IRC: #vault-tool on Freenode
> ---
> You received this message because you are subscribed to the Google Groups
> "Vault" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to vault-tool+...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/vault-tool/04542a69-e7c3-41c5-86ce-dbfeef0efd62%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages