Hi Rich,
On Mon, Jan 16, 2017 at 8:51 PM, Rich Fromm <
rich....@gmail.com> wrote:
> On Monday, January 16, 2017 at 3:33:46 PM UTC-8, Jeff Mitchell wrote:
> One thing at a time, I have not yet tried deploying the build from the
> branch. (I will do that next.)
>
> But I thought what I did find was worth reporting. My previous tests were
> with vault 0.6.1. I tried upgrading to 0.6.4. It is still broken, but the
> behavior is different.
>
> The third unseal no longer semi-silently ping-pongs back from unseal
> progress 2 to 1. Instead, the unseal generates a 400, claiming the key is
> invalid, and goes back to 0.
The branch probably won't provide much more information then, because
what it did was add a nonce to the unseal process so that we could see
if the ping-ponging you were seeing was actually *different* unseal
operations. It seems, based on your new results, that they are --
which is exactly what I'd expect, because the process resets after the
a failed Shamir combination. I just looked again at the 0.6.1 code
responsible for that -- and in fact a diff of 0.6.1 to 0.6.4 for that
code shows nothing at all, because it's literally the same code in the
function. The API part of the function has very minimal changes and
the CLI unseal command is exactly the same. So the behavior you see in
0.6.4 is what is expected, and what you should have seen in 0.6.1,
because it hasn't changed in the interim. Something is very strange.
What's the SHA256 of your 0.6.1 binary?
> I'm pretty sure the problem is not just that I corrupted the 3rd key. I used
> the default settings when running vault init, so there are 5 keys total. I
> tried various permutations, and it's always the third one that generates the
> 400, regardless of which keys I use and in which order.
That would be the case if the unseal keys you have don't match the key
used to encrypt the keyring. There would be three potential causes of
this:
1) Your barrier got rekeyed, and nobody updated your unseal keys -- or
conversely, you rekeyed your barrier, and then reverted back to an old
version but not your old unseal keys. We're working on audit logs of
rekey events (right now it's not supported because we only log
authenticated requests/responses) but you could look at your past
server logs to see if a rekey happened. Not sure how far back you'd
have to go if you haven't unsealed in a long time before this.
2) Some kind of data corruption on the keyring file...maybe the result
of S3 sync not actually working properly, or changing the encoding, or
some such thing. (We've seen this kind of thing in the past when
people did backups with tools that stored data into JSON without
base64'ing it on the way in/out, since some backends store as raw
bytes...backend specific though.)
3) Some kind of issue with the Shamir code itself
At this point I'm not sure which of those might be the culprit, but
the strange behavior of 0.6.1 for you makes me think something is
strange on your end...
Best,
Jeff