Vault 0.5.0 and Amazon Aurora

107 views
Skip to first unread message

Jerry Walling

unread,
Feb 25, 2016, 11:02:06 AM2/25/16
to Vault
We have been working on standing up Vault on approximately 700 MySQL Servers (prd/non-prod). These servers are a mix of Percona, Amazon RDS MySQL, and Amazon Aurora. With Percona and RDS, things seem to be going very well, but not so much with Aurora. One issue we were having has to deal with clusters. Aurora has a concept of a writer and many readers. Only one node can actually write to the database at a time "the writer" - and what makes this even more difficult is that Amazon can and does move the writer around the cluster. The problem this presents has to deal with the way connections are stored in vault. Traditionally, we where storing each node in Vault. When we would request a connection, Vault would connect to the node, create the user, and return it with the corresponding password. With Aurora, you cannot create a user on a reader. It would return:

{
 "errors": [
   "Error 1290: The MySQL server is running with the --read-only option so it cannot execute this statement"
 ]
}

So, to work around this, we store the cluster endpoint in Vault for each participating node in the cluster. Amazon always points this endpoint at the writer. This seems to OK for the most part. We are seeing other weirdness however that we have not been able to work around. The first issue is connections to the database. We are not sure what happened, but over time, Vault had created 230+ connections to an Aurora server. All connections were live and in a sleep state. This is problematic in that it has the potential to exhaust all available connections to the database. When we attempted to DELETE the mount from Vault, we received the following error:

{
 "errors": [
   "failed to revoke 'mysql_servers/our_server_name.amazonaws.com/creds/dba/121ac32a-2d66-f671-ce98-5599749c77a7' (1 / 2): failed to revoke entry: Error 1269: Can't revoke all privileges for one or more of the requested users"
 ]
}

So far, we have been unsuccessful in dropping the mount. The permissions that have been set up for the Vault user are:

GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, RELOAD, PROCESS, REFERENCES, INDEX, ALTER, SHOW DATABASES, CREATE TEMPORARY TABLES, LOCK TABLES, EXECUTE, REPLICATION SLAVE, REPLICATION CLIENT, CREATE VIEW, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, CREATE USER, EVENT, TRIGGER ON *.* TO ...WITH GRANT OPTION

These are the same permissions we use on the other MySQL instances (RDS, Percona)...We even went as far as to drop the Vault user from the database, and when we attempt to DELETE the mount, we receive: 

{
"errors": 
[
1]
0:  "failed to revoke 'mysql_servers/our_server_name..amazonaws.com/creds/dba/121ac32a-2d66-f671-ce98-5599749c77a7' (1 / 2): failed to revoke entry: Error 1045: Access denied for user 'vault_user'@'10.168.20.138' (using password: YES)"
-
}


Why does Vault need to check the database? Is it trying to clean up users it created? In our case, there were no Vault-created users in the database.

Two requests we have coming from our findings so far:

1. Allow unconditional deleting of mounts, or at lease the ability to force drop the mount.
2. Provide an option as to whether or not Vault maintains persistent connections for a given mount.

One question we have:

1. How can we drop this mount from Vault?

We understand that Hashicorp did not have Aurora in mind when they designed Vault. We will continue to test the product with Aurora and provide our findings to help solidify on this front.

Regards,
Jerry

Michael Fischer

unread,
Feb 25, 2016, 11:17:36 AM2/25/16
to vault...@googlegroups.com
On Thu, Feb 25, 2016 at 7:59 AM, Jerry Walling <jerry....@pearson.com> wrote:

Two requests we have coming from our findings so far:

1. Allow unconditional deleting of mounts, or at lease the ability to force drop the mount.
2. Provide an option as to whether or not Vault maintains persistent connections for a given mount.

These are great suggestions.  As for (2), persistent connections are useful, so I wouldn't want to eliminate them altogether, but it sounds like the connection pool size Vault is using might be unbounded or set too high. 

Can you file GitHub issues for these so we can discuss further there?

--Michael

Jerry Walling

unread,
Feb 25, 2016, 12:06:24 PM2/25/16
to Vault
Issue# 1132

Regards, 
Jerry

Jeff Mitchell

unread,
Feb 25, 2016, 12:21:56 PM2/25/16
to vault...@googlegroups.com
Hi Jerry,

I'm assuming this is the follow-up from our earlier conversation :-)
Some comments:

On Thu, Feb 25, 2016 at 11:02 AM, Jerry Walling
<jerry....@pearson.com> wrote:
> So, to work around this, we store the cluster endpoint in Vault for each
> participating node in the cluster. Amazon always points this endpoint at the
> writer. This seems to OK for the most part. We are seeing other weirdness
> however that we have not been able to work around. The first issue is
> connections to the database. We are not sure what happened, but over time,
> Vault had created 230+ connections to an Aurora server. All connections were
> live and in a sleep state. This is problematic in that it has the potential
> to exhaust all available connections to the database.

Vault uses normal connection pools from Go's SQL support. I can't say
whether or not they are unbounded by default, but it does seem that
way.

> Why does Vault need to check the database? Is it trying to clean up users it
> created? In our case, there were no Vault-created users in the database.
>
> Two requests we have coming from our findings so far:
>
> 1. Allow unconditional deleting of mounts, or at lease the ability to force
> drop the mount.

Good idea. Please file an issue for this.

> 2. Provide an option as to whether or not Vault maintains persistent
> connections for a given mount.

Please file an issue for this. Right now you can control the number of
open and idle connections in the MySQL backend but there is no way to
say that you want no connection pooling at all.

> One question we have:
>
> 1. How can we drop this mount from Vault?

Right now you can't, because for security reasons Vault won't let
mounts be dropped that have pending revocations. By dropping the Vault
user from the database you have ensured that this can never be
resolved. So the first step would be to restore the user in the
database, and then to work through the other issues. If you have a
Vault cluster, you could try moving the active node around (by sealing
the current active node); when the new active node takes over and
handles a request it will resolve the address of the MySQL server at
that time, ideally leading it to the current leader node in Aurora.

Thanks,
Jeff
Message has been deleted

Jerry Walling

unread,
Feb 25, 2016, 3:02:14 PM2/25/16
to Vault
Jeff,

What is the vault_user trying to execute when we attempt to DELETE the MySQL mount? So far, we have recreated the vault_user in the "mounted" database with the following permissions:

SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, RELOAD, PROCESS, REFERENCES, INDEX, ALTER, SHOW DATABASES, CREATE TEMPORARY TABLES, LOCK TABLES, EXECUTE, REPLICATION SLAVE, REPLICATION CLIENT, CREATE VIEW, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, CREATE USER, EVENT, TRIGGER ...WITH GRANT OPTION

Note: these are the maximum permissions allowed for a user in Aurora (and MySQL RDS). 
When we attempt to DELETE the mount, we still receive the following error:

{
 "errors": [
   "failed to revoke 'mysql_servers/our_server.amazonaws.com/creds/dba/121ac32a-2d66-f671-ce98-5599749c77a7' (1 / 2): failed to revoke entry: Error 1269: Can't revoke all privileges for one or more of the requested users"
 ] }

Thanks!
-j

Jeff Mitchell

unread,
Feb 25, 2016, 3:06:59 PM2/25/16
to vault...@googlegroups.com
Hi Jerry,

When you attempt to delete a mount in Vault, Vault revokes all leases
associated with the mount. It basically does a listing of all leases
issued for that mount, revokes them in a loop, and on success,
unmounts. So it's running the same SQL used during a manual revocation
(or during expiration).

This is a security feature; if it didn't do that, unmounting a mount
would cause all of the dynamic secrets to never be revoked,
potentially allowing those credentials to be usable forever.

--Jeff
> --
> This mailing list is governed under the HashiCorp Community Guidelines -
> https://www.hashicorp.com/community-guidelines.html. Behavior in violation
> of those guidelines may result in your removal from this mailing list.
>
> GitHub Issues: https://github.com/hashicorp/vault/issues
> IRC: #vault-tool on Freenode
> ---
> You received this message because you are subscribed to the Google Groups
> "Vault" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to vault-tool+...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/vault-tool/c065f06a-41ca-46cb-809b-23e111e17e4d%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.

Jeff Mitchell

unread,
Feb 25, 2016, 3:25:02 PM2/25/16
to vault...@googlegroups.com
Note: I've filed https://github.com/hashicorp/vault/issues/1135
against force-unmounting.

--Jeff

Jerry Walling

unread,
Feb 25, 2016, 3:59:08 PM2/25/16
to Vault
Jeff,
So this is what we have been able to figure out so far.

MySQL, Aurora, RDS, etc. execute operations in the following order when this statement is executed:
(believe this is the code that Vault is running when we attempt to drop the mount:

https://github.com/hashicorp/vault/blob/47309289ae8f53ff97be17a4ab21b4a5afa317ef/builtin/logical/mysql/secret_creds.go#L73

1. Determine if the user attempting the REVOKE has privileges to perform the revoke being requested. EG: REVOKE SELECT ON *.* FROM BLAH_USER
   A. If so, determine if the user being REVOKED Has that privilege EG: BLAH_USER have SELECT permissions. If not, throw the following error:
       Code: 1141 SQL State: 42000 --- There is no such grant defined for user 'blah_user' on host '%'
   B. If BLAH_USER has SELECT, but the REVOKING user does not have privileges to perform the REVOKE, the following error is thrown:
       Code: 1045 SQL State: 28000 --- Access denied for user 'blah_user'@'%' (using password: YES)
   C. If the REVOKING user has permissions and the user being REVOKE'd from has the permission being REVOKE'd, MySQL then checks to see if the user being REVOKE'd from exists.
       aa. If the user exists, the revoked is performed (no error)
       bb. If the user does not exist, throw the following error:
             Code: 1141 SQL State: 42000 --- There is no such grant defined for user 'blah_user' on host '%'  

There is a special case, however, when you attempt to execute:
REVOKE ALL PRIVILEGES, GRANT OPTION FROM user
In this case, we receive the error:
Code: 1269 SQL State: HY000 --- Can't revoke all privileges for one or more of the requested users

This statement will never work in Aurora because Aurora does not allow you to create a user with ALL PRIVILEGES, so ALL PRIVILEGES cannot be revoked.

This kind of falls back on a conversation you and I had a couple months back. If Hashicorp makes the revoke syntax configurable (as the CREATE USER/GRANT) has been defined, we can avoid the problem of not being able to DELETE the mount.

I hope all this babble makes some sense ;)

Regards,
Jerry

Jeff Mitchell

unread,
Feb 25, 2016, 4:25:33 PM2/25/16
to vault...@googlegroups.com
On Thu, Feb 25, 2016 at 3:59 PM, Jerry Walling
<jerry....@pearson.com> wrote:
> There is a special case, however, when you attempt to execute:
>
> REVOKE ALL PRIVILEGES, GRANT OPTION FROM user
>
> In this case, we receive the error:
>
> Code: 1269 SQL State: HY000 --- Can't revoke all privileges for one or more
> of the requested users
>
>
> This statement will never work in Aurora because Aurora does not allow you
> to create a user with ALL PRIVILEGES, so ALL PRIVILEGES cannot be revoked.

Doh, haha.

> This kind of falls back on a conversation you and I had a couple months
> back. If Hashicorp makes the revoke syntax configurable (as the CREATE
> USER/GRANT) has been defined, we can avoid the problem of not being able to
> DELETE the mount.

Yep -- it's going to be.

--Jeff
Reply all
Reply to author
Forward
0 new messages