Jira (PDB-1883) Allow Blacklisting of Resources With Certain Tags That Should Not Be Stored In PuppetDB

Nick Walker (JIRA)

unread,

Aug 7, 2015, 1:26:06 PM8/7/15

to puppe...@googlegroups.com

Nick Walker created an issue

PuppetDB /

PDB-1883

Allow Blacklisting of Resources With Certain Tags That Should Not Be Stored In PuppetDB

Issue Type:	New Feature
Assignee:	Unassigned
Created:	2015/08/07 10:25 AM
Labels:	tcse
Priority:	Normal
Reporter:	Nick Walker

Sometimes I manage files (or other resources) that contain passwords and I'd like a way to prevent those resources from being stored in PuppetDB to prevent accidental exposure.

The way I imagine using this is that I'd put a specific tag on any resources I didn't want to store say a "DO NOT STORE" tag. Then on the PDB side, there would be a configurable list of tags that should not be stored and when parsing the catalog before putting it into the database these resources would be removed.

Add Comment

This message was sent by Atlassian JIRA (v6.4.5#64020-sha1:78acd6c)

Nick Walker (JIRA)

unread,

Aug 7, 2015, 1:39:06 PM8/7/15

to puppe...@googlegroups.com

Nick Walker updated an issue

PuppetDB /

PDB-1883

Allow Blacklisting of Resources With Certain Tags That Should Not Be Stored In PuppetDB

Change By:	Nick Walker

Sometimes I manage files (or other resources) that contain passwords and I'd like a way to prevent those resources from being stored in PuppetDB to prevent accidental exposure.

The way I imagine using this is that I'd put a specific tag ( or a new meta-attribute) on any resources I didn't want to store say a "DO NOT STORE" tag . Then before sending to PDB or on the PDB side, there would be a configurable list of tags that should not be stored and when parsing the catalog before putting it into the database these resources would be removed.

Add Comment

Kenneth Barber (JIRA)

unread,

Aug 7, 2015, 1:49:08 PM8/7/15

to puppe...@googlegroups.com

Kenneth Barber commented on

PDB-1883

Re: Allow Blacklisting of Resources With Certain Tags That Should Not Be Stored In PuppetDB

Can you provide some non-contrived examples? Extra points for manifest code and desired outcome, perhaps how you see it displayed later in the report/catalog (or how it's not, if we're that brutal).

I'm also curious about filtering on fields versus entire resources, seems to me we could preserve some data, just hide certain private fields - at least that feels like a good granularity you would want to aim for with such a solution. Otherwise you might lose the graph connectivity, which is a shame.

Also, what about hiding the data, but letting the consumer know it was hidden. This feels like a good distinction for a consumer of the data later on, that is - there is just a field/resource place holder saying that this "particular" piece of data was sensitive, so not shown perhaps.

Its a shame I can't see a good generic way of doing this implicitly, some types have fields that contain both sensitive & insensitive data depending on the context - however in pure ruby resource types that are custom, this is easier (like postgresql passwords, if maintained using a pure ruby resource).

This problem smells of similar problems we've had with anonymization, whereby we try to forensically "sniff out" sensitive data, falling back to assuming if we don't know, perhaps it's all sensitive etc. etc.

Add Comment

Kenneth Barber (JIRA)

unread,

Aug 7, 2015, 1:52:05 PM8/7/15

to puppe...@googlegroups.com

Kenneth Barber commented on

PDB-1883

Re: Allow Blacklisting of Resources With Certain Tags That Should Not Be Stored In PuppetDB

Depending on the solution, PuppetDB terminus might not be the right place to do all this work. We are just consumers of data from Puppet, I would ask if it's wiser to place any implementation there instead, since the catalog & report interfaces are general, and we are just 1 consumer of that. Without knowing the solution however, end-to-end I'm not sure.

Add Comment

Nick Walker (JIRA)

unread,

Aug 10, 2015, 7:51:04 PM8/10/15

to puppe...@googlegroups.com

Nick Walker commented on

PDB-1883

Re: Allow Blacklisting of Resources With Certain Tags That Should Not Be Stored In PuppetDB

Kenneth Barber This ticket is effectively asking for a shorter term solution than PE-1387. Instead of full on anonymization, I'm looking for a quicker workaround to make sure my passwords aren't stored in clear text.

The basic use case is that I use hiera-eyaml to store my password, then I put the result of the hiera-eyaml lookup into a resource attribute and it shows up in PuppetDB now for anyone to see from the resources endpoint.

Seems like maybe we'd want some sort of meta-parameter on each resource that would tell Puppet which attributes on that resource to obfuscate before storing in PuppetDB? Seems like that's what you were alluding to in your first comment.

I'm also curious about filtering on fields versus entire resources, seems to me we could preserve some data, just hide certain private fields - at least that feels like a good granularity you would want to aim for with such a solution. Otherwise you might lose the graph connectivity, which is a shame.

Yea, upon further thought, I think maintaining graph connectivity is a requirement for any solution we'd imagine.

Add Comment

Kenneth Barber (JIRA)

unread,

Aug 12, 2015, 5:53:09 PM8/12/15

to puppe...@googlegroups.com

Kenneth Barber commented on

PDB-1883

Re: Allow Blacklisting of Resources With Certain Tags That Should Not Be Stored In PuppetDB

> Kenneth Barber This ticket is effectively asking for a shorter term solution than PDB-1387. Instead of full on anonymization, I'm looking for a quicker workaround to make sure my passwords aren't stored in clear text.

I know that you don't want anonymization here . Thats not my point, it's just that the problems are related - ie. field sensitivity would be great to know at the schema level for both problems, instead of trying to match it much later. There is a proper solution sitting here, I'd sleep better if we knew what that was, even if the short term was something dodgy. Creating a list of resources & parameters that must be filtered is certainly one solution though.

> Yea, upon further thought, I think maintaining graph connectivity is a requirement for any solution we'd imagine.

So yeah, you probably don't want to filter resources per se, but parameters of resources I believe.

So lets take about that possibility first:

So ... hiera-eyaml knows what fields are encrypted or not ... can you see a way to let Puppet know this information as well somehow? I'm guessing not - because hiera is just a big bag of variables that can be accessed using a function then that value gets passed around ... unless people specifically follow hiera/class creation patterns explicitly perhaps ...

Still - it could be that in the class/resource/ruby type & provider we are able to mark fields as sensitive through the DSL (and type .rb file for the ruby type part). This would be a more mid/long-term solution perhaps.

That would certainly save having to build up a completely different list of resources/parameter combinations that we have to exclude/whatever somewhere else.

To explain Nick Walker why I'm going on about this ... one of the concerns I have about this is that the whole mechanism for deciding what to encode becomes an action-at-a-distance anti-pattern. It just becomes very easy for someone to encrypt something new in hiera-eyaml, but forget to put it in some static list that we create for this purpose thus exposing themselves accidentally ...

Another perspective is also like anonymization does to some degree, whereby its not just a 'list' of items, but there is support for general regexp matching for things that fall through, check out how this works in that code for example:

https://github.com/puppetlabs/puppetdb/blob/master/src/puppetlabs/puppetdb/cli/anonymize.clj#L69-L70

So here we just presume anything that matches 'password-like' words gets anon'd by default (that is anything matching: /password/, /pwd/, /secret/, /key/ or /private/), anything above and beyond that would need manual specification. This probably is reasonable bet ... having said that, it might wipe too much which is not great, we'd have to leave that kind of thing configurable.

This other problem here - is that some passwords do want to be exported and collected. How would you see us handling that case? We know what resources are 'exported' already, do we need different policies for exported stuff versus non-exported.

Going back to your 'tag' filtering idea ...

This becomes a filter on resources then - and then perhaps we add a way to pass a list of tags. Now I'm going to say straight up this is going to break the catalog graph if we remove a resource like this. I don't think we can just reconnect these edges to something else to fake this. If we just remove the resource, we'll have to do something about that loss of connectivity otherwise tools like catalog view, catalog diff, all the magical stuff we are doing with catalogs in PE basically.

Another idea is perhaps we want to retain the resource and just perhaps clear all its parameters or something like that. So yeah, we could just wipe the parameters in that case instead, or wipe the values only, retaining the 'keys'. This all sounds a bit icky, right? Or is it just me?

Henrik Lindberg - a penny for your thoughts perhaps ... I'm curious if this "type" of solution would double up as a solution for any other problems you've heard of relating to security/sensitivity. That is, has this kind of thing come up before - that you know of?

Add Comment

Henrik Lindberg (JIRA)

unread,

Aug 31, 2015, 8:55:04 AM8/31/15

to puppe...@googlegroups.com

Henrik Lindberg commented on

PDB-1883

Re: Allow Blacklisting of Resources With Certain Tags That Should Not Be Stored In PuppetDB

Kenneth Barber As I see it, it is difficult to do anything when all attributes are untyped (as they are now) and typically are encoded as strings. This leads to heuristics (based on the names of attributes, or similar; e.g. for 'password'). When we do have typed attributes I imagine that we add a type Encrypted to the type system, and that this type has type parameters that further detail the properties of the encryption - e.g. 'searchability' (in full, using a pattern, using full encrypted representation == only, etc. only if authorized, etc.), information about the encryption (encryption algorithm), and representation (string, binary).

I imagine the encrypted data type to have two values; the encrypted which is used for external representation and clear text which is used "in memory" for computations. I can also imagine the clear text to be unavailable if logic is not allowed to operate on it.

Needless to say, these ideas need further thought. It may be worth searching for papers on the topic - I know this is a common problem in systems that deal with classified information - there they also have an issue where single values in isolation may be ok, but not in aggregation / association with other elements. Not sure if we ever have such scenarios; but we may have something similar in that it would be ok to get a single clear text value, but not a list of all.