Revised SCAP data collection architecture specification

39 views
Skip to first unread message

Charles Schmidt

unread,
Jun 10, 2020, 10:57:17 AM6/10/20
to scap-dev-endpoint
Hi all,

I have posted a revised data collection architecture summary. I believe that this pulls in the last few months of discussions (but there were a lot of those so please check me on this). I also added a section on open questions.

Comments are welcome.

Thanks,
Charles
SCAP v2 Data Collection Architecture 20200610.docx

Charles Schmidt

unread,
Jul 29, 2020, 2:56:49 PM7/29/20
to scap-dev-endpoint
Hello all,

Just a reminder - please complete your review of the architecture document and provide any feedback. The hope is that we will be able to wrap up this document and its higher level characterization of the design and turn our attention to fleshing out some of the details of design elements.

Thanks,
Charles

Adam Montville

unread,
Jul 30, 2020, 8:42:55 AM7/30/20
to Charles Schmidt, scap-dev-endpoint
Charles,

Thank you for the reminder. I could use a few extra days to get to this.

Kind regards,

Adam

--
To unsubscribe from this group, send email to scap-dev-endpo...@list.nist.gov
Visit this group at https://list.nist.gov/scap-dev-endpoint
---
To unsubscribe from this group and stop receiving emails from it, send an email to scap-dev-endpo...@list.nist.gov.

David Kemp

unread,
Jul 31, 2020, 9:25:42 AM7/31/20
to Adam Montville, Charles Schmidt, scap-dev-endpoint
I had one big question and some minor comments.  The big one: is Managed Asset Identity Management in or out of scope?  It is a critical dependency for creating assessment instructions and asset bindings, and we can't assume it already exists.  If building the asset database is out of scope, then it must at least be shown as an external interface, the way "Application" is the interface to receive assessment instructions.

The other comments are just stream-of-consciousness thoughts.

Let me know if you can't see the comments, I can attach a word file if needed.   https://docs.google.com/document/d/1DwT4noHSEfKYmg1qvySk8ufS5eJ4q19LUVbsaAXlC8w

Regards,
Dave

David Solin

unread,
Jul 31, 2020, 9:50:27 AM7/31/20
to David Kemp, Adam Montville, Charles Schmidt, scap-dev-endpoint
I don’t see any comments on the Google Doc, Dave.

I’m not entirely sure that I grasp your question, however, I have given some thought to enterprise CMDBs (in an ITIL sense) and how they relate to the SCAPv2 architecture.  The Repository component was actually formerly labeled the “CMDB”.  We changed this for a couple of reasons, those I recall being:

1) We didn’t want to make adoption of SCAPv2 a political issue that revolved around control of the CMDB.
2) We didn’t want to imply that a single component — least of all the SCAP data repository — should fulfill every purpose of an enterprise CMDB.

Typically in a very large enterprise, management applications each have their own working data repositories, and those repositories are periodically synchronized with a CMDB via an ETL process.  The essential function of this synchronization process becomes asset identity management, which is to say, the CMDB should be able to reference asset identities across all the systems management products, so that services, SLAs and organizational associations made within the CMDB can be mapped to assets as they are represented in the various working repositories.

This makes it possible for someone to call the helpdesk complaining about an issue with the “order entry application”, and make it possible for IT to then look at events related to that business service in the enterprise performance monitoring system, the enterprise patch management system, the enterprise change management system, etc.  The correlation capability also makes it possible to implement other use-cases as well, such as determining compliance with some standard for a particular organizational unit, implementing departmental charge-back accounting for IT services, maintenance of SLAs across business applications, etc.

Is this what you mean by Managed Asset Identity Management?

If so, then I think you raise a good point.  We should probably introduce a dotted bi-directional line between the Repository and an external enterprise CMDB.  Having such a dotted line would imply that SCAPv2 activities can also be managed and correlated with respect to records represented in a CMDB.  However, I would say that any detailed specification of that dotted line is probably outside the scope of the SCAPv2 architecture.  In a sense, this is similar to the line connecting the Collector component (and its extensions) and so-called PCEs (Posture Collection Engines) — we point out there is a connection, but specifying that connection in detail lies outside the scope of the SCAPv2 architecture.

Does that make sense?

Best regards,
—David Solin

David Kemp

unread,
Jul 31, 2020, 10:54:06 AM7/31/20
to David Solin, Adam Montville, Charles Schmidt, scap-dev-endpoint
David,

Yes, the distributed CMDB is exactly the sort of thing I was thinking of.  And like the Collector, I think there would need to be a CMDB box, not just a dashed line, to translate Enterprise-specific asset identities into SCAP-standard identities used in the Repository, assessment instructions used by the Manager, and Bound Asset Lists.  As this is a design architecture not a system architecture, the "translation" might be implemented as ETL to a new repository or just views into existing enterprise repositories.

I'll have to look at the Google Docs permissions to figure out why comments aren't visible, but here's the file.
SCAP v2 Data Collection Architecture 20200610-dpk.docx

David Solin

unread,
Jul 31, 2020, 11:18:53 AM7/31/20
to David Kemp, Adam Montville, Charles Schmidt, scap-dev-endpoint
Hi Dave,

That makes perfect sense — except I know that with the BMC/Remedy CMDB product, the ID reconciliation function is actually performed within the CMDB itself.  So even the box you are describing is notional.  I also didn’t mean for my concrete example to imply that we should specify an ETL process be used.  Indeed, we should just generically indicate that an “integration” of some kind could optionally exist.  I say optionally because I also do not necessarily think we must require that enterprises perform this integration — after all not everyone really has a CMDB — but we do want to indicate this kind of integration is useful.

So, perhaps an arrow to a box representing an otherwise unspecified “CMDB integration” of some kind what we need.  However I’ll add that, if we get to the point of describing a schema for our repository, we will want to have a field for a CMDB identifier for every asset, to enable the kind of bidirectional integration you have described.

And, I’ll check out your doc, thanks!

Best regards,
—David

<SCAP v2 Data Collection Architecture 20200610-dpk.docx>

David Kemp

unread,
Jul 31, 2020, 11:46:12 AM7/31/20
to David Solin, Adam Montville, Charles Schmidt, scap-dev-endpoint
Hi All,

I'm looking at the architecture, and particularly the three "SCAP Prototype Architecture" 8 July slides, from the perspective of an OpenC2 Proof of Concept experiment.  So we're looking now, not later, at a prototype schema for the repository and the messages that flow through the architecture.   It's a matter of successive refinement - the first thing to do is collect one piece of information from one asset in the simplest way possible.  That requires placeholders for asset ids, collector scopes, assessment instructions, etc.  Finding all of those placeholders was my myopic look at the architecture :-).

When we can collect one SBOM blob from one asset using a minimum viable product, the schemas and interaction sequences can then blossom into something suitable for various kinds of real enterprises.

Dave

David Kemp

unread,
Aug 5, 2020, 4:45:37 PM8/5/20
to Charles Schmidt, scap-dev-endpoint
Charles,

Feel free to skim over or disregard the comments I previously shared; after today's meeting most of them have been answered.

A couple of new observations:
* Roles / Repository: The architecture should explicitly state that the repository contains collections annotated with the specific assessment instructions used to create them.  The idea that an application can reuse a particular *request*, not just data in general cached from previous requests, is important, as is the specific content of the annotations.

* Roles / Message Fabric: Although the message fabric or a shim just above it can provide reliable message delivery, application errors also need to be addressed.  A Collector that receives instructions that it does not understand is an error, as opposed to instructions that are valid but out of scope.  The sender needs to be aware that something is wrong and re-submitting won't help.  This is needed to satisfy the "be transparent" design requirement.

* Supporting Data Sets / Collector Scopes:  "a list of assets about which the Collector can collect" looks like a duplicate of Bound Asset Lists: "a set of assets it [the Collector] is capable of assessing".  If they are the same, one can be deleted.  If they are different, the distinction should be explained.

* Collector/PCX capabilities: A note that "check system" refers to an individual component, not a class or category of check systems, would be helpful.

* Applications need to "implement a standardized interface in order to interact with the data collection architecture" and.also be able to construct report requests applicable to a particular enterprise.  A "query system" command or other information needed by applications, beyond the standardized interface, should be included in the architecture.

Regards,
David Kemp

--

Charles Schmidt

unread,
Aug 5, 2020, 6:23:53 PM8/5/20
to scap-dev-endpoint
Thanks, Dave. Good points again. I will try to capture them in the next revision. I'll reach out if I need any clarification.

Thanks,
Charles

On Wednesday, August 5, 2020 at 3:45:37 PM UTC-5, David Kemp wrote:
Charles,

Feel free to skim over or disregard the comments I previously shared; after today's meeting most of them have been answered.

A couple of new observations:
* Roles / Repository: The architecture should explicitly state that the repository contains collections annotated with the specific assessment instructions used to create them.  The idea that an application can reuse a particular *request*, not just data in general cached from previous requests, is important, as is the specific content of the annotations.

* Roles / Message Fabric: Although the message fabric or a shim just above it can provide reliable message delivery, application errors also need to be addressed.  A Collector that receives instructions that it does not understand is an error, as opposed to instructions that are valid but out of scope.  The sender needs to be aware that something is wrong and re-submitting won't help.  This is needed to satisfy the "be transparent" design requirement.

* Supporting Data Sets / Collector Scopes:  "a list of assets about which the Collector can collect" looks like a duplicate of Bound Asset Lists: "a set of assets it [the Collector] is capable of assessing".  If they are the same, one can be deleted.  If they are different, the distinction should be explained.

* Collector/PCX capabilities: A note that "check system" refers to an individual component, not a class or category of check systems, would be helpful.

* Applications need to "implement a standardized interface in order to interact with the data collection architecture" and.also be able to construct report requests applicable to a particular enterprise.  A "query system" command or other information needed by applications, beyond the standardized interface, should be included in the architecture.

Regards,
David Kemp

On Wed, Jul 29, 2020 at 2:56 PM Charles Schmidt <schmidt...@gmail.com> wrote:
Hello all,

Just a reminder - please complete your review of the architecture document and provide any feedback. The hope is that we will be able to wrap up this document and its higher level characterization of the design and turn our attention to fleshing out some of the details of design elements.

Thanks,
Charles

On Wednesday, June 10, 2020 at 9:57:17 AM UTC-5, Charles Schmidt wrote:
Hi all,

I have posted a revised data collection architecture summary. I believe that this pulls in the last few months of discussions (but there were a lot of those so please check me on this). I also added a section on open questions.

Comments are welcome.

Thanks,
Charles

--
To unsubscribe from this group, send email to scap-dev...@list.nist.gov

Visit this group at https://list.nist.gov/scap-dev-endpoint
---
To unsubscribe from this group and stop receiving emails from it, send an email to scap-dev...@list.nist.gov.

Charles Schmidt

unread,
Aug 17, 2020, 5:01:58 PM8/17/20
to scap-dev-endpoint
Hi Dave K.,

I'm wrapping up my revisions to the SCAP Architecture based on the feedback you and David Solin provided. I have one question.

In your final comment you said:

* Applications need to "implement a standardized interface in order to interact with the data collection architecture" and.also be able to construct report requests applicable to a particular enterprise.  A "query system" command or other information needed by applications, beyond the standardized interface, should be included in the architecture.


I'm not sure what you mean by this. It seems like you are saying that there need to be commands defined for the Application beyond those specified using a standardized interface, but if we define those commands, doesn't that mean we have just standardized them in the interface? Could you clarify what you mean here?

Thanks,
Charles

David Kemp

unread,
Aug 18, 2020, 7:37:47 AM8/18/20
to Charles Schmidt, scap-dev-endpoint
The collection request and report request and response API is (will be) standardized.  There is also enterprise-specific information that an application needs in order to create those.  I mean the API should include a "give me the menu" query so that an application knows what it can order before it places the order.

Don't want to stretch an analogy too far, but a TV API that lets you stream a movie isn't much good if there isn't also an API to find the movies that are available.

Regards,
Dave  

To unsubscribe from this group, send email to scap-dev-endpo...@list.nist.gov

Visit this group at https://list.nist.gov/scap-dev-endpoint
---
To unsubscribe from this group and stop receiving emails from it, send an email to scap-dev-endpo...@list.nist.gov.

Charles Schmidt

unread,
Aug 18, 2020, 12:10:22 PM8/18/20
to scap-dev-endpoint
Thanks, Dave.

I'll add some text to clarify the type of information that can be queried from a Repository. I believe that this will allow Applications to get a "menu" of the information available relevant to their interests.

Charles

On Tuesday, August 18, 2020 at 6:37:47 AM UTC-5, David Kemp wrote:
The collection request and report request and response API is (will be) standardized.  There is also enterprise-specific information that an application needs in order to create those.  I mean the API should include a "give me the menu" query so that an application knows what it can order before it places the order.

Don't want to stretch an analogy too far, but a TV API that lets you stream a movie isn't much good if there isn't also an API to find the movies that are available.

Regards,
Dave  

David Solin

unread,
Aug 18, 2020, 1:19:27 PM8/18/20
to David Kemp, Charles Schmidt, scap-dev-endpoint
Hi Dave K,

Do you mean, we should indicate there will be something like an IDL for the standardized interfaces?

Best regards,
—Dave S

David Kemp

unread,
Aug 18, 2020, 3:53:48 PM8/18/20
to David Solin, Charles Schmidt, scap-dev-endpoint
Hi Dave S and Charles,

If I were going to sell an Application, an API / IDL would allow me to develop a product that customers could purchase, connect, and use with their Managers and Repositories.  OpenC2's mission is to define interfaces that support plug and play interoperability between vendor products, so I'm looking at SCAP v2 messaging from that viewpoint.

Using conceptual/logical/physical terminology (https://www.visual-paradigm.com/guide/data-modeling/what-is-entity-relationship-diagram/), the architecture document is currently at the conceptual and logical levels - defining the names of data objects ("report request") and some of the data that goes in those objects.  The other nodes in the architecture (Manager, Collector, Repository) are reactive, they receive data and respond to it.  The Application node is unique because it initiates requests. So instead of getting data that tells it what to do, it needs to find data that allows it to create requests.


I'll add some text to clarify the type of information that can be queried from a Repository. I believe that this will allow Applications to get a "menu" of the information available relevant to their interests.

and

Do you mean, we should indicate there will be something like an IDL for the standardized interfaces?

At whatever level (conceptual or logical) the "report request" is defined, I think it would be useful to define a "config request" from the Application to the Repository that would return the enterprise's "configuration info" to the Application.

Regards,
Dave
Reply all
Reply to author
Forward
0 new messages