Fedora 4 Audit Service Planning

86 views
Skip to first unread message

David Wilcox

unread,
Feb 9, 2015, 1:18:44 PM2/9/15
to fedora-...@googlegroups.com, Fedora Community, isla...@googlegroups.com, hydra-c...@googlegroups.com
Hi everyone,

In a previous message I outlined the top feature priorities for the next major Fedora 4.x release [1]. The first item on this list is an audit service, which is an essential component of any repository platform. The audit service would provide a means to track a resource’s history within the repository, and possibly connect to other related services such as a PREMIS event service [2]. The rest of this message lays out the details of planning and implementing this feature, including the type and level of effort required.

First, we need to design the audit service by creating and validating use cases, and generating JIRA tickets to implement the functionality. This effort will commence in mid-February and end in mid-March, just ahead of the implementation. If you have use cases related to an audit service and would like to participate in this planning and design stage, please get in touch with me (both developers and non-developers are welcome). I would like to host a brief call to get things started, so please fill out this poll to indicate your availability: http://doodle.com/fuzv27epb9h5fnmc.

Following the planning and design phase, the implementation will likely require 3 developers working over the course of 2 code sprints to complete. The sprints have already been scheduled (beginning March 23), and 1 developer has already committed to both sprints. If this feature is important to you, please sign up for a code sprint on the wiki [3].

Regards,

David



-- 
David Wilcox
Fedora Product Manager
DuraSpace
Skype Name: david.wilcox82

David Wilcox

unread,
Feb 13, 2015, 12:41:38 PM2/13/15
to Fedora Community, isla...@googlegroups.com, fedora-...@googlegroups.com, hydra-c...@googlegroups.com
Hi everyone,

Thanks to all who responded to the Doodle poll for the Fedora 4 Audit Service planning meeting. It looks like almost everyone is available on Friday, February 20 at 3pm AST, so we will hold the meeting then. I will suggest we use Skype for the meeting, so please add me as a contact if you have not already done so: david.wilcox82. If anyone cannot attend a Skype meeting please let me know and I will suggest another way to connect.

I have posted an agenda to the wiki [1]. Please feel free to edit or add comments. If anyone else would like to attend just let me know.

Regards,

David


-- 
David Wilcox
Fedora Product Manager
DuraSpace
Skype Name: david.wilcox82

Andrew Woods

unread,
Feb 16, 2015, 12:30:58 PM2/16/15
to David Wilcox, Fedora Community, islandora, fedora-...@googlegroups.com, Hydra Community
Hello All,
In advance of the Audit Service planning meeting this Friday, it would be helpful if you could respond to this thread with specific requirements, use cases, and examples.

From discussions at the recent Code4Lib in Portland, the following suggestions were made:
1- Audit service should automatically record who updated which resource when and with which action.
2- Audit service should be able to include/import events that were performed external to the repository.
3- Audit service should be able to purge events.
4- Audit service should be RDF-based, and use PATCH semantics for updates.
5- PROV-O [1] ontology may be better suited than PREMIS [2].
6- Audit service would ideally support map-reduce-style analytics.

Regards,
Andrew


--
You received this message because you are subscribed to the Google Groups "Fedora Leaders" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fedora-leader...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nick Ruest

unread,
Feb 17, 2015, 12:45:56 PM2/17/15
to hydra...@googlegroups.com, hydra-c...@googlegroups.com, isla...@googlegroups.com, island...@googlegroups.com


On 15-02-17 12:40 PM, Justin Simpson wrote:
>
>
> On Tue, Feb 17, 2015 at 9:29 AM, Robert Sanderson <azar...@gmail.com
> <mailto:azar...@gmail.com>> wrote:
>
>
>
> On Mon, Feb 16, 2015 at 9:30 AM, Andrew Woods <awo...@duraspace.org
> <mailto:awo...@duraspace.org>> wrote:
>
>
> From discussions at the recent Code4Lib in Portland, the
> following suggestions were made:
> 1- Audit service should automatically record who updated which
> resource when and with which action.
>
>
> +1; requiring Hydra to pass through the "who" to Fedora4, I
> believe. So the requirements would be a standardized way to do
> this, that Fedora would need to allow external identities rather
> than managed, and should have a flag that an identity is external.
>
> It would be nice to have a basic user model such that if you
> dereference the identity, either internal or external, you got back
> at least some set of linked data attributes.

+1

>
> 2- Audit service should be able to include/import events that
> were performed external to the repository.
>
>
> +1 such as migration of audit trails to ensure continuous coverage
> for trusted repository status when people upgrade or migrate.
> Also useful for restoring from external backup systems. Events
> should still be flagged as imported rather than native. Would need
> to have some format for importing them.

+1

>
> 3- Audit service should be able to purge events.
>
>
> +0. Use case? If you can purge events then you can hide your
> tracks regarding any changes made to the system, and thus making the
> repository of indeterminate trustworthiness.
>
>
> One use case for purging events would be checksumming events. You
> probably only care about the first event (when the checksum was
> generated) and the last event (failed checksum check or most recent
> pass). I assume the issue of purging events is a scalability concern?
> That a long list of events takes a long time to load? Perhaps being
> able to archive events, rather than purge them, so they are not gone,
> but are not returned by default.

+1 to Justin's use case. I think just keeping the first and most recent
fixity check would be fine.

>
> 4- Audit service should be RDF-based, and use PATCH semantics
> for updates.
>
>
> RDF: +1
> PATCH: +0 -- Patching RDF is non-trivial, and means selecting a
> particular patch format. LD-Patch? Sparql update? JSON-patch? plain
> old diff? More than one?
>
> 5- PROV-O [1] ontology may be better suited than PREMIS [2].

PREMIS is pretty well established in the Islandora community. What about
the PREMIS ontology[1]? ...and I guess I need to learn more about PROV-O.

>
> +1 to PROV-O
>
>
> 6- Audit service would ideally support map-reduce-style analytics.
>
>
> To rephrase: A service should be able to be built on top of the
> audit data to do analytics. That service should be able to be
> written in a map-reduce style.
>
> In other words, the analytics service is not part of audit (and thus
> not part of this work) but should be able to be built using the API
> the audit service provides.

+1


cheers!

-nruest

[1] http://id.loc.gov/ontologies/premis.html
>
> And my regrets for the meeting, closing on our new house here in the
> bay area :)
>
> Rob
> dwi...@duraspace.org <mailto:dwi...@duraspace.org>
> Skype Name: david.wilcox82
>
> On February 9, 2015 at 2:18:42 PM, David Wilcox
>> dwi...@duraspace.org <mailto:dwi...@duraspace.org>
>> Skype Name: david.wilcox82
>
> --
> You received this message because you are subscribed to the
> Google Groups "Fedora Leaders" group.
> To unsubscribe from this group and stop receiving emails
> from it, send an email to
> fedora-leader...@googlegroups.com
> <mailto:fedora-leader...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the
> Google Groups "hydra-community" group.
> To unsubscribe from this group and stop receiving emails from
> it, send an email to
> hydra-communi...@googlegroups.com
> <mailto:hydra-communi...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
> --
> Rob Sanderson
> Information Standards Advocate
> Digital Library Systems and Services
> Stanford, CA 94305
>
> --
> You received this message because you are subscribed to the Google
> Groups "hydra-community" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to hydra-communi...@googlegroups.com
> <mailto:hydra-communi...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Hydra-Tech" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to hydra-tech+...@googlegroups.com
> <mailto:hydra-tech+...@googlegroups.com>.

Andrew Woods

unread,
Feb 18, 2015, 7:00:29 PM2/18/15
to Hydra-Tech, Hydra Community, isla...@googlegroups.com, island...@googlegroups.com
Hello All,
Do you have any Audit Service requirements/scenarios that have not been mentioned in this thread?
Regards,
Andrew

On Tue, Feb 17, 2015 at 12:54 PM, Esmé Cowles <esco...@ticklefish.org> wrote:
On 02/17/15, at 12:45 PM, Nick Ruest <rue...@gmail.com> wrote:
...


> PREMIS is pretty well established in the Islandora community. What about the PREMIS ontology[1]? ...and I guess I need to learn more about PROV-O.

I haven't had time to actually read it yet, but I was pointed at this article last week, which I think advocates using PREMIS and PROV-O together:

http://dcpapers.dublincore.org/pubs/article/view/3709

So I'll just put that out there as a possible way to reconcile this.

-Esme

--
You received this message because you are subscribed to the Google Groups "Hydra-Tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hydra-tech+...@googlegroups.com.

David Wilcox

unread,
Feb 19, 2015, 9:27:38 AM2/19/15
to Andrew Woods, Fedora Community, fedora-...@googlegroups.com, islandora, Hydra Community
Hi everyone,

Just a reminder to send me a Skype contact request (david.wilcox82) if you’d like to join the audit service meeting tomorrow. Some of those who responded to the poll have not yet done so, and I don’t want to miss anyone who would like to attend. Also, if Skype is not an option for you, please let me know so we can work something else out.

Regards,

David

-- 
David Wilcox
Fedora Product Manager
DuraSpace
Skype Name: david.wilcox82

David Wilcox

unread,
Feb 19, 2015, 9:50:39 AM2/19/15
to Andrew Woods, Fedora Community, islandora, fedora-...@googlegroups.com, Hydra Community
It looks like Skype will not be a good option for some participants, so let’s switch over to using the DuraSpace conference line instead. Call-in details, including the access code, have been posted to the meeting agenda [1]. Apologies for any confusion.

Regards,

David


-- 
David Wilcox
Fedora Product Manager
DuraSpace
Skype Name: david.wilcox82

John Doyle

unread,
Feb 19, 2015, 5:53:52 PM2/19/15
to hydra-c...@googlegroups.com, hydra...@googlegroups.com, isla...@googlegroups.com, island...@googlegroups.com, fedora-c...@googlegroups.com
Hi Andrew,

I'm generally in agreement with the +1s in this thread.  

I am, however, wondering about the intended role of the Audit Service in maintaining evidence of Fedora fixity checks.  TRAC/TDR (and I presume the ISO 16363 spec) look for evidence of fixity checking on a "routine basis", and with logs "stored separately or protected separately from the AIPs themselves" [4.4.1.2 The repository shall actively monitor the integrity of AIPs].  I think it would be helpful therefore to design the Audit Service with these requirements in mind, and allow (potentially) all checksumming activities to be recorded and available for reporting (with such reporting carried out via a separate mechanism, I would imagine).

Regards,
John

Andrew Woods

unread,
Feb 20, 2015, 9:01:52 AM2/20/15
to Susan Lafferty, David Wilcox, Fedora Community, islandora, fedora-...@googlegroups.com, Hydra Community
Thanks Susan,
We will add Arif's notes to today's discussion.
Andrew

On Fri, Feb 20, 2015 at 1:00 AM, Susan Lafferty <susan.l...@unsw.edu.au> wrote:

Hi Andrew,

Arif has put together the following:

 

Please find below uses cases from UNSW Library:

 

·         Detailed event-related information about ingestion, updates and deletion of records should be automatically captured. At minimum, the information should include the following:

o   Who did it

o   What was done Hydra Community <hydra-c...@googlegroups.com>

o   When it was done

o   Outcome/Success/Failures/Errors

o   The related record, node, property etc.

·         Information captured should be in RDF and be query-able using SPARQL.  For example, the following types of queries should be supported:

o   List all events associated with record X

o   List all events of type “T” associated with record X

o   List all events of type “T” that occurred to record X within a specified date range

o   Get an event with a  specific identifier (e.g. SPARQL DESCRIBE query)

·         Optional: Fedora 4 REST API should support dissemination of event/audit information.

 

Regards

 

Susan Lafferty

Director Digital Library Services

 

Phone +61 2 9385 3479 | Mob +61 402 346 001| Fax +61 2 9385 8002 | susan.l...@unsw.edu.au | http://www.library.unsw.edu.au

UNSW Library | UNSW Australia | UNSW Sydney NSW 2052 AUSTRALIA | CRICOS Provider Code: 00098G

Justin Simpson

unread,
Feb 20, 2015, 10:17:59 AM2/20/15
to hydra-c...@googlegroups.com, isla...@googlegroups.com, island...@googlegroups.com, hydra...@googlegroups.com

+1 from me on the paper Esme linked to here.  I think this idea of mapping PREMIS entities as subclasses or equivalent classes to Prov entities allows the best of both worlds, and is worth exploring more.

On 17 Feb 2015 09:54, "Esmé Cowles" <esco...@ticklefish.org> wrote:
On 02/17/15, at 12:45 PM, Nick Ruest <rue...@gmail.com> wrote:
...

> PREMIS is pretty well established in the Islandora community. What about the PREMIS ontology[1]? ...and I guess I need to learn more about PROV-O.

I haven't had time to actually read it yet, but I was pointed at this article last week, which I think advocates using PREMIS and PROV-O together:

http://dcpapers.dublincore.org/pubs/article/view/3709

So I'll just put that out there as a possible way to reconcile this.

-Esme

--
You received this message because you are subscribed to the Google Groups "hydra-community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hydra-communi...@googlegroups.com.

Justin Simpson

unread,
Feb 20, 2015, 10:17:59 AM2/20/15
to hydra-c...@googlegroups.com, isla...@googlegroups.com, hydra...@googlegroups.com, fedora-c...@googlegroups.com, island...@googlegroups.com


On 19 Feb 2015 13:17, "John Doyle" <doy...@mail.nlm.nih.gov> wrote:
>
> Hi Andrew,
>
> I'm generally in agreement with the +1s in this thread.  
>
> I am, however, wondering about the intended role of the Audit Service in maintaining evidence of Fedora fixity checks.  TRAC/TDR (and I presume the ISO 16363 spec) look for evidence of fixity checking on a "routine basis", and with logs "stored separately or protected separately from the AIPs themselves" [4.4.1.2 The repository shall actively monitor the integrity of AIPs].  I think it would be helpful therefore to design the Audit Service with these requirements in mind, and allow (potentially) all checksumming activities to be recorded and available for reporting (with such reporting carried out via a separate mechanism, I would imagine).
>

I think this use case is covered.  One of the other requirements listed from the discussion at Code4Lib was the ability for the audit service to be able to record events from external agents.  We used the one example of an external piece of software that generates checksums ( e.g., a bagit tool).  I think if an external application independently verified checksums, and kept its own external logs, it still makes sense for Fedora to have a way to also keep records of these verification events.  The external application can post these to fedora via this audit service. 

> Regards,
> John
>
>
>  
>
> On Wednesday, February 18, 2015 at 7:00:29 PM UTC-5, Andrew Woods wrote:
>>
>> Hello All,
>> Do you have any Audit Service requirements/scenarios that have not been mentioned in this thread?
>> Regards,
>> Andrew
>>
>> On Tue, Feb 17, 2015 at 12:54 PM, Esmé Cowles <esco...@ticklefish.org> wrote:
>>>
>>> On 02/17/15, at 12:45 PM, Nick Ruest <rue...@gmail.com> wrote:
>>> ...
>>>
>>> > PREMIS is pretty well established in the Islandora community. What about the PREMIS ontology[1]? ...and I guess I need to learn more about PROV-O.
>>>
>>> I haven't had time to actually read it yet, but I was pointed at this article last week, which I think advocates using PREMIS and PROV-O together:
>>>
>>> http://dcpapers.dublincore.org/pubs/article/view/3709
>>>
>>> So I'll just put that out there as a possible way to reconcile this.
>>>
>>> -Esme
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups "Hydra-Tech" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an email to hydra-tech+...@googlegroups.com.
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>
>>
> --

> You received this message because you are subscribed to the Google Groups "hydra-community" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to hydra-communi...@googlegroups.com.

John Doyle

unread,
Feb 20, 2015, 10:43:49 AM2/20/15
to hydra-c...@googlegroups.com, isla...@googlegroups.com, hydra...@googlegroups.com, fedora-c...@googlegroups.com, island...@googlegroups.com
That sounds reasonable to me, and useful.  But I'm looking for Fedora to satisfy this particular TDR requirement without delegating the checksumming, and logging, to an external application - which also seems reasonable and useful. :-)  Perhaps we can discuss this further during the call - maybe your scenario would also allow for mine.

Andrew Woods

unread,
Feb 23, 2015, 7:49:15 PM2/23/15
to Fedora Community, islandora, Hydra Community
Hello All,
We had a productive meeting [1] last week which set the foundation for how we, as a community, will approach teasing out the details required for implementation of the Audit Service.

We plan on having a follow-on meeting next week. In advance of that, however, the following tasks should be addressed, or at least furthered. 

Comments and decisions related to the following items are being documented (by you!) on the wiki:

...but feel free to also respond directly to this thread.

======
Actions
======
1) Define required Audit Service queries
2) Perform comparative analysis of PROV-O vs. PREMIS-RDF
3) Define repository events that should be recorded by the Audit Service
4) Define event agents that should be supported by the Audit Service
5) Define capability of the Audit Service REST-API

======
Unresolved Questions
======
1) Should there be support for adding external events to the Audit Service?
1a) If yes, what restrictions, if any, should be enforced on this capability? (e.g. only when migrating from Fedora3? only by administrators?)
1b) If yes, what should the import format be?
2) What is the scope of the Audit Service: the Repository? or the Resource?
3) What is the most appropriate ontology? PROV-O, PREMIS, combination? other?
4) How should the Audit Service support different users? administrators, users?
5) For event tracking, where is the user principal expected to come from?
6) How will user principals be mapped to persistent user identifiers?

Regards,
Andrew

David Wilcox

unread,
Feb 24, 2015, 6:17:49 PM2/24/15
to islandora, Fedora Community, Hydra Community
Hi everyone,

I have created a poll for a follow-up audit service planning meeting next week. The agenda for the meeting will be based on the action items and questions below. Please respond with your availability (timezone support has been enabled): http://doodle.com/vvrgw9g4wped8bep.

Our immediate goal is to define the requirements for the Fedora 4 audit service so we can create related JIRA tickets for the technical team to implement (the first sprint starts March 30). To this end, we need to accomplish the following tasks before next week’s meeting:

1. Assign and begin working on the action items listed below
2. Answer the unresolved questions from the last meeting

There are 4 action items, each of which needs an owner to push it forward (though contributions from others are welcome and encouraged):

======
Actions
======
1) Define required Audit Service queries
2) Perform comparative analysis of PROV-O vs. PREMIS-RDF
3) Define repository events and event agents that should be recorded and supported by the Audit Service
4) Define capability of the Audit Service REST-API

If you have an interest in pushing any of these actions forward, please let me know or sign up on the wiki [1].

There are also 3 unanswered questions that need to be resolved before we can move forward (the first question in particular is very important):

=================
Unresolved Questions
=================
1) Should there be support for adding external events to the Audit Service?
- If yes, what restrictions, if any, should be enforced on this capability? (e.g. only when migrating from Fedora3? only by administrators?)
- If yes, what should the import format be?
2) For event tracking, where is the user principal expected to come from?
3) How will user principals be mapped to persistent user identifiers?

Please join the conversation and help answer these questions [2] before our next meeting [3]. Default answers (based on the discussions so far) have been provided in case there is insufficient community input.

Regards,

David


--
For more information about using this group, please read our Listserv Guidelines: http://islandora.ca/content/welcome-islandora-listserv
---
You received this message because you are subscribed to the Google Groups "islandora" group.
To unsubscribe from this group and stop receiving emails from it, send an email to islandora+...@googlegroups.com.
Visit this group at http://groups.google.com/group/islandora.

For more options, visit https://groups.google.com/d/optout.



--

Andrew Woods

unread,
Feb 25, 2015, 11:08:04 AM2/25/15
to Joshua Westgard, fedora-community, islandora, Hydra Community
Hello Josh,
I agree that the question of the exact scope of the Audit Service must be driven by the use cases. Where the Audit Service is viewed to serve a primary purpose of supporting internal debugging and troubleshooting, scoping it to the repository and internal events makes sense. Maintaining a more complete provenance of a Resource to include events before ingest (e.g. Fedora 3 audit log), external events on the Resource (e.g. transfer to other systems, external fixity, etc.), as well as internal repository events, argues for scoping the service to the Resource vs. the repository.

In any case, I may be missing some nuance in your suggestion of having "import external audit records" as a class of Fedora event. On last week's call [1], there was what may be a related suggestion of marking external events with an appropriate flag, which may include import events. Is your suggestion analogous to this?

Thank you for the input. It is import to tease these issues out.
Andrew

On Tue, Feb 24, 2015 at 6:24 PM, Joshua Westgard <westg...@gmail.com> wrote:
Andrew and everyone,

I feel as though limiting the scope of the audit service to internal (i.e. Fedora) events is the most sensible approach, but I can absolutely see a use-case for the ability to import events, especially (but not only) in cases of migration from an earlier repository with audit events.  I wonder if one way to resolve the question would be to allow only internal Fedora events to be part of the audit service, but to have "import external audit records" to itself be a class of Fedora event. That would allow Fedora as a system to remain somewhat agnostic about the quality of the imported audit data, while at the same time allowing users some flexibility to bring in data from other sources for use with the Fedora audit events.

Does that make any sense to anyone?  It is well possible that I'm not fully understanding what's possible or what's at stake here.

Josh Westgard
UMD Libraries

On Monday, February 23, 2015 at 7:49:16 PM UTC-5, awoods wrote:

1) Should there be support for adding external events to the Audit Service?
1a) If yes, what restrictions, if any, should be enforced on this capability? (e.g. only when migrating from Fedora3? only by administrators?)
1b) If yes, what should the import format be?
 

--
You received this message because you are subscribed to the Google Groups "Fedora Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fedora-communi...@googlegroups.com.

Andrew Woods

unread,
Feb 25, 2015, 5:30:39 PM2/25/15
to Joshua Westgard, fedora-community, islandora, Hydra Community
Hello Josh,
Noting in the resource's audit log the source of an externally generated event (among other details) along with a clear indication of the event having been created externally is very reasonable. Thanks for the clarity.
Andrew

On Wed, Feb 25, 2015 at 2:51 PM, Joshua Westgard <westg...@gmail.com> wrote:
Eric's statement gets a lot closer to a clear formulation of the idea I was trying to express than my original attempt.  It seems like some of the needs of those who come down on either side of this question might be met by just giving external audit events a clear provenance, so that internal and external events are distinguishable on some fundamental level. It would also be important that the source of external audit events (external service or user action) be recorded. Sorry I'm not formulating this any better ...

Josh

On Wednesday, February 25, 2015 at 11:33:58 AM UTC-5, eric.james wrote:
Andrew, Josh, all,

I too maybe missing a key nuance, but it seems to me whether the audit event being captured is external or internal it would have a similar representation (combination of PREMIS/PROVO/other) in fedora.  The difference being an internal audit event would be triggered by a internal repository event while and external audit event would be generated by a loaded POST, or perhaps a special endpoint designed specifically for external event upload.

-Eric

From: fedora-c...@googlegroups.com [fedora-c...@googlegroups.com] on behalf of Andrew Woods [awo...@duraspace.org]
Sent: Wednesday, February 25, 2015 11:08 AM
To: Joshua Westgard
Cc: fedora-community; islandora; Hydra Community
Subject: Re: [fedora-community] Re: Fedora 4 Audit Service Planning

David Wilcox

unread,
Feb 26, 2015, 7:47:12 AM2/26/15
to Joshua Westgard, Andrew Woods, Hydra Community, fedora-community, islandora
Hi Josh,

I added Andrew’s summary of your comments to the Unanswered Questions page [1] so we don’t lose track of your input. Please feel free to edit if I did not properly capture your thoughts. If you do not already have an account for the DuraSpace wiki I can create one for you - just let me know.

David


-- 
David Wilcox
Fedora Product Manager
DuraSpace
Skype Name: david.wilcox82

David Wilcox

unread,
Feb 26, 2015, 8:13:56 AM2/26/15
to islandora, Fedora Community, Hydra Community
Hi everyone,

I accidentally listed the wrong dates for the Doodle poll, so I have created a new poll with the correct dates for the meeting (March 4-6): http://doodle.com/52b5dwtxmztehadf. Please fill out this new poll instead (the previous poll has been closed). Sorry for the mistake. I will close the poll at the end of the day on Friday, Feb. 27 so please get your responses in by then.

Regards,

David

-- 
David Wilcox
Fedora Product Manager
DuraSpace
Skype Name: david.wilcox82

Mark Jordan

unread,
Feb 27, 2015, 12:20:02 AM2/27/15
to Eric James, Andrew Woods, Joshua Westgard, fedora-community, islandora, Hydra Community
Hi,

I'd like to offer three specific use cases illustrating events external to the repository, two of which would typically occur during ingestion into the repository and the third which would be performed periodically during the file's life within the repository:

1) an external application scanning a file for viruses,
2) an external application validating a file's content against an external schema, profile, or using domain-specific validation tools, and
3) an external application, or a internal service provided by the repository itself, verifying a file's checksum.

Details describing when these events happened, the applications (and their environments) that performed the events, and the outcome of the events should all be recorded. I can see value in separating the recording of events initiated within the repository and events external to the repository, but don't see any value in privileging one type of event above the other - both are equally valid and important. I also think that we need to accommodate both types of events in a way that can be expressed in whatever ontologies or vocabularies evolve over time as community standards to demonstrate a chain of preservation.

Mark


Andrew, Josh, all,

I too maybe missing a key nuance, but it seems to me whether the audit event being captured is external or internal it would have a similar representation (combination of PREMIS/PROVO/other) in fedora.  The difference being an internal audit event would be triggered by a internal repository event while and external audit event would be generated by a loaded POST, or perhaps a special endpoint designed specifically for external event upload.

-Eric

From: fedora-c...@googlegroups.com [fedora-c...@googlegroups.com] on behalf of Andrew Woods [awo...@duraspace.org]
Sent: Wednesday, February 25, 2015 11:08 AM
To: Joshua Westgard
Cc: fedora-community; islandora; Hydra Community
Subject: Re: [fedora-community] Re: Fedora 4 Audit Service Planning


Andrew Woods

unread,
Feb 27, 2015, 9:26:06 AM2/27/15
to Mark Jordan, Eric James, Joshua Westgard, fedora-community, islandora, Hydra Community
Thank you, Mark.
I have added your examples of external events which would benefit from being included in a resource's audit log in the planning page:
Andrew

Andrew Woods

unread,
Feb 27, 2015, 9:42:26 AM2/27/15
to Friscia, Michael, hydra-c...@googlegroups.com, fedora-community, islandora
Hello Mike,
I also added your "move" event to the planning page. I believe the "actor" and "fixity" points have already been accounted for.

Your point regarding reporting is important. I would like to see the Audit Service requirements becoming split along the lines of input-requirements and output-requirements. Ideally, the service could produce an output format that could be transformed into something more suitable to various proprietary systems. If you happen to have the details of what Tableau expects, for example, that would be informative.

I could not exactly tell from your message if there was an implicit assumption that the audit logs were required to be stored in Fedora. At this point, we are gathering requirements around what the Audit Service should do, not necessarily how it is implemented. Is there a requirement that the audit logs be stored within or external to the repository? There are certainly arguments both ways.

Thanks,
Andrew

On Fri, Feb 27, 2015 at 6:52 AM, Friscia, Michael <michael...@yale.edu> wrote:
Just to add

- every time the file was moved
- who moved the file including machine names
- every time a fixity check happened as a result of a move


Most of our files are digitized by a vendor, so we end up with workflows where a hard drive is sent to us, files copied to staging area, files copied to Q/C area, files copied into a curation system, files copied to pre-ingest staging area, files ingested into repository. So there are at least 8 events we would capture on every file prior to entering Fedora.

Eric did suggest that what we could do is create the Fedora record, less the file, at the point of creation or transfer to our systems from a vendor. This is a good idea going forward but we still have all the legacy data that we would have to import.

I might offer that it would be ideal that the system flags events Fedora creates differently from events that are imported.

Eric and I also talked a little about reporting. Since audits tie directly to reporting. In our enterprise systems, we separate the reports server from the production servers. In some cases this means that the data is transferred to another system nightly, in other cases it happens in near real time. But the point of it is that some of the reports that are run could affect the production system. So I have concerns that the amount of auditing performed could impact general ingest operations. So it may be valuable to consider how the audits can be transferred to another system. Reporting is not something I would build in Fedora. There are very good reporting tools that can be connected to databases and provide more functionality than we could ever program. So in our case, we would want all the audit data to replicate to a SQL based system so that we could use Tableau to run reports as we are in the process of adopting this product for reporting needs. But I can see use cases for Microsoft SQL Reporting Services or the Oracle equivalent.

But in the use case of having an external application produce reports we must also consider how these products operate. In the case of Tableau, we could have reports generated on timers. Given the potential size of our repository, the amount of reporting activity could be significant just with the reports that run regularly. The ad-hoc reports could also be significant. So again, it is important to not only consider the creation of the audit data but the use of it and ensure that systems performance is not affected.

-mike

_______________________________________
Michael Friscia
Manager, Digital Library & Programming Services
Yale University Library
(203) 432-1856
________________________________________
From: hydra-c...@googlegroups.com [hydra-c...@googlegroups.com] on behalf of Mark Jordan [mjo...@sfu.ca]
Sent: Friday, February 27, 2015 12:19 AM
To: James, Eric
Cc: Andrew Woods; Joshua Westgard; fedora-community; islandora; Hydra Community
Subject: [hydra-community] Re: [fedora-community] Re: Fedora 4 Audit Service Planning


Hi,

I'd like to offer three specific use cases illustrating events external to the repository, two of which would typically occur during ingestion into the repository and the third which would be performed periodically during the file's life within the repository:

1) an external application scanning a file for viruses,
2) an external application validating a file's content against an external schema, profile, or using domain-specific validation tools, and
3) an external application, or a internal service provided by the repository itself, verifying a file's checksum.

Details describing when these events happened, the applications (and their environments) that performed the events, and the outcome of the events should all be recorded. I can see value in separating the recording of events initiated within the repository and events external to the repository, but don't see any value in privileging one type of event above the other - both are equally valid and important. I also think that we need to accommodate both types of events in a way that can be expressed in whatever ontologies or vocabularies evolve over time as community standards to demonstrate a chain of preservation.

Mark

________________________________
Andrew, Josh, all,

I too maybe missing a key nuance, but it seems to me whether the audit event being captured is external or internal it would have a similar representation (combination of PREMIS/PROVO/other) in fedora.  The difference being an internal audit event would be triggered by a internal repository event while and external audit event would be generated by a loaded POST, or perhaps a special endpoint designed specifically for external event upload.

-Eric
________________________________
From: fedora-c...@googlegroups.com [fedora-c...@googlegroups.com] on behalf of Andrew Woods [awo...@duraspace.org]
Sent: Wednesday, February 25, 2015 11:08 AM
To: Joshua Westgard
Cc: fedora-community; islandora; Hydra Community
Subject: Re: [fedora-community] Re: Fedora 4 Audit Service Planning


Hello Josh,
I agree that the question of the exact scope of the Audit Service must be driven by the use cases. Where the Audit Service is viewed to serve a primary purpose of supporting internal debugging and troubleshooting, scoping it to the repository and internal events makes sense. Maintaining a more complete provenance of a Resource to include events before ingest (e.g. Fedora 3 audit log), external events on the Resource (e.g. transfer to other systems, external fixity, etc.), as well as internal repository events, argues for scoping the service to the Resource vs. the repository.

In any case, I may be missing some nuance in your suggestion of having "import external audit records" as a class of Fedora event. On last week's call [1], there was what may be a related suggestion of marking external events with an appropriate flag, which may include import events. Is your suggestion analogous to this?

Thank you for the input. It is import to tease these issues out.
Andrew


On Tue, Feb 24, 2015 at 6:24 PM, Joshua Westgard <westg...@gmail.com<mailto:westg...@gmail.com>> wrote:
Andrew and everyone,

I feel as though limiting the scope of the audit service to internal (i.e. Fedora) events is the most sensible approach, but I can absolutely see a use-case for the ability to import events, especially (but not only) in cases of migration from an earlier repository with audit events.  I wonder if one way to resolve the question would be to allow only internal Fedora events to be part of the audit service, but to have "import external audit records" to itself be a class of Fedora event. That would allow Fedora as a system to remain somewhat agnostic about the quality of the imported audit data, while at the same time allowing users some flexibility to bring in data from other sources for use with the Fedora audit events.

Does that make any sense to anyone?  It is well possible that I'm not fully understanding what's possible or what's at stake here.

Josh Westgard
UMD Libraries

On Monday, February 23, 2015 at 7:49:16 PM UTC-5, awoods wrote:

1) Should there be support for adding external events to the Audit Service?
1a) If yes, what restrictions, if any, should be enforced on this capability? (e.g. only when migrating from Fedora3? only by administrators?)
1b) If yes, what should the import format be?



--
You received this message because you are subscribed to the Google Groups "Fedora Community" group.




--
You received this message because you are subscribed to the Google Groups "Fedora Community" group.



--
You received this message because you are subscribed to the Google Groups "Fedora Community" group.

David Wilcox

unread,
Mar 2, 2015, 9:35:41 AM3/2/15
to islandora, Fedora Community, Hydra Community
Hi everyone,

Thanks to all who participated in the Doodle poll. The best time for most participants is Thursday, March 5 at 3pm ET, so I will schedule the meeting for that date/time. I will follow up shortly with an agenda, including call-in information.

Regards,

David

-- 
David Wilcox
Fedora Product Manager
DuraSpace
Skype Name: david.wilcox82

David Wilcox

unread,
Mar 3, 2015, 2:06:23 PM3/3/15
to Fedora Community, Hydra Community, islandora
Hi everyone,

I have created an agenda page for Thursday’s 3pm ET meeting here: https://wiki.duraspace.org/display/FF/2015-03-05+-+Audit+Service+Planning+Meeting. You will find the call-in information on that page. Feel free to add to the agenda or respond to this thread.

Regards,

David

-- 
David Wilcox
Fedora Product Manager
DuraSpace
Skype Name: david.wilcox82

David Wilcox

unread,
Mar 6, 2015, 12:21:58 PM3/6/15
to Fedora Community, islandora, hydra-c...@googlegroups.com, fedora-...@googlegroups.com
Hi everyone,

The notes from yesterday’s audit service planning meeting have been posted to the wiki [1]. Thanks to everyone who attended. As indicated in the notes, there are a number of outstanding actions that need to be completed:

- Review events and event agents [2] to make sure they cover your needs. Edit as required. 
- Establish consensus on combining PROV-O with PREMIS [3], using them separately, or using a different ontology within the audit service.
- List, in plain English, queries [4] that must be supported by the audit service.
- Participate in the mailing list discussion about what the REST-API [5] will support.
- Leave any final comments on the unresolved questions [6] before they are resolved.

These actions should be completed before our next call (which needs to be scheduled sometime in the week of March 16th). If you would like to attend, please indicate your availability by filling out this poll: http://doodle.com/3ep4ff68c8dx4qa3.

On that call we will discuss the results and finalize the functional requirements. This will allow us to create JIRA tickets in time for the implementation sprints, which will begin on March 30 [7].

Regards,

David


Andrew Woods

unread,
Mar 6, 2015, 8:30:27 PM3/6/15
to Robert Sanderson, Hydra Community, Fedora Community, islandora, fedora-...@googlegroups.com
Ben,
Your #1 mentions the potential of creating audit-LDPCs automatically for objects (aka containers). Would you envision audit entries for datastreams (aka Non-RDF sources) being maintained within these container-level audit-LDPCs? We would also want to consider nested containers.

As a note: as of the 4.0.0 release, Fedora does not have a query REST API.
Andrew

On Fri, Mar 6, 2015 at 2:01 PM, Robert Sanderson <azar...@gmail.com> wrote:

+1 to round, reused wheels.

Rob

On Fri, Mar 6, 2015 at 10:59 AM, Benjamin Armintor <armi...@gmail.com> wrote:
How is this service different from using a LDPC to manage the audit events as resources, and using the existing REST API to query and manage them? 
I'd like to see a proposal in which FCR4 advertises the memberRelation for audits in the way it advertises for membership, and most of the API is no different than querying other nodes.

- Ben

--
You received this message because you are subscribed to the Google Groups "hydra-community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hydra-communi...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "hydra-community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hydra-communi...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Rob Sanderson
Information Standards Advocate
Digital Library Systems and Services
Stanford, CA 94305

--
You received this message because you are subscribed to the Google Groups "Fedora Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fedora-communi...@googlegroups.com.

Andrew Woods

unread,
Mar 11, 2015, 8:12:03 AM3/11/15
to Arif Shaon, David Wilcox, Fedora Community, islandora, hydra-c...@googlegroups.com, fedora-...@googlegroups.com
Thank you, Arif.

For All: If you have an interest in Fedora's Audit Service, please review the initial set of queries Arif has proposed it support:

Additionally, we will be having the next Audit Service meeting next week. Please indicate your availability on the poll by this Friday so we can get the schedule information out.

Regards,
Andrew

On Tue, Mar 10, 2015 at 8:08 PM, Arif Shaon <a.s...@unsw.edu.au> wrote:

Hi All,

 

I have updated the “Audit Service Query” page based on our discussion last week.  Please review and provide feedback.

 

I have added an example section for each of the queries/use cases to provide “practical” examples of the corresponding query. It would be good if we have examples from at least two different institutions. Can I ask for volunteers?

 

I have also listed a couple of potential use cases, i.e. adding and deleting custom/external events, which are currently listed as “Unresolved Questions”. 

 

Also, please do feel free to either add directly to page or email any other types of queries/use cases that should be considered for implementation.

 

Look forward to your response.

 

Regards

Arif

Dr Arif Shaon

Lead Technical Officer, Library Repository Services, UNSW Library

 

Phone +61 2 9385 3088 | Fax +61 2 9385 8002 | a.s...@unsw.edu.au| http://www.library.unsw.edu.au

UNSW Library | UNSW Australia | UNSW Sydney NSW 2052 AUSTRALIA | CRICOS Provider Code: 00098G

--

You received this message because you are subscribed to the Google Groups "Fedora Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fedora-communi...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andrew Woods

unread,
Mar 13, 2015, 9:36:20 AM3/13/15
to islandora, Fedora Community, hydra-c...@googlegroups.com, fedora-...@googlegroups.com
Hello All,
Thank you for the questions and comments in this thread and for participating in the Doodle poll [1]. The next meeting [2] will take place on Thursday, March 19th @3pm ET [3].
Please plan on discussing and building consensus around the Audit Service's functional and non-functional requirements.

Regards,
Andrew

On Wed, Mar 11, 2015 at 10:24 AM, James, Eric <eric....@yale.edu> wrote:
Looking at the Audit Service Query wiki page Arif revised (https://wiki.duraspace.org/display/FF/Audit+Service+Queries), and responding on these lists rather than on the wiki as the wiki is subject to change.  Questions/comments:

1) Clarification question, do these 2 examples (resource or repository events) correspond to the examples on the Audit Service REST API https://wiki.duraspace.org/display/FF/Audit+Service+REST+API page?

2) Fedora 4 does not have a search API.  Would implementing this require creating one, or using an external triplestore?

3) Beyond date, type, and user, I there could be a use case for more granular search (ex: what upstream system a object is captured from, the format types involved in derivative creation)

4) If this does involve some kind of search, would this search have to necessarily pertain to audit? Why not just enable full graph querying?

ALso regarding Ben's question "What does the internal/external distinction offer us that identifying an Agent does not?"

I agree, the internal/external distinction doesn't seem too large.  1) An external event would be uploaded, while an internal event for the most part would be triggered. 2) At the agent level an internal agent might be the repository and any authenticated user, while the external agent might be a system/user for the external event if that information is available.

-Eric

From: fedora-c...@googlegroups.com [fedora-c...@googlegroups.com] on behalf of Benjamin Armintor [armi...@gmail.com]
Sent: Wednesday, March 11, 2015 9:11 AM
To: hydra-c...@googlegroups.com
Cc: Arif Shaon; David Wilcox; Fedora Community; islandora; fedora-...@googlegroups.com
Subject: Re: [hydra-community] Re: [fedora-community] Fedora 4 Audit Service Planning

I'm looking over the event types, and I have some questions:
1. What does the internal/external distinction offer us that identifying an Agent does not?
2. What do the two modification event types offer us besides restating the rdf:type of the object related to the event?
3. How does derivative creation differ from a creation/modification event for the derivative?

- Ben

You received this message because you are subscribed to the Google Groups "hydra-community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hydra-communi...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Fedora Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fedora-communi...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andrew Woods

unread,
Mar 16, 2015, 6:29:22 PM3/16/15
to James, Eric, Benjamin Armintor, hydra-c...@googlegroups.com, Arif Shaon, David Wilcox, Fedora Community, islandora, fedora-...@googlegroups.com
Hello Eric,
My responses are inline, although I expect some of your questions are directed to Arif...

On Wed, Mar 11, 2015 at 10:24 AM, James, Eric <eric....@yale.edu> wrote:
Looking at the Audit Service Query wiki page Arif revised (https://wiki.duraspace.org/display/FF/Audit+Service+Queries), and responding on these lists rather than on the wiki as the wiki is subject to change.  Questions/comments:

1) Clarification question, do these 2 examples (resource or repository events) correspond to the examples on the Audit Service REST API https://wiki.duraspace.org/display/FF/Audit+Service+REST+API page?
The strawdog REST-API addresses the 2 examples in the "Audit Service Queries", although the REST-API also defines "single audit entry" interactions.

2) Fedora 4 does not have a search API.  Would implementing this require creating one, or using an external triplestore?
Yes. 

3) Beyond date, type, and user, I there could be a use case for more granular search (ex: what upstream system a object is captured from, the format types involved in derivative creation)
Please add your more detailed queries to the "Audit Service Queries" wiki page. We want to establish as comprehensive a list as possible to inform the pending design. 

4) If this does involve some kind of search, would this search have to necessarily pertain to audit? Why not just enable full graph querying?
Are you suggesting having audit entries in the same external triplestore that optionally holds repository resource triples? Can you elaborate on "enable full graph querying"?

ALso regarding Ben's question "What does the internal/external distinction offer us that identifying an Agent does not?" 

I agree, the internal/external distinction doesn't seem too large.  1) An external event would be uploaded, while an internal event for the most part would be triggered. 2) At the agent level an internal agent might be the repository and any authenticated user, while the external agent might be a system/user for the external event if that information is available.

-Eric
Regards,
Andrew 

From: fedora-c...@googlegroups.com [fedora-c...@googlegroups.com] on behalf of Benjamin Armintor [armi...@gmail.com]
Sent: Wednesday, March 11, 2015 9:11 AM
To: hydra-c...@googlegroups.com
Cc: Arif Shaon; David Wilcox; Fedora Community; islandora; fedora-...@googlegroups.com
Subject: Re: [hydra-community] Re: [fedora-community] Fedora 4 Audit Service Planning

I'm looking over the event types, and I have some questions:
1. What does the internal/external distinction offer us that identifying an Agent does not?
2. What do the two modification event types offer us besides restating the rdf:type of the object related to the event?
3. How does derivative creation differ from a creation/modification event for the derivative?

- Ben
On Wed, Mar 11, 2015 at 8:12 AM, Andrew Woods <awo...@duraspace.org> wrote:
You received this message because you are subscribed to the Google Groups "hydra-community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hydra-communi...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Fedora Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fedora-communi...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andrew Woods

unread,
Mar 16, 2015, 6:31:40 PM3/16/15
to Benjamin Armintor, Hydra Community, Robert Sanderson, Fedora Community, islandora, fedora-...@googlegroups.com
+1 to:
"It would be a great service to this community if we designed FCR4 in a way that allowed the ontology, usage patterns, and middleware to be repurposed- in as transparent a way as possible- for other LDP containers." --- especially if you mean "...other LDP <implemenations>"!
Andrew

On Wed, Mar 11, 2015 at 8:51 AM, Benjamin Armintor <armi...@gmail.com> wrote:
The lack of a query API doesn't bother me here, for two reasons:
1. Of the two REST APIs described, one is specific to a resource, and thus really for a particular description
2. I think the other, for all events, would be better served by defining an actual query API, or being built against a triplestore

Obviously this is only my opinion, but I think everything we do around FCR4 needs to be considered as:
* an implementation of LDP and optional extensions (API)
* a model for repository content in LDP
* a set of practices/patterns for interacting with LDP
* separate middleware that combines 2 or more of the above

It would be a great service to this community if we designed FCR4 in a way that allowed the ontology, usage patterns, and middleware to be repurposed- in as transparent a way as possible- for other LDP containers.

- Ben
You received this message because you are subscribed to the Google Groups "Fedora Leaders" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fedora-leader...@googlegroups.com.

Andrew Woods

unread,
Mar 16, 2015, 6:55:24 PM3/16/15
to Benjamin Armintor, Hydra Community, Arif Shaon, David Wilcox, Fedora Community, islandora, fedora-...@googlegroups.com
Hello Ben,
I believe questions 2) and 3) are best answered by Esmé. As for 1), I find the internal/external distinction useful for clarifying which events should be captured by F4, itself, versus events that are simply imported. I was not imagining that there would be a field, per se, in the audit log indicating internal/external.
Thanks,
Andrew

On Wed, Mar 11, 2015 at 9:11 AM, Benjamin Armintor <armi...@gmail.com> wrote:
I'm looking over the event types, and I have some questions:
1. What does the internal/external distinction offer us that identifying an Agent does not?
2. What do the two modification event types offer us besides restating the rdf:type of the object related to the event?
3. How does derivative creation differ from a creation/modification event for the derivative?

- Ben
On Wed, Mar 11, 2015 at 8:12 AM, Andrew Woods <awo...@duraspace.org> wrote:
You received this message because you are subscribed to the Google Groups "hydra-community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hydra-communi...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Fedora Leaders" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fedora-leader...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages