Re: REVISED TAXONOMY FOR LTDP for SNIA's CLOUD ARCHIVE SIG

65 views
Skip to first unread message

LTDPRM

unread,
Jul 15, 2011, 5:44:24 PM7/15/11
to Sebastian Zangaro, Cloud Archive SIG, ltd...@googlegroups.com
All

Suggested improvements below - the more I think about this the more I want us to clearly distinguish between an Archive, digital preservation,  and a Digital Preservation service.  I had to add the term Digital Preservation Service to do this though.  By separating archive away from preservation and a preservation service away from preservation we now end up with:
 
a] Digital Archive - infrastructure (a storage repository with special capabilities but not exclusively focused on long-term or digital preservation)

b] Digital Preservation - keeping information reliable and authentic over time for information consumers - not about how you did it.  Information is preserved when I can reuse it and know it is authentic 100 years from now.  This has nothing to do with how it was preserved, just that it was.  And, that is all that counts.  And unlike typical uses pointed out in Priscilla Caplan's note today, Preservation is not about preventing bit rot or dealing with format conversion -- those are only several of many practice details that obfuscate the goal. 

c] Digital preservation service, DPS - the management, curation, and infrastructure required to achieve digital preservation.  An archive is a component of a DPS, necessary but not sufficient. 

Revised Proposed Definitions:

  • Cloud Archive Service: (n)  [Long-Term Digital Retention and Preservation]
    • A cloud-based service providing a specialized online storage repository for purposes of compliance, litigation support, and/or retention for extended periods of time including long-term. An archive is used as infrastructure to support digital preservation, and thus is a component of a complete digital preservation service. 

  •  Digital Archive: (n)   [Long-Term Digital Retention and Preservation]
    • A specialized storage repository with supporting data and storage services used to secure, retain, and protect digital information and data  for extended periods of time including long-term.    

  • Digital Preservation:  (n)  [Long-Term Digital Retention and Preservation]   
    • A digital object is preserved when information-consumers can access, examine, reuse, and interpret digital information and verify it as authentic over any period of time including long-term.  The goals of digital preservation are to keep any designated digital object accessible, interpretable, secure, reliable, and authentic over time.   (see "preservation object")

  • Digital Preservation Service: (n) [Long-Term Digital Retention and Preservation]
    • A service providing digital preservation of information and data. A digital preservation service is a comprehensive management and curation function that controls its supporting infrastructure, information, data, and storage services in accordance with the requirements of the information objects it manages to accomplish the goals of digital preservation.  


  • Digital Auditing: (n)  [Long Term Digital Retention and Preservation]
    • A methodology to assure the long-term maintenance of the accessibility, protection, and authenticity of digital objects held in a digital archive using rigorous cryptographic techniques. Digital auditing is a process of routine periodic testing of stored digital objects, comparing their previous digital signature and secure time stamp to their current to verify that change, loss of access, or data loss has not occurred. Digital audit methods must force the service to actually compute a new hash-value each time the service is requested to overcome security vulnerabilities. 


  •   Preservation Object: (n)   [Long Term Digital Retention and Preservation]
    • A digital information object consisting of indexes, fixity, audit logs, data files, reference information, and metadata wrapped into a single digital container.  A preservation object provides the functionality required to assure the future ability to use, secure, interpret, and verify authenticity of the information or data in the container and is the foundational element for digital preservation of information and data. 
 

Please comment. 


Best regards,
Michael Peterson


(805)201-3178   |  mpet...@ltdprm.com
www.ltdprm.org  
|  www.ilm20.org

On Jul 15, 2011, at 7:43 AM, Sebastian Zangaro wrote:

Submitter's message
Updated Taxonomy document with the following definitions:


1. Cloud Archive Service (n.) [Long Term Digital Retention and Preservation]
A service that provides digital preservation or long-term retention offered over a WAN.
 A cloud-based archive providing a service for digital preservation and/or long-term retention.

2. Archive (n., also archives) [Long Term Digital Retention and Preservation]
 An archive is a specialized repository used to retain digital information and data.

3. Digital Preservation (n.) [Long Term Digital Retention and Preservation]
 Services that provide curation, protection, and assurance of the ability of information-consumers to access, reuse, and interpret digital information objects as authentic over any period of time including long-term.
The ongoing management and orchestration of comprehensive practices and services that enable information-consumers to access, examine, reuse, and interpret digital information objects and to verify them as authentic over any period of time including long-term.

4. Digital Auditing: [Long Term Digital Retention and Preservation]
 A methodology to assure the long-term maintenance of the accessibility, protection, and authenticity of digital objects held in a digital archive using rigorous cryptographic techniques. Digital auditing is a process of routine periodic testing of stored digital objects, comparing their previous digital signature and secure time stamp to their current to verify that change, loss of access, or data loss has not occurred. Digital audit methods must force the service to actually compute a new hash-value each time the service is requested to overcome security vulnerabilities.

5. Preservation Object [Long Term Digital Retention and Preservation]
 A digital container (a digital information object) that provides functionality required to assure the future ability to use, interpret, and verify authenticity of the information or data in the container.




-- Mr. Sebastian Zangaro
Document Name: Cloud Archive SIG Taxonomy Terms for SNIA Dictionary

Description
This document contains the initial terms being proposed to be added to the
SNIA dictionary.
Download Latest Revision

Submitter: Mr. Sebastian Zangaro
Group: Cloud Archive SIG
Folder: Charter, Business Plan and Formation Document
Date submitted: 2011-07-15 07:43:26
Revision: 1


LTDPRM

unread,
Jul 15, 2011, 8:08:25 PM7/15/11
to Henry Gladney, Cloud Archive SIG, ltd...@googlegroups.com, John Swinden, Peter Farwell
Henry

Thanks for chipping in.  Since posting this I received a note from Bob with a similar thought. So, here is an additional round of edits.

1] I wrote the intro to try to frame the thinking and see the "necessary but not sufficient" afterthought was not appropriate. I agree that an archive is not a necessary component of a DPS.  

2] Use of the term 'archive' to insinuate an organization - is an artifact from OAIS and as you say is in common use.  To me now, all the more reason to separate from the term.  You also challenged me to articulate some new thinking that we can use to stimulate this conversation and here is one. -- Push an 'archive' into the corner as a repository, and differentiate digital preservation away from being tightly coupled to or dependent upon an archive.  With the exception of the library/gov/cultural heritage communities, I think it is pretty safe to say that the rest of the world is not connecting the two. Even the traditional community speaks out of both sides of their mouths and most definitions of preservation make no reference to the infrastructure, but instead focus on the information. They are confused terms and we can bring clarity to the table. 

3] digital auditing -  Richard Pierce Moses pointed this out as well.  "Digital Auditing" is a well known storage term of over 15 yrs of age and we are defining a data service not to be confused with a financial audit.  Thus, the "...ing" added to the term to help differentiate it.  I'm not inclined to let go of this term, yet.

4]  Preservation Object -- another ISO term from OAIS's AIP, SIP, DIP days.  Also much maligned and poorly defined across the industry so an opportunity for this effort as it ties so nicely with the goals of digital preservation.  I think we need to separate the term from the quality measures -- adding "trustworthy" connotes a special quality and since there are 5+ PO software forms in the market already, let's deal with their quality separately.  This leaves room for differentiation which is good. 

I see the comparison to paper idea surfaces and think explanation is separate from definition. May we push that into another piece. 

On the thought of representation, Yes, we are also caught in semantics and you found it, thanks.  In a digital IT context, digital information is an "object" by definition as it is comprised of the data plus its metadata contained in some digital wrapper often called a file or as a special case, a Preservation Object.  So, unlike OAIS which says, an information object can be either physical or digital, let's focus on the digital -- and to your point, add the word "digital" before all of these terms. 

Information Object: A Data Object together with its Representation Information. An Information Object is composed of a Data Object that is either physical or digital, and the Representation Information that allows for the full interpretation of the data into meaningful information.
o Data Object: Either a Physical Object or a Digital Object.
o Digital Object: An object composed of a set of bit sequences. (Source: OAIS)

Oh, and as I looked up references today, I found an academic paper on these topics - "Defining a Preservation Object."  Even used your work as a reference - so we found another one... (see attached)  However, I will note that they threw their hands up and adopted a philosophical metaphor to define information and preservation object so in the end added no value to the conversation, in my opinion...  


I do want to know people's thoughts on the concepts being espoused here - the separation of archive from digital information preservation is not new - but poorly understood.  The attached paper by Quisbert points this out well.  It clearly states: 

Abstract In this paper, we discuss long-term digital preservation from an information perspective, rather than the predominant approaches; the Archival and the Technocratic Approach. Information lives longer than people, tools (software) and organizations live. The Information Continuum Model provides support for this standpoint. However, we find that there exists no concept to support practical action in preservation from the information perspective. Existing concepts as information object, digital object, preservation object, electronic record, information package and significant properties are context dependent and focus on the object to be preserved, rather than preservation of information. Consequently, they are not suitable for realizing the information perspective in long-term digital preservation. The concept of Digital Information Preservation Object is therefore introduced and a tentative definition of the concept is presented.

And as Henry is declaring loud and clear - the focus on what Quisbert calls the "Archival and Technocratic Approach" has failed to produce any measurable progress over the past 10 years. 

I'll keep updating these definitions as comments come in.  Please keep the googlegroups reflector on the distribution list. 

Best regards,
Michael Peterson

LTDRP-logo_whiteback-sloganx200.jpg
Quisbert_Towards-Def-Preservation-Object.pdf

MPeterson

unread,
Jul 15, 2011, 8:17:16 PM7/15/11
to LTDPRM
All,
There is a parallel discussion underway in the digital curation group
- on this thread. Take a look and note the differences in approach
and viewpoint. We do have a number of semantic challenges in this
domain.

http://groups.google.com/group/digital-curation/browse_thread/thread/7a1a699194fd5c6b?hl=en

LTDPRM

unread,
Jul 15, 2011, 10:02:51 PM7/15/11
to Charles Dollar, Cloud Archive SIG, ltd...@googlegroups.com
Charles
Thanks, I have actually - and Richard jumped in on this discussion a week ago and I looked at his thinking and the SAA glossary again and was not happy with what I see relative to our goals.  (And, I don't declare that what I've put out so far is right - it is clearly a work in progress and will be further refined and normalized as we think through all the implications.)  So, you raise a good question.  Let me think through what we're doing following the approach I'm progressing. 

 I believe that all that we're really doing is trying to be a lot more precise than previous taxonomies - by drawing the distinction between what digital preservation is all about (the information) and separating it away from the generic term  'archive' and then differentiating it  by using the term "digital archive" and then putting it into a services context which is an important way to look at complex systems, especially in the day of cloud-based services.   As you know and as we all fight, these terms have and are used so many different ways that we have to spend some time talking about them together to unify our thinking and I find that by putting things down where we can review them, we make progress.   I also find context and precision useful for this purpose. For example, we defined the context narrowly to long-term digital preservation and then by focusing on "digital archive" as different than "archive" we have a useful distinction. An archive is a building, an organization, a service, a place, a box, a shelf, a repository and used so many different ways that have nothing to do with digital preservation.  I found that 'digital archive' accommodates all the ways that IT communities and vendors use the term, independent of digital preservation, and by separating infrastructure from preservation, we no longer have a conflict to fight. With separation, definitions and architectures can align with a common set of semantics.   Will 'archivists' accept the narrower interpretation for this use case?  Can we get away with this if it is useful? I submit that at worst we have a story to tell.  

Seeing the JISC definition of digital preservation today in Priscilla Caplan's comments also stimulated me and then seeing the discussion in the Quisbert paper in which he declares that  the "Archival and Technocratic Approach" has failed to produce any measurable progress over the past 10 years, because it failed to focus on preservation as an information problem -- stimulated me further to try to tie these disparate pieces together and here we are.  I should draw a mind map now showing how all these pieces fit together since they are associated.  

Stepping back and taking a page from Henry, we need a story that gets people talking about these topics.  This approach is charting a path through the maze, not new, not different - just with a focus on information rather than infrastructure, that will lead us to be able to frame and talk about architectures, practices, methods, etc. with an interesting and profound voice.  I think we need to keep exploring it. 

All of this ties to the discussion of 'audit", TDRs, architectures, and requirements. If we're to add value to the industry, we have to define and agree on the goals and the framework for the approaches we take. This simple taxonomy work could be profoundly important...   

I look forward to exploring this further with you.


Best regards,
Michael Peterson


(805)201-3178   |  mpet...@ltdprm.com
www.ltdprm.org  
|  www.ilm20.org

On Jul 15, 2011, at 6:00 PM, Charles Dollar wrote:

Michael,
 
Thanks for your thoughts about how to articulate a taxonomy for digital preservation.
 
My first impressions is that your categories -digital archives, digital preservation,  and archives, among others may not resonate with archivists.  Have you looked at the glossary that Richard Pearce-Moses published several years ago.
 
Many thanks.
 
Charles
 

MPeterson

unread,
Jul 15, 2011, 10:12:03 PM7/15/11
to LTDPRM
7/15 REVISIONS BASED ON TODAY'S FEEDBACK


Cloud Digital Archive Service: (n)  [Long-Term Digital Retention and
Preservation]
A cloud-based service providing a specialized online storage
repository for purposes of compliance, litigation support, and/or
retention for extended periods of time including long-term. A digital
archive can be an infrastructure component of a complete digital
preservation service. 

Digital Archive: (n)   [Long-Term Digital Retention and Preservation]
A specialized storage repository with supporting data and storage
services used to secure, retain, and protect digital information and
data for extended periods of time including long-term.   An archive
can be a component of a digital preservation service, but is not
sufficient by itself to accomplish digital preservation.
Digital Preservation Object: (n)   [Long Term Digital Retention and
Preservation]
A special type of a digital information object consisting of indexes,
fixity, audit logs, data files, reference information, and metadata
wrapped into a single or compound digital container.  A preservation
object provides the functionality required to assure the future
ability to use, secure, interpret, and verify authenticity of the
metadata, information, and data in the container and is the

MPeterson

unread,
Jul 19, 2011, 5:55:32 PM7/19/11
to LTDPRM
7/20/11 UPDATES TO THE PROPOSED DEFINITIONS

I included a taxonomy and the ISO 14721 definition for an "Archive"
-- which I also suggest we progress as we need to revise the SNIA
dictionary for this term as well and replace the old thinking in the
dictionary

What I believe I like best about this approach, as illustrated in the
'taxonomy' [ http://groups.google.com/group/ltdprm/t/a2e7a7124270ee8b
], is that we finally identified a way to overcome the confusion over
the term archive by simply separating it into component pieces

"An archive" - the organization and structure to retain physical and
digital information and data - let's accept the ISO definition and
concept
"physical archive" - the service providing retention of physical
artifacts
"digital archive" - the service providing retention of digital
information and data - but not preservation (Most archives do not
provide digital preservation so this is appropriate and accommodates
all IT uses of the term as well.)
"digital Preservation" - defined in an information-context, not an
infrastructure, services or obstacle context. Note, preservation does
not require and is fully independent of a digital archive. "no
dependencies"
"Digital Preservation Service" - the service (not a system) providing
all necessary functions to assure digital preservation of digital
information and data over the lifecycle of the managed digital
objects.

I hope this helps.

Michael Peterson


Archive: (n) [Retention and Preservation]
An organization of people and systems that have accepted the
responsibility to (protect, retain, and) preserve information and data
and make it available for a Designated Community. (Source: ISO 14721)

Cloud Digital Archive Service: (n)  [Long-Term Digital Retention and
Preservation]
A cloud-based service providing a specialized online storage
repository for purposes of compliance, litigation support, and/or
retention for extended periods of time, not including “long-term.” A
cloud digital archive service can be utilized as a component of a
complete digital preservation service, but does not provide adequate
services to accomplish digital preservation. 

Digital Archive: (n)   [Long-Term Digital Retention and Preservation]
A specialized storage repository or service with supporting data and
storage services used to secure, retain, and protect digital
information and data for extended periods of time, not including “long-
term.”   A digital archive can be an infrastructure component of a
complete digital preservation service, but is not sufficient by itself
to accomplish digital preservation.

Digital Preservation:  (n)  [Long-Term Digital Retention and
Preservation]   
A digital object is preserved when information-consumers can access,
examine, reuse, and interpret digital information and verify it as
authentic over any period of time including long-term.  The goals of
digital preservation are to keep any designated digital object
accessible, interpretable, secure, reliable, and authentic over its
lifetime.   (see "preservation object")

Digital Preservation Service: (n) [Long-Term Digital Retention and
Preservation]
A service providing digital preservation of information and data. A
digital preservation service iincludes a comprehensive management and
curation function that controls its supporting infrastructure,
information, data, and storage services in accordance with the
requirements of the information objects it manages to accomplish the
goals of digital preservation.  

Digital Auditing: (n)  [Long Term Digital Retention and Preservation]
A methodology to assure the long-term maintenance of the
accessibility, protection, and authenticity of digital objects held in
a digital archive or digital preservation service using rigorous
cryptographic techniques. Digital auditing is a process of routine
periodic testing of stored digital objects, comparing their previous
digital signature and secure time stamp to their current to verify
that change, loss of access, or data loss has not occurred.

Digital Preservation Object: (n)   [Long Term Digital Retention and
Preservation]
A special type of a digital information object consisting of indexes,
fixity, audit logs, data files, reference information, and metadata
wrapped into a single or compound digital container.  A preservation
object provides the functionality required to assure the future
ability to use, secure, interpret, and verify authenticity of the
metadata, information, and data in the container and is the

Michael Peterson

unread,
Jul 21, 2011, 12:34:47 AM7/21/11
to LTDPRM
Update to "Digital Auditing" - because of confusion with physical
audits, let's change the term to "Digital Object Auditing" -


Digital Object Auditing: (n)  [Long Term Digital Retention and
Preservation]
A methodology to assure the long-term maintenance of the
accessibility, protection, and authenticity of digital objects held in
a digital archive or digital preservation service using rigorous
cryptographic techniques. Digital object auditing is a process of
Reply all
Reply to author
Forward
0 new messages