For those of you who haven't seen the Science piece I wrote on accessible reproducible research it discusses the need for 2 things - a system to capture the analysis automatically and then an easy way to embed in the manuscript itself - accessible in this case means to someone who doesn't program and never wants to.
I'm enclosing the relevant links for your amusement - there's a video that shows the doc in action.
This is GREAT - and exactly the type of thing that we were envisioning for the "Source Content" feature of PDF 2.0.
I'd personally love to see a companion plugin for Adobe Acrobat and/or Reader to enable...the Word plugin would embed the necessary information into the produced PDF which could be picked up in Acrobat/Reader and enable the same views, reruns, etc.
-----Original Message----- From: beyond-the-pdf@googlegroups.com [mailto:beyond-the-pdf@googlegroups.com] On Behalf Of Jill Mesirov Sent: Friday, November 12, 2010 6:12 PM To: beyond-the-pdf@googlegroups.com Subject: capturing workflows and embedding in word documents
For those of you who haven't seen the Science piece I wrote on accessible reproducible research it discusses the need for 2 things - a system to capture the analysis automatically and then an easy way to embed in the manuscript itself - accessible in this case means to someone who doesn't program and never wants to.
I'm enclosing the relevant links for your amusement - there's a video that shows the doc in action.
Thanks, Jill. I'm really impressed that you're embedding interactivity in a way that's both easy for the author and seems to suit the science perfectly! Native Mac and Linux versions of this plugin would be interesting; I took a look but don't run Parallels or VMWare.
> This is GREAT - and exactly the type of thing that we were envisioning for > the "Source Content" feature of PDF 2.0.
> I'd personally love to see a companion plugin for Adobe Acrobat and/or > Reader to enable...the Word plugin would embed the necessary information > into the produced PDF which could be picked up in Acrobat/Reader and enable > the same views, reruns, etc.
> Leonard
> -----Original Message----- > From: beyond-the-pdf@googlegroups.com [mailto: > beyond-the-pdf@googlegroups.com] On Behalf Of Jill Mesirov > Sent: Friday, November 12, 2010 6:12 PM > To: beyond-the-pdf@googlegroups.com > Subject: capturing workflows and embedding in word documents
> For those of you who haven't seen the Science piece I wrote on > accessible reproducible research it discusses the need > for 2 things - a system to capture the analysis automatically and then > an easy way to embed in the manuscript itself - accessible in this case > means to someone who doesn't program and never wants to.
> I'm enclosing the relevant links for your amusement - there's a video > that shows the doc in action.
> -- > Jill P. Mesirov, Ph.D. > Associate Director and Chief Informatics Officer > Director, Computational Biology and Bioinformatics
> Broad Institute of MIT and Harvard > 7 Cambridge Center > Cambridge MA 02142 > phone: 617-714-7070 > fax : 617-714-8991 > email: mesi...@broad.mit.edu
The ideas of Utopia are excellent, but their implementation isn't (IMO) the right approach. The PDF itself doesn't contain any of that rich information, so that it can be used/mined/extracted - instead, it appears to be sitting in one (or more) databases or data repositories online that Utopia is able to "magically" locate and then enable.
I'd prefer to see the same user experience (which is quite well done!) applied to a PDF with that type of rich semantics embedded...
Leonard
From: beyond-the-pdf@googlegroups.com [mailto:beyond-the-pdf@googlegroups.com] On Behalf Of Jodi Schneider Sent: Sunday, November 14, 2010 5:19 AM To: beyond-the-pdf@googlegroups.com Subject: Re: capturing workflows and embedding in word documents
Thanks, Jill. I'm really impressed that you're embedding interactivity in a way that's both easy for the author and seems to suit the science perfectly! Native Mac and Linux versions of this plugin would be interesting; I took a look but don't run Parallels or VMWare.
-Jodi On Sun, Nov 14, 2010 at 1:26 AM, Leonard Rosenthol <lrose...@adobe.com<mailto:lrose...@adobe.com>> wrote: This is GREAT - and exactly the type of thing that we were envisioning for the "Source Content" feature of PDF 2.0.
I'd personally love to see a companion plugin for Adobe Acrobat and/or Reader to enable...the Word plugin would embed the necessary information into the produced PDF which could be picked up in Acrobat/Reader and enable the same views, reruns, etc.
-----Original Message----- From: beyond-the-pdf@googlegroups.com<mailto:beyond-the-pdf@googlegroups.com> [mailto:beyond-the-pdf@googlegroups.com<mailto:beyond-the-pdf@googlegroups. com>] On Behalf Of Jill Mesirov Sent: Friday, November 12, 2010 6:12 PM To: beyond-the-pdf@googlegroups.com<mailto:beyond-the-pdf@googlegroups.com> Subject: capturing workflows and embedding in word documents
For those of you who haven't seen the Science piece I wrote on accessible reproducible research it discusses the need for 2 things - a system to capture the analysis automatically and then an easy way to embed in the manuscript itself - accessible in this case means to someone who doesn't program and never wants to.
I'm enclosing the relevant links for your amusement - there's a video that shows the doc in action.
-- Jill P. Mesirov, Ph.D. Associate Director and Chief Informatics Officer Director, Computational Biology and Bioinformatics
Broad Institute of MIT and Harvard 7 Cambridge Center Cambridge MA 02142 phone: 617-714-7070 fax : 617-714-8991 email: mesi...@broad.mit.edu<mailto:mesi...@broad.mit.edu>
This "in the PDF" approach is all good if you are talking only about contributions / annotations of a single person, or about something that is both completely authoritative and public. But there is a strong use case for multiple, shareable perspectives - for example within a lab or collaboration. This is why Steve with Utopia and our MGH + NIF with Annotation Framework use a standoff metadata model, and why we (speaking for myself but I believe Steve is likely to be in agreement) advocate standardizing and opening the model of metadata, which can be done using a fairly simple ontology model.
IMO what you are advocating with the "baked in the PDF" approach is simply the addition of more and more metadata to the existing stuff already there. But if this is not *fully authoritative* metadata, for example? If it is discussion, or comes from multiple sources, or contains contradictory views? You will end up with more and more bloating, among other negative results. Also, what about private metadata, i.e. notes?
If you have a PDF and I have a copy of the same PDF, and we make notes about the same content, we should be able to share - or not share - them freely without getting into all the mess of multiple file copies etc. If there is a group of ten people working on the same problem, they should be able to share equally. This should not require that they all have shared access to a single file copy.
But I also realize PDF has always been about a "self contained" model of information. If you disconnect from the Web, PDFs still work. So is there perhaps a way to implement this as a spectrum where the metadata can exist within or outside of the PDF? In fact, "outside the PDF" means on the Web, and there are various existing and emerging standards for how to do this.
I believe that if we were to achieve agreement on a model of annotation metadata that could exist in the same form within the PDF, or outside the PDF, or both, that would be ideal.
Also - ideally when I open a PDF that contains annotation referencing some entity that is commonly studied or used outside the document itself - e.g. a protein, a database, a reagent, a computational tool, a workflow - my Web browser should just natively be able to connect to all other sources of information about that entity wherever they are on the Web, and use these connections to enhance the information I see without jumping all over the place. Annotation itself is or at least should be, an independently sharable boundary object.
Best
Tim
On Nov 14, 2010, at 9:51 AM, Leonard Rosenthol wrote:
> The ideas of Utopia are excellent, but their implementation isn’t (IMO) the right approach. The PDF itself doesn’t contain any of that rich information, so that it can be used/mined/extracted – instead, it appears to be sitting in one (or more) databases or data repositories online that Utopia is able to “magically” locate and then enable.
> I’d prefer to see the same user experience (which is quite well done!) applied to a PDF with that type of rich semantics embedded…
> Leonard
> From: beyond-the-pdf@googlegroups.com [mailto:beyond-the-pdf@googlegroups.com] On Behalf Of Jodi Schneider > Sent: Sunday, November 14, 2010 5:19 AM > To: beyond-the-pdf@googlegroups.com > Subject: Re: capturing workflows and embedding in word documents
> Thanks, Jill. I'm really impressed that you're embedding interactivity in a way that's both easy for the author and seems to suit the science perfectly! Native Mac and Linux versions of this plugin would be interesting; I took a look but don't run Parallels or VMWare.
> Leonard, you might be interested in looking at the Utopia Documents PDF viewer and enhanced PDFs. I took a look yesterday which I wrote about here: > http://jodischneider.com/blog/2010/11/14/utopia-documents-pulling-sci... > There are 2 short screencasts of some of the interactive content.
> -Jodi
> On Sun, Nov 14, 2010 at 1:26 AM, Leonard Rosenthol <lrose...@adobe.com> wrote: > This is GREAT - and exactly the type of thing that we were envisioning for the "Source Content" feature of PDF 2.0.
> I'd personally love to see a companion plugin for Adobe Acrobat and/or Reader to enable...the Word plugin would embed the necessary information into the produced PDF which could be picked up in Acrobat/Reader and enable the same views, reruns, etc.
> Leonard
> -----Original Message----- > From: beyond-the-pdf@googlegroups.com [mailto:beyond-the-pdf@googlegroups.com] On Behalf Of Jill Mesirov > Sent: Friday, November 12, 2010 6:12 PM > To: beyond-the-pdf@googlegroups.com > Subject: capturing workflows and embedding in word documents
> For those of you who haven't seen the Science piece I wrote on > accessible reproducible research it discusses the need > for 2 things - a system to capture the analysis automatically and then > an easy way to embed in the manuscript itself - accessible in this case > means to someone who doesn't program and never wants to.
> I'm enclosing the relevant links for your amusement - there's a video > that shows the doc in action.
> -- > Jill P. Mesirov, Ph.D. > Associate Director and Chief Informatics Officer > Director, Computational Biology and Bioinformatics
> Broad Institute of MIT and Harvard > 7 Cambridge Center > Cambridge MA 02142 > phone: 617-714-7070 > fax : 617-714-8991 > email: mesi...@broad.mit.edu
I agree about the Mac/Linux versions - Microsoft sponsored the implementation for Windows only but did make the code open source and available. I think the implementation is very .net dependent and we don't have that kind of expertise in house. We'd love to collaborate with Apple on a native Mac version - know anyone who might be interested? J
Jodi Schneider wrote: > Thanks, Jill. I'm really impressed that you're embedding > interactivity in a way that's both easy for the author and seems to > suit the science perfectly! Native Mac and Linux versions of this > plugin would be interesting; I took a look but don't run Parallels or > VMWare.
> Leonard, you might be interested in looking at the Utopia Documents > PDF viewer and enhanced PDFs. I took a look yesterday which I wrote > about here: > http://jodischneider.com/blog/2010/11/14/utopia-documents-pulling-sci... > There are 2 short screencasts of some of the interactive content.
> -Jodi
> On Sun, Nov 14, 2010 at 1:26 AM, Leonard Rosenthol <lrose...@adobe.com > <mailto:lrose...@adobe.com>> wrote:
> This is GREAT - and exactly the type of thing that we were > envisioning for the "Source Content" feature of PDF 2.0.
> I'd personally love to see a companion plugin for Adobe Acrobat > and/or Reader to enable...the Word plugin would embed the > necessary information into the produced PDF which could be picked > up in Acrobat/Reader and enable the same views, reruns, etc.
> Leonard
> -----Original Message----- > From: beyond-the-pdf@googlegroups.com > <mailto:beyond-the-pdf@googlegroups.com> > [mailto:beyond-the-pdf@googlegroups.com > <mailto:beyond-the-pdf@googlegroups.com>] On Behalf Of Jill Mesirov > Sent: Friday, November 12, 2010 6:12 PM > To: beyond-the-pdf@googlegroups.com > <mailto:beyond-the-pdf@googlegroups.com> > Subject: capturing workflows and embedding in word documents
> For those of you who haven't seen the Science piece I wrote on > accessible reproducible research it discusses the need > for 2 things - a system to capture the analysis automatically and then > an easy way to embed in the manuscript itself - accessible in this > case > means to someone who doesn't program and never wants to.
> I'm enclosing the relevant links for your amusement - there's a video > that shows the doc in action.
> -- > Jill P. Mesirov, Ph.D. > Associate Director and Chief Informatics Officer > Director, Computational Biology and Bioinformatics
> Broad Institute of MIT and Harvard > 7 Cambridge Center > Cambridge MA 02142 > phone: 617-714-7070 > fax : 617-714-8991 > email: mesi...@broad.mit.edu <mailto:mesi...@broad.mit.edu>
-- Jill P. Mesirov, Ph.D. Associate Director and Chief Informatics Officer Director, Computational Biology and Bioinformatics
Broad Institute of MIT and Harvard 7 Cambridge Center Cambridge MA 02142 phone: 617-714-7070 fax : 617-714-8991 email: mesi...@broad.mit.edu
There are definitely two different, but potentially related, items here...
1 - richer material included natively into the PDF at the time of publication This is where the actual data (XML, XLS, etc.) would be "attached" to the visual table/graph/chart in the PDF, or the MathML or ChemML (or whatever) associated with a given equation or molecular structure, etc. This would enable the type of extended UI that Utopia present on various elements in the PDF - which are indeed the types of things you want to be able to do, whether you are connected to the internet (or some subset thereof) or not.
2 - annotations, added after publication At Adobe, because we believe that a PDF should be "self-contained", we've approach document collaboration via a "synchronization model". Everyone can work on their own copies of the document, and their comments are submitted (when they want) up to a "repository". At any time, each person can either manually (or automatically) synch their comments with all others in the repository. This gives you the "best of both worlds" as you get individual copies of documents, private and public comments, collaboration on comments (replies, etc.) AND they can also live in the PDF itself for offline viewing/processing.
Which also takes us to another issue, and that's archiving (esp. long term archiving) - which is another reason that the above solutions also work well - in that when the document needs to be "archived off" (be it for personal use, organization use or submission to something like NARA or LOC - or even submission to the FDA) you already have all the necessary pieces.
Leonard
From: beyond-the-pdf@googlegroups.com [mailto:beyond-the-pdf@googlegroups.com] On Behalf Of Tim Clark Sent: Sunday, November 14, 2010 10:19 AM To: beyond-the-pdf@googlegroups.com Subject: Re: capturing workflows and embedding in word documents
Hi Leonard,
This "in the PDF" approach is all good if you are talking only about contributions / annotations of a single person, or about something that is both completely authoritative and public. But there is a strong use case for multiple, shareable perspectives - for example within a lab or collaboration. This is why Steve with Utopia and our MGH + NIF with Annotation Framework use a standoff metadata model, and why we (speaking for myself but I believe Steve is likely to be in agreement) advocate standardizing and opening the model of metadata, which can be done using a fairly simple ontology model.
IMO what you are advocating with the "baked in the PDF" approach is simply the addition of more and more metadata to the existing stuff already there. But if this is not *fully authoritative* metadata, for example? If it is discussion, or comes from multiple sources, or contains contradictory views? You will end up with more and more bloating, among other negative results. Also, what about private metadata, i.e. notes?
If you have a PDF and I have a copy of the same PDF, and we make notes about the same content, we should be able to share - or not share - them freely without getting into all the mess of multiple file copies etc. If there is a group of ten people working on the same problem, they should be able to share equally. This should not require that they all have shared access to a single file copy.
But I also realize PDF has always been about a "self contained" model of information. If you disconnect from the Web, PDFs still work. So is there perhaps a way to implement this as a spectrum where the metadata can exist within or outside of the PDF? In fact, "outside the PDF" means on the Web, and there are various existing and emerging standards for how to do this.
I believe that if we were to achieve agreement on a model of annotation metadata that could exist in the same form within the PDF, or outside the PDF, or both, that would be ideal.
Also - ideally when I open a PDF that contains annotation referencing some entity that is commonly studied or used outside the document itself - e.g. a protein, a database, a reagent, a computational tool, a workflow - my Web browser should just natively be able to connect to all other sources of information about that entity wherever they are on the Web, and use these connections to enhance the information I see without jumping all over the place. Annotation itself is or at least should be, an independently sharable boundary object.
Best
Tim
On Nov 14, 2010, at 9:51 AM, Leonard Rosenthol wrote:
The ideas of Utopia are excellent, but their implementation isn't (IMO) the right approach. The PDF itself doesn't contain any of that rich information, so that it can be used/mined/extracted - instead, it appears to be sitting in one (or more) databases or data repositories online that Utopia is able to "magically" locate and then enable.
I'd prefer to see the same user experience (which is quite well done!) applied to a PDF with that type of rich semantics embedded...
Leonard
From: beyond-the-pdf@googlegroups.com<mailto:beyond-the-pdf@googlegroups.com> [mailto:beyond-the-pdf@googlegroups.com] On Behalf Of Jodi Schneider Sent: Sunday, November 14, 2010 5:19 AM To: beyond-the-pdf@googlegroups.com<mailto:beyond-the-pdf@googlegroups.com> Subject: Re: capturing workflows and embedding in word documents
Thanks, Jill. I'm really impressed that you're embedding interactivity in a way that's both easy for the author and seems to suit the science perfectly! Native Mac and Linux versions of this plugin would be interesting; I took a look but don't run Parallels or VMWare.
-Jodi On Sun, Nov 14, 2010 at 1:26 AM, Leonard Rosenthol <lrose...@adobe.com<mailto:lrose...@adobe.com>> wrote: This is GREAT - and exactly the type of thing that we were envisioning for the "Source Content" feature of PDF 2.0.
I'd personally love to see a companion plugin for Adobe Acrobat and/or Reader to enable...the Word plugin would embed the necessary information into the produced PDF which could be picked up in Acrobat/Reader and enable the same views, reruns, etc.
-----Original Message----- From: beyond-the-pdf@googlegroups.com<mailto:beyond-the-pdf@googlegroups.com> [mailto:beyond-the-pdf@googlegroups.com<mailto:beyond-the-pdf@googlegroups. com>] On Behalf Of Jill Mesirov Sent: Friday, November 12, 2010 6:12 PM To: beyond-the-pdf@googlegroups.com<mailto:beyond-the-pdf@googlegroups.com> Subject: capturing workflows and embedding in word documents
For those of you who haven't seen the Science piece I wrote on accessible reproducible research it discusses the need for 2 things - a system to capture the analysis automatically and then an easy way to embed in the manuscript itself - accessible in this case means to someone who doesn't program and never wants to.
I'm enclosing the relevant links for your amusement - there's a video that shows the doc in action.
-- Jill P. Mesirov, Ph.D. Associate Director and Chief Informatics Officer Director, Computational Biology and Bioinformatics
Broad Institute of MIT and Harvard 7 Cambridge Center Cambridge MA 02142 phone: 617-714-7070 fax : 617-714-8991 email: mesi...@broad.mit.edu<mailto:mesi...@broad.mit.edu>
> This "in the PDF" approach is all good if you are talking only about contributions / annotations of a single person, or about something that is both completely authoritative and public. But there is a strong use case for multiple, shareable perspectives - for example within a lab or collaboration. This is why Steve with Utopia and our MGH + NIF with Annotation Framework use a standoff metadata model, and why we (speaking for myself but I believe Steve is likely to be in agreement) advocate standardizing and opening the model of metadata, which can be done using a fairly simple ontology model.
Tim, we are in total agreement! [And Leonard, thanks for the positive comments about Utopia -- I'm sorry we disagree on the rest!]
I would be very happy indeed to see a PDF in which is both possible (I believe much of it is already) and common practice (sadly, not currently very common) to add additional metadata ('baked in' as Tim nicely puts it). However, this metadata can only ever refer to either a) the article of record, or b) to external data captured at that moment in time. For example, let's say you wished to refer from the PDF to a particular database entry: you have the choice of a) copying that database entry in to the PDF in some form (which could be bloated, but would work offline and be reliable etc), or to b) include a link to whatever the up to date version of the record may be on line, and rely on this being resolved at 'read time', or c) both. If you are prepared to accept b) or c) as a sensible option, then why not use the same mechanism for accessing all the richer data / metadata associated with the article, since the infrastructure for doing b) or c) will be much the same (and reductio ad absurdum, the only thing you need to store in the PDF is a unique ID that allows the rest to be fetched at runtime).
> I believe that if we were to achieve agreement on a model of annotation metadata that could exist in the same form within the PDF, or outside the PDF, or both, that would be ideal.
My view is that metadata for the Article of Record goes in the PDF, size permitting, but also that links are kept to data outside the PDF, which can be resolved at 'read time' to make sure that the PDF is kept as a both an Article of Record (JV's 'minutes of science') and as a 'Living Document' with links to up-to-date data, comments etc. [and referring to my previous whitterings on the subject, as much as I like PDFs and I think they make an excellent 'View', I don't believe they make good vehicles for storing an articles 'Model')
> Also - ideally when I open a PDF that contains annotation referencing some entity that is commonly studied or used outside the document itself - e.g. a protein, a database, a reagent, a computational tool, a workflow - my Web browser should just natively be able to connect to all other sources of information about that entity wherever they are on the Web, and use these connections to enhance the information I see without jumping all over the place. Annotation itself is or at least should be, an independently sharable boundary object.
> Best
> Tim
> On Nov 14, 2010, at 9:51 AM, Leonard Rosenthol wrote:
>> The ideas of Utopia are excellent, but their implementation isn’t (IMO) the right approach. The PDF itself doesn’t contain any of that rich information, so that it can be used/mined/extracted – instead, it appears to be sitting in one (or more) databases or data repositories online that Utopia is able to “magically” locate and then enable.
>> I’d prefer to see the same user experience (which is quite well done!) applied to a PDF with that type of rich semantics embedded…
>> Leonard
>> From: beyond-the-pdf@googlegroups.com [mailto:beyond-the-pdf@googlegroups.com] On Behalf Of Jodi Schneider >> Sent: Sunday, November 14, 2010 5:19 AM >> To: beyond-the-pdf@googlegroups.com >> Subject: Re: capturing workflows and embedding in word documents
>> Thanks, Jill. I'm really impressed that you're embedding interactivity in a way that's both easy for the author and seems to suit the science perfectly! Native Mac and Linux versions of this plugin would be interesting; I took a look but don't run Parallels or VMWare.
>> Leonard, you might be interested in looking at the Utopia Documents PDF viewer and enhanced PDFs. I took a look yesterday which I wrote about here: >> http://jodischneider.com/blog/2010/11/14/utopia-documents-pulling-sci... >> There are 2 short screencasts of some of the interactive content.
>> -Jodi
>> On Sun, Nov 14, 2010 at 1:26 AM, Leonard Rosenthol <lrose...@adobe.com> wrote: >> This is GREAT - and exactly the type of thing that we were envisioning for the "Source Content" feature of PDF 2.0.
>> I'd personally love to see a companion plugin for Adobe Acrobat and/or Reader to enable...the Word plugin would embed the necessary information into the produced PDF which could be picked up in Acrobat/Reader and enable the same views, reruns, etc.
>> Leonard
>> -----Original Message----- >> From: beyond-the-pdf@googlegroups.com [mailto:beyond-the-pdf@googlegroups.com] On Behalf Of Jill Mesirov >> Sent: Friday, November 12, 2010 6:12 PM >> To: beyond-the-pdf@googlegroups.com >> Subject: capturing workflows and embedding in word documents
>> For those of you who haven't seen the Science piece I wrote on >> accessible reproducible research it discusses the need >> for 2 things - a system to capture the analysis automatically and then >> an easy way to embed in the manuscript itself - accessible in this case >> means to someone who doesn't program and never wants to.
>> I'm enclosing the relevant links for your amusement - there's a video >> that shows the doc in action.
>> -- >> Jill P. Mesirov, Ph.D. >> Associate Director and Chief Informatics Officer >> Director, Computational Biology and Bioinformatics
>> Broad Institute of MIT and Harvard >> 7 Cambridge Center >> Cambridge MA 02142 >> phone: 617-714-7070 >> fax : 617-714-8991 >> email: mesi...@broad.mit.edu
I have no problem with there being external references and other information in a PDF - but as you note, there is the related issue of standardizing where/how such information is referenced. What format? Where in the PDF? Etc. If we could all agree on what goes in and how, then we now have interoperability and that's the most important aspect.
Today there is no question that PDF is just the 'view' of the MVC model. However, our goal going forward is to add the 'model' to that - so that not only do you have a specific view, but you also have all the necessary pieces to go back to "edit mode" with the model and perhaps even recreate a view. (this may be the entire PDF or just some subsection of it) PDF already has all the necessary components (and many nice-to-have optional) for doing this - but it's all about standardizing how it gets done and then building the tooling...
Leonard
From: beyond-the-pdf@googlegroups.com [mailto:beyond-the-pdf@googlegroups.com] On Behalf Of Steve Pettifer Sent: Sunday, November 14, 2010 1:50 PM To: beyond-the-pdf@googlegroups.com Subject: Re: capturing workflows and embedding in word documents
This "in the PDF" approach is all good if you are talking only about contributions / annotations of a single person, or about something that is both completely authoritative and public. But there is a strong use case for multiple, shareable perspectives - for example within a lab or collaboration. This is why Steve with Utopia and our MGH + NIF with Annotation Framework use a standoff metadata model, and why we (speaking for myself but I believe Steve is likely to be in agreement) advocate standardizing and opening the model of metadata, which can be done using a fairly simple ontology model.
Tim, we are in total agreement! [And Leonard, thanks for the positive comments about Utopia -- I'm sorry we disagree on the rest!]
I would be very happy indeed to see a PDF in which is both possible (I believe much of it is already) and common practice (sadly, not currently very common) to add additional metadata ('baked in' as Tim nicely puts it). However, this metadata can only ever refer to either a) the article of record, or b) to external data captured at that moment in time. For example, let's say you wished to refer from the PDF to a particular database entry: you have the choice of a) copying that database entry in to the PDF in some form (which could be bloated, but would work offline and be reliable etc), or to b) include a link to whatever the up to date version of the record may be on line, and rely on this being resolved at 'read time', or c) both. If you are prepared to accept b) or c) as a sensible option, then why not use the same mechanism for accessing all the richer data / metadata associated with the article, since the infrastructure for doing b) or c) will be much the same (and reductio ad absurdum, the only thing you need to store in the PDF is a unique ID that allows the rest to be fetched at runtime).
I believe that if we were to achieve agreement on a model of annotation metadata that could exist in the same form within the PDF, or outside the PDF, or both, that would be ideal.
My view is that metadata for the Article of Record goes in the PDF, size permitting, but also that links are kept to data outside the PDF, which can be resolved at 'read time' to make sure that the PDF is kept as a both an Article of Record (JV's 'minutes of science') and as a 'Living Document' with links to up-to-date data, comments etc. [and referring to my previous whitterings on the subject, as much as I like PDFs and I think they make an excellent 'View', I don't believe they make good vehicles for storing an articles 'Model')
Best wishes
Steve
Also - ideally when I open a PDF that contains annotation referencing some entity that is commonly studied or used outside the document itself - e.g. a protein, a database, a reagent, a computational tool, a workflow - my Web browser should just natively be able to connect to all other sources of information about that entity wherever they are on the Web, and use these connections to enhance the information I see without jumping all over the place. Annotation itself is or at least should be, an independently sharable boundary object.
Best
Tim
On Nov 14, 2010, at 9:51 AM, Leonard Rosenthol wrote:
The ideas of Utopia are excellent, but their implementation isn't (IMO) the right approach. The PDF itself doesn't contain any of that rich information, so that it can be used/mined/extracted - instead, it appears to be sitting in one (or more) databases or data repositories online that Utopia is able to "magically" locate and then enable.
I'd prefer to see the same user experience (which is quite well done!) applied to a PDF with that type of rich semantics embedded...
Leonard
From: beyond-the-pdf@googlegroups.com<mailto:beyond-the-pdf@googlegroups.com> [mailto:beyond-the-pdf@googlegroups.com] On Behalf Of Jodi Schneider Sent: Sunday, November 14, 2010 5:19 AM To: beyond-the-pdf@googlegroups.com<mailto:beyond-the-pdf@googlegroups.com> Subject: Re: capturing workflows and embedding in word documents
Thanks, Jill. I'm really impressed that you're embedding interactivity in a way that's both easy for the author and seems to suit the science perfectly! Native Mac and Linux versions of this plugin would be interesting; I took a look but don't run Parallels or VMWare.
-Jodi On Sun, Nov 14, 2010 at 1:26 AM, Leonard Rosenthol <lrose...@adobe.com<mailto:lrose...@adobe.com>> wrote: This is GREAT - and exactly the type of thing that we were envisioning for the "Source Content" feature of PDF 2.0.
I'd personally love to see a companion plugin for Adobe Acrobat and/or Reader to enable...the Word plugin would embed the necessary information into the produced PDF which could be picked up in Acrobat/Reader and enable the same views, reruns, etc.
-----Original Message----- From: beyond-the-pdf@googlegroups.com<mailto:beyond-the-pdf@googlegroups.com> [mailto:beyond-the-pdf@googlegroups.com<mailto:beyond-the-pdf@googlegroups. com>] On Behalf Of Jill Mesirov Sent: Friday, November 12, 2010 6:12 PM To: beyond-the-pdf@googlegroups.com<mailto:beyond-the-pdf@googlegroups.com> Subject: capturing workflows and embedding in word documents
For those of you who haven't seen the Science piece I wrote on accessible reproducible research it discusses the need for 2 things - a system to capture the analysis automatically and then an easy way to embed in the manuscript itself - accessible in this case means to someone who doesn't program and never wants to.
I'm enclosing the relevant links for your amusement - there's a video that shows the doc in action.
-- Jill P. Mesirov, Ph.D. Associate Director and Chief Informatics Officer Director, Computational Biology and Bioinformatics
Broad Institute of MIT and Harvard 7 Cambridge Center Cambridge MA 02142 phone: 617-714-7070 fax : 617-714-8991 email: mesi...@broad.mit.edu<mailto:mesi...@broad.mit.edu>
I thought I would mention some of the work we are doing on reproducible publications in the context of VisTrails project.
Some background: VisTrails (http://www.vistrails.org) is an open-source data analysis and visualization tool that combines and extends features of scientific workflows and visualization systems. A distinguishing feature of VisTrails is its provenance infrastructure: VisTrails maintains provenance of data products (e.g., visualizations, plots), of the workflows that derive these products and their executions.
Essentially, as you explore data and create visualizations, VisTrails captures all the steps transparently. Once you get a result you like, you can 'publish' it in different ways. The video shows how this is done for a LateX document, wiki, and powerpoint presentation.
We have also integrated this capability with CrowdLabs (http://www.crowdlabs.org), a social Web site where users can share not only their results, but the specifications of the analysis that derived the results and their provenance. Through CrowdLabs, it is also possible to publish mashups that allows users to interactively manipulate the results (e.g., try different parameters) without having to install and run VisTrails on their desktop. For an example, see http://www.crowdlabs.org/vistrails/medleys/details/24
VisTrails and the reproducible publication package run on Mac, LinuX and Window.
> I agree about the Mac/Linux versions - Microsoft sponsored the implementation for Windows only but did make the code open source and available. I think the implementation is very .net dependent and we don't have that kind of expertise in house. We'd love to collaborate with Apple on a native Mac version - know anyone who might be interested? > J
> Jodi Schneider wrote: >> Thanks, Jill. I'm really impressed that you're embedding interactivity in a way that's both easy for the author and seems to suit the science perfectly! Native Mac and Linux versions of this plugin would be interesting; I took a look but don't run Parallels or VMWare. >> Leonard, you might be interested in looking at the Utopia Documents PDF viewer and enhanced PDFs. I took a look yesterday which I wrote about here: >> http://jodischneider.com/blog/2010/11/14/utopia-documents-pulling-sci... >> There are 2 short screencasts of some of the interactive content.
>> -Jodi
>> On Sun, Nov 14, 2010 at 1:26 AM, Leonard Rosenthol <lrose...@adobe.com <mailto:lrose...@adobe.com>> wrote:
>> This is GREAT - and exactly the type of thing that we were >> envisioning for the "Source Content" feature of PDF 2.0.
>> I'd personally love to see a companion plugin for Adobe Acrobat >> and/or Reader to enable...the Word plugin would embed the >> necessary information into the produced PDF which could be picked >> up in Acrobat/Reader and enable the same views, reruns, etc.
>> Leonard
>> -----Original Message----- >> From: beyond-the-pdf@googlegroups.com >> <mailto:beyond-the-pdf@googlegroups.com> >> [mailto:beyond-the-pdf@googlegroups.com >> <mailto:beyond-the-pdf@googlegroups.com>] On Behalf Of Jill Mesirov >> Sent: Friday, November 12, 2010 6:12 PM >> To: beyond-the-pdf@googlegroups.com >> <mailto:beyond-the-pdf@googlegroups.com> >> Subject: capturing workflows and embedding in word documents
>> For those of you who haven't seen the Science piece I wrote on >> accessible reproducible research it discusses the need >> for 2 things - a system to capture the analysis automatically and then >> an easy way to embed in the manuscript itself - accessible in this >> case >> means to someone who doesn't program and never wants to.
>> I'm enclosing the relevant links for your amusement - there's a video >> that shows the doc in action.
>> -- >> Jill P. Mesirov, Ph.D. >> Associate Director and Chief Informatics Officer >> Director, Computational Biology and Bioinformatics
>> Broad Institute of MIT and Harvard >> 7 Cambridge Center >> Cambridge MA 02142 >> phone: 617-714-7070 >> fax : 617-714-8991 >> email: mesi...@broad.mit.edu <mailto:mesi...@broad.mit.edu>
> -- > Jill P. Mesirov, Ph.D. > Associate Director and Chief Informatics Officer > Director, Computational Biology and Bioinformatics
> Broad Institute of MIT and Harvard > 7 Cambridge Center > Cambridge MA 02142 phone: 617-714-7070 > fax : 617-714-8991 > email: mesi...@broad.mit.edu
I agree with Steve and Tim here: while some metadata can and should be stored with the pdf but not all of it can be.
In particular, I'm thinking about the provenance of the work. Provenance by it's very nature goes beyond the pdf itself and can be much much bigger than the pdf. For example, we've done some work where we maintain a reproducible representation of the results of an astronomy workflow by maintaining a virtual machine image along with the workflow itself. Obviously, this is an extreme case, but I doubt people are going to want to embed a 3GB virtual machine image in their pdfs.
Essentially, we need both embedding in the pdf and linking to the outside and we need some nice guidance for how to do this.
Steve Pettifer wrote: >> This "in the PDF" approach is all good if you are talking only about >> contributions / annotations of a single person, or about something >> that is both completely authoritative and public. But there is a >> strong use case for multiple, shareable perspectives - for example >> within a lab or collaboration. This is why Steve with Utopia and our >> MGH + NIF with Annotation Framework use a standoff metadata model, and >> why we (speaking for myself but I believe Steve is likely to be in >> agreement) advocate standardizing and opening the model of metadata, >> which can be done using a fairly simple ontology model.
> Tim, we are in total agreement! [And Leonard, thanks for the positive > comments about Utopia -- I'm sorry we disagree on the rest!]
> I would be very happy indeed to see a PDF in which is both possible (I > believe much of it is already) and common practice (sadly, not currently > very common) to add additional metadata ('baked in' as Tim nicely puts > it). However, this metadata can only ever refer to either a) the article > of record, or b) to external data captured at that moment in time. For > example, let's say you wished to refer from the PDF to a particular > database entry: you have the choice of a) copying that database entry in > to the PDF in some form (which could be bloated, but would work offline > and be reliable etc), or to b) include a link to whatever the up to date > version of the record may be on line, and rely on this being resolved at > 'read time', or c) both. If you are prepared to accept b) or c) as a > sensible option, then why not use the same mechanism for accessing all > the richer data / metadata associated with the article, since the > infrastructure for doing b) or c) will be much the same (and reductio ad > absurdum, the only thing you need to store in the PDF is a unique ID > that allows the rest to be fetched at runtime).
>> I believe that if we were to achieve agreement on a model of >> annotation metadata that could exist in the same form _within the >> PDF_, or _outside the PDF_, or _both_, that would be ideal.
> My view is that metadata for the Article of Record goes in the PDF, size > permitting, but also that links are kept to data outside the PDF, which > can be resolved at 'read time' to make sure that the PDF is kept as a > both an Article of Record (JV's 'minutes of science') and as a 'Living > Document' with links to up-to-date data, comments etc. [and referring to > my previous whitterings on the subject, as much as I like PDFs and I > think they make an excellent 'View', I don't believe they make good > vehicles for storing an articles 'Model')
> Best wishes
> Steve
>> Also - ideally when I open a PDF that contains annotation referencing >> some entity that is commonly studied or used outside the document >> itself - e.g. a protein, a database, a reagent, a computational tool, >> a workflow - my Web browser should just natively be able to connect to >> all other sources of information about that entity wherever they are >> on the Web, and use these connections to enhance the information I see >> without jumping all over the place. Annotation itself is or at least >> should be, an independently sharable boundary object.
>> Best
>> Tim
>> On Nov 14, 2010, at 9:51 AM, Leonard Rosenthol wrote:
>>> The ideas of Utopia are excellent, but their implementation isn t >>> (IMO) the right approach. The PDF itself doesn t contain any of that >>> rich information, so that it can be used/mined/extracted instead, >>> it appears to be sitting in one (or more) databases or data >>> repositories online that Utopia is able to magically locate and >>> then enable. >>> I d prefer to see the same user experience (which is quite well >>> done!) applied to a PDF with that type of rich semantics embedded >>> Leonard >>> *From:*beyond-the-pdf@googlegroups.com >>> <mailto:beyond-the-pdf@googlegroups.com>[mailto:beyond-the-pdf@googlegroups .com]*On >>> Behalf Of*Jodi Schneider >>> *Sent:*Sunday, November 14, 2010 5:19 AM >>> *To:*beyond-the-pdf@googlegroups.com >>> <mailto:beyond-the-pdf@googlegroups.com> >>> *Subject:*Re: capturing workflows and embedding in word documents >>> Thanks, Jill. I'm really impressed that you're embedding >>> interactivity in a way that's both easy for the author and seems to >>> suit the science perfectly! Native Mac and Linux versions of this >>> plugin would be interesting; I took a look but don't run Parallels or >>> VMWare. >>> Leonard, you might be interested in looking at the Utopia Documents >>> PDF viewer and enhanced PDFs. I took a look yesterday which I wrote >>> about here: >>> http://jodischneider.com/blog/2010/11/14/utopia-documents-pulling-sci... >>> There are 2 short screencasts of some of the interactive content.
>>> -Jodi
>>> On Sun, Nov 14, 2010 at 1:26 AM, Leonard Rosenthol >>> <lrose...@adobe.com <mailto:lrose...@adobe.com>> wrote: >>> This is GREAT - and exactly the type of thing that we were >>> envisioning for the "Source Content" feature of PDF 2.0.
>>> I'd personally love to see a companion plugin for Adobe Acrobat >>> and/or Reader to enable...the Word plugin would embed the necessary >>> information into the produced PDF which could be picked up in >>> Acrobat/Reader and enable the same views, reruns, etc.
>>> Leonard
>>> -----Original Message----- >>> From:beyond-the-pdf@googlegroups.com >>> <mailto:beyond-the-pdf@googlegroups.com>[mailto:beyond-the-pdf@googlegroups .com >>> <mailto:beyond-the-pdf@googlegroups.com>] On Behalf Of Jill Mesirov >>> Sent: Friday, November 12, 2010 6:12 PM >>> To:beyond-the-pdf@googlegroups.com >>> <mailto:beyond-the-pdf@googlegroups.com> >>> Subject: capturing workflows and embedding in word documents
>>> For those of you who haven't seen the Science piece I wrote on >>> accessible reproducible research it discusses the need >>> for 2 things - a system to capture the analysis automatically and then >>> an easy way to embed in the manuscript itself - accessible in this case >>> means to someone who doesn't program and never wants to.
>>> I'm enclosing the relevant links for your amusement - there's a video >>> that shows the doc in action.
>>> -- >>> Jill P. Mesirov, Ph.D. >>> Associate Director and Chief Informatics Officer >>> Director, Computational Biology and Bioinformatics
>>> Broad Institute of MIT and Harvard >>> 7 Cambridge Center >>> Cambridge MA 02142 >>> phone: 617-714-7070 >>> fax : 617-714-8991 >>> email:mesi...@broad.mit.edu <mailto:mesi...@broad.mit.edu>
With respect to provenance I would encourage you all to review Atul Butte's recent piece in Nature Biotech Nat Biotechnol. 2010 Nov;28(11):1181-5 where he discusses leveraging cloud resources for this process Best, J
Paul Groth wrote: > I agree with Steve and Tim here: while some metadata can and should be > stored with the pdf but not all of it can be.
> In particular, I'm thinking about the provenance of the work. > Provenance by it's very nature goes beyond the pdf itself and can be > much much bigger than the pdf. For example, we've done some work where > we maintain a reproducible representation of the results of an > astronomy workflow by maintaining a virtual machine image along with > the workflow itself. Obviously, this is an extreme case, but I doubt > people are going to want to embed a 3GB virtual machine image in their > pdfs.
> Essentially, we need both embedding in the pdf and linking to the > outside and we need some nice guidance for how to do this.
> cheers, > Paul
> Steve Pettifer wrote: >>> This "in the PDF" approach is all good if you are talking only about >>> contributions / annotations of a single person, or about something >>> that is both completely authoritative and public. But there is a >>> strong use case for multiple, shareable perspectives - for example >>> within a lab or collaboration. This is why Steve with Utopia and our >>> MGH + NIF with Annotation Framework use a standoff metadata model, and >>> why we (speaking for myself but I believe Steve is likely to be in >>> agreement) advocate standardizing and opening the model of metadata, >>> which can be done using a fairly simple ontology model.
>> Tim, we are in total agreement! [And Leonard, thanks for the positive >> comments about Utopia -- I'm sorry we disagree on the rest!]
>> I would be very happy indeed to see a PDF in which is both possible (I >> believe much of it is already) and common practice (sadly, not currently >> very common) to add additional metadata ('baked in' as Tim nicely puts >> it). However, this metadata can only ever refer to either a) the article >> of record, or b) to external data captured at that moment in time. For >> example, let's say you wished to refer from the PDF to a particular >> database entry: you have the choice of a) copying that database entry in >> to the PDF in some form (which could be bloated, but would work offline >> and be reliable etc), or to b) include a link to whatever the up to date >> version of the record may be on line, and rely on this being resolved at >> 'read time', or c) both. If you are prepared to accept b) or c) as a >> sensible option, then why not use the same mechanism for accessing all >> the richer data / metadata associated with the article, since the >> infrastructure for doing b) or c) will be much the same (and reductio ad >> absurdum, the only thing you need to store in the PDF is a unique ID >> that allows the rest to be fetched at runtime).
>>> I believe that if we were to achieve agreement on a model of >>> annotation metadata that could exist in the same form _within the >>> PDF_, or _outside the PDF_, or _both_, that would be ideal.
>> My view is that metadata for the Article of Record goes in the PDF, size >> permitting, but also that links are kept to data outside the PDF, which >> can be resolved at 'read time' to make sure that the PDF is kept as a >> both an Article of Record (JV's 'minutes of science') and as a 'Living >> Document' with links to up-to-date data, comments etc. [and referring to >> my previous whitterings on the subject, as much as I like PDFs and I >> think they make an excellent 'View', I don't believe they make good >> vehicles for storing an articles 'Model')
>> Best wishes
>> Steve
>>> Also - ideally when I open a PDF that contains annotation referencing >>> some entity that is commonly studied or used outside the document >>> itself - e.g. a protein, a database, a reagent, a computational tool, >>> a workflow - my Web browser should just natively be able to connect to >>> all other sources of information about that entity wherever they are >>> on the Web, and use these connections to enhance the information I see >>> without jumping all over the place. Annotation itself is or at least >>> should be, an independently sharable boundary object.
>>> Best
>>> Tim
>>> On Nov 14, 2010, at 9:51 AM, Leonard Rosenthol wrote:
>>>> The ideas of Utopia are excellent, but their implementation isn t >>>> (IMO) the right approach. The PDF itself doesn t contain any of that >>>> rich information, so that it can be used/mined/extracted instead, >>>> it appears to be sitting in one (or more) databases or data >>>> repositories online that Utopia is able to magically locate and >>>> then enable. >>>> I d prefer to see the same user experience (which is quite well >>>> done!) applied to a PDF with that type of rich semantics embedded >>>> Leonard >>>> *From:*beyond-the-pdf@googlegroups.com >>>> <mailto:beyond-the-pdf@googlegroups.com>[mailto:beyond-the-pdf@googlegroups .com]*On
>>>> Behalf Of*Jodi Schneider >>>> *Sent:*Sunday, November 14, 2010 5:19 AM >>>> *To:*beyond-the-pdf@googlegroups.com >>>> <mailto:beyond-the-pdf@googlegroups.com> >>>> *Subject:*Re: capturing workflows and embedding in word documents >>>> Thanks, Jill. I'm really impressed that you're embedding >>>> interactivity in a way that's both easy for the author and seems to >>>> suit the science perfectly! Native Mac and Linux versions of this >>>> plugin would be interesting; I took a look but don't run Parallels or >>>> VMWare. >>>> Leonard, you might be interested in looking at the Utopia Documents >>>> PDF viewer and enhanced PDFs. I took a look yesterday which I wrote >>>> about here: >>>> http://jodischneider.com/blog/2010/11/14/utopia-documents-pulling-sci...
>>>> There are 2 short screencasts of some of the interactive content.
>>>> -Jodi
>>>> On Sun, Nov 14, 2010 at 1:26 AM, Leonard Rosenthol >>>> <lrose...@adobe.com <mailto:lrose...@adobe.com>> wrote: >>>> This is GREAT - and exactly the type of thing that we were >>>> envisioning for the "Source Content" feature of PDF 2.0.
>>>> I'd personally love to see a companion plugin for Adobe Acrobat >>>> and/or Reader to enable...the Word plugin would embed the necessary >>>> information into the produced PDF which could be picked up in >>>> Acrobat/Reader and enable the same views, reruns, etc.
>>>> <mailto:beyond-the-pdf@googlegroups.com>] On Behalf Of Jill Mesirov >>>> Sent: Friday, November 12, 2010 6:12 PM >>>> To:beyond-the-pdf@googlegroups.com >>>> <mailto:beyond-the-pdf@googlegroups.com> >>>> Subject: capturing workflows and embedding in word documents
>>>> For those of you who haven't seen the Science piece I wrote on >>>> accessible reproducible research it discusses the need >>>> for 2 things - a system to capture the analysis automatically and then >>>> an easy way to embed in the manuscript itself - accessible in this >>>> case >>>> means to someone who doesn't program and never wants to.
>>>> I'm enclosing the relevant links for your amusement - there's a video >>>> that shows the doc in action.
>>>> - Has all the relevant links available. >>>> In particular - even if you don't have a Science subscription you can >>>> get to the paper from there.
>>>> -- >>>> Jill P. Mesirov, Ph.D. >>>> Associate Director and Chief Informatics Officer >>>> Director, Computational Biology and Bioinformatics
>>>> Broad Institute of MIT and Harvard >>>> 7 Cambridge Center >>>> Cambridge MA 02142 >>>> phone: 617-714-7070 >>>> fax : 617-714-8991 >>>> email:mesi...@broad.mit.edu <mailto:mesi...@broad.mit.edu>
-- Jill P. Mesirov, Ph.D. Associate Director and Chief Informatics Officer Director, Computational Biology and Bioinformatics
Broad Institute of MIT and Harvard 7 Cambridge Center Cambridge MA 02142 phone: 617-714-7070 fax: 617-714-8991 email: mesi...@broad.mit.edu