Lots of talk here about RDF and NLM XML but we have not go to talking
about how these might be captured when someone sits down to write
their paper ("narrative", people are calling it).
There are several dimensions to the toolset needed (data management
framework with IDs for data, packaging mechanisms for bundling
research inputs and outputs), but one important one is an authoring
environment which fits with researcher's practice. As Lee noted, lots
use MS Word - but in an environment where we also have people wanting
WordPress and others working in OpenOffice,org and using text based
formats like LaTeX what can we build?
I am planning to demo some ideas that would take us towards
word-processor and web-based authoring tools in which authors can make
explicit links between their articles (and pre-articles) and data,
disambiguate terms, and embed metadata and structural labels
(abstract, intro, method, etc) from the start. I think some variant of
the MS Research Ontology and NLM Add-in for Word would be a potential
deliverable, although both have some serious limitations, not the
least of which is interoperability with other tools (even other MS
tools) in their current form. I will post a suggestion I put to MS
Research about what we could do to improve the Ontology Add-in soon.
Phil - I'd still like to get my hands on the author's Word manuscript
(or a similar one) - one of the things I think is important is closing
hte gap between author tools and publisher tools. At the moment we
have people talking about doing stuff with XML - but where is the XML
coming from?
(One more deliverable I'd like to see, but which I think might be
contentious, would be a series of microformats and conventions that
together allowed us to build an HTML 5 based schema for research
communications - I am dubious about the value of very hierarchical XML
schemas like NLM or DocBook in this world where authors are using
tools that are not optimized for such deep hierarchies. Converting
arbitrary text to XML is expensive and hard - and no, Lemon8 XML does
not do the job at all well.)
--
Peter Sefton
Manager, Software Research and Development Laboratory,
Australian Digital Futures Institute,
University of Southern Queensland
Toowoomba Queensland 4350 AUSTRALIA
Work: sef...@usq.edu.au
Private: p...@ptsefton.com
IM accounts:
Gmail: ptse...@gmail.com
Yahoo: peter_...@yahoo.com
MSN: p...@ptsefton.com
AIM: ptsefton
p: +61 (0)7 4631 1640
m: +61 (0)410 326 955
USQ Website: http://www.usq.edu.au
Personal Website: http://ptsefton.com
Just one comment on your idea for microformats.. there's really no reason not to use rdfa there and leverage all the work into things like salt, swan etc. If you want simple html authoring this can be easily accomplished using rdfa profiles.
Paul
Sent from my iPhone
Peter - I added a EarlyDraft.zip file under ""Files" on the Workshop
Website that hopefully meets your needs.
We will need some kind of profile, for sure - but even this simple
example on the RDFa profiles wiki is impossible to author in a word
processor:
<div class="haudio">
<span class="fn">Start Wearing Purple</span> by
<span class="contributor">Gogol Bordello</span>
</div>
I am working on examples that encode triples in URLs - a kind of
'nanoformat' - see my blog:
http://ptsefton.com/2010/11/14/before-beyond-the-pdf-authoring-tools-for-document-semantics.htm
Peter
Cool thanks for the blog, I see what you are trying to do. So you don't need users to write their own rdfa (a good thing), you need to mint new urls identifying some metadata.
Question: do you care if the urls are meaningful to people?
I'm thinking of a URL shortener like thing that produces these "metadata urls"
Cheers,
Paul
Sent from my iPad
I think they need to be readable by machines so you can decode them
without resolving them - can be read like a triple.
Here the subject is implicit (the page that holds the assertion). p is
the predicate or property and o is the object.
I would not expect users to mint these by hand, but they should be
usable in a wide variety of contexts.
Peter
That's an impressive feature list.
I'm interested in whether you have any ideas about how the validating
XML authoring will work - will this be an adapatation of the built in
WYSIWYG editor, a different existing component, or something
completely new.
Peter Sefton
--
Peter Sefton
Manager, Software Research and Development Laboratory,
Australian Digital Futures Institute,
University of Southern Queensland
Toowoomba Queensland 4350 AUSTRALIA
Work: sef...@usq.edu.au
Private: p...@ptsefton.com
IM accounts:
Gmail: ptse...@gmail.com
Yahoo: peter_...@yahoo.com
Well, It is of course an initial list, subject to change (and budget!)
Peter Sefton wrote:
>
> That's an impressive feature list.
>
but I think these features are a minimum set for a complete working
system.
I'd agree. We've (Joe Townsend, copied, and support from Microsoft (Alex Wade)) been working on doing this for chemistry. Joe has insisted that we adhere to strict validation and this goes far beyond the DTD (which is almost uselss in chemistry)
this looks great, your workshop paper describes 100% of what I have in mind with Wordpress. Based on the discussion on this list I am positive that we will be able to form a good-sized group of people willing to move this concept forward in the next 12 months. It would be lovely if we also find a publisher willing to work with us on this with one of his journals.
Kind regards, Martin
Carl,
this looks great, your workshop paper describes 100% of what I have in mind with Wordpress. Based on the discussion on this list I am positive that we will be able to form a good-sized group of people willing to move this concept forward in the next 12 months. It would be lovely if we also find a publisher willing to work with us on this with one of his journals.
I agree we need a visionary publisher, if we cannot find one the
emergent tools will enable us to become one. The key issue in my mind
is to then get the broader scientific community ie the authors to buy
in. New scientific discoveries resulting from new ways of
communicating and analyzing open science will be the driver in my
view.
We've been working with PLoS to get the contents of their PLoS Currents
title into PubMed Central (http://www.plos.org/journals/currents.php).
Currents uses a web-based WYSIWYG editor for all article authoring. Once
the articles are published on the Currents site (part of Google Knol), we
get a (mostly structured) XML submission for PMC.
I hope to get into the gory details of which parts of this workflow are
working and which are not during the Workshop. This leads nicely into why
we are interested in a WYSIWYG (or better WYSIWYM) authoring/editing tool
that writes truly structured content at the beginning - like the one that
Carl is working on.
Jeff Beck
Be...@ncbi.nlm.nih.gov
If we can come up with a tool that works (even one that mostly works), we
shouldn't have any trouble finding a few early adopter publishers.
We've been working with PLoS to get the contents of their PLoS Currents
title into PubMed Central (http://www.plos.org/journals/currents.php).
Currents uses a web-based WYSIWYG editor for all article authoring. Once
the articles are published on the Currents site (part of Google Knol), we
get a (mostly structured) XML submission for PMC.
I hope to get into the gory details of which parts of this workflow are
working and which are not during the Workshop. This leads nicely into why
we are interested in a WYSIWYG (or better WYSIWYM) authoring/editing tool
that writes truly structured content at the beginning - like the one that
Carl is working on.
Peter,Thanks for your additional comments.Probably the question of additional markup and content (beyond the written, graphical, article content itself), and how to capture it is a worthy one of further discussion.
I would agree that dealing with the em-dash and curly quote issues involved when MSWord is part of the authoring chain is a core requirement for our system in the initial version.
However, I expect that the first version of the tool will defer anything not explicitly part of the NLM Journal Article DTD, such as data, domain-specific markup, and the like.
Such add-ons could be added in a later release, and others would certainly be free to develop additional functionality in the form of plugins or ancillary tools.
It was not our intention when we put together to design the NLM DTD for bioscience. Actually we made significant efforts to do just the opposite – design it for journal articles in general. And I think we have done that pretty well (AND, we are constantly taking user suggestions to do it better with each new version).
But, the general nature of the article model means that there is nothing specifically designed in for physical science or mathematics either. We do have a number of "escape valves" where domain-specific content can be tagged using the existing element and attributes. To be done well (that is, to control your domain-specific application of the general article model), you will need to apply a domain-specific validation layer on top of your basic schema validation.
The advantage of this is that we have a general article model that we can use to archive and exchange information with a shared set of tools. The disadvantage is that you don't get the domain-specific stuff for free with those general tools. One idea I've had recently that I haven't shared with Carl yet is to allow users (groups, really who are defining these domain-specific applications of the general model) … to allow these users to define another validation layer that can be pulled into the authoring/editing tool and used for validation and real-time checking during the authoring process.
In my non-tool-builder mind, this could be done with something as simple as a Schematron. (Schematron allows you to make assertions (with error messages or warnings) about a document and test those assertions.
Jeff
Currently a first pass of an 'Ontology of Rhetorical Blocks' is out - http://esw.w3.org/HCLSIG/SWANSIOC/Actions/RhetoricalStructure/models/blocksontology is out, and we are working on a 'medium-grained' model - http://esw.w3.org/HCLSIG/SWANSIOC/Actions/RhetoricalStructure/alignment/mediumgrain
Paolo Ciccarese, Tim Clark, Jodi Schneider and I are all working on this, and very much invite comments, contributions, and discussions at or before the San Diego meeting.
Anita de Waard
Disruptive Technologies Director, Elsevier Labs
http://elsatglabs.com/labs/anita/
a.de...@elsevier.com
-----Original Message-----
From: beyond-...@googlegroups.com on behalf of Peter Murray-Rust
Sent: Thu 12/16/2010 13:11
To: beyond-...@googlegroups.com
Cc: Joe Townsend; Alex Wade
Subject: Re: Authoring tools / Deliverables
Peter,
Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677 (The Netherlands)
Peter,
It was not our intention when we put together to design the NLM DTD for bioscience. Actually we made significant efforts to do just the opposite – design it for journal articles in general. And I think we have done that pretty well (AND, we are constantly taking user suggestions to do it better with each new version).
But, the general nature of the article model means that there is nothing specifically designed in for physical science or mathematics either. We do have a number of "escape valves" where domain-specific content can be tagged using the existing element and attributes. To be done well (that is, to control your domain-specific application of the general article model), you will need to apply a domain-specific validation layer on top of your basic schema validation.
The advantage of this is that we have a general article model that we can use to archive and exchange information with a shared set of tools. The disadvantage is that you don't get the domain-specific stuff for free with those general tools. One idea I've had recently that I haven't shared with Carl yet is to allow users (groups, really who are defining these domain-specific applications of the general model) … to allow these users to define another validation layer that can be pulled into the authoring/editing tool and used for validation and real-time checking during the authoring process.
In my non-tool-builder mind, this could be done with something as simple as a Schematron. (Schematron allows you to make assertions (with error messages or warnings) about a document and test those assertions.
Jeff
We've been working on wordpress as the basis for a publication
environment for most of last year. You can see the results at
http://www.knowledgeblog.org or
http://ontogenesis.knowledgeblog.org
which is a journal/book/tutorial for ontologies and their use in
biology. Initially, this started off as a pet project of my own; since
then, we have used funding from an EPSRC Network to generate content and
are now funded by JISC.
We see wordpress primarily as a publication tool. At our first workshop
where much of the content for ontogenesis was developed people moaned a
lot about the editing environment. We recently held another workshop
which produced http://taverna.knowledgeblog.org/. Here people used a
variety of tools -- mostly word, but also live writer and some text
tools (textmate/markdown, asciidoc/blogpost); our experience is that
they managed this will little effort beyond that of authoring the
article.
We've also used google docs, open office and latex. The point is, I
think, people already have their tool chains and already have their
collaborative environments (SVN, email, dropbox). Not that I am against
additions to wordpress to support collaborative editing; I just don't
think it is that important.
Of course, we lose the absolute WYSIWYG element, but then WYSIWYG isn't
really WYSIWIG anymore -- after all the acronym means "What you see (on
screen) is what you get (when you print it on paper)" -- I rarely do the
latter these days. In most cases, it's close enough, and it's easier
that traditional publishing where it takes several weeks to see what the
final form is; here, the author can update their article and see the end
published form automatically and immediately.
So far, from the JISC money, we have produced or repurposed
- lots of documentation (process.knowledgeblog.org)
- a process for formal review, based around the EditFlow plugin
- Support for maths with Mathjax (which gives scalable fonts, rather
than images), using mathml or tex markup
- A site-wide table of contents
- Article level table of contents
- Multiple author support
- Some customized themes
- Some latex to wordpress support (it works but is a little hard at the
moment).
- DOIs from DataCite.
- Archiving from the British Library.
- Content!
Ironically, given my background, at the moment, we haven't added any
support for semantic markup, but we hope this will come. We're also
working on references, both server-side (basically, the idea is, drop a
DOI into your article, wordpress will generate a link, and the full
reference list) and inside the tool chain (so latex users should get
bibtex, word users should get word tools). Our hope is that all of this
can be done "for real" -- that is as a usuable process.
We're hoping to finish of with some slightly more innovative and forward
looking stuff as demonstrators; we'd like to produce a microarray paper,
where the author generates no figures, but submits R and an array
express ID. Figures get generated on-the-fly, with the R still
accessible in the published version. I think this form of customisability,
adding features that are useful for some areas of science, is a key
advantage of this sort of environment.
We actively pursing this; we'd welcome collaboration from anyone else
who is generating tooling for use within wordpress. Like Martin and
Carl, I think wordpress gets us 90% of the way there. We need to plug
the gaps, support more tools, work with scientists existing practices
and exploit the extensibility. I think that the structuring support
mentioned in Carl's paper sounds excellent, for instance, and is not
something we had in sight at the moment.
I hope that the workshop goes well; I would have loved to come, but I
can't travel at the moment.
Phil
>>>
> The point is, I think, people already have their tool chains and already
> have their
> collaborative environments (SVN, email, dropbox). Not that I am against
> additions to wordpress to support collaborative editing; I just don't
> think it is that important.
>>>
>
> With hosted collaborative editing (e.g. within WordPress), having the
> edits in one place makes the experience more like version-controlled
> software development -- the code [content] is all in one place, and
> you can easily see who contributed what. The trick is going to be
> getting the WYSIWYG browser control to display those differences and
> changes in a meaningful way.
Sure. But these tools exist as well. Google docs or Live Writer give you
hosted collaboration. Dropbox gives you pass-the-parcel semantics. SVN
gives you concurrent edit and merge (at least with latex). Git gives you
versioning on steroids.
It's a busy application space. My worry would be that anything added
into wordpress would just be the poor relation of these tools.
> <<
> Like Martin and Carl, I think wordpress gets us 90% of the way there. We
> need to plug
> the gaps, support more tools, work with scientists existing practices
> and exploit the extensibility. I think that the structuring support
> mentioned in Carl's paper sounds excellent, for instance, and is not
> something we had in sight at the moment.
>>>
>
> That's what Annotum is intended to do - plug those gaps.
>
> Enforcing document structure is a major challenge with the use of current
> tools including MSWord (and most current WYSYWYG browser controls).
>
> It's not so much a matter of WYSIWYG, but rather WYGIWYW - what you GET is
> what you WANT. I've worked with browser-based content management systems
> for a long time, and the challenges always come down to two things: making
> it as 'easy' or comfortable as the tools authors prefer (today this is
> likely MSWord; for us years ago at fool.com it was the AOL Mail client), and
> getting content that the system can use: headings properly marked, exhibits
> and figures properly captioned and sourced, and so on. This gets back to
> the point of using a centrally-hosted, collaborative editing paradigm: If
> all content can be entered via a mechanism that enforces structure, you can
> be sure that it can be formatted, displayed, and exported consistently.
I agree. There is a tension here. If you use existing tools, you make
the authors life easy, but the presentation harder. If you use bespoke
tools, you make the authors life slightly harder but the presentation
and structuring easier.
We're going for the former approach; I think that this is practical when
using wordpress because most of the hard work for linking between the
tools and wordpress has already been done. MS Word can already do blog
posting, so can live writer, so can google docs. If we had to get word
talking to wordpress ourselves, well, I wouldn't have gone this route.
None of these approaches contradict, of course. A collaborative
environment for wordpress would be excellent.
Phil