Continuing BtPDF - Hackfest on Scholarly HTML March Cambridge UK

17 views
Skip to first unread message

Peter Murray-Rust

unread,
Feb 8, 2011, 5:59:48 AM2/8/11
to beyond-...@googlegroups.com, Murray-Rust Group, David FLANDERS
We are planning to continue the impetus of the Beyond the PDF "Writing" group in an informal hack event in Cambridge UK in mid-march. We are making arrangements for Peter Sefton and Martin Fenner to be physically present in our group during the dates Mar 11-Mar 20 2011. The precise dates that either comes will be posted later.

A hackfest consists of a collection of geeks fuelled by geek food and drink . Dates and times depend on local availability of rooms, etc. The possibilities are some or all of

* 2-day hackfest on Mar 12-13. Sat/Sun
* 1-day hackfest on Mar 20. Sun (it's science open day on Sat so we might go to the Panton Arms or the Open Knowledge Foundation)
* an ad hoc seminar given by whomever.

In January we ran a hackfest for #pmrhack which was very successful and had about 16 people over a Sat and Sun. It consisted of some tables, free wifi, laptops, and some geek food. Our group will be involved - and  it's likely that we'll be looking at integrating web services such as OSCAR (chemistry annotation) and OPSIN chemistry name2structure. Ben O'Steen is also on our projects and we hope he'll be here for significant chunks. We shall also welcome visitors during the week but we may have to specify particular days as this is in term and rooms are scarce.

The goal of the coming hackfest is to create a prototype of "Scholarly HTML". This prototype will allow us to create a (probably declarative) approach to compound scientific documents, with embedded behaviour such as chemistry using CML, or data visualisation. It is platform-agnostic but the likely tools are Wordpress and ePrints although others can be used.

Peter and Martin have done HUGE amounts of tool and system building and we'll be looking for their guidance and code to hack some demonstrators during the period.

I've included dave Flanders and he will mail the JISC devloper community and we can also float it at Dev8D. Anyone is welcome but there will be a concentraion on people who can:
* hack code
* introduce new tools
* provide examples
* create documentation
* package and disseminate

Here's mails from Martin and PT

Martin:
I'm starting to get excited about the workshop/hackfest. And I agree that ePub only makes sense if we first create good scholarly HTML. To continue our discussions at the Beyond the PDF workshop in January (particularly the writing group), maybe the goal of the hackfest could be to do what is necessary (discussions, coding, documentation, etc.) so that people can start using "Beyond the PDF Tools v.0.1" on April 1st. Phil Bourne, Jonathan Eisen and Pat Brown had agreed to start using these tools with graduate students in their labs. Our tools should of course also work for Peter. Is that a goal we can achieve?

PT:
we need protocols for using HTML. One of the really important things we need to think about is how we can make documents with declarative contents, so citations, chemistry, maths etc are all specified in a way that is independent of the delivery platform and/or authoring tool. For CML, for example there needs to be a way to link to a CML description, with some decoration that says - 'this is CML' then the repository or CMS can decide whether it can display CML or not, using something like the oEmbed standard. I'd like to have Eprints and Dspace people around to code this up for their platforms. Don't forget that people will use plain old web browsers to read stuff - no need to make them deal with ePub if they don't need to.

And then there's the standards formerly known as HTML 5 - for many devices that's all you would need to 'take away' a scholarly publication - we saw one HTML based reader at the workshop, my group has a utilitarian one, and then there are things like this http://www.alistapart.com/articles/a-simpler-page/ with a lot more class.

.... and ... PT's draft blurb

Exploring Scholarly HTML – the Hackfest


To some of us it's pretty clear that HTML is THE document format for scholarly documents – and the web is the glue that will bind together C21 research. In March We're going to bring together hackers and thinkers in Cambridge, England to riff on the topic “Exploring and defining Scholarly HTML”.


What should scholarly documents look like on the web? What do we call the research object of the future? How do we link to data files in robust, sustainable ways, and allow research objects to come alive when placed in the right environment? How do we embed metadata, scientific semantics, formal semantic statements, citations, et al in a research object? How do we write these documents? How do we let machines contribute, annotate, and take over writing the boring bits? How do we ship research to mobile devices, the tablet de jour, and to discipline and institutional repositories? How do we get rid of empty rituals like citation formatting? (hint, a semantic link to a reliable shared bibliography would do) [1] How do we seed documents with annotation points for comment and review and have that work across all the places research reaches?


We're sure that the hackfest will bring together people and their pet technologies: word processors, WordPress, repositories like ePrints and Dspace, file management like Subversion, Git and DropBox ™, ePub, SWORD, and the Blue Obelisk Chemistry toolkit. 

The goal? Running code ready for early adopter researchers, with an emphasis on chemistry implementing and defining "Scholarly HTML".



[1] PMR: I'd make this "shared OPEN bibliography" - i.e. vendor-independent. This is likely to be a deliverable from #jiscopenbib - not comprehensive but addressing many of the areas.

--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

Peter Sefton

unread,
Feb 8, 2011, 6:37:46 PM2/8/11
to beyond-...@googlegroups.com, Murray-Rust Group, David FLANDERS
I think the weekend of 11th is now looking by far the best - currently working on being on the ground for the next weekend as well but may leave earlier if not needed.
--
Peter Sefton
Manager, Software Research and Development Laboratory,
Australian Digital Futures Institute,
University of Southern Queensland
Toowoomba Queensland 4350 AUSTRALIA


Work: sef...@usq.edu.au
Private: p...@ptsefton.com

IM accounts:
Gmail: ptse...@gmail.com
Yahoo: peter_...@yahoo.com
MSN:  p...@ptsefton.com
AIM: ptsefton

p: +61 (0)7 4631 1640
m: +61 (0)410 326 955


USQ Website: http://www.usq.edu.au
Personal Website: http://ptsefton.com


Bretwood

unread,
Feb 16, 2011, 5:08:00 AM2/16/11
to Beyond the PDF
I'm new to BtPDF, but working in a tiny way on some similar issues.
Though I dabble in the peer-reviewed lit., most of my work is lower-
class reports and writing for the public. I'm currently working with
a Django programmer to build tools for publishing technical content
online.

I've been thinking about how the next generation of scientific
publication might blur the line between research and publication.
Right now I'm working on a couple ideas:

* Commonly scientists collaborate and work with the same dataset. It
makes perfect sense to put that dataset in the cloud where everyone
accesses the same version of the data. Sub-versioning techniques
might be needed for some complex datasets. So that's the research
end... when researchers move to publication, why not have their
publication connected to this dataset. At it's simplest, this is just
an online supplement. But it might be that data figures in the
publication are linked to the database itself, and that this gives
interactive access to the data (e.g. click on a point in a scatterplot
and view a whole table row including x,y, and other variables.)

* Building off the idea of a figure linked to an online data set...
Figures aren't just for publication, they're also for analysis. A
researcher might build analysis code into a figure... a simple example
would be a scatterplot where pull-down menus would let you set the
variables for the two axes. Now why not expose the final publication
reader to this interactivity? The author can provide a version of the
figure that shows what they want shown, but still gives the reader the
ability to explore other cross-sections of the data (hopefully
convincing themselves that the author did a good job of exposing the
most relevant data cross-section.) Another great example is an
interactive map for showing GIS data. When a page is loaded, the
reader sees the map as the author wishes them to, but they can then
zoom, change layers, maybe even upload new data...

I hope to have some simple examples of this sort of publication
working within a few months. All our code will be open source, and
I'd love to chat with anyone about concept or execution.
> *we need protocols for using HTML. One of the really important things we
> need to think about is how we can make documents with declarative contents,
> so citations, chemistry, maths etc are all specified in a way that is
> independent of the delivery platform and/or authoring tool. For CML, for
> example there needs to be a way to link to a CML description, with some
> decoration that says - 'this is CML' then the repository or CMS can decide
> whether it can display CML or not, using something like the oEmbed standard.
> I'd like to have Eprints and Dspace people around to code this up for their
> platforms. Don't forget that people will use plain old web browsers to read
> stuff - no need to make them deal with ePub if they don't need to.*
> *
> *
> *And then there's the standards formerly known as HTML 5 - for many devices
> that's all you would need to 'take away' a scholarly publication - we saw
> one HTML based reader at the workshop, my group has a utilitarian one, and
> then there are things like thishttp://www.alistapart.com/articles/a-simpler-page/**with a lot more class.*
>
> .... and ... PT's draft blurb
>
> *Exploring Scholarly HTML – the Hackfest*
>
> *
> *
>
> *To some of us it's pretty clear that HTML is THE document format for
> scholarly documents – and the web is the glue that will bind together C21
> research. In March We're going to bring together hackers and thinkers in
> Cambridge, England to riff on the topic “Exploring and defining Scholarly
> HTML”. *
>
> *
> *
>
> *What should scholarly documents look like on the web? What do we call the
> research object of the future? How do we link to data files in robust,
> sustainable ways, and allow research objects to come alive when placed in
> the right environment? How do we embed metadata, scientific semantics,
> formal semantic statements, citations, et al in a research object? How do we
> write these documents? How do we let machines contribute, annotate, and take
> over writing the boring bits? How do we ship research to mobile devices, the
> tablet de jour, and to discipline and institutional repositories? How do we
> get rid of empty rituals like citation formatting? (hint, a semantic link to
> a reliable shared bibliography would do) [1] How do we seed documents with
> annotation points for comment and review and have that work across all the
> places research reaches?*
>
> *
> *
>
> *We're sure that the hackfest will bring together people and their pet
> technologies: word processors, WordPress, repositories like ePrints and
> Dspace, file management like Subversion, Git and DropBox ™, ePub, SWORD, and
> the Blue Obelisk Chemistry toolkit. *
>
> *The goal? Running code ready for early adopter researchers, with an
> emphasis on chemistry implementing and defining "Scholarly HTML".*

Peter Murray-Rust

unread,
Feb 16, 2011, 3:22:23 PM2/16/11
to beyond-...@googlegroups.com, Bretwood
On Wed, Feb 16, 2011 at 10:08 AM, Bretwood <hig...@gmail.com> wrote:
I'm new to BtPDF, but working in a tiny way on some similar issues.
Though I dabble in the peer-reviewed lit., most of my work is lower-
class reports and writing for the public.  I'm currently working with
a Django programmer to build tools for publishing technical content
online.

... These are all great and possible ideas. The problem is getting scientists and publishers to think they are worth doing.:-)
 

Hig

unread,
Feb 16, 2011, 5:47:12 PM2/16/11
to Peter Murray-Rust, beyond-...@googlegroups.com
Maybe Google would be game to launch an interactive scientific journal...?

I think that if there were some stunning examples of publications that used this sort of approach, some scientists would follow.  I wonder if it would be possible to publish in a conventional journal, but provide an interactive version of the publication as an online supplement?

I've found it difficult to get across the idea that if I build an interactive report, a .pdf version is necessarily incomplete.  And it can be hard to communicate to readers that they can do things like zoom in on a map...

I can definitely see reasons to focus on just getting scientists into html as a first step.  But maybe in parallel we can explore some further steps outside the inertia of the peer-reviewed literature.  The nonprofits and consultants I work with are really interested to make their reports readily accessible, so maybe we can come up with some tools that will ultimately be useful to academia as well.

-Hig
--
Hig (Bretwood Higman, PhD)
hig...@gmail.com
(907) 399 5530
Ground Truth Trekking (www.groundtruthtrekking.org)
Nuka Research (www.nukaresearch.com)
Geological Hazards  (www.groundtruthtrekking.org/Reports/FaultHunt01/)
Sundrop Jewelry (www.sundropjewelry.com)

Phillip Lord

unread,
Feb 17, 2011, 6:39:00 AM2/17/11
to beyond-...@googlegroups.com, Peter Murray-Rust

Hig <hig...@gmail.com> writes:
> Maybe Google would be game to launch an interactive scientific journal...?

Effectively, they already did. PLoS Currents for example (published by
PLoS obviously) which runs of google knol. PLoS currents is a great
idea, I think, although the dependency on knol is less so. Ultimately, I
think, we need to fit in with existing scientific tooling.


> I think that if there were some stunning examples of publications that used
> this sort of approach, some scientists would follow. I wonder if it would
> be possible to publish in a conventional journal, but provide an interactive
> version of the publication as an online supplement?

This would be twice the work. Publishing is already a hell of a lot of
effort.


> I've found it difficult to get across the idea that if I build an
> interactive report, a .pdf version is necessarily incomplete. And it can be
> hard to communicate to readers that they can do things like zoom in on a
> map...
>
> I can definitely see reasons to focus on just getting scientists into html
> as a first step. But maybe in parallel we can explore some further steps
> outside the inertia of the peer-reviewed literature. The nonprofits and
> consultants I work with are really interested to make their reports readily
> accessible, so maybe we can come up with some tools that will ultimately be
> useful to academia as well.


This is part of the idea behind my knowledgeblog.org project. A
light-weight publishing platform suitable for formal publications.
Wordpress already gives us publishing, tool integration, RSS, media
handling, word clouds, and the rest. We've borrowed and repurposed a
peer-review system. We've added citation, maths and some cite indexing
support. Archiving and DOIs come from colleagues. It's all pluggable --
you can take what you want and you can add what you want to it.

If you want zoomable maps, then you could add for instance this...

http://avi.alkalay.net/2006/11/google-maps-plugin-for-wordpress.html#complex

It's all just a plugin away. This is the advantage of adapting commodity
software to fit, as opposed to the current bespoke system. Some one has
already done most of the work for you.


Phil

Bretwood

unread,
Feb 18, 2011, 8:29:10 PM2/18/11
to Beyond the PDF
Interesting stuff. Can you point to some prime examples of
KnowledgeBlog in action?

The WordPress map insertion relies on Google's My Maps... There are
severe limitations here. The main ones are that the size and number
of data objects is limited, and there's no facility for sophisticated
data management... e.g. dynamic content generation. It's fine for a
traditional blog, but if we're talking scientific publications then
you'd want to have your maps talking to datasets.

Two versions of a paper isn't really twice the work... make the
online/interactive version and then generate a static image of it
(this is what I'm doing for a client who insists on a .pdf
publication). But an interactive paper would be a lot more work than
a flat paper. This is ameliorated somewhat if interactive tools are
used for the analysis too, so they're already built by the time a
paper is drafted. Perhaps in the sci-fi future, nearly the whole
workflow is managed online... Analysis tools (e.g. an analytical
machine at some far-away lab) are linked to an online database as soon
as they begin running samples. The author processes and visualizes
that data using cloud tools, and then saves specific views of specific
analyses to include in the final publication. Readers then have
access to those same tools.

This sort of system would also be a natural setting to handle a series
of revisions and reviews, possibly extending beyond a publication that
has entered the scientific literature proper. As soon as the author
is comfortable with the public seeing their work, they publish a
draft. This draft is given a unique permanent url, and people can
reference it. After peer review (which might be publicly posted) a
final draft is concocted. Anyone who navigates to the original draft
sees a prominent note at the top that there's an updated version.
Similarly, if mistakes are discovered or additions concocted after
final publication, a new final publication could be generated that is
referenced from the original "final" publication.

Given the limitations of what is available right now, open-source
custom software seems like not a bad way to go. Presumably there will
be many visions of what is "beyond the .pdf" before we actually do
move beyond the .pdf.
> http://avi.alkalay.net/2006/11/google-maps-plugin-for-wordpress.html#...

Phillip Lord

unread,
Feb 21, 2011, 11:14:48 AM2/21/11
to beyond-...@googlegroups.com

Bretwood <hig...@gmail.com> writes:
> Interesting stuff. Can you point to some prime examples of
> KnowledgeBlog in action?

http://ontogenesis.knowledgeblog.org has been running for a year now.
Our primary focus for this was getting a useful resource out; that is,
it is the content that counts. It's a small experiment, to see whether
we could replace the existing academic book publishing process. We're
got 12k reads in the first year. A book wouldn't have been published
yet.


> The WordPress map insertion relies on Google's My Maps... There are
> severe limitations here. The main ones are that the size and number
> of data objects is limited, and there's no facility for sophisticated
> data management... e.g. dynamic content generation. It's fine for a
> traditional blog, but if we're talking scientific publications then
> you'd want to have your maps talking to datasets.

Yes, but then it would be good enough for many purposes. It's always
possible to find "what if" examples where commodity technology is not
enough. The solution, then, is that you have to build something more
extensive.

Given that the current technology is a) take a picture of the map, b)
stick into a PDF, any advance is a good thing.

> Two versions of a paper isn't really twice the work... make the
> online/interactive version and then generate a static image of it
> (this is what I'm doing for a client who insists on a .pdf
> publication).

Yes, and we've achieved both .pdf and epub versions in the same way. Of
course, this format translations are not lossless or perfect.

> Analysis tools (e.g. an analytical machine at some far-away lab) are
> linked to an online database as soon as they begin running samples.

This already happens in some areas, although it's not the norm.


> This sort of system would also be a natural setting to handle a series
> of revisions and reviews, possibly extending beyond a publication that
> has entered the scientific literature proper. As soon as the author
> is comfortable with the public seeing their work, they publish a
> draft. This draft is given a unique permanent url, and people can
> reference it. After peer review (which might be publicly posted) a
> final draft is concocted.

Yes. Ontogenesis works in this way.


> Anyone who navigates to the original draft sees a prominent note at
> the top that there's an updated version. Similarly, if mistakes are
> discovered or additions concocted after final publication, a new final
> publication could be generated that is referenced from the original
> "final" publication.

We have a variation on that theme, but yes all versions are accessible.


> Given the limitations of what is available right now, open-source
> custom software seems like not a bad way to go. Presumably there will
> be many visions of what is "beyond the .pdf" before we actually do
> move beyond the .pdf.

Absolutely. It's a slow process.

Phil

Bretwood

unread,
Feb 21, 2011, 9:09:10 PM2/21/11
to Beyond the PDF
I like the open peer reviews on Ontogenesis. I know some are
skeptical of review being open, I think in part because that creates
social pressure to hold back controversial criticism. But to me it
seems like there are as many issues with closed peer review. And
reviews can include important content that it's a shame to hide.

I'm a geologist which puts maps at a high-priority. Google provides
lots of great functionality in its maps API, so it's not that far out
there to make a more functional version.

I have a couple somewhat interactive reports up that uses some of the
elements I'm interested in. The first hand-built report looks a
little nicer and has some elements the newer one lacks, but hopefully
within months the admin-supported report will catch up in styling and
functionality. The biggest hurdle is supporting the maps, but we're
getting there.
2009 report hand-built in HTML: http://www.groundtruthtrekking.org/Reports/FaultHunt01/
2010 report supported by a Django admin:
http://groundtruthtrekking.org/Reports/Faulthunt2010Terracedeformation/1/Introduction/

-Hig

On Feb 21, 7:14 am, phillip.l...@newcastle.ac.uk (Phillip Lord) wrote:
> Bretwood <hig...@gmail.com> writes:
> > Interesting stuff.  Can you point to some prime examples of
> > KnowledgeBlog in action?
>
> http://ontogenesis.knowledgeblog.orghas been running for a year now.

Phillip Lord

unread,
Feb 22, 2011, 5:43:59 AM2/22/11
to beyond-...@googlegroups.com

Bretwood <hig...@gmail.com> writes:
> I like the open peer reviews on Ontogenesis. I know some are
> skeptical of review being open, I think in part because that creates
> social pressure to hold back controversial criticism. But to me it
> seems like there are as many issues with closed peer review. And
> reviews can include important content that it's a shame to hide.

I agree with this. I think it can hold back criticism, but mostly
unsupported criticism. There are some questions remaining in my mind for
reviewing -- for ontogenesis, our authors have generally said that they
prefer a more collaborative relationship. Perhaps, reviewers should in
future be considered to be "one-step-removed" authors? Open review, as
it stands, though, also makes it more worthwhile for reviewers; as they
are no longer anonymous, they can demonstrate their own work.

We choose open review in the first instance for pragmatic reasons --
it saves messing around with privacy and access control. I think it has
academic benefits also.


> I'm a geologist which puts maps at a high-priority. Google provides
> lots of great functionality in its maps API, so it's not that far out
> there to make a more functional version.
>
> I have a couple somewhat interactive reports up that uses some of the
> elements I'm interested in. The first hand-built report looks a
> little nicer and has some elements the newer one lacks, but hopefully
> within months the admin-supported report will catch up in styling and
> functionality. The biggest hurdle is supporting the maps, but we're
> getting there.
> 2009 report hand-built in HTML: http://www.groundtruthtrekking.org/Reports/FaultHunt01/
> 2010 report supported by a Django admin:
> http://groundtruthtrekking.org/Reports/Faulthunt2010Terracedeformation/1/Introduction/

These look really nice. If you are interesting in working on something
similar for knowledgeblog please let me know me. If not, I'd just be
interested in the sort of functionality you need.

We're getting a bit of topic here -- if you want to talk, can I suggest
knowledgeb...@knowledgeblog.org, which is publicly accessible.

Phil

Peter Murray-Rust

unread,
Feb 22, 2011, 6:07:58 AM2/22/11
to beyond-...@googlegroups.com, Phillip Lord, Charlotte Bolton
Suggest we revert this thread to its original theme.

Peter Sefton is coming for an extended stay in Cambridge - ca 2011/03/09-2011/03/19. Martin Fenner will be joining us for the first weekend (12/13). Others will be visiting during the weekdays on an ad hoc basis.

Peter is in the process of creating a prototype of the USQ system to work with.

Please let me know if you plan to come and also copy in Sue Begg (smb28 at cam d_t ac d_t uk)

There will be a wiki page and Etherpad to be announced here.

P.



On Mon, Feb 21, 2011 at 4:14 PM, Phillip Lord <philli...@newcastle.ac.uk> wrote:

Bretwood <hig...@gmail.com> writes:
> Interesting stuff.  Can you point to some prime examples of
> KnowledgeBlog in action?





RebholzSchuhmann

unread,
Feb 22, 2011, 5:54:29 PM2/22/11
to beyond-...@googlegroups.com, Peter Murray-Rust, Phillip Lord, Charlotte Bolton
Hi all,

I want to make you aware of the second CALBC workshop on the second CALBC workshop (16/17/18 March, EBI, Hinxton, Cambridge).

The workshop will address the core questions around the development of the CALBC corpus that contains a large number of documents and a large number of biomedical annotations. The workshop will also deal with a number of additional questions that are of greater importance:
* the normalisation of terminological and literature resources
* the integration of annotated literature resources in the bioinformatics infrastructure
* proposals for standards
* semantic Web solutions around annotated corpora
* the comparison of silver standards corpora with gold standard corpora
* use of the CALBC approach for multilingual corpora

On Friday (18th March) we will have room for a significant amount of open discussions.  The topics are on the Web page and mainly indicate that we see further developments around the silver standard corpus approach. Certainly the large CALBC could serve as a significant resource to the BtPDF community.

On 16/17 March we charge a fee (mainly to cofer the dinner costs) and entrance would be free on 18th.

Please make suggestsions if you would like to bring up related topics to be discussed in the meeting.

Please check the Web site (www.calbc.eu and Wokshop II) for further details. Registration remains open for another 2 weeks.

Best wishes,
Dietrich & CALBC project partners
-- 
Dietrich Rebholz-Schuhmann, MD, PhD - Research Group Leader
EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD (UK)
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
TM support:www.ebi.ac.uk/Rebholz-svr | tm-su...@ebi.ac.uk

barend mons

unread,
Feb 23, 2011, 7:55:19 AM2/23/11
to beyond-...@googlegroups.com, Peter Murray-Rust, Phillip Lord, Charlotte Bolton
Dietrich, I am struggling (really) with four deadlines at the same time before flying to the USA tomorrow, but I will look at it later.
I would very much like ot come and maybe even check whether we can combine it with my visit to Rolf and John.
best


**************************************
Dr. Barend Mons
Scientific Director 
Support and external relations
Netherlands Bioinformatics Centre (NBIC)
and Biosemantics Group
Leiden University Medical Centre
Phone:  +31 (0)24 36 19 500
Fax:       +31 (0)24 89 01 798

Mail: Netherlands Bioinformatics Centre
260 NBIC
P.O. Box 9101
6500 HB Nijmegen

Visiting address:
LUMC building 2, Einthovenweg 20
2333 ZC Leiden, The Netherlands










Peter Murray-Rust

unread,
Mar 4, 2011, 4:27:54 AM3/4/11
to beyond-...@googlegroups.com, Murray-Rust Group, David FLANDERS
Details are firming up for the ScholarlyHTML BtPDF hackfest in Cambridge on March12/13 and afterwards. For details follow the web page http://www-pmr.ch.cam.ac.uk/wiki/Scholarly_HTML

One main thrust of the hackfest will be to create a "data journal" initially based on crystallography but extensible to other areas of science where objects can be created in semantic form. I have blogged this - see:
http://blogs.ch.cam.ac.uk/pmr/2011/03/04/scholarly-html-hackfest-and-visit-of-peter-sefton-and-martin-fenner/

Here is the gist:

The general plan is to CREATE something during the time that PT is here. PT runs a world class team in University of Southern Queensland which has created a proven Open toolset based on WordPress for high quality scholarly documents (e.g. course materials, papers, theses). Martin has likewise pioneered many plugins for WordPress.

We shall invite Peter and Martin to give presentations (but this will need to be on a weekday)

The theme is Scholarly HTML with particular emphasis on data publication.  It is to give authors the freedom to author as they wish, not as they are constrained but the recipient. A consequence is that all data should be semantic (i.e. understandable by machine). This means that bitmaps such as PNG should be replaced or augmented by – say – SVG or HTML5. Much of the impetus for the meeting came from “Beyond the PDF” run by Phil Bourne and Anita de Waard.

In general we would like to be able to publish:

  • Semantic (mainly rectangular) tables where columns have defined semantics
  • Semantic graphs where axes are semantic and points, lines, bars etc are first-class objects
  • Maths (MathML)
  • Semantic bibliography (technically solved, but we’d like to include online OPEN resources (e.g. from Open Bibliography)
  • Scalable diagrams (probably SVG)
  • Chemistry/crystallography as CML

There will be many ideas but as a focus we have come up with a unifying project. After discussion with Simon Hodson (JISC) and Brian McMahon (IUCr) we plan to implement the following idea in our JISCXYZ project and to start this during the hackfest. (Simon and Brian hope to be present for some of the time).

A data-journal for crystallography

Every week Crystaleye aggregates (automatically) a few hundred structures and creates fully semantic CML. These are currently published as HTML pages with embedded CML and PNGs (http://wwmm.ch.cam.ac.uk/crystaleye) . A typical page (there are ca 250,000) is http://wwmm.ch.cam.ac.uk/crystaleye/summary/acta/c/2008/01-00/data/av3113/av3113sup1_I/av3113sup1_I.cif.summary.html (you can twiddle the molecule and create the unit cell by clicking). We wish to create a “data publication” from this material.

The proposed data journal will automatically select ca 10 interesting structures per week and publish these as a Scholarly HTML blog. The hackfest will educate us to the best ways of representing these as Scholarly HTML and allowing the best modes of presentation. Because we shall be using a blog readers can comment on these structures using the blog mechanism and also add their own ideas about interesting structures that we have not included. In this way we hope to build up a sense of publication and comment.

There is also the possibility for readers to submit their own structures which will be automatically validated during the submission process. We’ll work very closely with the IUCr during this. We can add to the interest by having ranking tables for authors or contributors and having various “records” such as largest structure.


All are welcome. It's hands-on. Let us know if you are interested.

P.

Aaron Culich

unread,
Mar 4, 2011, 12:44:53 PM3/4/11
to beyond-...@googlegroups.com, Peter Murray-Rust, Murray-Rust Group, David FLANDERS
I added the hackfest to the BtPDF calendar. I'd like to encourage people to put events on the calendar because I think it is helpful to see with just a glance what everyone else is up to even if we can't attend all the events. I wish I could be there in person, but I'll definitely jump on etherpad during the event.

-Aaron

Peter Murray-Rust

unread,
Mar 4, 2011, 12:56:42 PM3/4/11
to Aaron Culich, beyond-...@googlegroups.com, Murray-Rust Group, David FLANDERS
On Fri, Mar 4, 2011 at 5:44 PM, Aaron Culich <acu...@gmail.com> wrote:
I added the hackfest to the BtPDF calendar. I'd like to encourage people to put events on the calendar because I think it is helpful to see with just a glance what everyone else is up to even if we can't attend all the events. I wish I could be there in person, but I'll definitely jump on etherpad during the event.

Great

We should perhaps try to think about synching some public screen display
 

Peter Murray-Rust

unread,
Mar 12, 2011, 5:31:07 AM3/12/11
to beyond-...@googlegroups.com, Murray-Rust Group, David FLANDERS
The hackfest is continuing in Cambridge - see Etherpad for current discussion and debate:
http://scholarly-html.okfnpad.org/1
This will capture our current notes and ideas. It will also point to other resources.
NOTE: Anyone can contribute in the CHAT box. Please give yourself a name...



On Tue, Feb 8, 2011 at 10:59 AM, Peter Murray-Rust <pm...@cam.ac.uk> wrote:
We are planning to continue the impetus of the Beyond the PDF "Writing" group in an informal hack event in Cambridge UK in mid-march. We are making arrangements for Peter Sefton and Martin Fenner to be physically present in our group during the dates Mar 11-Mar 20 2011. The precise dates that either comes will be posted later.

A hackfest consists of a collection of geeks fuelled by geek food and drink . Dates and times depend on local availability of rooms, etc. The possibilities are some or all of

* 2-day hackfest on Mar 12-13. Sat/Sun
* 1-day hackfest on Mar 20. Sun (it's science open day on Sat so we might go to the Panton Arms or the Open Knowledge Foundation)
* an ad hoc seminar given by whomever.

In January we ran a hackfest for #pmrhack which was very successful and had about 16 people over a Sat and Sun. It consisted of some tables, free wifi, laptops, and some geek food. Our group will be involved - and  it's likely that we'll be looking at integrating web services such as OSCAR (chemistry annotation) and OPSIN chemistry name2structure. Ben O'Steen is also on our projects and we hope he'll be here for significant chunks. We shall also welcome visitors during the week but we may have to specify particular days as this is in term and rooms are scarce.

The goal of the coming hackfest is to create a prototype of "Scholarly HTML". This prototype will allow us to create a (probably declarative) approach to compound scientific documents, with embedded behaviour such as chemistry using CML, or data visualisation. It is platform-agnostic but the likely tools are Wordpress and ePrints although others can be used.


Reply all
Reply to author
Forward
0 new messages