Scientific Papers as HTML+RDFa

16 views
Skip to first unread message

Petersen, Niklas

unread,
Apr 3, 2018, 5:52:10 AM4/3/18
to or...@googlegroups.com

Hi,

while I assume everybody is aware of it, I nevertheless wanted to write a mail about it:

Sörens student Sarven Capadisli [1] works on different topics & technologies to facilitate digital publishing. Everything he develops is open source [2]. While not everything works perfectly, he is on the right path I would say. I created manually a "web-copy" of one of my own papers [3] and you can find a long list here: [4].

In case you know of people who work on similar topics, have related ideas etc., it would be nice if you could share it.


[1] http://csarven.ca/

[2] https://github.com/linkeddata/dokieli
[3] http://np00.github.io/d/scorvoc

[4] https://github.com/linkeddata/dokieli/wiki



Best regards,
Niklas

http://np00.github.io/#i 

Vladimir Alexiev

unread,
Apr 5, 2018, 7:46:22 AM4/5/18
to Open Research Knowledge Graph
I've been very interested in Dokieli, but it seems to me that for most realistic work you need to hand-edit HTML.
Dokieli is great for tweaking a paper, inserting specialized things, and adding annotations, but I don't think you can use it for creating a whole new article unless you edit HTML.
I'd love to be proven wrong. Eg how did you convert your SCOR article to this semantic format?

I think that HTML editing is out of reach for most researchers.
I know HTML but I hate to edit HTML. 
Instead, I write in plain text (emacs orgmode) then produce PDF or HTML using one of the org exporters ("ox-*").
There are also a number of editors based on Markdown.
Pandoc is also a very powerful convertor between formats.

I'd love if someone can extend orgmode to produce semantic papers (in Dokieli, RASH or whatever).

Petersen, Niklas

unread,
Apr 5, 2018, 4:24:53 PM4/5/18
to Open Research Knowledge Graph

Because I assumed someone would ask, I added the word "manually" to it :)

Once I realized that this is what you have to do, I looked for ways to automatize it and realized how hard it is. Nothing works by itself, but I think if you combine some things, you get for a subset of papers good enough results. I am close to it actually, but my PhD Thesis is keeping me busy for now. Of course you will never reach 100%, but I don't think that is necessary. The "typos" will be hopefully fixed by the crowd. Or that is actually how I imagined it. And if you have >1k or more papers in RDF and HTML+RDFa, you have enough to start a discussion (hopefully). Well ok, RDF is probably not enough. You have to build at least one nice app on the top which milks the benefits of the model below it. 

HTML+RDFa is for me just a view and not too important. As I mentioned in the Workshop, I sense that it could get very interesting to explore ways to represent, classify and link on an as granular level as possible every piece of the Scientific Article. Paragraphs, Sentences, Figures, Algorithms, Mathematical Formulas etc. Not for the sake of linking them, but to build some cool algorithms on the top.

But one step after another.

On the topic of "HTML Editing":
To me, that doesn't make sense. Nobody should ever edit HTML to create or edit a Scientific Article. I would prefer if there would exist something like "Google Docs on the top of static Scientific Articles". Such as, to collect as much "scientists crowd knowledge" as possible and add it to the Research Graph. However, I am afraid if such a thing ever exist, scientists would start to promote their own papers, or do nasty things to their hated fellow researchers and then you have to spend a lot of resources on quality control here.


Best regards,
Niklas


Von: or...@googlegroups.com <or...@googlegroups.com> im Auftrag von Vladimir Alexiev <vladimir...@ontotext.com>
Gesendet: Donnerstag, 5. April 2018 13:46
An: Open Research Knowledge Graph
Betreff: Re: Scientific Papers as HTML+RDFa
 
--
You received this message because you are subscribed to the Google Groups "Open Research Knowledge Graph" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orkg+uns...@googlegroups.com.
To post to this group, send email to or...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/orkg/7ff2cfc1-85b2-4e6a-8290-3228286da45a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Vladimir Alexiev

unread,
Apr 5, 2018, 5:18:40 PM4/5/18
to Open Research Knowledge Graph
Although I love orgmode, I think that pandoc and pandoc-scholar are more advanced towards scientific article production.
See this paper about pandoc-scholear: https://peerj.com/articles/cs-112/.
They produce JSONLD for authors, affiliations/institutions, and citations (CITO). This is configured with simple YAML in the markdown.
They still don't do all dokieli stuff (eg ability to markup the semantic purpose of every section) but they do a lot.

Interesting you'd mention automated analysis of math formulas. Pandoc uses Latex for math markup and then 
"parses formulas into internal structures and allows conversion into formats other than LATEX. This allows for format-specific formula representation and enables computational analysis of the formulas (Corbí& Burgos, 2015)."

Dokieli's rendering in ACM and LNCS layouts is neat, but pandoc can also produce other formats such as DOCX, ODT, EPUB, PDF, and JATS.
In particular, JATS is important for publishing and archiving by large publishers and libraries such as LoC.
Dokieli espouses the opposite revolutionary ideology, that HTML is *the* universal format and nothing else is needed, and authors should take care of their publishing.

Petersen, Niklas

unread,
Apr 6, 2018, 2:14:12 AM4/6/18
to Open Research Knowledge Graph

FYI: I read a very interesting discussion on what goes wrong in science including ideas and links on Hacker News:
    

    https://news.ycombinator.com/item?id=16764321


Favorite quotes:

"Who says we need to convince them? How about we leave them behind? They are rentiers, gatekeeping society's access to publicly funded scientific knowledge. I can't think of a reason why society should allow this hostage situation to continue."

"but you need to figure out a way to take over the universities then because scientists are also animals who need food and shelter etc. etc. and depend on grants, stipends, and salaries to buy those things.

I'm not saying this to be dismissive - I'm strongly in favor of faculties organizing to unseat administrators from their privileged positions. It's baffling to me that bright minds on campuses complain at length about the state of higher education but seem oddly averse to doing anything about it."



It looks like a lot of people would be very willing to move to something else. And we are surely not alone :)

Niklas

Gesendet: Donnerstag, 5. April 2018 23:18

An: Open Research Knowledge Graph
Betreff: Re: Scientific Papers as HTML+RDFa
--
You received this message because you are subscribed to the Google Groups "Open Research Knowledge Graph" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orkg+uns...@googlegroups.com.
To post to this group, send email to or...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages