Your comments wanted -- Schema.org extensions for biological database entries

28 views
Skip to first unread message

MORITA Mizuki

unread,
Aug 10, 2012, 5:43:34 AM8/10/12
to biohac...@googlegroups.com
Dear all,

We proposed “schema.org” extensions for biological DB entries at the
end of last year, but unfortunately, we have received very few
comments. We need more discussion:
http://www.w3.org/wiki/WebSchemas/BioDatabases

(They are meta-data, which are supposed to be buried in the HTML code
of DB entries and to be retrieved by search engines, such as Google,
Yahoo! and Bing, to make snippets in search results more valuable.)

If you have time, we'd like your comments on the proposal through the
following ways:
1. Reply to the W3C public-vocabs ML:
* http://lists.w3.org/Archives/Public/public-vocabs/2012Mar/0080.html
2. Reply to a Twitter post:
* https://twitter.com/keyboardrobot/status/179411931994652672
3. Reply to this mail

Thank you very much in advance.

===== References =====
* About Schema.org
* http://schema.org/
* http://googleblog.blogspot.jp/2011/06/introducing-schemaorg-search-engines.html
* http://developer.yahoo.com/blogs/ydn/posts/2011/06/introducing-schema-org-a-collaboration-on-structured-data/
* http://www.bing.com/community/site_blogs/b/search/archive/2011/06/02/bing-google-and-yahoo-unite-to-build-the-web-of-objects.aspx
* About microdata
* http://www.w3.org/TR/microdata/
* http://support.google.com/webmasters/bin/answer.py?hl=en&answer=176035
* About rich snippets
* http://support.google.com/webmasters/?hl=en

--

MORITA Mizuki
森田 瑞樹

M. Scott Marshall

unread,
Aug 10, 2012, 5:55:36 AM8/10/12
to morita...@gmail.com, biohac...@googlegroups.com
Dear Morita,

There was an HCLS discussion and a teleconference a few months ago
with people from Schema.org that might be useful to you. I think that
if you read through the mailing list related threads and minutes, you
will find some useful ideas and opinions.

I recommend that you post this request for comments to the HCLS list
at "HCLS" <public-sem...@w3.org> and/or directly respond to
some of the old discussion threads.

Kind regards,
Scott

--
M. Scott Marshall, PhD
MAASTRO clinic, http://www.maastro.nl/en/1/
http://eurecaproject.eu/
https://plus.google.com/u/0/114642613065018821852/posts
http://www.linkedin.com/pub/m-scott-marshall/5/464/a22

Jerven Bolleman

unread,
Aug 10, 2012, 6:01:36 AM8/10/12
to biohac...@googlegroups.com
Dear All,

This is an excellent idea and support the concept. I am just thinking
that the link to a species should be a link to Taxon concept instead
of a property with an NCBI id.
Which allows other taxonomic databases to be used. Otherwise defenitly
something that is implementable for uniprot.org.

Regards,
Jerven





On Fri, Aug 10, 2012 at 11:43 AM, MORITA Mizuki <morita...@gmail.com> wrote:
--
Jerven Bolleman
m...@jerven.eu

José María Fernández González

unread,
Aug 10, 2012, 6:00:58 AM8/10/12
to biohac...@googlegroups.com
Dear Mizuki,
it is very interesting the proposal (I have to recognize I read it too fast). At technical level my suggestion is using RDFa for the annotations at HTML level, instead of custom tags. You can see several examples here:

http://en.wikipedia.org/wiki/RDFa#Examples_of_RDFa

So my recommendation at technical level is translating the ontology you have already depicted to OWL or something similar, and then use RDFa and the ontology to annotate the HTML content.

Best Regards,
José María Fernández
"There is no reason why anybody would want a computer in their home" -
Ken Olson, founder of DEC 1977
"640K ought to be enough for anybody" - Bill Gates, 1981
"Nobody will ever outgrow a 20Mb hard drive." - ???

"Premature optimization is the root of all evil." - Donald Knuth

José María Fernández González
Tlfn: (+34) 91 732 80 00 / 91 224 69 00 (ext 3061)
e-mail: jmfer...@cnio.es Fax: (+34) 91 224 69 76
Unidad del Instituto Nacional de Bioinformática
Biología Estructural y Biocomputación Structural Biology and Biocomputing
Centro Nacional de Investigaciones Oncológicas
C.P.: 28029 Zip Code: 28029
C/. Melchor Fernández Almagro, 3 Madrid (Spain)


**NOTA DE CONFIDENCIALIDAD** Este correo electr�nico, y en su caso los ficheros adjuntos, pueden contener informaci�n protegida para el uso exclusivo de su destinatario. Se proh�be la distribuci�n, reproducci�n o cualquier otro tipo de transmisi�n por parte de otra persona que no sea el destinatario. Si usted recibe por error este correo, se ruega comunicarlo al remitente y borrar el mensaje recibido.
**CONFIDENTIALITY NOTICE** This email communication and any attachments may contain confidential and privileged information for the sole use of the designated recipient named above. Distribution, reproduction or any other use of this transmission by any party other than the intended recipient is prohibited. If you are not the intended recipient please contact the sender and delete all copies.

Hilmar Lapp

unread,
Aug 10, 2012, 4:45:24 PM8/10/12
to biohac...@googlegroups.com
Dear Morita,

I can't see from the documentation on that W3C page that this is being coordinated with BioDBcore; shouldn't it though?

Gaudet, Pascale, Amos Bairoch, Dawn Field, Susanna-Assunta Sansone, Chris Taylor, Teresa K Attwood, Alex Bateman, et al. 2011. “Towards BioDBcore: a Community-defined Information Specification for Biological Databases.” Database: The Journal of Biological Databases and Curation 2011 (0) (January): baq027. doi:10.1093/database/baq027.

-hilmar
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org :
===========================================================



MORITA Mizuki

unread,
Aug 10, 2012, 10:52:40 PM8/10/12
to M. Scott Marshall, biohac...@googlegroups.com
Dear Scott,

Thank you for your suggestion! I haven't read the HCLS ML. I will
check it and post my request.

Best regards,
Mizuki

--

MORITA Mizuki
森田 瑞樹


MORITA Mizuki

unread,
Aug 11, 2012, 7:55:54 AM8/11/12
to biohac...@googlegroups.com
Dear Jerven,

I haven't known Taxon concept. Thank you for your suggestion.

Taxon Concept (http://taxonconcept.stratigraphy.net/search.php)
returned nothing against my queries: “homo sapiens”, “human”, “mus
musculus” and “mouse”. Mmm... Why? Did I make mistakes? Anyway, I need
to learn about Taxon concept.

# UniProt seems to use NCBI Taxonomy ID.

--

MORITA Mizuki
森田 瑞樹


MORITA Mizuki

unread,
Aug 11, 2012, 8:19:15 AM8/11/12
to biohac...@googlegroups.com
Dear José,

Thank you for your comment.

It's great that RDFa has excellent power of expression. But now, we
are focusing on not expressing all information in each database entry,
but on enriching the search results from search engines. For this
purpose, we think it's better to follow the agreement (=schema.org)
among major search engines (Google, Yahoo! and Bing). It means we will
use microdata instead of RDFa.

Best regards,
Mizuki

--

MORITA Mizuki
森田 瑞樹


> **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en su caso los
> ficheros adjuntos, pueden contener información protegida para el uso
> exclusivo de su destinatario. Se prohíbe la distribución, reproducción o
> cualquier otro tipo de transmisión por parte de otra persona que no sea el

MORITA Mizuki

unread,
Aug 11, 2012, 9:01:22 AM8/11/12
to biohac...@googlegroups.com
Dear Hilmar,

Thank you for valuable comment. I added the link to the BioDBcore
website on the W3C Wiki page. I realize that we should add description
on relation to BioDBcore, but I now don't have good ideas.

We, of course, made our set with taking account of BioDBcore. But, as
our proposal is aiming at improving search engine results page, our
set is smaller than that of BioDBcore.

Best regards,
Mizuki

--

MORITA Mizuki
森田 瑞樹


Hilmar Lapp

unread,
Aug 12, 2012, 1:04:07 PM8/12/12
to biohac...@googlegroups.com
Morita,

There's a ratified TDWG (Biodiversity Information Standards, f.k.a. Taxonomic Databases Working Group) standard, the Taxon Concept Schema:

http://www.tdwg.org/standards/117/

This is probably where you want to start at. I don't know what the folks at Stratigraphy.net do, but it is not a standard project.

-hilmar

Jerven Bolleman

unread,
Aug 12, 2012, 2:42:11 PM8/12/12
to biohac...@googlegroups.com
Hi All,

First of all I am sorry for the confusion caused. When I talked about a taxon concept I should have included some examples.
The main reason is that NCBI taxonomy (including as used in UniProt) covers all organism concepts used in the wider biology field.
Take for example the "homo erectus"[1] concept you can't find it in the NCBI taxonomy because there are no nucleotide/protein sequences for it.
While discussing the evolutionary history of homo sapiens a link to homo erectus is necessary, so for those databases that talk about organism
without a nucleotide sequence have different taxonomic databases.

So in example form: this is what I propose
<div itemscope itemtype ="http://schema.org/BiologicalDatabaseEntry" additionalType="http://purl.uniprot.org/Protein">
<h1><a itemprop="url" href="http://purl.uniprot.org/uniprot/P05067"></h1>
<span itemprop="isEntryOf" itemscope itemtype="http://schema.org/BiologicalDatabase">
<a itemprop="url" href="http://purl.uniprot.org/uniprot/">
<span itemprop="name">UniProt KB</span>
</a>
</span>
</div>

<div itemscope itemtype ="http://schema.org/TaxonomicDatabaseEntry" additionalType="http://purl.uniprot.org/Taxon">
<h1><a itemprop="url" href="http://purl.uniprot.org/taxonomy/9606"></h1>
<span itemprop="isEntryOf" itemscope itemtype="http://schema.org/BiologicalDatabase">
<a itemprop="url" href="http://purl.uniprot.org/taxonomy/">
<span itemprop="name">UniProt Taxonomy</span>
</a>
</span>
<span itemprop="isEntryOf" itemscope itemtype="http://schema.org/BiologicalDatabase">
<a itemprop="url" href="http://www.ncbi.nlm.nih.gov/taxonomy/">
<span itemprop="name">NCBI Taxonomy</span>
</a>
</span>
</div>

I hope that this helps resolve the confusion I caused.
Regards,
Jerven Bolleman

[1] https://en.wikipedia.org/wiki/Homo_erectus / https://ja.wikipedia.org/wiki/%E3%83%9B%E3%83%A2%E3%83%BB%E3%82%A8%E3%83%AC%E3%82%AF%E3%83%88%E3%82%B9

Andrea Splendiani

unread,
Aug 12, 2012, 9:32:06 PM8/12/12
to biohac...@googlegroups.com
Hi,

as a side question arising from this discussion:
Do you think there is still some space (in the technological landscape) for RDFa ?
As far as I can see (perhaps not much...) that the interface between web-content and structured data is of primary interest for search engines related issues. This was the area that RDFa was to cover, and is now probably filled by microdata.
Other than search engine friendliness... why embedding RDF in html, and not just having an RDF header with the data content or, better, a link to RDF ?

Just curious to know opinions...

best,
Andrea

Rutger Vos

unread,
Aug 13, 2012, 3:58:45 AM8/13/12
to biohac...@googlegroups.com
I hope there is, we're using it - though not in HTML but in XML:
http://nexml.org
--
Dr. Rutger A. Vos
Bioinformaticist
NCB Naturalis
Visiting address: Office A109, Einsteinweg 2, 2333 CC, Leiden, the Netherlands
Mailing address: Postbus 9517, 2300 RA, Leiden, the Netherlands
http://rutgervos.blogspot.com

Joachim Baran

unread,
Aug 13, 2012, 10:22:23 AM8/13/12
to biohac...@googlegroups.com
Hello,

On 13 August 2012 03:58, Rutger Vos <rutge...@gmail.com> wrote:
I hope there is, we're using it - though not in HTML but in XML: http://nexml.org

On Mon, Aug 13, 2012 at 5:32 AM, Andrea Splendiani <and...@sgtp.net> wrote:
> Do you think there is still some space (in the technological landscape) for RDFa ?
[...] 
> Other than search engine friendliness... why embedding RDF in html, and not just having an RDF header with the data content or, better, a link to RDF ?
  Perhaps we should take some time during the hackathon and get together over a beer where we jot down current uses of Semantic Web formats/technologies in bioinformatics. We could then capture opinions and write-up a recommendation/publication spiced up with a literature review on the current state-of-art of the Semantic Web and its anticipated future.

  I think that would benefit the community rather well, because I am usually having a rather hard time advocating the use of Semantic Web technologies due to the many approaches toward representing semantic annotations. An overview with good recommendations that are backed up by references would provide people from outside of the field a good introduction, whilst it also might help researchers within the field to position themselves more distinctively.

  Would that be something interesting to pursue?

Best,
Joachim

Andrea Splendiani

unread,
Aug 13, 2012, 10:36:57 AM8/13/12
to biohac...@googlegroups.com, biohac...@googlegroups.com
Hi,

I think it would be a good idea, if coming from the right perspective.
I don't read anymore generic presentations of benefits of semantic web technologies ;)
It would be interesting to start from a non semantic web perspective: like taking a couple of paradigmatic problems (and the associated resources) and discuss how sem web technologies can/can't help, and at what "cost".
Cost (generically intended) is a big point...

Definitively something to discuss over a beer, or two, or three...

Best,
Andrea

Sent from my iPad

On 13 Aug 2012, at 15:22, Joachim Baran <joachi...@gmail.com> wrote:

Hello,

On 13 August 2012 03:58, Rutger Vos <rutge...@gmail.com> wrote:
I hope there is, we're using it - though not in HTML but in XML: http://nexml.org

On Mon, Aug 13, 2012 at 5:32 AM, Andrea Splendiani <and...@sgtp.net> wrote:
> Do you think there is still some space (in the technological landscape) for RDFa ?
[...] 
> Other than search engine friendliness... why embedding RDF in html, and not just having an RDF header with the data content or, better, a link to RDF ?
  Perhaps we should take some time during the hackathon and get together over a beer where we jot down current uses of Semantic Web formats/technologies in bioinformatics. We could then capture opinions and write-up a recommendation/publication spiced up with a literature review on the current state-of-art of the Semantic Web and its anticipatedy future.

Joachim Baran

unread,
Aug 13, 2012, 11:21:22 AM8/13/12
to biohac...@googlegroups.com
Hello,

On 13 August 2012 10:36, Andrea Splendiani <and...@sgtp.net> wrote:
I think it would be a good idea, if coming from the right perspective.
I don't read anymore generic presentations of benefits of semantic web technologies ;)
  Oh yes, I absolutely agree with that.

Joachim

Pjotr Prins

unread,
Aug 13, 2012, 1:16:05 PM8/13/12
to biohac...@googlegroups.com
On Mon, Aug 13, 2012 at 11:21:22AM -0400, Joachim Baran wrote:
> Hello,
>
> On 13 August 2012 10:36, Andrea Splendiani <[1]and...@sgtp.net> wrote:
>
> I think it would be a good idea, if coming from the right perspective.
> I don't read anymore generic presentations of benefits of semantic web
> technologies ;)
>
> Oh yes, I absolutely agree with that.
> Joachim

So why is RDF not more ubiquitous/common? I think these generic
presentations don't really make a great selling point. Name me one
convincing paper for bioinformatics I can show my boss.

Pj.

Joachim Baran

unread,
Aug 13, 2012, 1:46:00 PM8/13/12
to biohac...@googlegroups.com
Hello,

On 13 August 2012 13:16, Pjotr Prins <pjot...@thebird.nl> wrote:
[...] I think these generic presentations don't really make a great selling point.
  Unless I misunderstood Andrea, then we are all agreeing that generic presentations are not sufficient anymore.
 
[...] Name me one convincing paper for bioinformatics I can show my boss.
  Well, let's work on that. :)

Best wishes,
Joachim 

Pjotr Prins

unread,
Aug 13, 2012, 4:51:27 PM8/13/12
to biohac...@googlegroups.com
On Mon, Aug 13, 2012 at 01:46:00PM -0400, Joachim Baran wrote:
> Hello,
>
> On 13 August 2012 13:16, Pjotr Prins <[1]pjot...@thebird.nl> wrote:
>
> [...] I think these generic presentations don't really make a great
> selling point.
>
> Unless I misunderstood Andrea, then we are all agreeing that generic
> presentations are not sufficient anymore.

I don't agree. Unless you show me a convincing 'presentation' in the
form of a paper.

> [...] Name me one convincing paper for bioinformatics I can show my
> boss.
>
> Well, let's work on that. :)

Sure. By the sound of it, we need two papers. One delivering the
'state-of-the-art', the other a 'convincing argument for using RDF' in
bioinformatics. I don't see how we can marry the two. But many drinks
may settle that, I am happy to oblige.

Truth is, I think in all existing presentations, RDF is missold to the
general audience. RDF's strength is *not* about sharing data. And
because it put *me* on the wrong foot for years, I think it is time for
a generic presentation that can sell RDF to anyone in bioinformatics
:)

Again, point me to the paper that does that. If there is such a
paper, we don't need to redo it. Eagerly awaiting a reference...

Pj.

Andrea Splendiani

unread,
Aug 13, 2012, 7:44:36 PM8/13/12
to biohac...@googlegroups.com
Hi,

that's the problem with "generic" presentations. In my experience with biologists, I was able to sell quite complex approaches to information design. But the problem came with the question: where is the data ?
Perhaps at this stage the semantic web requires more engineering than innovation...

ciao,
Andrea

Andrea Splendiani

unread,
Aug 13, 2012, 7:54:31 PM8/13/12
to biohac...@googlegroups.com
Hi,

I guess I answered too early in the thread ;)

>> On 13 August 2012 13:16, Pjotr Prins <[1]pjot...@thebird.nl> wrote:

>> [...] I think these generic presentations don't really make a great
>> selling point.

>> Unless I misunderstood Andrea, then we are all agreeing that generic
>> presentations are not sufficient anymore.
>
> I don't agree. Unless you show me a convincing 'presentation' in the
> form of a paper.
I have read lot of "presentations" on the virtues of semantic web in life sciences. They all make sense but there is a limit in how they can be "convincing".
In a sense, benefits are there to be proved...

>
>> [...] Name me one convincing paper for bioinformatics I can show my
>> boss.
>>
>> Well, let's work on that. :)
>
> Sure. By the sound of it, we need two papers. One delivering the
> 'state-of-the-art', the other a 'convincing argument for using RDF' in
> bioinformatics. I don't see how we can marry the two. But many drinks
> may settle that, I am happy to oblige.
Then the main question is: sake or beer ?

On the "state of the art": I don't have a strong opinion on this. Perhaps it make sense to make a point, as schema.org and a few other things actually changed the panorama.

> Truth is, I think in all existing presentations, RDF is missold to the
> general audience. RDF's strength is *not* about sharing data. And
> because it put *me* on the wrong foot for years, I think it is time for
> a generic presentation that can sell RDF to anyone in bioinformatics
> :)
Uhm... I think bioinformatics has its bias, and for some socio-hystoric reason, RDF has been sold as a data sharing solutions in some cases. Which it may or not be (sharing is a part of communication in the end).

> Again, point me to the paper that does that. If there is such a
> paper, we don't need to redo it. Eagerly awaiting a reference...
I know many. What is it that your missing respect to say, just to mention one we know, Eric's paper in BIB some years ago ?
Just to know what you are looking for...

best,
Andrea

>
> Pj.

Jerven Bolleman

unread,
Aug 14, 2012, 4:23:51 AM8/14/12
to biohac...@googlegroups.com
Hi All,

The reason to use RDF and SPARQL is in my opinion an economic one not
a scientific one. Therefore to convince your boss to use it does not
require an academic paper but a budget.

There are 2 major reasons that RDF and SPARQL is slow to grow/hype.
The first is that it depends on a network effect. The second is
technical maturtity of the tools.
The network effect is simple: the data integration is only cheaper if
multiple datasets of interest are available as RDF. The one which was
harder to solve is technical maturity.

Today I think these 2 reasons have been overcome. On the network
effect, my 15 min, presentation will combine the SPARQL endpoint of at
least 4 primary datasources and answer interesting questions i.e.
there is a network. On the technical maturity: Today, I can recommend
four stores that can achive performance at scale. 2 years ago I could
not recommend any.

For me I would rather write a set of tutorials about sharing your data
in a SPARQL endpoint And have a paper discussing the results of a
SPARQL query that answers an difficult and interesting problem.

Regards,
Jerven
--
Jerven Bolleman
m...@jerven.eu

Pjotr Prins

unread,
Aug 14, 2012, 4:48:03 AM8/14/12
to biohac...@googlegroups.com
On Tue, Aug 14, 2012 at 10:23:51AM +0200, Jerven Bolleman wrote:
> Hi All,
>
> The reason to use RDF and SPARQL is in my opinion an economic one not
> a scientific one. Therefore to convince your boss to use it does not
> require an academic paper but a budget.

Most bosses do not provide a budget without justification (mine is not
pointy haired). A well written/known paper helps justifying a budget.

All I am saying, is that until now I have not seen a convincing paper
for use and uptake of RDF in Bioinformatics. If this is a
representative example:

http://www.biomedcentral.com/content/pdf/1471-2105-8-S3-S1.pdf

you can see what I mean. While, I quote

"""To achieve this goal, the Semantic Web community has proposed and
developed new standard Web languages such as RDF (the Resource
Description Framework) [3] and OWL (the Web Ontology Language) [4],
which provide enhanced capability for resource description and
knowledge representation going far beyond the content presentation
capabilities of HTML language and data tagging capabilities of the XML
language."""

is in some sense descriptive, but not meant to give real insight. It
is bound to have my boss' eyes glaze over (and mine too).

What paper, or even web resource, sets a bioinformatician on track for
even trying RDF and SPARQL? In the field I see very little uptake of
what is in principle very exciting stuff, and *we* know it can help
research groups tremendously (who are now mired in the mud of
spreadsheets and SQL models). Come on guys and gals, I am talking
excitement here!

I am not receiving real references to papers I can use, so I assume
there are none and we have some evangelizing to do. Prove me
different. Not by arguing, but by a reference. Only a really good
paper will convince me there is no need to write one.

Pj.

Rutger Vos

unread,
Aug 14, 2012, 7:29:30 AM8/14/12
to biohac...@googlegroups.com
I'm going to have to agree with Pjotr here, especially if he means a
more practical paper that actually *demonstrates* the utility of RDF
in a bioinformatics context. The starry-eyed blue sky papers and
talks, we've all seen them.

Mark

unread,
Aug 10, 2012, 6:47:12 AM8/10/12
to biohac...@googlegroups.com
+1 to the Taxpn concept idea. There are SIO predicates for attaching taxon-database ID's to this concept.

M

Bruno Aranda

unread,
Aug 14, 2012, 7:48:00 AM8/14/12
to biohac...@googlegroups.com
Hi all,

Although I am not attending the BioHackathon this year, here are my thoughts :)

Basically, one of the major things that is limiting the adoption of Linked data is that the field is highly academic. In addition, there is the popularity of Big Data, which helps to solve most of the simpler problems, and that everybody talks about. If you want to argue about how useful RDF can be for Bioinformatics, it would be good to discuss how a model that uses graph data is simpler to develop and extend, and facilitate insight and data analysis, compared to the Big Data approach of using highly uniform resources with limited used cases. So, if you really want to analyse data, graph data is the way to go.

However, sometimes I feel that we are letting the solution drive the problem, instead of the problem driving a solution. RDF is a solution. Is it really addressing the main problem?

Cheers,

Bruno

Rutger Vos

unread,
Aug 14, 2012, 7:55:19 AM8/14/12
to biohac...@googlegroups.com
May I just issue a slight word of warning: taxon concepts are hard,
figuring out how to model them is TDWG's core business and they
haven't (fully) solved it in decades so one hackathon won't do it
either. Be prepared to dive into the scary world of homonyms,
synonyms, different authorities, etc.

Hilmar Lapp

unread,
Aug 14, 2012, 8:46:22 AM8/14/12
to biohac...@googlegroups.com, biohac...@googlegroups.com
Plus, while taxonomists obsess about them, nobody else cares about taxon concepts, including us. That's why I'm always a little scared when non-taxonomists talk about representing and linking taxon concepts - it's likely that they don't know what they are talking about, and that they don't need to know either.

-hilmar

Sent with a tap.

Fumihiro Kato

unread,
Aug 14, 2012, 9:26:07 AM8/14/12
to biohac...@googlegroups.com
This discussion is interesting for me as we have decided to manage
only taxon names and taxon ranks for Japanese common names and
scientific names because it was very hard for us to recognize and define
what taxon concepts were.

Fumi

Andrea Splendiani

unread,
Aug 14, 2012, 9:45:00 AM8/14/12
to biohac...@googlegroups.com
Hi,

so, on why there is a (relatively) limited adoption of sem-web technologies, a few thoughts in an unordered list:

-) Not all bits of it have the same levels of maturity and potential practical impact. Yet, when sem-web technologies are presented, people tend to present the whole picture. A bit like selling a car with an AI pilot: maybe that's the final vision of the car, but not the one that would go on a street complying with regulations today. 

-) Of all presentations on the benefits of sem-web technologies, there is always little "hard" evidence: numbers. Besides, in practices many examples provided so far centralize all data under some repository. For the un-inititated, this is yet another ETL approach.

-) For a technological framework, there have been too many proof of principle respect to valuable solutions. Big Data is in the hype because hadoop was there to develop solutions on top of it, not for some prototype approach and proof of concept in massive parallel computation.

-) Publishing data is relatively easy, generating or maintaining it, it is not. As a result, we had lot of data on the sem web of unclear value.  Now this is changing...

-) It all works well with open data and publicly founded data. It is still unclear how the core shareability implied by sem web technologies would interact with a commercial environment.

-) Not all problems have the sem-web as a silver bullet: can't see easily how this would help in next-gen, and in more "approximate" spaces. But I didn't look into this deep enough, perhaps.

-) As a paradigm shift, you just get a lot of resistance out of "human factors".

Just a few thoughts...

ciao,
Andrea

MORITA Mizuki

unread,
Aug 14, 2012, 10:08:49 AM8/14/12
to biohac...@googlegroups.com
Dear Hilmar,

Thank you for letting me know. I would start with TDWG.

Regards,
Mizuki

--

MORITA Mizuki
森田 瑞樹


MORITA Mizuki

unread,
Aug 14, 2012, 10:14:45 AM8/14/12
to biohac...@googlegroups.com
Dear Jerven,

I might see! Users currently can use only NCBI Taxonomy ID, and you
mean we should make users choose user's favorite Taxonomy ID
collection. Is it right?

Regards,
Mizuki

--

MORITA Mizuki
森田 瑞樹


Jerven Bolleman

unread,
Aug 14, 2012, 10:18:31 AM8/14/12
to biohac...@googlegroups.com
Exactly!
--
Jerven Bolleman
m...@jerven.eu

MORITA Mizuki

unread,
Aug 14, 2012, 10:22:12 AM8/14/12
to biohac...@googlegroups.com
Dear Mark,

> There are SIO predicates for attaching taxon-database ID's to this concept.

It sounds very nice. I'm interested in it. Would you recommend a
reference if you have?

Regards,
Mizuki

--

MORITA Mizuki
森田 瑞樹


MORITA Mizuki

unread,
Aug 14, 2012, 11:25:35 AM8/14/12
to biohac...@googlegroups.com
Dear Jerven and others,

Thank you! Examples let us to overcome the language barrier.

So, the issue is the balance between simplicity and expressiveness.

Previously, people in a cell bank claimed me that they want to
distinguish human races for human cells in the bank, and NCBI Taxonomy
is not enough for it. Now, Jerven raised another issue and proposed
alternative way.

While, I think in many cases NCBI Taxonomy is enough for the purpose
of improving search engine results, and it's easy to use if only one
ID collection is allowed. But I'm not convinced.

Does anyone have another idea or advice?

Regards,
Mizuki

--

MORITA Mizuki
森田 瑞樹


Thomas Lütteke

unread,
Aug 14, 2012, 1:02:17 PM8/14/12
to biohac...@googlegroups.com
Am 14.08.2012 14:46, schrieb Hilmar Lapp:
> Plus, while taxonomists obsess about them, nobody else cares about taxon concepts, including us. That's why I'm always a little scared when non-taxonomists talk about representing and linking taxon concepts - it's likely that they don't know what they are talking about, and that they don't need to know either.
>
> -hilmar
Hi Hilmar,

I agree that people might not know what they are talking about when
talking about taxons, but I disagree with your statement that the don't
need to know. From the computer science aspect of bioinformatis taxons
might not be useful or even necessary, but from the biological aspects
they can be very useful / important, as taxon concepts allow comparing
data of groups of individuals that have a certain kind of relationship.
For example, you might be interested in finding specific differences
between humans (or mammals in general) and certain groups of pathogens;
and such differences can be important to identify potential targets to
attack the pathogens without harming the host.

Therefore, it is important to include taxonomic information in your
data, and it is also important to do this in a comparable way, i.e. to
ideally use unique names or IDs. I know that this is not an easy task
given the large number of different common names etc., but nevertheless
it is an important task and definitely nothing that people do not need
to think about.

Best regards,
Thomas

Hilmar Lapp

unread,
Aug 14, 2012, 1:29:46 PM8/14/12
to biohac...@googlegroups.com
Thomas,

I wasn't arguing against including taxonomic information at all - quite the opposite. Taxon concepts are not the same as taxonomic designation or identification, though, nor is it about unique IDs. NCBI Taxonomy doesn't have taxon concepts, nor has anyone found them amiss there.

Taxon concepts are important to track the history of taxa and their nomenclatural acts, including publications, descriptions, and voucher specimens. I recommend the following article as an introduction:

Franz, N, R Peet, and A Weakley. 2008. “On the Use of Taxonomic Concepts in Support of Biodiversity Research and Taxonomy.” New Taxonomy Proceedings (Systematics Association).
http://labs.bio.unc.edu/Peet/pubs/Cardiff.pdf

On taxonomies and taxon concepts in the context of reasoning and the semantic web, I recommend this paper:
Franz, Nico M., and David Thau. 2010. “Biological Taxonomy And Ontology Development: Scope And Limitations.” Biodiversity Informatics 7: 45–66.
https://journals.ku.edu/index.php/jbi/article/viewArticle/3927

-hilmar

Jerven Bolleman

unread,
Aug 14, 2012, 3:27:15 PM8/14/12
to biohac...@googlegroups.com
I am really sorry to have used the term "Taxon Concept" I really meant
instead of using a ncbi tax id use a URL. Allowing, everyone to be
very flexible and do what they need.
--
Jerven Bolleman
m...@jerven.eu
Reply all
Reply to author
Forward
0 new messages