here are some pointers for further investigation about this topic, which
is still more an area of research, rather than broadly applied in practice:
1. SemWebQuality.org [1]
2. qualitywebdata.org [2]
3. Quality Criteria for Linked Data sources [3]
Cheers,
Bo
[1] http://semwebquality.org
[2] http://qualitywebdata.org/
[3]
http://sourceforge.net/apps/mediawiki/trdf/index.php?title=Quality_Criteria_for_Linked_Data_sources
Just to add to Bob's list,
On 14/10/2011 09:30, Bob Ferris wrote:
> Hi Bastin,
>
> here are some pointers for further investigation about this topic, which
> is still more an area of research, rather than broadly applied in practice:
>
> 1. SemWebQuality.org [1]
> 2. qualitywebdata.org [2]
> 3. Quality Criteria for Linked Data sources [3]
4. pedantic-web.org [4a] esp. the FOPs page [4b]
5. "Weaving the Pedantic Web" LDOW paper [5]
</flagrant-self-promotion>
6. State of the LOD cloud [6]
7. Denny Vrandecic's thesis [7] on quality of Web ontologies
8. Halpin et al. [8a] Ding et al. [8b] on quality of owl:sameAs
...to name but a few resources that come to mind.
Cheers,
Aidan
> Cheers,
>
>
> Bo
>
>
> [1] http://semwebquality.org
> [2] http://qualitywebdata.org/
> [3]
> http://sourceforge.net/apps/mediawiki/trdf/index.php?title=Quality_Criteria_for_Linked_Data_sources
[4a] http://pedantic-web.org/
[4b] http://pedantic-web.org/fops.html
[5] http://sw.deri.org/~aidanh/docs/pedantic_ldow10.pdf
[6] http://www4.wiwiss.fu-berlin.de/lodcloud/state/
[7] http://www.aifb.kit.edu/web/Phdthesis3008/en
[8a]
http://www.mendeley.com/research/owlsameas-isnt-same-analysis-identity-linked-data-1/
[8b] http://www.springerlink.com/content/q1359571l25472pk/
I'am wondering about something important
Nowadays, the era of the linked data started from october 2007, create
from scratch a new dimension of the last work pulbished about the data
quality of the semantic web?
what i mean that i found many papers from 2001 till 2007, discuss
about the "metadata quality evaluation and ensuring" like the
following
- Detecting Quality Problems in Semantic Metadata without the Presence
of a Gold Standard
- A Framework for Evaluating Semantic Metadata
- An Infrastructure for Acquiring High Quality Semantic Metadata
- On-To-Knowledge: Semantic Web Enabled Knowledge Management
- Metrics for Evaluation of Ontologybased Information Extraction
- Strategies for the Evaluation of Ontology Learning
With the same words, what i want to say : the evaluation and ensuring
of the data quality (metrics measurement and metrics evaluation) will
be concentrated then on the Linked data context of the semantic web?
Thanks a lot in advance
I
So instead to discuss about the semantic web quality from the scope of
these paper , we will evaulate the data quality now in the context of
the linked data
Not sure I fully understand your remarks (roughly speaking that Linked
Data quality papers are reinventing the wheel?), but in general I would
say that challenges for Linked Data are significantly different than the
challenges identified in most traditional Semantic Web literature. This
is very much true of issues regarding data-quality, etc.
Speaking in a very very general sense, the shift in requirements comes
from putting more focus on the "Web" and less focus on "Semantics". This
is quite a major shift. Certainly, Linked Data quality can learn from
pre-2007 earlier papers on the topic, but directly translating results
to modern Linked Data is (IMO) not always easy or even possible.
Cheers,
Aidan
Thanks for your reply and attention
I will try to reformulate my question. I have also some questions
concerning your answers
The quality measurements used to evaluate the semantics web before the
linked data are the same of the quality measurements used to evaluate
the semantic web after the linked data
What i understand that the semantic web after Oct 2007 (after the
linked data project) is defined as web of data in which all data are
interlinked to each other, and this is the goal that they want to
reach in the semantic web from 1999 (but not realized until 2007 due
to the heterogenity of data sources).
Sorry if i have some misunderstanding about the above..but please
clarify that to me
My questions concerning your answers :
> but in general I would say that challenges for Linked Data are significantly different than the challenges
> identified in most traditional Semantic Web literature
By the linked data , we will have a new semantic web based on the
linked data , and the traditional one that you mention (before the
data linked) will dissapear? sorry if it is a stupid question
> Speaking in a very very general sense, the shift in requirements comes from
> putting more focus on the "Web" and less focus on "Semantics".
I can't understand what do you mean here that it focus more on the
"web" and less focus on "semantics".However all recommendations of the
W3C provide different languages RDF, RDF(S) , OWL ...etc (which are
characterized by its formal semantics and increase by moving up in the
semantic layers). Meanwhile the Data linked are based upon such
languages to describe different resources
> Certainly, Linked Data quality can learn from pre-2007
> earlier papers on the topic,
Here, i'am in a conflict , there exist papers published about the data
quality of the linked data before 2007 ? How ?
> but directly translating results to modern
> Linked Data is (IMO) not always easy or even possible.
So , it appears from your answer that there exist an old linked data ,
and modern linked data , what is the IMO stands for?
if there are 2 types of linked data , what is the difference between
both ? i think that the modern one which is based on the design
principles (RDF URI and HTTP) i
A question now out of this discussion , i want to know your opinion
about the quality of the semantic web , to which degree we can rely
on?
Thanks in advance
My best Regards
Marc
your post indicates some misunderstandings that I would like to clarify. I might have a fringe point of view on this, and I am happy for anyone correcting me, or disagreeing.
There is no semantic web before the LOD web. The semantic web, from the beginning was built on URIs, RDF, etc. Long before 2007, data was published following the LOD principles, they were just not explicated by TimBL. Just to name two examples: the AIFB portal was publishing data before 2003 enabling follow your nose discovery, Semantic MediaWiki was publishing its data the same way in 2006, etc.
The major difference is a major shift in research and especially the development community surrounding the semantic web. We moved from a focus on ontologies, decidability of ontology languages, modeling methodologies etc. to a more technical focus on publishing data on the web. But the LOD principles were not new.
Thus, there is no pre-2007 semantic web. There was no difference in heterogeneity on the semantic web that hampered any uptake.
The other difference was that many of the ontologies developed were published, if at all, somewhere on the web, without any connection the the URI. Sometimes even in a ZIP file, having several such ontologies in one file. Which, for many of the research questions that were attacked back then, was acceptable, but not for a web of linked data, because the links could not be followed.
Regarding quality of data on the semantic web: the research that was done before linked data was taken serious focuses mostly on questions of modeling quality (eg Ontoclean), accuracy of the data, the possibility of effective reasoning, formal-logical consistency (there are tons of paper on that one!), etc. Almost none of the work focused on the web aspects of the data.
Aidan already mentioned my thesis. I think it is fair to say that it builds a bridge from the pre-2007 world (with its focus on ontologies) to the post-2007 world (with its focus on linked data). I have tried to look at the lowest syntactic aspects -- how should URIs be formed -- over the integration of the ontology in the web of data all the way to usability aspects in applications.
The LOD principles are necessary for the semantic web to come around. But they are not sufficient. We have a paper at ISWC next week arguing about the shortcomings of the current usage of labels on the web. Another necessary step. But there is quite some work left to do.
Now, closing -- your question is about "the quality of the semantic web, to which degree we can rely on". I would say that the question does not make much sense. Would you ask about the quality of the web? Would you ask how much you can rely on the web itself? It is obvious, that there are many sites you can not rely on. Others that you can. This kind of question is very complex, and in Chris Bizer's thesis -- also mentioned by Aidan -- you can find a number of ideas how to start thinking about them.
So really, what do you mean with your question? How accurate is the data out there on the web? How well are the technical standards implemented (what this mailing list is mostly concerned with)? How shared are the formalized conceptualizations? How useful is the data for my application? All of them are valid interpretations -- with very different answers. But talking about quality in general does not make sense really. If you don't believe me, try Zen and the Art of Motorcycle Maintenance.
I hope this helps,
sorry for the long post,
Denny
Sorry, I could have been clearer earlier. +1 to Denny's comments below.
> what is the IMO stands for?
"in my opinion". ;)
Cheers,
Aidan
would you mind moving that quite general discussion somewhere else?
publi...@w3.org would be suitable.
"Note that the mailing list is not a general discussion list or Q&A board, and
we discourage any traffic that is not directly concerned with fixing problems
with data that is already published on the Web."
Cheers,
Andreas.