[CFP] Second International Workshop on Semantic Statistics (SemStats 2014)

20 views

Skip to first unread message

Sarven Capadisli

unread,

Jun 24, 2014, 3:04:09 AM6/24/14

to publishing-st...@googlegroups.com

SemStats 2014 Call for Papers
=============================

Second International Workshop on Semantic Statistics (SemStats 2014)

Workshop website: http://semstats.org/
Event hashtags: #SemStats #ISWC2014

in conjunction with

ISWC 2014
The 13th International Semantic Web Conference
Riva del Garda - Trentino, Italy, October 19-23, 2014
http://iswc2014.semanticweb.org/

Workshop Summary
================

The goal of this workshop is to explore and strengthen the relationship
between the Semantic Web and statistical communities, to provide better
access to the data held by statistical offices. It will focus on ways in
which statisticians can use Semantic Web technologies and standards in
order to formalize, publish, document and link their data and metadata.
It follows the 1st Semantic Statistics workshop held at ISWC 2013
(SemStats 2013) http://www.datalift.org/en/event/semstats2013 that was a
big success attracting more than 50 participants all along the day.

The statistical community shows more and more interest in the Semantic
Web. In particular, initiatives have been launched to develop semantic
vocabularies representing statistical classifications and discovery
metadata. Tools are also being created by statistical organizations to
support the publication of dimensional data conforming to the Data Cube
W3C Recommendation. But statisticians see challenges in the Semantic
Web: how can data and concepts be linked in a statistically rigorous
fashion? How can we avoid fuzzy semantics leading to wrong analyses? How
can we preserve data confidentiality?

The workshop will also cover the question of how to apply statistical
methods or treatments to linked data, and how to develop new methods and
tools for this purpose. Except for visualisation techniques and tools,
this question is relatively unexplored, but the subject will obviously
grow in importance in the near future.

Motivation
==========

There is a growing interest regarding linked data and the Semantic Web
in the statistical community. A large amount of statistical data from
international and national agencies has already been published on the
web of data, for example Census data from the U.S., Spain or France,
amongst others. In most cases, though, this publication is done by
people exterior to the statistical office (see also
http://datahub.io/dataset/istat-immigration, http://270a.info/ or
http://eurostat.linked-statistics.org/), which raises issues such as
long-term URI persistence, institutional commitment and data maintenance.

Statistical organisations are also interested in how Semantic Web might
make it simpler for analysts to use well described statistical data in
conjunction with other forms of data (eg geospatial information,
scientific data, "big data" from various sources) which is expressed
semantically. The ability to bring together diverse types of data in
this way should enable new insights on multifaceted issues.

Statistical organizations also possess an important corpus of structural
metadata such as concept schemes, thesauri, code lists and
classifications. Some of those are already available as linked data,
generally in SKOS format (e.g. FAO's Agrovoc or UN's COFOG). Semantic
web standards useful for the statisticians have now arrived at maturity.
The best examples are the W3C Data Cube, DCAT and ADMS vocabularies. The
statistical community is also working on the definition of more
specialized vocabularies, especially under the umbrella of the DDI
Alliance. For example, XKOS extends SKOS for the representation of
statistical classifications, and Disco defines a vocabulary for data
documentation and discovery. The Visual Analytics Vocabulary is a first
step towards semantic descriptions for user interface components
developed to visualize Linked Statistical Data which can lead to
increased linked data consumption and accessibility. We are now at the
tipping point where the statistical and the Semantic Web communities
have to formally exchange in order to share experiences and tools and
think ahead regarding the upcoming challenges.

Statisticians have a long-going culture of data integrity, quality and
documentation. They have developed industrialized data production and
publication processes, and they care about data confidentiality and more
generally how data can be used.

The web of data will benefit in getting rich data published by
professional and trustworthy data providers. It is also important that
metadata maintained by statistical offices like concept schemes of
economic or societal terms, statistical classifications, well-known
codes, etc., are available as linked data, because they are of good
quality, well-maintained, and they constitute a corpus to which a lot of
other data can refer to.

It seems that after a period where the aim was to publish as many
triples as possible, the focus of the Semantic Web community is now
shifting to having a better quality of data and metadata, more coherent
vocabularies (see the LOV initiative), good and documented naming
patterns, etc. This workshop aims to contribute in these longer term
problems in order to have a significant impact.

The statistics community faces sometimes challenges when trying to adopt
Semantic Web technologies, in particular:

* difficulty to create and publish linked data: this can be alleviated
by providing methods, tools, lessons learned and best practices, by
publicizing successful examples and by providing support.
* difficulty to see the purpose of publishing linked data: we must
develop end-user tools leveraging statistical linked data, provide
convincing examples of real use in applications or mashups, so that the
end-user value of statistical linked data and metadata appears more clearly.
* difficulty to use external linked data in their daily activity: it is
important to develop statistical methods and tools especially tailored
for linked data, so that statisticians can get accustomed to using them
and get convinced of their specific utility.

To conclude, statisticians know how misleading it can be to exploit
semantic connections without carefully considering and weighing
information about the quality of these connections, the validity of
inferences, etc. A challenge for them is to determine, to ensure and to
inform consumers about the quality of semantic connections which may be
used to support analysis in some circumstances but not others. The
workshop will enable participants to discuss these very important issues.

Topics
======

The workshop will address topics related to statistics and linked data.
This includes but is not limited to:

How to publish linked statistics?

* What are the relevant vocabularies for the publication of statistical
data?
* What are the relevant vocabularies for the publication of statistical
metadata (code lists and classifications, descriptive metadata,
provenance and quality information, etc.)?
* What are the existing tools? Can the usual statistical software
packages (e.g. R, SAS, Stata) do the job?
* How do we include linked data production and publication in the data
lifecycle?
* How do we establish, document and share best practices?

How to use linked data for statistics?

* Where and how can we find statistics data: data catalogues, dataset
descriptions, data discovery?
* How do we assess data quality (collection methodology, traceability,
etc.)?
* How can we perform data reconciliation, ontology matching and instance
matching with statistical data?
* How can we apply statistical processes on linked data: data analysis,
descriptive statistics, estimation, correction?
* How to intuitively represent statistical linked data: visual
analytics, results of data mining?

Submissions
===========

This workshop is aimed at an interdisciplinary audience of researchers
and practitioners involved or interested in Statistics and the Semantic
Web. All papers must represent original and unpublished work that is not
currently under review. Papers will be evaluated according to their
significance, originality, technical content, style, clarity, and
relevance to the workshop. At least one author of each accepted paper is
expected to attend the workshop.

Workshop participation is available to ISWC 2014 attendants at an
additional cost, see http://iswc2014.semanticweb.org/registration for
details.

The workshop will also feature a challenge based on Census Data
published on the web or provided by Statistical Institutes. It is
expected that data from Australia, France and Italy will be available.
The challenge will consist in the realization of mashups or
visualizations, but also on comparisons, alignment and enrichment of the
data and concepts involved.

We welcome the following types of contributions:

* Full research papers (up to 12 pages)
* Short papers (up to 6 pages)
* Challenge papers (up to 6 pages)

All submissions must be written in English and must be formatted
according to the information for LNCS Authors (see
http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0). Please,
note that (X)HTML(+RDFa) submissions are also welcome as soon as the
layout complies with the LNCS style. Authors can for example use the
template provided at https://github.com/csarven/linked-research.
Submissions are NOT anonymous. Please submit your contributions
electronically in PDF format at
http://www.easychair.org/conferences/?conf=semstats2014 and before July
7, 2014, 23:59 PM Hawaii Time. All accepted papers will be archived in
an electronic proceedings published by CEUR-WS.org.

See important dates and contact info on the workshop home page.

If you are interested in submitting a paper but would like more
preliminary information, please contact semsta...@easychair.org.

Chairs
======

* Sarven Capadisli, University of Leipzig, Germany, and Bern University
of Applied Sciences, Switzerland
* Franck Cotton, INSEE, France
* Armin Haller, CSIRO, Australia
* Alistair Hamilton, ABS, Australia
* Monica Scannapieco, Istat, Italy
* Raphaël Troncy, EURECOM, France

Program Committee
=================

* Phil Archer, W3C, i-sieve, UK
* Ghislain Auguste Atemezing, Eurecom, France
* Jay Devlin, Statistics New Zealand, New Zealand
* Miguel Expósito Martín, Instituto Cántabro de Estadística, Spain
* Dan Gillman, US Bureau of Labor Statistics, USA
* Arofan Gregory, Metadata Technology NA, USA
* Tudor Groza, School of ITEE, The University of Queensland, Australia
* Christophe Guéret, Data Archiving and Networked Services (DANS), The
Netherlands
* Andreas Harth, AIFB, Karlsruhe Institute of Technology, Germany
* Hak Lae Kim, Samsung Electronics
* Laurent Lefort, CSIRO ICT Centre, Australia
* Domenico Lembo, Sapienza University of Rome, Italy
* Vincenzo Patruno, Istat, Italy
* Marco Pellegrino, Eurostat, Luxembourg
* Dave Reynolds, Epimorphics, UK
* Hideaki Takeda, National Institute of Informatics, Japan
* Wendy Thomas, Minnesota Population Center, USA
* Bernard Vatant, Mondeca, France
* Boris Villazón-Terrazas, iSOCO, Intelligent Software Components, Spain
* Joachim Wackerow, GESIS - Leibniz Institute for the Social Sciences,
Germany
* Stuart Williams, Epimorphics, UK

Reply all

Reply to author

Forward

0 new messages