ANN: LDIF - Linked Data Integration Framework Version 0.2 released

7 views
Skip to first unread message

Andrea Matteini

unread,
Aug 31, 2011, 11:49:37 AM8/31/11
to LDIF
Hi all,

we are happy to announce the release of the LDIF - Linked Data
Integration Framework Version 0.2.

The LDIF - Linked Data Integration Framework can be used within Linked
Data applications to translate heterogeneous data from the Web of
Linked Data into a clean local target representation while keeping
track of data provenance. LDIF provides an expressive mapping language
for translating data from the various vocabularies that are used on
the Web into a consistent, local target vocabulary. LDIF includes an
identity resolution component which discovers URI aliases in the input
data and replaces them with a single target URI based on user-provided
matching heuristics. For provenance tracking, the LDIF framework
employs the Named Graphs data model.

Compared to the initial Version 0.1 release, the new LDIF release
provides:
- improved performance (faster data loading, parallelization of the
data translation),
- smaller memory footprint,
- a new N-Triples output module,
- more sophisticated performance evaluations within use cases up to
100 million triples.

More information about LDIF, concrete usage examples and performance
details are available at http://www4.wiwiss.fu-berlin.de/bizer/ldif/

Over the next months, we plan to extend LDIF along the following
lines:

1. Add Web Data Access Modules (Linked Data Crawler, SPARQL Endpoint
Reader, Remote RDF File Loader) as well as a scheduling component
which provides for regularly updating the local input data cache.
2. Implement a Hadoop Version of the Runtime Environment in order to
be able to scale to really large amounts of input data. Processes and
data will be distributed over a cluster of machines.
3. Add a Data Quality Evaluation and Data Fusion Module which allows
Web data to be filtered according to different data quality assessment
policies and provides for fusing Web data according to different
conflict resolution methods.
4. Flexible integration workflow. Currently the integration flow is
static and can only be influenced by predefined configuration
parameters. We plan to make the workflow and its configuration more
flexible in order to make it easier to include additional modules that
cover other data integration aspects.

The development of LDIF is supported in part by Vulcan Inc. as part of
its Project Halo (http://www.projecthalo.com) and by the EU FP7
project LOD2 - Creating Knowledge out of Interlinked Data (http://
lod2.eu, Grant No. 257943).

Lots of thanks to:

Andreas Schultz (FUB)
Robert Isele (FUB)
Chris Bizer (FUB)
Christian Becker (MES)


Cheers,

Andrea


--
Andrea Matteini
MediaEvent Services GmbH & Co. KG
Stendaler Straße 4 · 10559 Berlin · Germany
http://mediaeventservices.com
Reply all
Reply to author
Forward
0 new messages