Hi all,
the Web-based Systems Group and our industry partner mes |semantics
are
happy to announce the release of the LDIF – Linked Data Integration
Framework Version 0.4 Scale-Out.
LDIF can be used within Linked Data applications to translate
heterogeneous
data from the Web of Linked Data into a clean local target
representation
while keeping track of data provenance. LDIF translates data from the
Web
into a consistent target vocabulary. LDIF includes an identity
resolution
component which translates URI aliases into single target URI.
Up till now, LDIF stored data purely in-memory. This restricted the
amount
of data that could be processed.
LDIF Version 0.4 introduces two new implementations of the LDIF
runtime
environment which allow LDIF to scale to large data sets:
1. The new triple store backed implementation scales to larger data
sets on
a single machine.
2. The new Hadoop-based implementation provides for processing very
large
data sets on a Hadoop cluster, for instance on Amazon EC2.
We have tested LDIF for integrating RDF data sets ranging from 25
million to
3.6 billion triples.
A comparison of the performance of all three implementations is found
on the
LDIF benchmark page:
http://www.assembla.com/spaces/ldif/wiki/Benchmark
LDIF is provided under the terms of the Apache Software License. LDIF
can be
downloaded from the project webpage which also provides detailed
information
about the features and the configuration of the framework:
http://www4.wiwiss.fu-berlin.de/bizer/ldif/
The development of LDIF is supported in part by Vulcan Inc. as part of
its
Project Halo and by the EU FP7 project LOD2 (Grant No. 257943).
Cheers,
Andreas Schultz, Andrea Matteini, Robert Isele, Chris Bizer and
Christian Becker