"HTML boilerplate code is used on webpages as presentation
directives for a browser to display data to a human end user. For ma-
chines, our community has made tremenduous e orts to provide querying
endpoints using agreed-upon schemas, protocols, and principles since the
avent of the Semantic Web. These data lifting e orts have been some of
the primary materials for bootstrapping the Web of data. Data lifting
usually involves an original data structure from which the semantic ar-
chitect has to produce a mapper to RDF vocabularies. Less e orts are
made in order to lift data produced by a Web mining process, due to
the di culty to provide an e cient and scalable solution. Nonetheless,
the Web of documents is mainly composed of natural language twisted
in HTML boilerplate code, and few data schemas can be mapped into
RDF. In this paper, we present CommentsLifter, a system that is able
to lift SIOC data from user-generated comments in Web 2.0."
KECSM-2012: Knowledge Extraction & Consolidation from Social Media 2012. Proceedings of the 1st International Workshop on Knowledge Extraction & Consolidation from Social Media in conjunction with the 11th International Semantic Web Conference (ISWC 2012). Boston, USA, November 12, 2012. pages 48-62 in