Article: Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis

3 views
Skip to first unread message

jai...@ptolemy.arc.nasa.gov

unread,
Aug 31, 2005, 9:52:39 PM8/31/05
to
JAIR is pleased to announce the publication of the following article:

Cimiano, P., Hotho, A. and Staab, S. (2005)
"Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis",
Volume 24, pages 305-339.

For quick access via your WWW browser, use this URL:
http://www.jair.org/abstracts/cimiano05a.html

Abstract:
We present a novel approach to the automatic acquisition of taxonomies
or concept hierarchies from a text corpus. The approach is based on
Formal Concept Analysis (FCA), a method mainly used for the analysis
of data, i.e. for investigating and processing explicitly given
information. We follow Harris' distributional hypothesis and model
the context of a certain term as a vector representing syntactic
dependencies which are automatically acquired from the text corpus
with a linguistic parser. On the basis of this context information,
FCA produces a lattice that we convert into a special kind of partial
order constituting a concept hierarchy. The approach is evaluated by
comparing the resulting concept hierarchies with hand-crafted
taxonomies for two domains: tourism and finance. We also directly
compare our approach with hierarchical agglomerative clustering as
well as with Bi-Section-KMeans as an instance of a divisive clustering
algorithm. Furthermore, we investigate the impact of using different
measures weighting the contribution of each attribute as well as of
applying a particular smoothing technique to cope with data
sparseness.

The article is available via:

-- comp.ai.jair.papers (also see comp.ai.jair.announce)

-- World Wide Web: The URL for our World Wide Web server is
http://www.jair.org/
For direct access to this article and related files try:
http://www.jair.org/abstracts/cimiano05a.html

-- Anonymous FTP from Carnegie-Mellon University (USA):
ftp://ftp.cs.cmu.edu/project/jair/volume24/cimiano05a.ps
The compressed PostScript file is named cimiano05a.ps.Z

For more information about JAIR, visit our WWW or FTP sites, or
contact jai...@isi.edu

--
Steven Minton
JAIR Managing Editor

Reply all
Reply to author
Forward
0 new messages