Release of the dsm-parameter-analysis GitHub repository

5 views
Skip to first unread message

Andras Dobo

unread,
Sep 17, 2019, 6:01:42 PM9/17/19
to acl-...@googlegroups.com
[Apologies for multiple postings]

**** Release of the dsm-parameter-analysis GitHub repository ****


Dear Colleagues,


We are pleased to announce the release of the GitHub repository
connected to the PhD dissertation of András Dobó:
Dobó, A.: A comprehensive analysis of the parameters in the creation and
comparison of feature vectors in distributional semantic models for
multiple languages. University of Szeged (2019)
http://doktori.bibl.u-szeged.hu/10120/1/AndrasDoboThesis2019.pdf

The GitHub repository, including the source code, as well as the used
libraries, resources and test datasets, is available at:
https://github.com/doboandras/dsm-parameter-analysis


The project implements a distributional semantic model (DMS), with 10
freely adjustable parameters. For some of the parameters more than a
thousand possible settings are implemented, resulting in trillions of
possible configurations. This freely configurable DSM can have any
corpus or word vectors as input, and can be tested on multiple standard
test datasets. It currently works for the following languages: English,
Spanish and Hungarian.


Abstract of the dissertation:
Measuring the semantic similarity and relatedness of words is important
for many natural language processing tasks. Although distributional
semantic models designed for this task have many different parameters,
such as vector similarity measures, weighting schemes and dimensionality
reduction techniques, there is no truly comprehensive study
simultaneously evaluating these parameters while also analysing the
differences in the findings for multiple languages.

We would like to address this gap with our systematic study by searching
for the best configuration in the creation and comparison of feature
vectors in distributional semantic models for English, Spanish and
Hungarian separately, and then comparing our findings across these
languages.

During our extensive analysis we test a large number of possible
settings for all parameters, with more than a thousand novel variants in
case of some of them. As a result of this we were able to find such
configurations that significantly outperform conventional configurations
and achieve state-of-the-art results.


For more information please see the below publications:

Dobó, A.: A comprehensive analysis of the parameters in the creation and
comparison of feature vectors in distributional semantic models for
multiple languages. University of Szeged (2019)
http://doktori.bibl.u-szeged.hu/10120/1/AndrasDoboThesis2019.pdf

Dobó A., Csirik J.: Comparison of the Best Parameter Settings in the
Creation and Comparison of Feature Vectors in Distributional Semantic
Models Across Multiple Languages. In: MacIntyre J., Maglogiannis I.,
Iliadis L., Pimenidis E. (eds) Artificial Intelligence Applications and
Innovations. AIAI 2019. IFIP Advances in Information and Communication
Technology, vol 559. 487-499. Springer, Cham. (2019)
http://www.inf.u-szeged.hu/~dobo/Publications/Comparison%20of%20the%20best%20parameter%20settings%20of%20DSMs%20across%20languages.pdf

Dobó A., Csirik J.: A Comprehensive Study of the Parameters in the
Creation and Comparison of Feature Vectors in Distributional Semantic
Models. Journal of Quantitative Linguistics (2019)
https://doi.org/10.1080/09296174.2019.1570897


Best regards,
Andras Dobo
Institute of Informatics
University of Szeged
http://www.inf.u-szeged.hu/~dobo/
Reply all
Reply to author
Forward
0 new messages