Dear all
We are presently in the process of developing an Annif-based automated indexing for documentary resources on LGBTQ+ . We translated a globally known domain-specific vocabulary called Homosaurus (
homosaurus.org), and uploaded the translated version of the Homosaurus (with Bengali and Hindi - two major languages in India) in TTL format (after Skosfication) inside Annif through load-vocab. The process went on smoothly, and Annif created two files under Vocabs folder - subjects.csv and subjects.ttl.
The subjects.csv looks like this:
Then we trained different backends (Lexical, Associative, Ensemble) against 0.45 million labeled data (resources indexed by using Homosaurus vocab), obtained hyperparameter optimization (against a validation dataset of 4500 records not used in training) from Simple Ensemble and applied that weightage formula in the Neural Network model. After comparing F1@5 and NDCG scores for a test dataset of 5000 records (around 1% records we have not used for training or validation) we found the NN model has a better score profile as expected. It can predict possible indexing terms from Homosaurus with accuracy score like:
echo "Stigma and lesbian, gay, bisexual, transgender, and queer (and additional identities) (LGBTQ+) parent socialization self-efficacy: Mediating roles of identity and community. || In the United States, cultural forces have led to the stigmatization of lesbian, gay, bisexual, transgender, and queer (and additional identities) (LGBTQ+) parenthood. However, pushing back against this stigmatization, developing a positive LGBTQ+ identity, and investing in one's LGBTQ+ community may inform empowering narratives of future parenthood and related constructs, such as LGBTQ+ parent socialization. Perceived self-efficacy related to preparation for bias (i.e., discussions of discrimination, prejudice, or bias-based bullying) socialization is likely associated with an individual's own perceptions or experiences of stigmatization given the conceptual overlap of bias and stigma. However, other constructs related to stigmatization and socialization self-efficacy, such as positive LGBTQ+ identity or community connectedness, have yet to be simultaneously considered (to our knowledge). Further, previous research has rarely included different assessments of stigma (i.e., perceived and enacted) and/or dimensions of positive LGBTQ+ identity (i.e., authenticity and self-awareness). Thus, this study aimed to rectify these gaps and provide a greater understanding of sexual stigma and LGBTQ+ parent socialization self-efficacy. Using data from a survey-based, online, cross-sectional study of LGBTQ+ childfree adults (N = 433; Mage = 29.85 years old) in the United States, we found that experiences of enacted or perceived sexual stigma were differentially associated with LGBTQ+ parent socialization preparation for bias self-efficacy. Further, positive LGBTQ+ identity authenticity and self-awareness, as well as LGBTQ+ community connectedness played distinct roles as mediators of the relationships between sexual stigma and LGBTQ+ parent socialization self-efficacy. These findings have implications for how we might understand the role of stigma, identity, community, and socialization among future LGBTQ+ parents. (PsycInfo Database Record (c) 2024 APA, all rights reserved)." | annif suggest homoIT-nn
2024-02-26T08:05:24.850Z INFO [omikuji::model] Loading model from data/projects/homoIT-omikujiB/omikuji-model...
2024-02-26T08:05:24.850Z INFO [omikuji::model] Loading model settings from data/projects/homoIT-omikujiB/omikuji-model/settings.json...
2024-02-26T08:05:24.850Z INFO [omikuji::model] Loaded model settings Settings { n_features: 433577, classifier_loss_type: Hinge }...
2024-02-26T08:05:24.855Z INFO [omikuji::model] Loading tree from data/projects/homoIT-omikujiB/omikuji-model/tree0.cbor...
2024-02-26T08:05:25.176Z INFO [omikuji::model] Loading tree from data/projects/homoIT-omikujiB/omikuji-model/tree1.cbor...
2024-02-26T08:05:25.498Z INFO [omikuji::model] Loading tree from data/projects/homoIT-omikujiB/omikuji-model/tree2.cbor...
2024-02-26T08:05:25.828Z INFO [omikuji::model] Loaded model with 3 trees; it took 0.98s
2024-02-26T08:05:26.544Z INFO [omikuji::model] Loading model from data/projects/homoIT-omikujiP/omikuji-model...
2024-02-26T08:05:26.544Z INFO [omikuji::model] Loading model settings from data/projects/homoIT-omikujiP/omikuji-model/settings.json...
2024-02-26T08:05:26.544Z INFO [omikuji::model] Loaded model settings Settings { n_features: 433577, classifier_loss_type: Hinge }...
2024-02-26T08:05:26.549Z INFO [omikuji::model] Loading tree from data/projects/homoIT-omikujiP/omikuji-model/tree0.cbor...
2024-02-26T08:05:26.961Z INFO [omikuji::model] Loading tree from data/projects/homoIT-omikujiP/omikuji-model/tree1.cbor...
2024-02-26T08:05:27.389Z INFO [omikuji::model] Loading tree from data/projects/homoIT-omikujiP/omikuji-model/tree2.cbor...
2024-02-26T08:05:27.809Z INFO [omikuji::model] Loaded model with 3 trees; it took 1.27s
So far so good. Now comes the query part. Our configuration for the NN backend of the project is:
[homoIT-nn]
name=Homosaurus NN Ensemble project
language=en
backend=nn_ensemble
sources=homoIT-mllm:0.0966,homoIT-stwfsa:0.1608,homoIT-fastText:0.3379,homoIT-omikujiB:0.3339,homoIT-omikujiP:0.0709
limit=100
vocab=homoIT
nodes=100
dropout_rate=0.2
epochs=10
lmdb_map_size=
2147483648
Can we configure the result display with multilingual labels here in this way (both command prompt and WSGI) ?
Thanks and regards