Arabic Keyword Generator

138 views
Skip to first unread message

Tariq Yousef

unread,
Jul 27, 2023, 9:24:34 AM7/27/23
to sig...@googlegroups.com
Dear SIGARAB Community,

I am currently looking for a pre-trained model that can generate keywords for Arabic texts, with a particular interest in classical Arabic. If any of you are aware of any relevant resources or can offer any guidance, I would highly appreciate it.

Thanks in advance!

Best Regards
Tariq Yousef 
-----
Postdoctoral Researcher at Hamburg University
Email: tariq....@uni-hamburg.dein...@tariq-yousef.com
Website: www.tariq-yousef.com

Nor Alhoda

unread,
Jul 27, 2023, 11:04:27 AM7/27/23
to Tariq Yousef, sig...@googlegroups.com
وعليكم السلام ورحمه الله وبركاته

المودل دة ممكن يكون مفيد

--
You received this message because you are subscribed to the Google Groups "SIGARAB: Special Interest Group on Arabic Natural Language Processing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sigarab+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sigarab/CABRLrSpbkpaX6kqqWMmPowfACbgOqAWYMFNiS%3Do82H%2B9jj6XHA%40mail.gmail.com.

Nor Alhoda

unread,
Jul 27, 2023, 11:08:37 AM7/27/23
to Tariq Yousef, sig...@googlegroups.com
And you can use KeyBERT to extract keywords and keyphrases for some Arabic text. You can adjust n-grams and other interesting parameters. 

Here is the Github link:

Tariq Yousef

unread,
Jul 27, 2023, 11:26:04 AM7/27/23
to Nor Alhoda, sig...@googlegroups.com
Thanks a lot for sharing. 

Rana R Malhas

unread,
Jul 29, 2023, 5:36:14 AM7/29/23
to Tariq Yousef, sig...@googlegroups.com

SA Tariq,

 

You may benefit from CL-AraBERT (CLassical AraBERT). It is an AraBERT-based pre-trained model that is further pre-trained on about 1.05B-word Classical Arabic dataset (after being initially pre-trained on MSA datasets), to make it a better fit for NLP tasks on Classical Arabic text.

 

I think it is inherently suitable for your task because the model is pre-trained using two unsupervised tasks (as in pretraining BERT/AraBERT): the Masked Language Model task (MLM), and the Next Sentence Prediction (NSP) task.

 

You can download CL-AraBERT’s checkpoints from this GitHub link https://github.com/RanaMalhas/QRCD/blob/main/README.md#cl-arabert-pre-trained-language-model

 

Regards,

Rana Malhas

--

Reply all
Reply to author
Forward
0 new messages