Arabic Keyword Generator

Tariq Yousef

unread,

Jul 27, 2023, 9:24:34 AM7/27/23

to sig...@googlegroups.com

Dear SIGARAB Community,

I am currently looking for a pre-trained model that can generate keywords for Arabic texts, with a particular interest in classical Arabic. If any of you are aware of any relevant resources or can offer any guidance, I would highly appreciate it.

Thanks in advance!

Best Regards

Tariq Yousef

-----

Postdoctoral Researcher at Hamburg University
Email: tariq....@uni-hamburg.de, in...@tariq-yousef.com

Website: www.tariq-yousef.com

Nor Alhoda

unread,

Jul 27, 2023, 11:04:27 AM7/27/23

to Tariq Yousef, sig...@googlegroups.com

وعليكم السلام ورحمه الله وبركاته

المودل دة ممكن يكون مفيد

https://huggingface.co/ArefSadeghian/arabert-finetuned-caner

--
You received this message because you are subscribed to the Google Groups "SIGARAB: Special Interest Group on Arabic Natural Language Processing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sigarab+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sigarab/CABRLrSpbkpaX6kqqWMmPowfACbgOqAWYMFNiS%3Do82H%2B9jj6XHA%40mail.gmail.com.

Nor Alhoda

unread,

Jul 27, 2023, 11:08:37 AM7/27/23

to Tariq Yousef, sig...@googlegroups.com

And you can use KeyBERT to extract keywords and keyphrases for some Arabic text. You can adjust n-grams and other interesting parameters.

Here is the Github link:

https://github.com/MaartenGr/KeyBERT

Tariq Yousef

unread,

Jul 27, 2023, 11:26:04 AM7/27/23

to Nor Alhoda, sig...@googlegroups.com

Thanks a lot for sharing.

Rana R Malhas

unread,

Jul 29, 2023, 5:36:14 AM7/29/23

to Tariq Yousef, sig...@googlegroups.com

SA Tariq,

You may benefit from CL-AraBERT (CLassical AraBERT). It is an AraBERT-based pre-trained model that is further pre-trained on about 1.05B-word Classical Arabic dataset (after being initially pre-trained on MSA datasets), to make it a better fit for NLP tasks on Classical Arabic text.

I think it is inherently suitable for your task because the model is pre-trained using two unsupervised tasks (as in pretraining BERT/AraBERT): the Masked Language Model task (MLM), and the Next Sentence Prediction (NSP) task.

You can download CL-AraBERT’s checkpoints from this GitHub link https://github.com/RanaMalhas/QRCD/blob/main/README.md#cl-arabert-pre-trained-language-model

Regards,

Rana Malhas

--

Reply all

Reply to author

Forward