New Release: PEACH: A Sentence-Aligned Parallel English-Arabic Corpus for Healthcare

38 views
Skip to first unread message

Rania UOS

unread,
Aug 1, 2025, 8:24:11 AMAug 1
to sig...@googlegroups.com
Dear Colleagues, 

I'm happy to announce the release of PEACH, a manually aligned, gold-standard parallel corpus. It contains 51,671 sentence pairs, totaling approximately 590,517 English and 567,707 Arabic word tokens. Sentence lengths range from an average of 9.52 to 11.83 words.

PEACH supports research in contrastive linguistics, translation studies, and natural language processing. It can be used to derive bilingual lexicons, adapt large language models for domain-specific machine translation, evaluate user perceptions of machine translation in healthcare, and assess the readability and lay-friendliness of patient information leaflets and educational materials. It also serves as a valuable educational resource in translation studies.

PEACH is publicly accessible at Mendeley: https://data.mendeley.com/datasets/5k6yrrhng7/3 

The paper (attached) 

For citation: 
Al-Sabbagh, R. (2024). PEACH: A Sentence-Aligned Parallel English-Arabic Corpus for Healthcare. Corpora, 19(3), 395-410. https://doi.org/10.3366/cor.2024.0320

Rania Al-Sabbagh 
Assistant Professor of Linguistics
University of Sharjah, UAE



image.png
cor.2024.03203.pdf
Reply all
Reply to author
Forward
0 new messages