I'm happy to announce the release of PEACH, a manually aligned, gold-standard parallel corpus. It contains 51,671 sentence pairs, totaling approximately 590,517 English and 567,707 Arabic word tokens. Sentence lengths range from an average of 9.52 to 11.83 words.
PEACH supports research in contrastive linguistics, translation studies, and natural language processing. It can be used to derive bilingual lexicons, adapt large language models for domain-specific machine translation, evaluate user perceptions of machine translation in healthcare, and assess the readability and lay-friendliness of patient information leaflets and educational materials. It also serves as a valuable educational resource in translation studies.
PEACH is publicly accessible at Mendeley: https://data.mendeley.com/datasets/5k6yrrhng7/3