Reviewed by: Prodhan Mahbub Ibna Seraj, American International University-Bangladesh, Bangladesh; Marcel Pikhart, University of Hradec Krlov, Czechia; Afsheen Rezai, Ayatolah Borujerdei University, Iran
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
The importance of learning vocabulary has become widely recognized (Schmitt, 2000; Nation, 2001; Barcroft, 2004; Smidt and Hegelheimer, 2004). However, it is difficult for foreign language learners to obtain the meaning of vocabulary items and relative meanings acquired in a given context of use. It is also difficult for these learners to compensate for this difficulty, as they generally do not have the opportunity to get recurring interaction in the target language that can facilitate retention (Zandieh and Jafarigohar, 2012). Therefore, it is necessary for teachers to offer students an alternative that can improve efficiency and deepen vocabulary learning as well as learning in context.
Under the context of eLearning 4.0, new media and new technology expand the possibilities for vocabulary learning. The emergence of multimodal corpora and application of multimodality theory in vocabulary teaching have been gradually recognized and popularized. However, previous studies on the effects of multimodal teaching have drawn inconsistent conclusions. Although some studies have proved the superiority of the multimodal approach over single-mode (Chun and Plass, 1996a,b; Al-Seghayer, 2001; Ramezanali and Faez, 2019), some other studies (Boers et al., 2017) have reported no benefit of additional modes and no significant difference in L2 vocabulary learning.
Dual coding theory, concerned at a fundamental level with the nature of symbolic systems, assumes that memory and cognition are served by two separate systems, one specialized for dealing with verbal information and the other for non-verbal information (Paivio, 1990). From the perspective of dual coding theory, multimodal input plays an important role in English as a second language (ESL) vocabulary teaching and learning.
Supported by modern technology, multimodal input has become available for vocabulary instruction and has gained great attention in eLearning 4.0 (Bujang et al., 2020; Abdelghani et al., 2021). As multimodal materials convey different types of information through both of the two channels, many empirical studies based on multimodal input have employed not only language and still pictures but also audio clips and animations to explore their effects on vocabulary teaching and learning (Bisson et al., 2015; Khezrlou et al., 2017; Alzahrani and Roberts, 2021; Muoz et al., 2021; Perez, 2022). Lin and Yu (2017) compared the effects on vocabulary learning of eighth grade students under four input conditions: text only, text plus picture, text plus sound, and text and sound plus picture. Their findings showed that the input of text and sound achieved the best scores in the immediate post-test, and that the input of text and sound plus picture achieved the best scores in the delayed post-test. Ramezanali and Faez (2019) compared the effects of monomodal annotation and multimodal annotation on word acquisition. This study designed three experimental groups and one control group, and its outcome testified that multimodal input outperformed monomodal input, and that multimodality of translation and video produced better results than multimodality of translation and audio.
Previous studies on multimodal input in vocabulary learning have mainly focused on college students (Wang and Chen (2018); Diao and Hu (2021), adult learners (Boers et al., 2017), and primary school students (Tragant et al., 2016). Only few studies have worked with junior high school students (Lin and Yu, 2017). Furthermore, majority of studies on multimodal input in vocabulary instruction available so far are focused on incidental vocabulary acquisition, in which the recalling of vocabulary meaning is a by-product of reading or listening, with only two exceptions (Lin and Yu, 2017; Diao and Hu, 2021). It is, therefore, important to verify the effectiveness of multimodal input in explicit vocabulary instruction in class. The materials previous authors used were mostly textbooks, documentaries, or videos; only few studies have used multimodal online corpora (e.g., iWeb) This attests to a certain distance between these studies and the reality of current-day language learning, since online corpora are one of the main information sources for generation Z when it comes to eLearning 4.0. The goal of this study is to address this gap. The study aims to examine the effects of multimodal input and monomodal input on vocabulary learning based on evidence from post-tests, questionnaires, and interviews.
Target words were sorted out using questionnaire I. A total of 48 words were initially chosen from TEENS Junior (teens.i21st.cn), which was specifically designed for junior high school students as a complementary material to the textbook in use and was chosen by the English teacher of the two classes. After questionnaire I was distributed, eight words were picked out on the basis that all the participants declared that they had never encountered these words.
Microsoft PowerPoint was used to design the courseware consisting of one slide per target word. To illustrate the interface of the slide, Figure 1 presents a labeled screenshot of the target word peak shown in the slide that includes all multimodal verbal and visual representations. The verbal information of the target words is the shared information shown to both the CG and the EG (i.e., L1 translation, L2 definition, written form, and audio pronunciation). The picture and video are visual information designed only for the EG.
The pictures and videos that explain the meaning of the target words were selected by searching iWeb. Then, the multimodal materials were organized and presented in slides. During classroom teaching, the written form of the target words, audios, L2 translations, L1 definitions, pictures, and videos were presented contiguously to the EG; and the written form of the target words, audios, L2 translations, and L1 definitions were presented contiguously to the CG.
This study took about 2 weeks to conduct and consisted of six phases. One week before the experiment, questionnaire I was handed out with the purpose of sorting out the target words. Following questionnaire I was the teaching phase, which was conducted 1 week later. Both groups learned word meanings through PowerPoint slides for a total of 8 min, with 1 min for each word and with the CG receiving monomodal input information (e.g., written form, L1 translation, L2 definition, and audio pronunciation) and the EG receiving multimodal input information (e.g., written form, L1 translation, L2 definition, audio pronunciation, and picture, video). After the teaching intervention, both groups were required to finish the immediate post-test within 2 min using a pen and paper. In addition, questionnaire II was distributed and the interview was conducted on the EG. One week later, a delayed post-test was conducted for both groups. Figure 2 shows the research procedures used.
In tackling RQ1, both immediate and delayed post-test scores were considered. In scoring the post-test and delayed post-test results for data analysis purposes, for each word, a learner received either 1 point or 0 points. A point was given for responses that were identical to its Chinese translation given on the slides. Responses with wrong Chinese translation and no response were counted as incorrect and received 0 points. Thus, the highest score is 8 and the lowest is 0. Both the immediate and delayed post-tests were marked by the same researcher, and the criterion of marking was the same. The scores were recorded by the researcher for data analysis.
An alpha level of p < 0.05 was set for all the tests. Effect size estimates were obtained by calculating r for the non-parametric tests and the independent samples t-test. Following Plonsky and Oswald (2014), r-values of 0.25, 0.40, and 0.60 were considered as small, medium, and large, respectively.
As shown in Table 2, the mean score of the CG is 90.85 and the mean score of the EG is 91.32. Table 2 shows that the significance of the mean score is 0.61, which is much greater than 0.05 and indicates no significant difference between the scores of the two groups. It is safe to say that the students in the two classes have the same level of English proficiency. This would presume that all the learners have an equal ability to learn L2 vocabulary and that, when given adequate instruction, they will all perform at the same level.
As shown in Table 3, the significance of the mean score is 0.928, which is much greater than.05 and indicates that there is no significant difference between the scores of the two classes. This shows that students in the EG and the CG have the same level of English proficiency.
Table 4 shows that learners in the CG obtained an average score of 7.25 out of 8 in the immediate post-test, which corresponds to a mean vocabulary learning rate of 90.6%, whereas the learners in the EG obtained an average score of 7.14 out of 8 in the immediate post-test, which corresponds to a mean vocabulary learning rate of 89.3%. As shown in Table 4, there is no significant difference in the mean score between the CG and the EG, with z = 0.033, p = 0.974 > 0.05, even though the mean score of the CG is slightly higher than that of the EG in the immediate post-test. The effect size for this difference was in the small range (r = 0.004).
As shown in Table 5, learners in the CG obtained an average score of 4.09 out of 8 in the delayed post-test, which corresponds to a mean vocabulary learning rate of 51.1%, whereas learners in the EG obtained an average score of 3.57 out of 8 in the delayed post-test, which corresponds to a mean vocabulary learning rate of 44.6%. Table 5 shows that there is no significant difference in the mean score on delayed post-tests between the EG and the CG, with t = 0.954, p = 0.344 > 0.05. Still, the mean score of the CG is slightly higher than that of the EG in the delayed post-test, and the effect size r = 0.123 falls into the small range.
b37509886e