Fake news is engineered to deceive people and weaken public trust (Pierri & Ceri, 2019; Bozarth & Budak, 2020). To overcome this problem, an automated tool should be developed to check the authenticity or fakeness of news. Natural language processing (NLP) is an approach for handling text data that enables computers to interpret and interact with natural language (Wilks & Brewster, 2009). NLP facilitates the development of several applications, including text classification, question answering, machine translation, and much more (Church & Rau, 1995). Text classification has the most commonly utilized area in NLP problems being able to determine the semantic meaning of a sentence, word, or document (Manek et al., 2017).
Fake news detection is not easy and this task has several challenges. For example, the collection of the benchmark dataset and annotating it manually is a challenging task. This problem becomes more complicated with low-resource languages like Urdu for which a few online resources are available. Although there exist several approaches for Urdu fake news detection, they lack in several aspects. Such approaches do not use multi-domain data. Acquiring data from more domains allows for a more thorough evaluation of the performance of predictors. The performance of the models is better to be tested using data from a higher number of domains. Amjad et al. (2020) conducted a study to detect fake news with five domains. In addition, for dataset collection, some studies use Google Translate for English-to-Urdu translation but no manual verification is performed which reduces the scope of such models for real-world applications (Amjad et al., 2020; Akhter et al., 2021). Also, the number of samples in the dataset used in Amjad et al. (2020) is comparatively low due to which the models might not be trained and tested well. In previous works, fake news from five domains is considered with 900 samples only (Amjad et al., 2020).
Akhter et al. (2021) utilized an ensemble approach to detect fake news in the Urdu language. This study has contributed a new dataset along with experimentation on the benchmark dataset created by Amjad, Sidorov & Zhila (2020). However, this study lacks the ability to produce a highly accurate model for the identification of fake news in the Urdu language. In addition, this study has also used an English-translated dataset in Urdu language using Google Translate without manual verification. This study uses the data from a total of five domains only and its scope is small.
Lina, Fua & Jianga (2020) proposed a deep learning model called CharCNN-RoBERT for fake news detection in the Urdu language. The study uses the dataset developed by Amjad et al. (2020), which contains 900 samples for experiments. The study used the combination of RoBERTa, charCNN, and pre-training along with word and character n-grams for fake news detection in Urdu. Results indicate that using a combination of RoBERTa, charCNN, pre-training, and label smoothing, an accuracy of 0.90 is possible for Urdu fake news detection. Similarly, the authors adopt word and character n-grams to train machine learning models for fake news detection in the Urdu language (Balouchzahi & Shashirekha, 2020). Additionally, word embedding vectors are utilized for training deep learning models for the same purpose. An accuracy of 0.79 and an average F1 score of 0.78 is obtained using a machine learning-based ensemble approach.
The crowd-sourced professionals are directed to generate random fake news, which leads to an unbiased dataset. An example of opted strategy is shown in Fig. 3. In the previous study (Amjad et al., 2020), the crowd-sourced professionals were asked to generate fake news by changing the minor content of real news, which leads to a biased dataset. This study, however, does not generate fake news and considers only those which are found in existing datasets or obtained from other sources listed in Table 3.
Lastly, an English dataset from Kaggle was selected for machine-translated news. Instant Scrapper was used to acquire real news data from different websites. The fake news data was collected from the English language, then converted into the Urdu language, and translated news was manually checked (Ahmed, Traore & Saad, 2017). If any news does not convey meanings properly, either the news was removed or the sequence or wording is manually corrected. Figure 4 shows a news sample which has been discarded.
Table 8 shows the experimental results using TF-IDF features. Results indicate that all models perform well except for KNN which shows a 78.45% accuracy. The stacked-based approach has shown the highest accuracy and MCC score which are 93.82% and 86.10% for Urdu fake and real news detection. RF, ET, and LR show MCC scores as 81.51%, 81.02%, and 77.18%, respectively.
This study aims to solve these issues for the low-resource Urdu language. We increased the size of the dataset as compared to previously available data. The collected dataset consists of nine domains such as health, business, sports, technology, showbiz, politics, science, crime, and travel, and contains a total of 4,097 news. The collected dataset was cleaned by removing special characters, white spaces, non-Urdu characters, and stop words. For stop word removal, a previous study (Amjad et al., 2020) suggested that the removal of words in the Urdu language decreases the performance of the model. We removed the stop words and performance decreased, so for further experiments stop words have not been removed. Afterward, two types of features have extracted; first, the preprocessed text is converted into a feature vector, second, the verbs are extracted from the preprocessed text and finally, the two features are combined. For feature computation, TF-IDF and BoW with word level and character n-grams were employed.
Private schools in Pakistan's troubled north-western Swat district have been ordered to close in a Taleban edict banning girls' education. Militants seeking to impose their austere interpretation of Sharia law have destroyed about 150 schools in the past year. Five more were blown up despite a government pledge to safeguard education, it was reported on Monday. Here a seventh grade schoolgirl from Swat chronicles how the ban has affected her and her classmates. The diary first appeared on BBC Urdu online. THURSDAY JANUARY 15: NIGHT FILLED WITH ARTILLERY FIRE The night was filled with the noise of artillery fire and I woke up three times. But since there was no school I got up later at 10 am. Afterwards, my friend came over and we discussed our homework. The Taleban have repeatedly targeted schools in Swat
Today is 15 January, the last day before the Taleban's edict comes into effect, and my friend was discussing homework as if nothing out of the ordinary had happened.
Today, I also read the diary written for the BBC (in Urdu) and published in the newspaper. My mother liked my pen name 'Gul Makai' and said to my father 'why not change her name to Gul Makai?' I also like the name because my real name means 'grief stricken'.
My father said that some days ago someone brought the printout of this diary saying how wonderful it was. My father said that he smiled but could not even say that it was written by his daughter. WEDNESDAY 14 JANUARY: I MAY NOT GO TO SCHOOL AGAIN I was in a bad mood while going to school because winter vacations are starting from tomorrow. The principal announced the vacations but did not mention the date the school was to reopen. This was the first time this has happened. In the past the reopening date was always announced clearly. The principal did not inform us about the reason behind not announcing the school reopening, but my guess was that the Taleban had announced a ban on girls' education from 15 January. This time round, the girls were not too excited about vacations because they knew if the Taleban implemented their edict they would not be able to come to school again. Some girls were optimistic that the schools would reopen in February but others said that their parents had decided to shift from Swat and go to other cities for the sake of their education. Since today was the last day of our school, we decided to play in the playground a bit longer. I am of the view that the school will one day reopen but while leaving I looked at the building as if I would not come here again. FRIDAY 9 JANUARY: THE MAULANA GOES ON LEAVE? Today at school I told my friends about my trip to Bunair. They said that they were sick and tired of hearing the Bunair story. We discussed the rumours about the death of Maulana Shah Dauran, who used to give speeches on FM radio. He was the one who announced the ban on girls attending school. Some girls said that he was dead but others disagreed. The rumours of his death are circulating because he did not deliver a speech the night before on FM radio. One girl said that he had gone on leave.Since there was no tuition on Friday, I played the whole afternoon. I switched on the TV in the evening and heard about the blasts in Lahore. I said to myself 'why do these blasts keep happening in Pakistan?' WEDNESDAY 7 JANUARY: NO FIRING OR FEAR I have come to Bunair to spend Muharram (a Muslim holiday) on vacation. I adore Bunair because of its mountains and lush green fields. My Swat is also very beautiful but there is no peace. But in Bunair there is peace and tranquillity. Neither is there any firing nor any fear. We all are very happy.Today we went to Pir Baba mausoleum and there were lots of people there. People are here to pray while we are here for an excursion. There are shops selling bangles, ear rings, lockets and other artificial jewellery. I thought of buying something but nothing impressed - my mother bought ear rings and bangles. MONDAY 5 JANUARY: DO NOT WEAR COLOURFUL DRESSES I was getting ready for school and about to wear my uniform when I remembered that our principal had told us not to wear uniforms - and come to school wearing normal clothes instead. So I decided to wear my favourite pink dress. Other girls in school were also wearing colourful dresses and the school presented a homely look. Swat has been a centre of militant activityMy friend came to me and said, 'for God's sake, answer me honestly, is our school going to be attacked by the Taleban?' During the morning assembly we were told not to wear colourful clothes as the Taleban would object to it. I came back from school and had tuition sessions after lunch. In the evening I switched on the TV and heard that curfew had been lifted from Shakardra after 15 days. I was happy to hear that because our English teacher lived in the area and she might be coming to school now. SUNDAY 4 JANUARY: I HAVE TO GO TO SCHOOL Today is a holiday and I woke up late, around 10 am. I heard my father talking about another three bodies lying at Green Chowk (crossing). I felt bad on hearing this news. Before the launch of the military operation we all used to go to Marghazar, Fiza Ghat and Kanju for picnics on Sundays. But now the situation is such that we have not been out on picnic for over a year and a half. We also used to go for a walk after dinner but now we are back home before sunset. Today I did some household chores, my homework and played with my brother. But my heart was beating fast - as I have to go to school tomorrow. SATURDAY 3 JANUARY: I AM AFRAID I had a terrible dream yesterday with military helicopters and the Taleban. I have had such dreams since the launch of the military operation in Swat. My mother made me breakfast and I went off to school. I was afraid going to school because the Taleban had issued an edict banning all girls from attending schools. Only 11 students attended the class out of 27. The number decreased because of Taleban's edict. My three friends have shifted to Peshawar, Lahore and Rawalpindi with their families after this edict. On my way from school to home I heard a man saying 'I will kill you'. I hastened my pace and after a while I looked back if the man was still coming behind me. But to my utter relief he was talking on his mobile and must have been threatening someone else over the phone.
Bookmark with: