Bangla Word Software

0 views

Skip to first unread message

Martta Borromeo

unread,

Aug 5, 2024, 12:41:23 AM8/5/24

to greenpoisuri

Bengaliis one of the most morphologically rich languages and it has lots of inflectional and derivational variant forms of a word. Because of that it is quite complicated to determine the stem of word.

This paper presents a magnetoencephalography (MEG) study on reading in Bangla, an east Indo-Aryan language predominantly written in an abugida script. The study aims to uncover how visual stimuli are processed and mapped onto abstract linguistic representations in the brain. Specifically, we investigate the neural responses that correspond to word length in Bangla, a language with a unique orthography that introduces multiple ways to measure word length. Our results show that MEG signals localised in the anterior left fusiform gyrus, at around 130ms, are highly correlated with word length when measured in terms of the number of minimal graphemic units in the word rather than independent graphemic units (akśar) or phonemes. Our findings suggest that minimal graphemic units could serve as a suitable metric for measuring word length in non-alphabetic orthographies such as Bangla.

Copyright: 2024 Moitra et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This work was supported by Economic and Social Research Council: [ES/V000012/1] to L. Stockall. ( ) and NYU Abu Dhabi Institute under Grant G1001 to A. Marantz. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Twenty-four right-handed, self-reported native speakers of Bangla with normal or corrected-to-normal vision participated in the study. They were recruited from the New York University and surrounding communities in Abu Dhabi. Language history in the form of a questionnaire was collected to screen for eligibility, and written informed consent was provided by all participants prior to the experiment. Compensation was provided upon completion. The NYU Abu Dhabi Institutional Review Board approved all experimental protocols. The recruitment took place between 2/03/2022 to 15/11/2022.

The experiment consisted of a lexical decision task in which participants were presented with strings of characters appearing in the middle of the screen. Participants were instructed to indicate via button press with the non-dominant (left) hand whether they recognised the string as a word of their language, and to answer as quickly and accurately as possible. The buttons were counterbalanced; half of the participants indicated yes by pressing the left button on the response box and the other half by pressing the right button. Between the blocks, participants could take a self-timed break to perform small movements to remain comfortable. The average total time for the experiment was 15 minutes.

Because our research question involves initial stages of processing word forms, prior to lexical access, our analysis includes both grammatical and ungrammatical words. Preliminary analyses on only grammatical words did not produce any reliable effects, likely due to weak power. Because we are focused on the initial stages of interpreting orthographic features, before morphological analysis occurs [5, 19], there are no expected differences between grammatical and pseudoword stimuli.

We ran a separate two-stage spatio-temporal cluster-based permutation to determine whether the MEG signal distinguished between grammatical words and pseudowords within the time and search coordinates of interest. This analysis was conducted with the same parameters as the analysis of interest, except with a single factorial regressor, coding each trial as a grammatical word or a pseudoword.

We also found a significant cluster of normalised perimetric complexity. Although this was included as a nuisance regressor, we report on it here since it overlaps with the WLc cluster. This cluster localised to left fusiform gyrus, from 151-180ms (p = 0.0189), and showed a positive correlation between normalised perimetric complexity and MEG signals. This is shown in Fig 4. No significant effects of any of the other variables was observed.

Results of two-stage regression analysis. Cluster corresponds to normalised perimetric complexity, or the square length of the perimeter of the word divided by the ink area internal to word, divided by the word length in minimal graphemic units, WLc. Corrected p-value and temporal extent given in the green box. Brain plot demonstrates the spatial coordinates of the significant clusters outlined in black. Red shading corresponds to the positive t-values of the second stage regression, in the peak of the cluster timecourse. Timecourse plots show the beta coefficients of the cluster, and gray shading indicates the temporal extent.

Because of the multicollinearity of our regressors, we also conducted a post hoc analysis on the spatial coordinates of the cluster. We averaged the raw activation from the spatial coordinates of both WLc clusters for each participant and trial. Then, for each millisecond, we fit a linear model consisting of the same form as the two stage regression in (1), fit over the averaged timecourses extracted for each participant and trial. For each word length variable, we then fit a separate reduced model, removing the word length variable, at each millisecond. We then conducted a likelihood ratio test at each time point comparing each model, to determine whether the more complex model incorporating the word length variable is a better fit than the model without the word length variable. This then produces a time-course of χ2(1) test-statistic for each word length variable, showing when each factor contributes significantly to explaining the data. These distributions are given in Fig 1C.

The two-stage regression model fit to the factorial variable Grammatical vs. Pseudoword did not identify any significant clusters with the search and clustering parameters reported here. This is compatible with the expectation that morphological and lexical analysis is likely not launched during the M100 and M130 time windows.

Our results contribute to our understanding of the fusiform gyri and their relationship to the broader reading network by demonstrating that fusiform gyri facilitate sub-lexical analysis of the word-form [5, 9]. Additionally, our results show that in an abugida like Bangla, these processes are sensitive to the minimal graphemic units, which roughly correspond to phonemes, rather than larger composed graphemes (akśar), which roughly correspond to syllables.

We want to thank Prof. Alec Marantz and Dr. Samantha Wray for their constructive feedback. We would also like to thank Dr. Ishani Guha and Dr. Bidisha Bhattacharjee for their help in preparing the stimuli. We also want to thank Dave Cayado and the New York University Abu Dhabi Neuroscience of Language Lab members for their assistance during the data collection.

We provide a Mikolov-style word-analogy evaluation set specifically for Bangla, with a sample size of 16678, as well as a translated and curated version of the Mikolov dataset, which contains 10594 samples for cross-lingual research.

You can use our word unscrambler to easily decrypt words, such as bangla. Simply enter your letters (in this case BANGLA)into the letter box (YOUR TILES) and press the nice red SEARCH button. This will generate a list of the words you canmake from letters in bangla. The list of unscrambled words displays all results sorted by length and this should be easyto view on both desktops and mobile devices. And be sure to bookmark us so you can find us again quickly!

If you're trying to solver a word puzzle with a wildcard character, never fear, for example if you want to search for bangla + a wildcard.Simply enter this wildcard in this unscrambler as either a ? or by pressing the spacebar. It will unscramble words which can use thatwildcard tile by cycling through all the possible letters in the alphabet.

Recently, word sense disambiguation has gained increased attention by NLP practitioner due to its various potential applications in language technology. This paper proposes a Nave Bayes classifier for resolving lexical ambiguities of Bangla words with the help of a Bangla sense annotated corpus. At the initial stage, a Bangla sense annotated corpus is generated from a raw text corpus for serving as a training dataset. For a given input Bangla sentence, ambiguous words detection is done first and then Bayes probability theorem is applied to calculate the posterior probability that an ambiguous word belongs to a particular sense class. The values of posterior probability of several senses of the detected ambiguous word finally train the Nave Bayes classifier to classify a closest sense of the ambiguous word. Experimental outcome reveals that the proposed method outdoes existing techniques by achieving the highest F1-score of \(90\%\) on the test data.