Once Upon A Time In China 4 Full Movie English Subtitles

1 view
Skip to first unread message

Henry Gallagher

unread,
Aug 4, 2024, 2:37:09 PM8/4/24
to compprovamout
Currentlyevery time I have to search the subtitles in the languages I need from the app on tvOS and then open from my computer and use some tools the merge the two subtitles in a single srt file. If I could render at least two subtitles at time (also embedded ones from mkv) it would be very appreciated.

Hi! has there been any update on this? This feels like an easy feature to implement once you have a player able to pull subtitles from a database and display them. It looks like somebody has already put this together for Chrome: Netflix Dual subtitle for learning languages - Chrome Web Store


Hello,

It would be interesting to be able to display a second subtitle in the same video simultaneously.

This can greatly help with language learning.

BS Player and SM Player do it on PC, KM Player (old version) too.

I do not know any video player managing double subtitles on IOS.

Will Infuse be the first?


I know; many foreign language learners use watching movies and TV to learn language. Most movies provide subtitles in different languages, but it is rare to provide two languages in one subtitle at the same time. For example, Chinese subtitles and English subtitles are displayed at the same time. Therefore, I look forward to the fact that infuse can provide the function of displaying two subtitles at the same time, so that you can choose to display two different language subtitles at the same time. This would be a great feature!

Thank you very much, hope to reply!


many foreign language learners use watching movies and TV to learn language. Most movies provide subtitles in different languages, but it is rare to provide two languages in one subtitle at the same time. For example, Chinese subtitles and English subtitles are displayed at the same time. Therefore, I look forward to the fact that infuse can provide the function of displaying two subtitles at the same time, so that you can choose to display two different language subtitles at the same time. This would be a great feature!

Thank you very much, hope to reply!


However, if you watch a Chinese movie with Chinese subtitles, that definitely helps. You can see exactly what they're saying, so you can test your hearing and work on your hanzi recognition at the same time.


You'll only distinguish a word once in awhile, but there are words that are used over and over again so you'll especially exposed to words that are more frequent, which is great for beginners. The subtitles may help, especially if the movie has both hanzi and pinyin hardcoded and english subs (non-hardcoded).


But if you are watching the movie translated into English, then it'll be less helpful. Try with the Mandarin subtitles option. I know that if I put on English subtitles on an English speaking movie, then what they are saying shows below in text form, and it is not translated or disrupted, the exact words are shown below.


Word frequency is the most important variable in language research. However, despite the growing interest in the Chinese language, there are only a few sources of word frequency measures available to researchers, and the quality is less than what researchers in other languages are used to.


Following recent work by New, Brysbaert, and colleagues in English, French and Dutch, we assembled a database of word and character frequencies based on a corpus of film and television subtitles (46.8 million characters, 33.5 million words). In line with what has been found in the other languages, the new word and character frequencies explain significantly more of the variance in Chinese word naming and lexical decision performance than measures based on written texts.


Our results confirm that word frequencies based on subtitles are a good estimate of daily language exposure and capture much of the variance in word processing efficiency. In addition, our database is the first to include information about the contextual diversity of the words and to provide good frequency estimates for multi-character words and the different syntactic roles in which the words are used. The word frequencies are freely available for research purposes.


Copyright: 2010 Cai, Brysbaert. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


Funding: This work has been supported by an Odysseus Grant from the Government of Flanders (the Dutch-speaking region of Belgium). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.


Research on the Chinese language is becoming an important theme in psycholinguistics. Not only is Chinese one of the most widely spoken languages in the world, it also differs in interesting ways from the alphabetic writing systems used in the Western world. For example, the logographic writing system makes it impossible to compute the word's phonology on the basis of non-lexical letter to sound conversions [1]. Another characteristic of the Chinese writing system is that there are no spaces between the words. This is likely to have consequences for eye movement control in reading [2]. Finally, a Chinese character represents a syllable, which most of the time is a morpheme (i.e., the smallest meaningful element), and many Chinese words in fact are disyllabic compound words [3].


Research on the Chinese language requires reliable information about word characteristics, so that the stimulus materials can be manipulated and controlled properly. By far the most important word feature is word frequency. In this text, we first describe the frequency measures that are available for Chinese. Then, we describe the contribution a new frequency measure based on film subtitles is making in other languages and we present a similar database for Mandarin Chinese.


A second source of word frequency information consists of frequency lists that have been compiled by linguists and official organizations ([7] for an earlier review). Most of these lists are not publicly available, but can be obtained from the researchers. In Table 1 we summarize the most interesting lists we have encountered in our search.


When reading Table 1, it is important to keep in mind that many corpora were meant to be representative for the language produced in Chinese speaking regions and not necessarily for the language daily heard and read by Chinese speaking people. In addition, some of these sources are copyright protected. One main problem with Chinese word frequencies is that Chinese words are not written separately, making the segmentation of the corpus into words labor-intensive if one wants to have information beyond single character frequencies (Chinese words can consist of one to four or even more characters). This situation is currently changing, due to the availability of automatic parsers and part-of-speech taggers, as we will see below.


All in all, despite the existence of several frequency lists in Chinese, there are only three sources that provide easy access for individual researchers and other people interested in the Chinese language. The first is CCL ( :8080/ccl_corpus), which gives access to the unsegmented and untagged corpus and provides information about character frequencies but not word frequencies. The second is LCSMCS ( -bin/yuliao/), which gives word frequencies based on the segmented part of the corpus (2 million words). Unfortunately, words have to be entered separately on the website. Part of the single-character word frequencies from LCSMCS are also available in the Chinese Single-character Word Database (CSWD; available at _norm/psychnorms.html). This database provides information about 2,390 single-character Chinese words including nouns, verbs, and adjectives [11]). Finally, there is the Lancaster Corpus of Mandarin Chinese ( ) which provides frequency information for 5,000 words in A frequency dictionary of mandarin Chinese: Core vocabulary for learners [3] and for a larger set of 50,000 words upon request from the authors (also released by Richard Xiao on ).


Keuleers et al. [13] reported essentially the same findings in Dutch. Their subtitle frequency measure, based on a corpus of 40 million words, explained nearly 10% more variance in lexical decision times (based on 14,000 monosyllabic and disyllabic words) than the existing golden standard, the Celex frequencies [17], [18].


Encouraged by the above findings, we decided to compile a word and character frequency list based on Chinese subtitles. A potential problem in this work is that, unlike in most writing systems, there are no spaces between the words in Chinese. Therefore, word segmentation (i.e. splitting the character sequence into words) is a critical step in collecting Chinese word frequencies. Fortunately, in the last decade automatic word segmentation programs have become available with a good output [for a review see 3]. These algorithms are trained on a tagged corpus (i.e., a corpus in which all the words have been identified and given their correct syntactic role) and are then applied to new materials [19]. Their performance is regularly compared in competitions such as the SIGHAN Bakeoff (www.sighan.org; SIGHAN: a Special Interest Group of the Association for Computational Linguistics). A program that consistently performed among the best is ICTCLAS ( ) [3]. It incorporates part-of-speech information (PoS, i.e. the syntactic roles of the words, such as noun, verb, adjective, etc.) and generates multiple hidden Markov models, from which the one with the highest probability is selected [19], [20]. This not only provides the correct segmentation for the vast majority of sentences, but also has the advantage that the most likely syntactic roles of the words are given, which makes it possible to additionally calculate PoS-dependent frequencies. The algorithm is expected to work well for film subtitles, because these subtitles are of a limited syntactic complexity (most of them are short, simple sentences) and because the program has the faculty to recognize out-of-vocabulary words such as foreign names, which often exist in subtitles but are rarely covered by regular vocabularies. The program was also used to parse the LCMC corpus.

3a8082e126
Reply all
Reply to author
Forward
0 new messages