I have read that I can test each character on it's unicode number to find out if it is in the range of CJK characters. This is helpful, however I would like to separate them if possible to process the text against different dictionaries. Is there a way to test if a character is Japanese OR Chinese?
You won't be able to test a single character to tell with certainty that it is Japanese or Chinese because of the way the unihan code points are implemented in the Unicode standard. Basically, every Chinese character is a potential Japanese character. However, the reverse is not true. Also, there are a number of conventions that could be used to test to see if a block of text is in one language or the other.
The problem arises with the sheer number of characters and words that are in common. However, if I needed a quick and dirty solution to this problem, I would check my entire blocks of text for kana - if the text contains kana then I know it is Japanese. If you need to distinguish Korean as well, I would test for Hangul. Also, if you need to distinguish what type of Chinese, testing for types of simplifications would be the best approach.
The process of developing Unicode included the Han Unification. This is because a lot of the Japanese characters are derived from, or the same as, Chinese characters; similarly with Korean. There are some characters (katakana and hiragana - see chapter 12 of the Unicode standard v5.1.0) commonly used in Japanese that would indicate that the text was Japanese rather than Chinese, but I believe it would be a statistical test rather than definitive.
Check out the O'Reilly book on CJKV Information Processing (CJKV is short for Chinese, Japanese, Korean, Vietnamese; I have the CJK predecessor lurking somewhere). There's also the O'Reilly book on Unicode Explained which may be some help, though probably not for this question (I don't recall a discussion of how to identify Japanese and Chinese text).
You probably can't do that reliably. Japanese uses a lot of the same characters as Chinese. I think the best you could do is to look at a block of text. If you see any uniquely Japanese characters, then you can assume the whole block is Japanese. If not, then it's probably Chinese.
testing for characters in the katakana or hiragana ranges should be a very reliable means of determining whether or not the text is Japanese, especially if you are dealing with 'regular' user-generated text. if you are looking at legal documents or other more official fare it might be slightly more difficult, as there will be a much greater preponderance of complex chinese characters - but it should still be pretty reliable.
Japan and China both simplified many characters but often in different ways. You can check for Japanese Shinjitai and Simplified Chinese characters. There are many more of the latter than the former. If there are none of either then you probably have Traditional Chinese.
Of course if you're dealing with Unicode text you may find occasional rare characters or mixed languages which could throw off a heuristic so you're better off going with counting the types of characters to make a judgement.
A good way to find out which characters are common in one language and not in the others is to compare the legacy encodings against each other. You can find mappings of each to Unicode easily on the internet.
The Mandarin Chinese learning materials on Digital Dialects are free to use, do not require any form of registration or the provision of contact or personal details, and were developed for language students of various ages and learning styles. Language learning materials that teachers can use in Mandarin lesson plans or set as a homework assignment. School students and young kids should enjoy some of the colorful and intuitive word games, such as the animal vocabulary game, or the fruit and vegetable quiz. Games include recorded spoken Chinese from a native speaker with a standard accent. For quiet classroom environments or libraries, students are given 'text only' options, which are the games without sound. Games use both pinyin, a romanized Chinese alphabet, and Chinese simplified characters. Travelers and people staying in China for a short time may want to simply brush up on a few simple sentences and phrases, and learn the Chinese numbers using pinyin. Many people in Chinese speaking countries speak little or no English, and a few phrases and words can be essential for everyday basic communication with locals. The educational materials on this website with simplified Chinese characters offer the opportunity to begin to read and write. Characters used are common ones you might see on Chinese menus, metro signs and timetables. Characters for food items can be particularly useful to know in China for reading menus. Choose a Chinese topic to study or brush up on, learn the new words or sentences, and then test your proficiency levels with the fun online quiz. This website also hosts several Chinese flashcards for learning the Mandarin Chinese words and phrases featured in the practice quizzes prior to playing. Our Chinese visual dictionaries were developed for introducing Mandarin Chinese vocabulary topics to younger learners.
Mandarin Chinese is the most widely spoken Chinese language and is the first language of two-thirds of Chinese people. Around 920 million people speak Mandarin Chinese as their first language, and around 200 million more people speak Chinese as a second or other language. There are four subgroups of Mandarin Chinese, and the official variant, Modern Standard Chinese (also known as 'national language') is the form spoken in the Beijing region. Mandarin dialects are not all mutually intelligible. Standard Chinese is the official language of China and Taiwan, and one of Singapore's four official languages.
Chinese phrases - common and useful phrases for beginners and travelers. Uses Pinyin (romanized) characters. Exercise and word list includes spoken Chinese audio provided by a native speaker (from China) with a typical Mandarin Chinese accent. Learn to say hello in Chinese along with other important salutations and greetings. If you'd like to brush up on everyday greetings for a weekend trip to Hong Kong or Taipei, or to surprise a Chinese or Singaporean friend, visit our how to say hello in Chinese webpage.
Numbers in Chinese - three quizzes to learn the numbers starting from numbers 1-12 for absolute beginners. Here you have two options - a pinyin 1-12 numbers game and a 1-12 characters game for beginning to learn Chinese script. You will need the numbers from 1-12 to continue on with the numbers 13-20 game, which teaches you to count up to 20. Here you can choose to play either the Chinese characters numbers 13-20 game or the simple pinyin 13-20 game. For learning the multiples of 10 up to 100 play the numbers 0-100 game, again with both pinyin and Chinese script options.
Colours in Chinese - the most common colour adjectives in the Chinese language. The colours Chinese script game could serve as a handy learning tool for beginning to read and write common Chinese characters. Use our Chinese flashcards: colours for learning the Chinese script and words,
Fruit and vegetable game - 15 words for Chinese fruit and vegetables in a simple quiz with audio included. Vocabulary practice game includes Pinyin, Chinese character and audio options. For elementary level Mandarin Chinese students and kids.
Animals in Chinese - learn the vocabulary for animals and then choose either the game with 17 questions if you are not yet familiar with the vocabulary or the complete 33 question quiz. Teach this vocabulary to kids with our Chinese picture dictionary: animals. Fun kids' Chinese practice exercises for learning and memorizing essential vocabulary.
Chinese vocabulary game - game with vocabulary lists and audio containing 17 Mandarin Chinese words. Choose either the audio vocabulary quiz with spoken Mandarin audio, the pinyin vocabulary version, or for learning characters play the Chinese script vocabulary game. Our Chinese picture dictionary: vocabulary can be used for learning or teaching this essential vocabulary.
Chinese vocabulary quiz 2 - our second first words in Mandarin Chinese practice game. Use our Chinese vocabulary flashcards 2 for learning these words prior to testing with the quiz. Ideal for learning first Chinese characters.
Chinese vocabulary game 4 - our fourth and final visual Mandarin Chinese vocabulary building exercise. Click -thru our vocabulary flashcards 4 for learning the words with pinyin and simplified Chinese scripts.
Days and months in Chinese - learn the seven days of the week and the 12 months of the year in a simple visual quiz that is suitable for kids and elementary students. Chinese practice game use pinyin script only.
Vocabulary builders - as you begin to acquire more vocabulary, these three games each offer 48 additional words. Match the Chinese word with the corresponding English word. Chinese vocabulary builder 1 and the slightly more advanced Chinese vocabulary builder 2.
The BBC website has a section for learning Chinese, and includes a Mandarin Chinese for children section. The youtube probably has the best collection of online Chinese study materials. Try channels such as Chinese for us or Everyday Chinese.
b1e95dc632