How to read kanji/kana in this file?

Konrad Den Ende

unread,

Dec 6, 2003, 7:17:52 PM12/6/03

to

Is it doable? I tried to take a look at it with Word but it didn't
gave anything fun (i saw the very same characters).

--

Kindly
Konrad
---------------------------------------------------
May all spammers die an agonizing death; have no burial places;
their souls be chased by demons in Gehenna from one room to
another for all eternity and more.

Sleep - thing used by ineffective people
as a substitute for coffee

Ambition - a poor excuse for not having
enough sense to be lazy
---------------------------------------------------

necoandjeff

unread,

Dec 6, 2003, 8:37:58 PM12/6/03

to

"Konrad Den Ende" <chamste...@bigfoot.com> wrote in message
news:bqtrji$un8$1...@news.gu.se...

> I have a text file (sort of) where every row contains three columns
> that are: 1. Kanji 2. Kana-reading 3. English meaning.
> I'd like to be able to read (and write to) the file but it's encoded
> in a format that looks like:
> ï»¿å¤–|ã ã ¨|outside
> å¤–å‡ºã ™ã‚‹|ã Œã „ãƒ»ã —ã‚…ã ¤ã ™ã‚‹|to go out
> å¤–å›½|ã Œã „ãƒ»ã “ã |a foreign country
> å¤–ç§‘|ã ’ãƒ»ã ‹|surgery
>
> Is it doable? I tried to take a look at it with Word but it didn't
> gave anything fun (i saw the very same characters).

I think the format above is still preserving the double-byte nature of the
original characters, the viewer just doesn't know that they are intended to
be double byte characters as opposed to single byte characters.

In Word, go to the General tab of Options and click the checkbox that says
"confirm conversion at open." Then, when you double click on a text file, it
will give you a dialog where you can open it in various formats. When you
open your file, try opening it as encoded text. It will then give you a
second dialog where you choose the encoding to open it with. It may
recognize that there are Japanese characters and default to Japanese. If
not, choose it from the menu and hit OK. This may enable you to open the
file and actually view the Japanese.

Jeff

Kevin Wayne Williams

unread,

Dec 6, 2003, 9:44:52 PM12/6/03

to

Konrad Den Ende wrote:

> I have a text file (sort of) where every row contains three columns
> that are: 1. Kanji 2. Kana-reading 3. English meaning.
> I'd like to be able to read (and write to) the file but it's encoded
> in a format that looks like:
> ï»¿å¤–|ã ã ¨|outside
> å¤–å‡ºã ™ã‚‹|ã Œã „ãƒ»ã —ã‚…ã ¤ã ™ã‚‹|to go out
> å¤–å›½|ã Œã „ãƒ»ã “ã |a foreign country
> å¤–ç§‘|ã ’ãƒ»ã ‹|surgery
>
> Is it doable? I tried to take a look at it with Word but it didn't
> gave anything fun (i saw the very same characters).
>

I looks like EUC-JP to me. Try giving it a .euc extension and opening it
with JWPCE (available for free download at numerous places).

KWW

Manuel Ubeda

unread,

Dec 6, 2003, 11:10:12 PM12/6/03

to

Kevin Wayne Williams wrote:
> Konrad Den Ende wrote:
>
>> I have a text file (sort of) where every row contains three columns
>> that are: 1. Kanji 2. Kana-reading 3. English meaning.

Hmmm... doesn't edict use a similar format?

A quote from the documentation that comes with it:
>
> FORMAT
>
> EDICT's format is that of the original "EDICT" format used by the
early PC
> Japanese word-processor MOKE (Mark's Own Kanji Editor). It uses EUC-JP
> coding for kana and kanji, however this can be converted to JIS
> (ISO-2022-JP) or Shift-JIS by any of the several conversion programs
> around.
> It is a text file with one entry per line. The format of entries is:
>
> KANJI [KANA] /English_1/English_2/.../
>
> or
>
> KANA /English_1/.../
>

>> I'd like to be able to read (and write to) the file but it's encoded
>> in a format that looks like:
>> ï»¿å¤–|ã ã ¨|outside
>> å¤–å‡ºã ™ã‚‹|ã Œã „ãƒ»ã —ã‚…ã ¤ã ™ã‚‹|to go out
>> å¤–å›½|ã Œã „ãƒ»ã “ã |a foreign country
>> å¤–ç§‘|ã ’ãƒ»ã ‹|surgery
>>
>> Is it doable? I tried to take a look at it with Word but it didn't
>> gave anything fun (i saw the very same characters).
>>
> I looks like EUC-JP to me. Try giving it a .euc extension and opening it
> with JWPCE (available for free download at numerous places).
>
> KWW
>

You might also try, as a quick check, to open it with a web browser and
then try different character encodings. I.e. Mozilla Firebird (which I
use) has "View -> Character Coding -> More -> East Asian -> Japanese
(...)" in the menu; don't know where it is in other browsers.

--
Single line signatures are neat :-)

Farrell

unread,

Dec 6, 2003, 11:28:01 PM12/6/03

to

Konrad Den Ende wrote:

> I have a text file (sort of) where every row contains three columns
> that are: 1. Kanji 2. Kana-reading 3. English meaning.
> I'd like to be able to read (and write to) the file but it's encoded
> in a format that looks like:

>
> Is it doable? I tried to take a look at it with Word but it didn't
> gave anything fun (i saw the very same characters).
>

First attempt sending non-western encoding, hope it works
(If not, someone will have to tell me how to get Thunderbird/mozilla to
send in the right encoding)
hope this works (copying and pasted into JWPce and it showed up fine for
me - slightly edited to remove crap and to add romaji :)

--
Iain

Peter Remmers

unread,

Dec 7, 2003, 6:02:00 AM12/7/03

to

On Sun, 07 Dec 2003 04:28:01 +0000, Farrell wrote:

> 外 |そと - soto | outside
> 外出する |がいしゅつする - gaishutsusuru| to go out
> 外国 |がいこく - gaikoku |a foreign country
> 外科 |げか - geka | surgery
>
> First attempt sending non-western encoding, hope it works
> (If not, someone will have to tell me how to get Thunderbird/mozilla to
> send in the right encoding)
> hope this works (copying and pasted into JWPce and it showed up fine for
> me - slightly edited to remove crap and to add romaji :)

The "crap" you removed wasn't crap. These dots were there to mark the
boundaries between the kanji.

Peter

David Nettles

unread,

Dec 7, 2003, 8:21:54 AM12/7/03

to

It looks like you got it. I also use Mozilla on Linux and have to
wrestle with multi-lingual configs --- it can be a bit challenging. :)

--
David Nettles
web: http://www.miteyo.org
email: tetsuo...@yahoo.co.jp

Farrell

unread,

Dec 7, 2003, 9:23:24 AM12/7/03

to

David Nettles wrote:

> Farrell wrote:
>
>>First attempt sending non-western encoding, hope it works
>>(If not, someone will have to tell me how to get Thunderbird/mozilla to
>>send in the right encoding)
>>hope this works (copying and pasted into JWPce and it showed up fine for
>>me - slightly edited to remove crap and to add romaji :)
>>
>
>
> It looks like you got it. I also use Mozilla on Linux and have to
> wrestle with multi-lingual configs --- it can be a bit challenging. :)
>

especially at 4 o'clock in the morning after a hard night's drinking...

--
Iain

Farrell

unread,

Dec 7, 2003, 9:24:57 AM12/7/03

to

Peter Remmers wrote:

Yeah, I know - but it's not used in writing (that I've seen), and the
separation should be obvious when they're next to each other - hence 'crap'
Probably should have used a different word, though (but 4am...drink...meh.)

--
Iain

Konrad Den Ende

unread,

Dec 7, 2003, 11:05:30 AM12/7/03

to

Thanks. It worked both in Word and in JWPce.