If you're still interested in the parsing program I promised you a few
years ago, you could e-mail me your e-mail address, and I would include
the first part of a novel actually written *in* Japanese which might be
in a way more revealing of the Japanese language than books in English
about other aspects of Japanese culture.
Bart
It is in the general interest here for you to reveal the identity
of the novel you have been referring to in connection
with a deal with Mirror, as well as to communicate
with him through this group if it is concerned
with the studying of Japanese, I believe.
I wonder if the email address given in the headers
of Mirror's posts here has not been a real one.
>It is in the general interest here for you to reveal the identity
>of the novel you have been referring to in connection
>with a deal with Mirror, as well as to communicate
>with him through this group if it is concerned
>with the studying of Japanese, I believe.
Isn't the name of the novel rather irrelevant, or are certain Japanese
novels not worth reading for their language content or are you
considering making a counter-suggestion?
>I wonder if the email address given in the headers
>of Mirror's posts here has not been a real one.
Do you mean this one: "From: mirror <mai...@127.0.0.1>"? Or this one:
"Sender: *"? Or this one: "Reply-To: *"?
>And you are quite right when you said:
>a novel actually written *in* Japanese which might be
>in a way more revealing of the Japanese language
>than books in English about other aspects of Japanese culture.
Well, duh! In a way indeed! :-)
--
Don Kirkman
don...@charter.net
Bart has brought out a most splendid idea
of a parsing program for language skill.
The text for the use of the program nust be
a good one, the name of which is worth
beign known to all of us.
>>I wonder if the email address given in the headers
>>of Mirror's posts here has not been a real one.
>
> Do you mean this one: "From: mirror <mai...@127.0.0.1>"?
Yes, that's it.
Why not make it public, so that the beneficiary of Bart's lecture
is not to be restricted to Mirror but open to all who will benefit
from the lecture? Actually, quite a few people have offered
such a course here, but all didn't come through. with their offers.
It's up to Bart and Mirror to decide whether in favor
of making it public or not.
> In my early days in this group I posted small files and got ripped
> apart for the faux pas. Perhaps Bart can email the file to you and
> you can post it and get burned alive by the group (the gang?).
Making a document public is not synonymous with posting it in a
newsgroup. We live in the age of the Web, and cheap/zero-cost web
hosting is available to just about anyone.
Keep newsgroups for discussions; public postings of essentially static
documents can go on web sites.
--
\ “I hope that after I die, people will say of me: ‘That guy sure |
`\ owed me a lot of money’.” —Jack Handey |
_o__) |
Ben Finney
Betcha Don knew that, but thought Chance didn't.
Well, I guess it wouldn't hurt to say here what I was going to say to
Paul in an e-mail before I sent him a bunch of files.
I had hoped to find my most recent version (that would be late 1980,
except for some changes I started to make in 1991) of the program in
machine-readable form somewhere. In fact, it *may* still exist on my old
UH Unix desktop, but if that has not been erased in the last nine years,
my knowledge of how to access it certainly has.
I did find a printout of a version from about 1978, and it obviously has
bugs in it, because some of the easy sentences didn't parse properly. I
don't remember SPITBOL (a SNOBOL variant) well enough to find them now,
but I wanted a machine-readable version anyway, so I retyped it and
added a few comments where I thought they might help explain how it
works (i.e., the parts that do). This is an ASCII text file, not
terribly long (I have all the documents on the Amiga side of this
computer, but I have to do Usenet from Linux if Japanese, etc., is
involved, so I can't give the exact length right now). I also typed up
the only lexicon I have for it, 300-some words is all, I think, so it is
an even shorter (but still ASCII!) file.
Then I found a presentation I had made about the program at a lunch
meeting the UH Linguistics department sponsors, and retyped it, pretty
much exactly as it was, although I find some of the prose a bit
embarrassing (I did fix any misspellings I recognized). ASCII didn't
seem good enough for this, so I made it PDF.
Finally, I typed up a modified version of the text that said
presentation was based on. Since my goal was to eventually create a
parser (and then translator perhaps, had I lived long enough) for the
*spoken* rather than the written language, my Japanese input was always
romanized, in a phonemic form with spaces between the "words" (=文節)
and hyphens separating the morphemes the program looks for. That is the
form of the text that I plan to send to Paul. But I added the Japanese
text, to make it perhaps easier to read. This document is also PDF.
I am considering putting all of this on my web site, but since my web
pages are purely an ego trip, which will when finished tell what a great
scholar, what a great hiker, what a great runner (the only part nearly
finished), what a great musician, etc., I am, I don't think I will ever
want to give anyone the URL.
I imagine that Paul must have originally learned of the parsing program
here; I think I gave a nutshell version of how it (is supposed to)
work(s). I'll do it again.
The bunsetsu is taken to be the minimal syntactic unit (word), but how
it ends (what suffixes, if any, are attached) tells what the words
Linking Function is, what it can modify. The earliest versions had two
Linking Functions. Adnominal (must link with nominal on the right,
unless it is the last word of the sentence) and Adverbial (must link
with a verbal on the right, unless...). I have since added
Adpropositional--links with a proposition--for words like はい = 'yes/no.'
Words also have a Root Function, of which there are three, namely
Nominal (can be linked with by an Adnominal on its left), Verbal (can be
linked with by an Adverbial on its left), and Nil (cannot be
grammatically modified--nothing links with it; one example: この).
The parser looks at the first two words in a sentence and decides
whether they are AdWhatever and Whatever. If not, the first word goes on
a LIFO stack, and the same test is applied to the second and third
words. If the words did have the appropriate Linking and Root Functions,
they are tested for semantic congruity. If that fails, the LeftHandWord
goes on the stack, and the next word to the right in the sentence is
brought into play.
If there is both grammatical and semantic congruence, the words are
linked (連文節) with the functions of the original RHW. If there is
anything waiting on the stack, it is checked as before against this new
RightHandWord.
When the last word is reached, and everything has been successfully
matched (if not, the message "Cannot understand the sentence" is
output), a bracketed tree is output, such as
3(kore-wa 2(1(hito-ni wakar-u-yoo-na)1 nihongo-no)2 bun-da)3
Whether that is correct or not (is it really わかるような日本語, as I'm
sure the last version of my program would parse it, or is it わかるよう
な文だ, which seems more likely to me) is, I concede an important
question, but I accepted *reasonable* parses, as long as really bad ones
were rejected.
A sample of a sentence JPARS couldn't (and shouldn't) parse follows
similar, but OK ones, below:
(Perhaps looking at a map) バスはこの道をきた。 → 3(basu-wa 2(1(kono
miti-o)1 ki-ta)2)3 (Of course the input was romanized the same way as
the output. Except, since the way to input computer programs and data
when I started this program was Hollerith punch cards, all caps.)
かれはこのシャツをきた。→ 3(kare-wa 2(1(kono syatu-o)1 ki-ta)2)3
バスはこのシャツをきた。→ I don't understand the sentence.
And Chance, if you're still here, I'll name a few books and see if you
can guess which one I used for my initial data. Bear in mind that my
goal was to parse mid-20th century *colloquial* Japanese.
谷崎潤一郎 『古都』
山本雄三 『波』
森鴎外 『ウィ(I don't know how to type wa-gyou-no i!)タ・セクスアリス』
太宰治 『晩年』
Bart
I am learning
as everybody is learning.
Is 'whoosh' here the same as 'shoo'?
>> I meant no disrespect.
None taken. I have a number of "127.0.0.1" links in my setup.
>Is 'whoosh' here the same as 'shoo'?
It depends. What do you mean by "shoo"? Is it English, Japanese, or
Korean? If "shoo" = "scat" or "gidoudahere," no, it's not the same.
--
Don Kirkman
don...@charter.net
> It depends. What do you mean by "shoo"? Is it English, Japanese, or
> Korean? If "shoo" = "scat" or "gidoudahere," no, it's not the same.
But if you get some scat on your shoo, don't you dare come walking into
the house.
Then, what will be it?
TIA
Seems like a double whoosh going on here.
Any one there enlightening the benighted?