British English vs American English spellings

42 views
Skip to first unread message

Mike Dowd

unread,
Dec 5, 2021, 6:03:10 PM12/5/21
to link-grammar
I see some effort to include both spellings in the English dictionary. Is the intention to fully support both spellings?

While looking around I randomly noticed that there was"watercolour" but not "watercolor",  "tricolour" but not "tricolor". Are these omissions? Do you want bug reports about this kind of thing? Do you want submissions of more words for the project?

Linas Vepstas

unread,
Dec 7, 2021, 4:50:39 PM12/7/21
to link-grammar, Mike Dowd
Hi Mike!

On Sun, Dec 5, 2021 at 5:03 PM Mike Dowd <mike...@gmail.com> wrote:
I see some effort to include both spellings in the English dictionary. Is the intention to fully support both spellings?

Yes, more-or-less. (*, see below)

While looking around I randomly noticed that there was"watercolour" but not "watercolor",  "tricolour" but not "tricolor". Are these omissions?

yes, accidental omissions.

Do you want bug reports about this kind of thing? Do you want submissions of more words for the project?

Yes. Even better than bug reports would be git pull requests adding the needed words. Maintaining everything myself is a bit of a chore.

(*) I said "more or less" for several reasons. One is that there are some experimental efforts to support "dialects", so, not just alternative spellings, but variations in grammar.  This ranges over everything from Shakesperian English, to New York Irish/Italian immigrant speech, to telegraphic news-paper headlines and twitter posts. Right now, this support is patchy and hap-hazard.  More accurately, its a "proof of concept": the framework is there. It's not filled in.

Another issue is that maintaining such a large and complex grammar, and all of its dialects, is a large task. Most of my efforts these days is in creating a system that learns grammar by reading. it is nowhere near ready, though, still in early experimental stages.

-- Linas

--
Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.
 

Mike Dowd

unread,
Dec 7, 2021, 8:18:34 PM12/7/21
to link-grammar
Do you want bug reports about this kind of thing? Do you want submissions of more words for the project?

Yes. Even better than bug reports would be git pull requests adding the needed words. Maintaining everything myself is a bit of a chore. 

I've never done git pull requests before, but I can probably figure it out. I might have more words for you as well.
  

Linas Vepstas

unread,
Dec 7, 2021, 8:24:31 PM12/7/21
to link-grammar
Heh. OK. I'm just trying to nurture future talent. If it's literally just a half-dozen words, it's not worth  the effort; if it's hundreds, then that's different. (git is not ... hard, but its ... well, there are both technical hurdles, and social/cultural ones as well. it's how developers communicate, but if that's not your bag, then it's not worth the effort.)

--linas

  

--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-grammar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/link-grammar/abe7bb84-4f59-4cbd-82f9-9edc6992c476n%40googlegroups.com.

Linas Vepstas

unread,
Dec 7, 2021, 8:33:20 PM12/7/21
to link-grammar, Mike Dowd
p.s.  I just now added tricolor, watercolor to the dicts; they will appear in the next version 5.10.3 That will be released when ... it becomes urgent to do so.

--linas

Mike Dowd

unread,
Dec 7, 2021, 8:38:02 PM12/7/21
to link-grammar
I gotcha. It will take a some work to see how many additional words I may have for you. I'll get back to you.

Mike Dowd

unread,
Dec 10, 2021, 4:49:45 PM12/10/21
to link-grammar
If I have massaged all the data correctly, I have 462 British spellings that aren't in en/words/. However, I don't have the American equivalents. If I can find an American equivalent in the dictionary, then the British word goes next to it, with the same suffix. If I can't, then I can either try to determine the part of speech suffix to add to both the American and British words, and which file they go in. Or skip those.

I can't think of a reasonable way of automating any of this. (Unreasonable means it would take me longer to write the code to do it than to do the work by hand.) How valuable is this data to you? Is it worth spending much time on?

Linas Vepstas

unread,
Dec 10, 2021, 11:53:56 PM12/10/21
to link-grammar
If you can sort them into nouns, verbs, adjectives, adverbs, and other, that's a big help. If a word is more than just one of these, it should be repeated in each appropriate list. Sorting nouns into mass nouns and count nouns is even better. If you don't do it, I can, but it does get tedious. I do this by hand, as there aren't really any practical tools.  Separating out anything with unusual plural forms or oddball conjugations also helps.

More than a decade ago, someone noticed that the automatic guesser for unknown words does work well enough, that explicitly adding the missing words rarely improves overall quality. So I can add the words, but it's unlikely to make a difference. It can even degrade performance, if a word can be both a noun and a verb, and I add one and forget to add the other. The one that gets aded prevents guessing of the form that is not added.

--linas

--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-grammar...@googlegroups.com.

Linas Vepstas

unread,
Dec 11, 2021, 12:03:22 AM12/11/21
to link-grammar
... and if you wanted to get fancy, sorting verbs into transitive, intransitive classes would also be a big help. A this and fourth list for those that take a particle.  A distinct list for any paraphrasing verbs.  I suspect you have mostly nouns and adjectives.

Also: caution: it's not just a question of whether the words can be found in the dict, but if they are correctly classified. For example, there might be a word that Brits use as a noun, Americans use as a verb, (or adjective.. etc.) and if it is not listed under each of the categories, it will be misinterpreted, invisible to the parser.

--linas

Mike Dowd

unread,
Dec 11, 2021, 1:18:32 AM12/11/21
to link-grammar
I haven't looked at the 462 too closely, but there are some medical ones in there too.

I can try some sorting. Sounds like I need to look up every single one in the dictionary to catch the distinctions you mentioned.

But you didn't sound terribly enthusiastic. 
> "So I can add the words, but it's unlikely to make a difference."
If you don't think these would be helpful, then let's not bother?

Linas Vepstas

unread,
Dec 11, 2021, 7:51:07 PM12/11/21
to link-grammar
On Sat, Dec 11, 2021 at 12:18 AM Mike Dowd <mike...@gmail.com> wrote:
I haven't looked at the 462 too closely, but there are some medical ones in there too.

I can try some sorting. Sounds like I need to look up every single one in the dictionary to catch the distinctions you mentioned.

Well, if you don't, then I do. I'm negotiating how much work I have todo

But you didn't sound terribly enthusiastic. 

Well, frankly, it is a chore. I would rather spend my time researching how to learn grammar, automatically. In the long run, it's a better deal.
 
> "So I can add the words, but it's unlikely to make a difference."
If you don't think these would be helpful, then let's not bother?

Well, I presume the reason that you started this conversation is that you noticed ... something. Send me the list. I'll add it. Whatever quality-control assistance you can provide, it would be appreciated.

--linas

Mike Dowd

unread,
Dec 11, 2021, 10:50:14 PM12/11/21
to link-grammar
The reason I offered some British words is that I thought this was a very cool project and I thought maybe I had something that I could contribute. I meant it as a gift. If the reality is that I would spend dozens of hours and you would spend a significant chunk of time that is better spent elsewhere, and the result will make no difference, then I will have made things worse for both of us not better. If you tell me that the list would be a nice thing to have, then I will get started on sorting.

Mike Dowd

unread,
Dec 12, 2021, 10:50:28 AM12/12/21
to link-grammar
Actually, you've been perfectly clear that in my ignorance I made an offer that is not helpful and only a burden. Let's just drop it. I wish the project well.

Linas Vepstas

unread,
Dec 13, 2021, 2:47:39 PM12/13/21
to link-grammar
On Sat, Dec 11, 2021 at 9:50 PM Mike Dowd <mike...@gmail.com> wrote:
The reason I offered some British words is that I thought this was a very cool project and I thought maybe I had something that I could contribute. I meant it as a gift. If the reality is that I would spend dozens of hours and you would spend a significant chunk of time that is better spent elsewhere, and the result will make no difference, then I will have made things worse for both of us not better. If you tell me that the list would be a nice thing to have, then I will get started on sorting.

Thank you!  Yes! It would be a nice thing to have!

Yes, it's a cool project!  I'm very focused on how to take it to the next level, and the difficulty of doing that results in my saying things that sound ill-tempered.  I won't turn down gifts; they are appreciated, and I enjoy the contact!

To recap: if the words are added correctly, they do make things better.  The "time spent better elsewhere" is a multi-year (approaching a decade?) project, so a few hours hardly makes a dent. (the project is to handle all languages, not just English. Early results are promising, final results are still a very long ways off.)

--linas

Mike Dowd

unread,
Dec 13, 2021, 10:36:06 PM12/13/21
to link-grammar
OK. I have some questions. Do I continue to add to this conversation, or can I contact you directly? How do I do that?

Linas Vepstas

unread,
Dec 14, 2021, 1:53:50 AM12/14/21
to Mike Dowd, link-grammar
Hi Mike,

Replying publicly, this is still a generic discussion.

On Mon, Dec 13, 2021 at 11:46 PM Mike Dowd <mike...@gmail.com> wrote:
Do you already have a description of what category of words goes in each of the files in en/words/? I have looked at them but it’s not always obvious to me what precisely the commonality is.

The only description is indirect: one looks at the file 4.0.dict.m4, searches for the name of the file, and then reads the nearby comments.  For example, to see what "words.v.6.1" is about, look here:


Note that 4.0.dict.m4 contains many words that do not fit neatly into any of the word-files!
 
Can you recommend a dictionary web site that’s better in talking about grammatical function than other ones? I have noticed they vary in how exactly they describe a word.

No, because the 4.0.dict.m4 makes far more detailed and precise classifications than any other existing dictionary, ever.  The "disjuncts" that the LG documentation talks about should be considered to be extremely fine-grained "parts of speech" or "word classes" or "word categories". It's not enough to just say something is a transitive verb; you have to also say if it can take a particle, or if it can be ditranstive,  take "that", bare infinitives, etc. For example

"she wrote me a letter"
"she filled out a form"
* "she filled out me a form"
* "she filled me a form"
? "she filled me a glass of water"
"she wrote a letter to me"
"she wrote to me a letter"
? "she filled out for me a form"
* "she wrote out me a letter"
"she filled out a form for me"
"she wrote out a letter for me"
"she wrote that she loved me"
* "she filled out that she loved me"
? "she filled out that she had an auto accident last year"

Verbs have by far the most complex classifications in link-grammar. Everything else is pretty easy.  To handle verbs, I have to think of a bunch of sentences with that verb, and then search for other verbs that behave similarly.

-- Linas

Linas Vepstas

unread,
Dec 14, 2021, 1:56:12 AM12/14/21
to Mike Dowd, link-grammar

Mike Dowd

unread,
Dec 14, 2021, 8:50:54 PM12/14/21
to link-grammar
For the moment I am focusing on the British words that have an American equivalent in the dictionary, until I have more experience. But I'm taking notes on the words that don't have the American counterpart in the dictionary. As well as other discoveries (e.g. a noun in the dictionary is categorized uncountable, but there is also a countable definition for the word as well).

Mike Dowd

unread,
Dec 14, 2021, 9:06:26 PM12/14/21
to link-grammar
Checking: if a noun has both countable and uncountable definitions, it should appear in the dictionary twice? Once with .n and once with .n-u? In the appropriate files.

Mike Dowd

unread,
Dec 14, 2021, 9:48:30 PM12/14/21
to link-grammar
Now I have seen a .s suffix which appears to be for "singular" when the same word is also in the dictionary with a .n-u suffix, for the uncountable version?

Linas Vepstas

unread,
Dec 16, 2021, 12:17:29 PM12/16/21
to link-grammar
Hi Mike,

On Tue, Dec 14, 2021 at 7:50 PM Mike Dowd <mike...@gmail.com> wrote:
>
> For the moment I am focusing on the British words that have an American equivalent in the dictionary, until I have more experience. But I'm taking notes on the words that don't have the American counterpart in the dictionary. As well as other discoveries (e.g. a noun in the dictionary is categorized uncountable, but there is also a countable definition for the word as well).

Yes, it's an adventure.

The distinctions between countable and uncountable nouns are as you
can guess: For example:

countable noun:
"the tree"
"a tree"

mass noun:
"the sand"
*"a sand"

There are some nouns which are both (I can't think of any off the top
of my head). A conventional dictionary will assign these different
word-senses, and explain why they're different.

The link-grammar subscripts (the .n .n-u .s) are enforced debugging
aids. If you list the same word twice in the LG dictionary, it will
complain. This is useful for avoiding mistakes, as otherwise it can be
very hard to understand why some parse is coming out the way it is.
Yet, often, one *does* want to list the same word in multiple places,
and the subscripts provide the mechanism to do so.

The use of subscripts in the LG dictionary is haphazard: they do not
play any grammatical role whatsoever, they are only debugging aids.
Thus, there aren't really any rules for how they should be used, other
than they shouldn't be misleading (don't use the .v subscript for
nouns.)

Note: it is easy to get the impression that the subscripts identify
"parts of speech". This is incorrect. (They are debugging aids). The
actual part-of-speech is done with connectors/links:

sand: Dmu- & ...;
tree: Ds**c- & ...;
trees: Dp- &..

The part of speech is encoded in those connectors: verbs never have
"D". Mass nouns get an "m". singular get "s", plural get "p", and
these rules are strict and uniformly applied: they determine the
grammar; the parser ensures that the link types are satisfied.

I mentioned "conventional dictionaries" and "word senses". The
connectors/links on words correlate strongly with word-senses. So
strongly, that, in fact, you can fish out word senses simply by
looking at the links on a word, and get pretty decent accuracy. I
tried this with a combo of LG+WordNet a decade ago, and hit 70%
accuracy right out the gate. Anyway, the lesson here is that if you
want to know about the "part of speech" of a word, do NOT look at the
subscripts; look at the linkages instead.

- Linas
Message has been deleted

Mike Dowd

unread,
Dec 17, 2021, 12:03:55 PM12/17/21
to link-grammar
Thank you. I really appreciate the master class in dictionaries.

I believe "coffee" is a good example of a noun that is both a mass noun, "There is a lot of coffee in that pot.", and countable, "The waitress brought us three coffees."

Reply all
Reply to author
Forward
0 new messages