Merging Homographs

53 views
Skip to first unread message

Jonathan

unread,
Oct 6, 2010, 1:32:10 AM10/6/10
to FLEx list
I have a lot of homographs. (Over 2700 entries are listed as
homographs.) Most of these are not different words, they are either
different senses or the same sense with additional information added,
e.g. semantic domain. How can I merge all of the homographs? Some of
you will cringe but I would rather merge all of them and then move the
actual homographs that are separate words later. Although I have read
something recently that led me to think that new dictionary users are
most confused by homographs.

Ronald Moe

unread,
Oct 6, 2010, 4:27:19 PM10/6/10
to flex...@googlegroups.com
If you haven't done a lot with your FLEx project yet (like interlinearizing
texts), you can export your dictionary, merge all homographs, and then
reimport your dictionary. I have a CC table that merges homographs in a SFM
(Toolbox) database. There is no easy way to do this in FLEx. It is a feature
that we have requested, but the FLEx team has not had time yet to implement
it. Currently all you can do is filter for homographs, then use the Merge
Entry feature to merge them one by one. If you have 2,700 homographs, this
is going to take you a long time.

Some English dictionaries designed for language learners combine homographs
into single entries. New dictionary users do not understand why some entries
are split. They may read the first entry and not even see the second. It
takes a lot of work for a lexicographer to determine which entries are
really homographs. The only proof that two senses belong to two different
homographs is to do a lot of etymological research.

One of the reasons lexicographers have split up homographs is so that they
can include etymological information for each. This information is useful
for historical linguists and other linguists who are interested in how
semantics change over time (e.g. grammaticalization). In can also be of
interest to the average user. But it can also be confusing or simply
irrelevant to many users who are consulting the dictionary for other
purposes.

The other reason for splitting homographs is so that you can organize your
senses to show semantic similarity. Most dictionaries order their senses
either on the basis of frequency (putting the most frequent sense first) or
on the basis of priority (putting the oldest sense first). Some dictionaries
try to make a judgment as to which sense is most basic. In other words, what
sense would a native speaker think of first? Or, what sense can be used to
semantically derive the other senses? Ordering senses in any of these ways
is complicated if unrelated senses are combined in the same entry.

Some dictionaries also split entries on the basis of grammatical category.
But that is another issue.

Ron Moe

--
You received this message because you are subscribed to the discussion group
"FLEx list". This group is hosted by Google Groups and is open for anyone to
browse.
To post to this group, send email to flex...@googlegroups.com
To unsubscribe from this group, send email to
flex-list-...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/flex-list
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.862 / Virus Database: 271.1.1/3180 - Release Date: 10/05/10
23:34:00

John Hatton

unread,
Oct 6, 2010, 6:15:00 PM10/6/10
to flex...@googlegroups.com, Jonathan
Jonathan,
I would be willing write you a little tool to do this merge, based on the LIFT file.  That's likely to be safer and easier than going to SFM and using CC.  Let me know if you want this.

jh
--

John Hatton
SIL Papua New Guinea, Palaso, & SIL International Software Development

Mark Penny

unread,
Oct 7, 2010, 2:00:21 AM10/7/10
to flex...@googlegroups.com
Yes, I'd be interested in such a tool as well! :)
Mark


From: flex...@googlegroups.com [mailto:flex...@googlegroups.com] On Behalf Of John Hatton
Sent: Thursday, October 07, 2010 3:45 AM
To: flex...@googlegroups.com
Cc: Jonathan
Subject: Re: [FLEx] Merging Homographs

--

John Hatton

unread,
Oct 13, 2010, 3:35:20 AM10/13/10
to flex...@googlegroups.com
>Yes, I'd be interested in such a tool as well! :)

Ok, I found some time to work on the merging tool this week, should have
something to try out next week. Note, as you think about all the possible
2-way merge scenarios (are these two entries really homographs in various
writing systems, are otherwise compatible?) and how to merge (what if one
says it has these features and other these other features, and they
conflict), you see it gets hard really fast. My guess is that most
situations requiring this tool, you have lots of very simple entries,
gathered from multiple sources, and so a simple algorithms will be
sufficient to the need. Jonathan or Mark, if you have thoughts on that, let
me know. I only expect to be able to deliver a simpler system, since this
isn't really a scheduled priority. If a system with fairly simple merging
logic won't be of any use to you, please tell me that now.

John Hatton
SIL PNG, Palaso, & SIL International Software Development

Mark Penny

unread,
Oct 13, 2010, 6:50:13 AM10/13/10
to flex...@googlegroups.com
John,

Yes, your assumptions about my situation are correct. Merging simple entries
from multiple sources is what I had in mind. So I'm not expecting something
fancy. I just miss being able to even find similar (or identical) records
[like I used to be able to do in Toolbox] - let alone the tools to merge
them efficiently.

Mark

> -----Original Message-----
> From: flex...@googlegroups.com
> [mailto:flex...@googlegroups.com] On Behalf Of John Hatton

Jonathan

unread,
Oct 13, 2010, 10:54:14 AM10/13/10
to FLEx list
I concur on this also. Simple entries mostly.

Ronald Moe

unread,
Oct 13, 2010, 1:45:17 PM10/13/10
to flex...@googlegroups.com
I would also be happy with a simple tool. But I would also like a
sophisticated tool that enables me to merge well-developed dictionaries with
lots of duplicate entries. Such a tool would have to be
interactive--enabling me to combine entries, combine senses within those
entries, order senses, etc. A minimal tool merely concatenates the entries,
and does so without user interaction, for instance by creating two senses if
the two homographs have two different glosses:

bank n. money institution
bank n. side of river
bank v. (of an airplane or bird) to turn

bank 1) n. money institution 2) n. side of river 3) v. (of an airplane or
bird) to turn

Different grammatical categories should be merged (as above), but different
morphtypes should not:

-ed v.sfx past tense [morphtype: affix]
Ed prop.n a man's given name [morphtype: root]

John, I'm assuming that morphtype would be one of your "conflicting
features". I can see how this would get complicated fast.

Ron Moe

-----Original Message-----
From: flex...@googlegroups.com [mailto:flex...@googlegroups.com] On

Behalf Of Jonathan
Sent: Wednesday, October 13, 2010 7:54 AM
To: FLEx list

David Tuggy

unread,
Oct 13, 2010, 7:14:03 PM10/13/10
to flex...@googlegroups.com
I don’t think I’ ve mentioned this before … ?

I (and perhaps others) would greatly benefit from a tool that would let me import \sfm marked examples into FLEx. The files I would be importing from are not yet in FLEx’s database (nor likely to be anytime soon) and even if they were there and analyzed, they would not link usefully to the proper entries in the lexicon, because, at least as currently implemented in the agglutinative language I’m working with, the analyses reference morphemes rather than the multi-morpheme stems where the fun is.

For instance, I have in my file the following sentence:

\r Ilus.6.8.3 \i Motta keh miek tonalli okikalakilih yon tekitl.
\r Ilus.6.8.3 \t Parece que le metió muchos días a ese trabajo.

It would be very nice to have this sentence as an example in the entry for [nik]kalakilia `[I] put it/s.t. into [it]’. (A parsed form ō-ø-k-kal-aki-lia-TRUNC `PST-he-it-house-fit-APPLIC-PRET’ would not automatically link up to that entry, of course.) It illustrates a usage of that complex stem which is perhaps a bit surprising (“He seems to have sunk a good many days’ work into it/invested a good many days’ work in it.”)

What I would like to be able to do would be to select the sentence in Word or OO or whatever, and copy it to the “scrap” (or whatever that is called these days), then move to FLEx and paste it in such a manner that:
• The section marked \i would automatically go in the vernacular field of an example, formatted in that field’s default language (in my case, Nawatl)
• The part marked \r would go automatically into the reference field of the same example, formatted in the appropriate analysis language (Spanish for me).
• The section marked \t would go into the first (in my case Spanish) translation field of the same example, formatted in that field’s default language.
It would be nice if also:
• The cursor would be left in the next open field (which in my case would be the English translation field).
• Leading or trailing spaces would be cut from all of the above.

It costs me at least 9 clicks, ticklish mouse selections, and Ctrl-X/Ctrl-V motions to accomplish this right now, besides (often) the need to repeatedly scan visually up and down the page looking for where the cursor now is, as the screen has been redrawn. (If I want to get rid of leading or trailing spaces, it will take a good many more motions.) I have done those 9 motions hundreds of times now, and would not mind at all not having to do them any more, since the procedure is so mechanically predictable.

I would be happy to go with any other set of \sfms, or rearrange my input file into some other standard arrangement to fit a tool that would be more generally useful for others.

Perhaps overlapping with this request would be the request to be able to select a whole example within FLEx, copy or cut it, and paste it into another entry.

–David Tuggy

John Hatton

unread,
Oct 27, 2010, 11:41:30 PM10/27/10
to flex...@googlegroups.com
The homograph merger is now available for you to try out:
http://projects.palaso.org/projects/lifttools

John Hatton
SIL Papua New Guinea, Palaso, & SIL International Software Development
Chat Google Talk: hattonjohn Skype: hattonjohn Google Wave:
hatto...@googlewave.com


John Hatton

unread,
Nov 14, 2010, 8:09:35 PM11/14/10
to flex...@googlegroups.com

It's been two weeks, and there's no evidence that anyone has grabbed the new LiftTools app which does the homograph merging requested here... at least I've had no feedback.  Open-source software has the following characteristic:  when we developers get a request to provide you with new software or features for free, we assume that you are implicitly agreeing to do your part of testing and giving feedback soon, not just the next time you run into the problem. I understand, this might not be a shared assumption between our sub-cultures ;-) 

Ronald Moe

unread,
Nov 17, 2010, 2:14:06 PM11/17/10
to flex...@googlegroups.com

Hi John,

I have 2,700 homographs to merge. So I downloaded LiftTools and tried it. The first time I got a blue screen of death and Windows shut down. I did the following: (1) opened the program, (2) clicked the Browse button to find the LIFT file, (3) clicked on a folder, (4) tried to use the scroll wheel on my mouse to scroll down to a sub-folder. That’s when I got the blue screen with a message to the effect that Windows was shutting down to protect itself. I had a few other programs running at the time, including FLEx.

I rebooted and tried again with no other programs running. I went slowly, only clicking on buttons in order to specify the LIFT file. I got to it and it was showing properly in the “LIFT File” box. I then clicked ‘Run’. I got the following error message:

 

Exception: Could not load file or assembly 'Palaso.DictionaryServices, Version=1.5.115.0, Culture=neutral, PublicKeyToken=null' or one of its dependencies. The system cannot find the file specified.

 

I’m running Vista on a Lenovo T500. I’m using FW 6.0.3, FLEx 3.0.3.40274. The FLEx export tool exported the dictionary to LIFT 0.13 XML.

Ron Moe

 


From: flex...@googlegroups.com [mailto:flex...@googlegroups.com] On Behalf Of John Hatton


Sent: Sunday, November 14, 2010 6:10 PM
To: flex...@googlegroups.com


No virus found in this message.


Checked by AVG - www.avg.com

Version: 10.0.1153 / Virus Database: 424/3258 - Release Date: 11/15/10

John Hatton

unread,
Nov 18, 2010, 12:30:37 AM11/18/10
to flex...@googlegroups.com
Ron, thanks for trying it out.

> "Exception: Could not load file or assembly 'Palaso.DictionaryServices,
Version=1.5.115.0, Culture=neutral, PublicKeyToken=null' or one of its
dependencies. The system cannot find the file specified."

Drat, the installer was missing a piece. I'll email you a new installer to
try. Our server guy is away for a couple weeks and our servers have
immediately detected this fact and promptly gone haywire, so I can't post
the fix more widely yet.

jh


Jonathan

unread,
Nov 22, 2010, 4:34:10 AM11/22/10
to FLEx list
John,

Sorry for not letting you know what had happened. Same as Ron. Can I
get the new installer?

Jonathan

John Hatton

unread,
Nov 22, 2010, 3:27:13 PM11/22/10
to flex...@googlegroups.com

Hi Jonathan

 

>Sorry for not letting you know what had happened.  Same as Ron.  Can I get the new installer?

 

You can get it from the server, here.  Ron ran into a problem when re-importing the resulting file, and I believe the FLEx team is looking into that.  There are a couple issues with LIFT import in FLEx 6.0.5, and I hear they plan to release a 6.0.6 to address these.

 

jh

Ronald Moe

unread,
Nov 22, 2010, 5:11:09 PM11/22/10
to flex...@googlegroups.com

Ken Zook was able to load the resulting file into FW6.0.6. So the bug is fixed. We just need to wait until 6.0.6 is released.

Ron Moe

 


From: flex...@googlegroups.com [mailto:flex...@googlegroups.com] On Behalf Of John Hatton
Sent: Monday, November 22, 2010 1:27 PM
To: flex...@googlegroups.com
Subject: [FLEx] Re: Merging Homographs

 

Hi Jonathan

No virus found in this message.
Checked by AVG - www.avg.com

Version: 10.0.1153 / Virus Database: 424/3272 - Release Date: 11/22/10

Jonathan

unread,
Jan 10, 2011, 4:15:00 AM1/10/11
to FLEx list
Running the merge homographs tool. On a Win 7 machine and FLEx 6.0.6
Lift export. I get a recurring error posted below. If I click ok or
hit nearly any key it goes on and it seems to be merging them still.
Still does it in XP mode, when ran as admin, and in vista mode. Any
ideas?

the recurring error:
"Message (not an exception): Could not finish updating LIFT dictionary
file. Will try again later.
(Access to the path 'C:\Users\Jonathan\Documents
\Documents.processed.liftoff3c0on.fpo' is denied.)

Msg: Access to the path 'C:\Users\Jonathan\Documents
\Documents.processed.liftoff3c0on.fpo' is denied.
Source: mscorlib
Assembly: mscorlib, Version=2.0.0.0, Culture=neutral,
PublicKeyToken=b77a5c561934e089
Stack: at System.IO.__Error.WinIOError(Int32 errorCode, String
maybeFullPath)
at System.IO.File.Delete(String path)
at LiftIO.Merging.SynchronicMerger.MakeBackup(String
pathToBaseLiftFile, String pathToMergeInTo)
at LiftIO.Merging.SynchronicMerger.MergeUpdatesIntoFile(String
pathToBaseLiftFile)
at Palaso.Lift.LiftDataMapper`1.MergeIncrementFiles()

--Error Reporting Properties--
culture: en-IN
dotnetversion: 2.0.50727.4952
currentdirectory: C:\Program Files\LiftTools
username: Jonathan
workingset: 12386304
osversion: Microsoft Windows NT 6.1.7600.0
userdomainname: Pilo
commandline: "C:\Program Files\LiftTools\LiftTools.exe"
version: 1.0.6.789 (apparent build date: 22-11-2010)
machinename: PILO

--Log--
No log available."


Cambell Prince

unread,
Jan 10, 2011, 9:31:55 AM1/10/11
to flex...@googlegroups.com, Jonathan
Hi,

Do you have another application open that has the lift file open?  It looks like the merge tool cannot write to the lift file.  A likely reason is that another app (maybe an editor perhaps, or Flex) has the file held open.

Try running the application with other lift using apps closed and see what happens.

Regards.
Cambell
--
Cambell Prince
Payap Language Software and SIL MSEAG
Skype: cambell.prince | Mobile: +66 87 190 5871
www.palaso.org and www.sil.org  
Reply all
Reply to author
Forward
0 new messages