Creating a LIFT file "by hand"

59 views
Skip to first unread message

Jeff Heath

unread,
Jun 19, 2020, 3:56:09 AM6/19/20
to FLEx list
I'm working on importing a complex Toolbox lexicon into FLEx. It has some additional information used for parsing, and I would also like that information to be imported. I have built a sample database in FLEx that has that information for a few entries, most of which is not included when I export that database as SFM, but it is when I export as LIFT. So I'm thinking that I should massage my Toolbox lexicon into a LIFT database (maybe using XSLT), and then import that into FLEx, into an existing database that has the structure, categories, and affixes for parsing already in place.

My main question relates to the use of "id" and "guid" attributes for the various elements. If I don't include them, will it generate them automatically on import? And if I have some variants or other lexical relations that need to point to other entries, can I just create random text for the "id" attributes, and use those id's in the appropriate "ref" attributes to connect them? Will those ids be recreated on the import, or will that FLEx database have those funny ids forever after?

Kent Rasmussen

unread,
Jun 19, 2020, 5:29:02 AM6/19/20
to flex...@googlegroups.com
I'm not sure that LIFT is valid without unique guid's for each entry and sense, so I wouldn't expect it to import OK. But is there a reason not to import to FLEx using SFM? That is typically how I convert SFM to LIFT, in any case. It takes some work, but is much faster than trying to create LIFT by hand (e.g., with a spreadsheet). Not all relationships come across as expected (especially on first try), so definitely use a dummy FLEx database for import (or return to a backup if/when things go awry), and check for references, etc. being imported as you expect.

Grace,

Kent

On 6/19/20 8:56 AM, Jeff Heath wrote:
> I'm working on importing a complex Toolbox lexicon into FLEx. It has some additional information used for parsing, and I would also like that information to be imported. I have built a sample database in FLEx that has that information for a few entries, most of which is not included when I export that database as SFM, but it is when I export as LIFT. So I'm thinking that I should massage my Toolbox lexicon into a LIFT database (maybe using XSLT), and then import that into FLEx, into an existing database that has the structure, categories, and affixes for parsing already in place.
>
> My main question relates to the use of "id" and "guid" attributes for the various elements. If I don't include them, will it generate them automatically on import? And if I have some variants or other lexical relations that need to point to other entries, can I just create random text for the "id" attributes, and use those id's in the appropriate "ref" attributes to connect them? Will those ids be recreated on the import, or will that FLEx database have those funny ids forever after?
>

--
Kent Rasmussen, Ph.D.
SIL Linguistics Consultant / Conseiller en Linguistique de SIL
Orthographies for Francophone Africa / Orthographes pour l'Afrique francophone
Text/voice in Cameroon: +237 695718832 / +237 682417131
Text/voicemail/WhatsApp: +1 541-357-7276

Bart-Jacqueline Eenkhoorn

unread,
Jun 19, 2020, 6:48:24 AM6/19/20
to flex...@googlegroups.com
I recently asked (in this group I guess) whether the easiest/best way would be toolbox-> lexique pro -> lift-export -> flex-import, or indeed toolbox (sfm) -> Flex-import direct*. I think Ann replied that with a lift import "things get lost" (or not all is imported) and that the way to go is the sfm import. So, just to say that I am very interested in what your conclusions will be Jeff.

Bart.

PS* I asked this because the database has a lot of custom styles defined, such as :

\an taan s_an:{femme}
\xe Proverbe : Le pantalon est mieux mouillé que brûlé. e_pr:{Sens : Le mal est préférable au pire.}
\xr Proverb: Wet trousers are better than burnt ones. r_pr:{Meaning: Something bad is preferable to the worst.}

With all these custom features present, I wanted to know whether, after lift-import into flex, these styles can be defined afterwards in flex, or whether the way to go is to import the sfm and define them on import. The sfm import was clearly favoured, although I did not find out whether it is possible to define styles after a lift import.

On Fri, 19 Jun 2020 at 09:56, Jeff Heath <jeff_...@sil.org> wrote:
I'm working on importing a complex Toolbox lexicon into FLEx. It has some additional information used for parsing, and I would also like that information to be imported. I have built a sample database in FLEx that has that information for a few entries, most of which is not included when I export that database as SFM, but it is when I export as LIFT. So I'm thinking that I should massage my Toolbox lexicon into a LIFT database (maybe using XSLT), and then import that into FLEx, into an existing database that has the structure, categories, and affixes for parsing already in place.

My main question relates to the use of "id" and "guid" attributes for the various elements. If I don't include them, will it generate them automatically on import? And if I have some variants or other lexical relations that need to point to other entries, can I just create random text for the "id" attributes, and use those id's in the appropriate "ref" attributes to connect them? Will those ids be recreated on the import, or will that FLEx database have those funny ids forever after?

--
"FLEx list" messages are public. Only members can post.
flex_d...@sil.org
http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/flex-list/913c4209-7239-4d57-a73f-38c6d20ffa92o%40googlegroups.com.

Jeff Heath

unread,
Jun 19, 2020, 7:34:08 AM6/19/20
to FLEx list
Thanks for the input. The problem with SFM import is that not all of the complexity can be imported. Can it create inflection features or allomorphs in the database? Yet the Toolbox lexicon has some data that should be represented with those features in FLEx, so that parsing will be successful.

Option #1
But I have an idea. I might be able to
  1. import the SFM into FLEx (even though it doesn't represent the full complexity of the data) to get the basics of all the entries
  2. fix up a couple of model entries with the inflection features and other extras
  3. export that data out in LIFT format
  4. programmatically manipulate the LIFT file (where the ids are established) to add additional fields and data from the SFM file
  5. import that LIFT file back into an "empty" FLEx project that has lists and styles all set up properly (responding to Bart's question)
That just might work... Once I have the LIFT export, I'll have to see what nature of changes is required (from the model entries), and figure out how to collect info from the SFM file entry and apply it to the corresponding LIFT entry. And I'll have to decide if XML parsing or direct text manipulation will be easier...  My hope is that FLEx will be relatively successful at importing a LIFT file if it is basically a LIFT file that it previously exported - fewer things "getting lost" that way?!

Option #2
Or here’s a completely different idea... The SFM file could be manipulated so that the complexity can be put into distinct SFMs that could be imported into Import Residue, and then I could try to Bulk Edit Entries, search for those specific bits of residue, and then apply inflection features to all lexemes that match each residue criteria. That would likely involve more manipulation of the SFM file in advance (possibly programming, but at least regular expressions), and would require manual bulk editing of the data once I get it into FLEx, to convert the Import Residue into inflection feature, etc. But it seems like that would be a lot less work.

OK, so it looks like I will be heading towards Option #2, but feel free to chime in or make other suggestions! Thanks again for the input.
Jeff

Bart-Jacqueline Eenkhoorn

unread,
Jun 19, 2020, 7:59:25 AM6/19/20
to flex...@googlegroups.com
I am curious to know whether you have considered the lexique pro->lift export route ? The inflection features and other extras you mention should all be exported from lexique-pro I assume?
Bart

--
"FLEx list" messages are public. Only members can post.
flex_d...@sil.org
http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.

Jeff Heath

unread,
Jun 19, 2020, 8:03:53 AM6/19/20
to FLEx list
I don’t think the SFM file represents the data in some standard way that Lexique Pro would understand. How are inflection features supposed to be represented in SFM?

Bart-Jacqueline Eenkhoorn

unread,
Jun 19, 2020, 8:45:23 AM6/19/20
to flex...@googlegroups.com
Well, looking at the toolbox manual, on page 23 the is \va and \mn where inflections can have their place :
image.png

and also in the manual:
\1d first person dual inflection
\2d second person dual inflection
and 3d.

I also found https://dictionaria.clld.org/submit  that bases their system on sfm, where they recommend that :  "We recommend that each dictionary contain all inflectional and derivational morphemes known in the language as headwords.

When you say : "I don’t think the SFM file represents the data in some standard way that Lexique Pro would understand", do you mean that a lexique pro export to lift would not correctly include the inflections ?

Bart.


--
"FLEx list" messages are public. Only members can post.
flex_d...@sil.org
http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.

Kevin Warfel

unread,
Jun 19, 2020, 10:17:04 AM6/19/20
to flex...@googlegroups.com

Hi Jeff,

 

I believe that your option #2 is closest to the recommended way to import things like inflection features. Here is what I understand as the way that our team (Dictionary & Lexicography Services) uses when they import SFM data into FLEx.

 

1) Create the target FLEx project and configure it with the desired inflection features, as well as custom fields into which to import the inflection feature details.

2) Import the SFM data into FLEx, mapping the inflection features to the custom fields that were created.

3) Use Filters and Bulk Edit tools in FLEx to assign inflection features to elements based on the contents of the custom fields.

4) Ignore the custom fields and their information (or delete them/it).

 

The following articles regarding noun classes and their handling in FLEx may or may not be relevant to what you’re doing, but in case they are, here are links:

https://lingtran.net/Modeling+Bantu+Noun+Classes

African Noun Classes - A How-to Manual

 

 

Hope that’s helpful,

Kevin

 

 

Kevin Warfel

Associate Dictionary and Lexicography Services Coordinator

a.k.a. Dictionary Development Coordinator

SIL International

 

Current technology makes it possible to provide those translating into just about any language with both a dictionary and a thesaurus in the target language, the standard tools of the trade for professional translators, so why are mother-tongue translators in minority languages still expected to do their work without these tools?  Ask me about Rapid Word Collection after reading about it at rapidwords.net.

--

"FLEx list" messages are public. Only members can post.
flex_d...@sil.org
http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.

Kent Rasmussen

unread,
Jun 19, 2020, 1:46:16 PM6/19/20
to flex...@googlegroups.com
I think this really is the question. By whatever method you import, you really need to check at each stage if all the data you had is there, and in the place you expect it. At any point it isn't, you need to back up and try again. I have had success doing this with SFM>FLEx only, but not with particlarly complex data structures, so your mileage may vary.
Something you will probably hear from *anyone* who does SFM>XML conversion is that each database has unique requirements, so there is a lot of handholding to do in the process, to make the data transfer. If you think you need automaticity, you're likely going to need another human to do that for you, or just deal with loss.

Grace,

Kent


On 6/19/20 12:34 PM, Jeff Heath wrote:
> The problem with SFM import is that not all of the complexity can be imported.


Kent Rasmussen

unread,
Jun 19, 2020, 1:56:20 PM6/19/20
to flex...@googlegroups.com
I think this references the MDF (Multi-Dictionary Formatter) manual, as toolbox itself can use a number of different SFM codes for a number of different things. MDF was in part an attempt to standardize those codes, so we would all mean the same things by the same codes, and so our tools could operate on them in the same way. I don't know if LexiquePro completely understands the latest MDF version or not.
Also, let's be clear that knowing what an SFM marker stands for is only one thing. Another implication of converting flat text (e.g., SFM) to structured text (e.g., XML, for FLEx or LIFT) is knowing the domain of each field. MDF tried to manage that, but I don't know that it really did. Some things belong in <sense> more logically, others maybe or maybe not. So things following \se in SFM may belong inside <sense/>, or they may belong after it (i.e., in <sense>'s parent, <entry> or some such) --depending on where the sense ends (</sense> in XML, not indicated in SFM). So part of the conversion includes understanding how you have structured your flat database; what belongs with what (hence the need for human intervention). If you have the (common) misfortune to have been not entirely consistent (e.g., in a database built up over time), then that is another layer of complexity.

Grace,

Kent


On 6/19/20 12:44 PM, Bart-Jacqueline Eenkhoorn wrote:
> Well, looking at the toolbox manual


Jeff Heath

unread,
Jun 20, 2020, 5:21:53 AM6/20/20
to FLEx list
This discussion has morphed into a discussion very different from the title of the original post. I've continued the discussion here: https://groups.google.com/forum/#!topic/flex-list/kv0yAks_6CU

Reply all
Reply to author
Forward
0 new messages