DTD / schema for import of text?

26 views
Skip to first unread message

Nick Thieberger

unread,
Jun 17, 2015, 11:43:32 PM6/17/15
to flex...@googlegroups.com
I want to import over a hundred texts into FLEX and have them in the following format which works for a few texts but fails for the whole set. This is all well-formed and conforms to an exported file created by FLEX, but I would like to validate it against s schema if there were one. I've looked through the FLEX directories and can't find a dtd. Any suggestions welcome,

Nick

<?xml version="1.0" encoding="UTF-8"?>
<document version="2">
<interlinear-text>
    <item type="title" lang="en">Appendix 4.1</item> 
<paragraphs>
            <paragraph>
                <phrases>
                    <phrase>
                        <item lang="nay" type="txt">Ungu luku wilkin ngera wonyangunangk kura?</item>
                    </phrase>
                </phrases>
            </paragraph>
            <paragraph>
                <phrases>
                    <phrase>
                        <item lang="en" type="gls">When thus looking for net then we to them ask</item>
                    </phrase>
                </phrases>
            </paragraph>

Alexandre Arkhipov

unread,
Jun 18, 2015, 6:29:31 AM6/18/15
to flex...@googlegroups.com
Dear Nick,

1) Here's a Schema that dates back to 2012. I've got it from the FLEx dev side about the time they were introducing some changes to the structure. However, it may have changed since, and more importantly, I believe FLEx itself does not use it to validate the texts -- at least it was not the case at that time, AFAIK. There were also some FLEx-applied restrictions that were not easily encoded in a Schema, it seems.

2) From the sample you're citing, I wonder if the problem is due to different languages (WS'es) in the baseline: it appears that you have here vernacular in one phrase (without a corresponding translation) and its translation appears standalone as another phrase (without the source text). To me it would be correct to have
            <paragraph>
                <phrases>
                    <phrase>
                        <item lang="nay" type="txt">Ungu luku wilkin ngera wonyangunangk kura?</item>
                        <item lang="en" type="gls">When thus looking for net then we to them ask</item>
                    </phrase>
                </phrases>
            </paragraph>
Also, you might want to check that the project you're importing into has exactly these labels for the languages (writing systems).

Best,
Sasha
--
You are subscribed to the publicly accessible group "FLEx list".
Only members can post but anyone can view messages on the website.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.
To post to this group, send email to flex...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/flex-list/e356c0b2-2b05-4e73-882b-c563e056daf5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

FlexInterlinear.xsd

Beth Bryson

unread,
Jun 23, 2015, 12:00:56 PM6/23/15
to flex...@googlegroups.com
You may well have gotten it sorted out by now, but just in case...

I tend to figure out what to do by taking a text that is in FLEx that has the parts that I want and then doing an export and looking at the resulting .flextext file.  Some comments based on that and Nick's example:

I don't think you can have a vernacular phrase that isn't broken down into words.  

In an export of a text, there is a segment number for each segment (sentence/phrase).  I don't know if this is required for import, or if it just gets put in internally, and occurs on export.  It may well be optional for import.

I'm not sure if you were intending for the English to be in a completely separate paragraph, or if you do intend for it to be the free translation of the sentence.  If the latter, then it needs to be in the same <phrase>.

Sasha makes a very good point that it is quite important to be sure that the writing systems you are referencing already exist in the project and have the same codes.  Also be sure that "nay" is a vernacular language and "en" is an analysis language.

I would expect Nick's example to look more like:

<?xml version="1.0" encoding="UTF-8"?>
<document version="2">
<interlinear-text>
    <item type="title" lang="en">Appendix 4.1</item> 
   <paragraphs>
          <paragraph>
              <phrases>
                  <phrase>
                     <item type="segnum" lang="en">1.1</item>
                     <words>
                       <word>
                          <item type="txt" lang="nay">Ungu</item>
                       </word>
                       <word>
                          <item type="txt" lang="nay">luku</item>
                       </word>
                       <word>
                          <item type="txt" lang="nay">wilkin</item>
                       </word>
                       <word>
                          <item type="txt" lang="nay">ngera</item>
                       </word>
                       <word>
                          <item type="txt" lang="nay">wonyangunangk</item>
                       </word>
                       <word>
                          <item type="txt" lang="nay">kura</item>
                       </word>
                       <word>
                          <item type="punct" lang="nay">?</item>
                       </word>
                     </words>
                      <item type="gls" lang="en">When thus looking for net then we to them ask</item>
                  </phrase>
              </phrases>
          </paragraph>

I have not tested this in FLEx, but hopefully these are some ideas to point in the right direction.  However, by now you may well have gotten there on your own.  

-Beth


For more options, visit https://groups.google.com/d/optout.
<FlexInterlinear.xsd>

John Mansfield

unread,
Jun 23, 2015, 6:31:14 PM6/23/15
to flex...@googlegroups.com
I've been dealing with the same problem lately. The text import does work for me, but not for all texts that I've tried to import. I'm especially interested in importing texts that already have morphological glossing in them - and this too has worked for some texts.

With the texts that fail to import, I'm fairly sure that I do have the xml correctly structured in terms of element and attribute configurations (i.e. tagging structure). But I wonder how FLEx deals with all the morphs and their glosses? The texts that fail to import contain dozens of morph items that are not already in the project lexicon, and this is the best guess I have for why these imports fail. But I'd be curious to know what FLEx does exactly when it imports a text, and whether it has to go through a procedure for matching all the morphs with the lexicon.

j

Beth Bryson

unread,
Jun 25, 2015, 12:24:23 AM6/25/15
to flex...@googlegroups.com
Hmm, as far as I know, FLEx doesn't currently try to import the morphemes and glosses....  But it sounds like you have had at least some success with that.

There was some work done internal to the program to implement that, but it wasn't taken to the point of actually working for users, as far as I know.  Maybe you have discovered a way to access that!

One reason we have never made that possible is exactly the reason you state:  What to do with entries that are already in the lexicon is challenging, especially if there is more than one piece of information about the morpheme, and some pieces match the lexicon but others don't.

We do hope to be able to do that kind of import some day, but it hasn't risen to the top of the priority list yet.

-Beth

steve_...@sil.org

unread,
Jun 29, 2015, 11:27:11 PM6/29/15
to flex...@googlegroups.com
John,


On Wednesday, June 24, 2015 at 8:31:14 AM UTC+10, John Mansfield wrote:
I'm especially interested in importing texts that already have morphological glossing in them - and this too has worked for some texts.

I'm not sure how you got some glosses to import. At least, not while importing texts. I haven't been able to do it in my text imports to date.

I have, however, been able to import glosses when importing a lexicon.

Steve
 
Reply all
Reply to author
Forward
0 new messages