convert from .doc to .sfm for importation into FLEx as interlinear text

126 views
Skip to first unread message

Michael Galant

unread,
Jun 20, 2013, 8:04:56 PM6/20/13
to flex...@googlegroups.com
I currently have .doc files in which I have Spanish elicitations (which can serve as free translations for the moment) and translations into Zapotec of the elicited sentences, and I would like to enter these as interlinear texts in FLEx.

Rather than copy and paste from my .doc files into FLEx, I am hoping to be able to convert the data (by using some macros and probably a bit of manual manipulation, due to the strange way that I've formatted the data) into .sfm (or .xml), but I don't know what .sfm needs to look like formatwise in order for the import into FLEX to work (and I have no idea how to create .xml).  Any suggestions?


Thanks,
Mike

J V C

unread,
Jun 25, 2013, 12:34:25 AM6/25/13
to flex...@googlegroups.com
Hi Mike,

I'm guessing you already saw Beth's helpful message describing the SFM format for texts, but I've forwarded it below just in case.

Regarding semi-automated conversion of Word formatting into SFM, I've had very good experiences with Word's ability to find and replace formatting and/or styles. Ctrl+H or Edit Replace opens the dialog; make sure you expand to "More" to see all options. However, it does not combine that with the power of full-blown regular expressions, so it can be hard to find the beginning of the line. So, one trick I use for inserting SFM markers at the beginning of lines is this:
- verify there are no pipes ( | ) in the data yet (if there are, choose something else not in use)
- replace all ^p with ^p|
- replace all | that have specific formatting with a particular backslash marker
- look through and perhaps globally delete all remaining pipes

I've mainly done the above with styles/formatting that apply to the whole MS Word "paragraph", but something similar can often work at the character level as well.

Jon

-------- Original Message --------
Subject: Re: [FLEx] simple SFM text import
Date: Fri, 21 Jun 2013 15:05:26 -0500
From: Beth (work) Bryson <Beth-wor...@sil.org>
Reply-To: flex...@googlegroups.com
To: flex...@googlegroups.com


Yes, it is possible to import texts from an SFM file.  If you are in the Texts & Words area, try File/Import.

The format is fairly simple:

\ref 001
\tx This would be a vernacular sentence.
\et This is the English free translation of the whole sentence.
\ft This is the French free translation of the whole sentence.

\ref 002
\tx This is a second sentence in the same text.
\et This is the English free translation of the second sentence.
\nt This is a note.

You can use any markers you want--the import wizard asks you how to map the markers to the different parts of an interlinear text in FLEx, and which writing system to use for each.

You can only import the whole sentence and the free translation of the whole sentence; it is not possible at this time to import word or morpheme glosses.  But you can have notes in any of the analysis languages, and you can have both free and literal translations.

The reference marker is required so FLEx will know where a new sentence (really, a "segment") begins.  But the numbers are not brought in--FLEx autonumbers.  Each new \ref (or whatever marker you use) will begin a new segment in FLEx.  If you have sentence-ending punctuation within one \ref, that will split that ref up into more than one segment.  (So you can get more segments than \ref's, but not fewer.)

FLEx numbers sentences sequentially, and also has hierarchy for paragraphs.  So:  2.1 refers to the first sentence of the second paragraph.

-Beth
Reply all
Reply to author
Forward
0 new messages