RE: [FLEx] Mapping two \lx's - SFM import upper planes

4 views
Skip to first unread message

Ken Zook

unread,
Oct 27, 2025, 11:57:11 AM (5 days ago) Oct 27
to flex...@googlegroups.com

Hi Eline,

 

The issue here is that the characters you are trying to import are in an upper plane of Unicode (e.g., U+1e2d9). Unfortunately, SFM import currently does not support importing Unicode characters from upper planes. There is a way it can be done that I can help you with off-list. It basically involves converting the characters to unique strings, and then converting these back to real characters after importing into FLEx.

 

Ken

 

From: flex...@googlegroups.com <flex...@googlegroups.com> On Behalf Of eel...@gmail.com
Sent: Tuesday, October 21, 2025 10:58 PM
To: FLEx list <flex...@googlegroups.com>
Subject: Re: [FLEx] Mapping two \lx's

 

Hi David,

 

Thanks for trying so elaborately! It's extra strange that the Wancho characters can't be imported because I can manually type Wancho without problems in FLEx. Exports also look good. I really hope someone understands the issue.

 

(I had lx and lx_Wan fields because that's what the sfm-export from flex had too.)

 

Eline

 

 

 

Op woensdag 22 oktober 2025 om 00:36:17 UTC+5:30 schreef David Rowe:

Eline,

Thanks for the excellent test data.

The Wancho text in the file is already Unicode, so there should be no need to use any encoding converter when importing that field.

Based on the five records you list, it seems that the \lx and \lx_Wan fields have the same data, so you'll likely only want to import one of those fields.
The \lx_ipa has the IPA equivalent, the \g_Eng has the English gloss, the \ps_Eng has the part of speech. 

I made an SFM test file from your spreadsheet, dropping the \lx_Wan field. (I assume you created something similar with sheetswiper.) Attached is the Wancho.txt file I used.
The lexeme (𞋙𞋖) in the first entry has the characters U+1E2D9 U+1E2D6 correctly encoded in UTF-8 as F0 9E 8B 99 F0 9E 8B 96.
It's been some time since I imported data into Flex, but there are others on this list who can correct any mistakes I've made. I mapped 

  • \lx to Lexeme Form
  • \lx_ipa to Pronunciation Form
  • \g_Eng to Gloss
  • \ps_Eng to Part of Speech

When I got to the Readiness step in the import, I got the following report (truncated to the first line) 
Error in SFM file at line 1: SFM 'lx' contains character value 0xD838, which is invalid and has been removed.
Error in SFM file at line 1: SFM 'lx' contains character value 0xDED9, which is invalid and has been removed.
Error in SFM file at line 1: SFM 'lx' contains character value 0xD838, which is invalid and has been removed.
Error in SFM file at line 1: SFM 'lx' contains character value 0xDED6, which is invalid and has been removed.

The characters U+1E2D9 U+1E2D6 in the \lx field on line 1 would be encoded as D838 DED9 D838 DED6 in UTF-16. 

It's not clear to me why Flex is discarding the data. I thought that perhaps the vernacular writing system needed to specify the characters, but trying to give the first and last Unicode character in the Wancho block didn't work:


Trying to add the Unicode values of the Wancho characters in the first lexeme didn't work either:


How can we get Flex to accept Wancho characters?

Thanks,
David

 

On 2025-10-17 23:36, eel...@gmail.com wrote:

Hi David,

 

Attached the file I tried to import.

 

Eline

 

 

 

Op maandag 13 oktober 2025 om 17:53:57 UTC+2 schreef David Rowe:

Eline,

Is it possible to create a file with a few of your SFM records and post it here? I'd like to look at how your Wancho text is encoded.

Thanks,
David

 

On 2025-10-13 07:03, 'Beth-docs Bryson' via FLEx list wrote:

I know that I have seen some Jira issues that might be related to this.  Please write to FLEx_...@sil.org; I expect there will be some answers from them.

 

-Beth

 

 

On Mon, Oct 13, 2025 at 8:18 AM kevin_warfel via FLEx list <flex...@googlegroups.com> wrote:

Apologies, Eline. I see that you wrote that you did try to import with no converter but got a lot of error messages. I missed that part of your message when I replied earlier.

 

I have no more advice for you, but I’m sure there are others who have relevant knowledge that I’m lacking and will be happy to share it for your benefit. (And I’ll learn something as well.)

 

Kevin

 

From: Kevin Warfel <kevin_...@sil.org>
Sent: Monday, October 13, 2025 7:51 AM
To: flex...@googlegroups.com
Subject: Re: [FLEx] Mapping two \lx's

 

If your Wancho characters are already in a Unicode font, you shouldn't need a converter. I would have expected the "<Already in Unicode>" option to work (rather than "Windows1252<>Unicode"). Did you try that and it didn't work? Or what was your rationale for using a converter?

 

Kevin

 

On Mon, Oct 13, 2025 at 7:37 AM eel...@gmail.com <eel...@gmail.com> wrote:

Hi all,

 

Kevin, you understood my problem correctly. With Anita's tip I got quite far. All fields are imported now. The only remaining problem is that the Wancho script doesn't display correctly (only boxes). I specified the font I use in FLEx (Noto Sans Wancho), and the importer itself indicates there should be a Windows 1252<>Unicode conversion. I tried without the conversion but got a lot of error messages then

 

Screenshot 2025-10-13 133414.png

Screenshot 2025-10-13 133609.png

Do you see any obvious errors?

 

Eline

Op maandag 13 oktober 2025 om 13:22:10 UTC+2 schreef Kevin Warfel:

Eline,

 

I'm understanding your situation a bit differently than Andreas expressed. Hopefully one of our responses will give you the information you need.

 

I'm understanding you to have two separate columns in a spreadsheet, one with the lexeme form in the Wancho orthography and the other with the lexeme form in IPA form, but you are wanting to import both into the Lexeme Form field in FLEx (just in different Writing Systems). If this is your reality, I would use \lx for the Wancho form and \lx-IPA for the IPA form. Then for the import, map \lx to the Wancho Writing System (WS) of the Lexeme Form field and \lx-IPA to the IPA WS of the Lexeme Form field. If you need help with the mapping to a specific WS in the Lexeme Form field, ask for more details. I would need to dig a bit to provide those details just now, so I'll send this for now.

 

Best wishes,

Kevin

 

On Mon, Oct 13, 2025 at 5:15 AM eel...@gmail.com <eel...@gmail.com> wrote:

Hi,

 

I've created a FLEx project where I use two writing systems for the Lexeme form (Wancho and IPA). How do I map this correctly when importing an SFM? I've labeled both the Wancho and the IPA column with \lx but that obviously doesn't work.

 

Eline

--

"FLEx list" messages are public. Only members can post.
flex_d...@sil.org
http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/flex-list/7be954c0-7175-4e0c-b3c5-6380964f220an%40googlegroups.com.

--
"FLEx list" messages are public. Only members can post.
flex_d...@sil.org
http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/flex-list/879f1b80-a7eb-454f-92ef-70ff9b9316d0n%40googlegroups.com.

--
"FLEx list" messages are public. Only members can post.
flex_d...@sil.org
http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/flex-list/003c01dc3c43%24d165d720%2474318560%24%40sil.org.

--
"FLEx list" messages are public. Only members can post.
flex_d...@sil.org
http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.

--
"FLEx list" messages are public. Only members can post.
flex_d...@sil.org
http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.

--
"FLEx list" messages are public. Only members can post.
flex_d...@sil.org
http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/flex-list/761f444c-0501-4811-8321-2196e48ad5e1n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages