7 as a word-forming character

32 views
Skip to first unread message

Aaron Broadwell

unread,
Aug 9, 2016, 7:52:36 PM8/9/16
to FLEx list, Michael Galant
Colleagues,

A student in our FLEx workshop is working in a language where the practical orthography uses 7 to represent glottal stop.  (E.g.  7at = [ʔat] )

We went to the Configure Writing Systems menu, but we do not see a way to add a numeral as a word-forming character.  We can enter it manually as a character, but it goes into the Numerals portion, and it does not seem possible to select it and move it into the set of word-forming characters.

We are not sure if this will cause future problems for his project.  Can anyone advise us about this issue?

Thanks,
Aaron Broadwell

Beth-docs Bryson

unread,
Aug 9, 2016, 8:18:35 PM8/9/16
to flex...@googlegroups.com
I believe what is happening is that if something is already defined in FLEx to be a word-forming character, it is not showing up in the list of word-forming characters after leaving that dialog and coming back.

For the digits, are you experiencing that FLEx is breaking words at the digit and treating it as a non-wordforming character?

In 2011 we made it so that digits are by default word-forming characters, so my expectation is that FLEx is treating it that way, even if you cannot see it in the list in the dialog.

-Beth


--
You are subscribed to the publicly accessible group "FLEx list".
Only members can post but anyone can view messages on the website.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.
To post to this group, send email to flex...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/flex-list/6cee2556-71e8-494d-8a9b-e47d944340f3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Aaron Broadwell

unread,
Aug 14, 2016, 1:40:03 PM8/14/16
to FLEx list, mikeg...@netzero.net
Thank you, Beth!

We did not spot any problems with the use of 7 as a word-forming character, but since we are trying to make sure that the initial parts of the FLEx system are set up correctly, we wanted to be sure that there would not be some future problem we had not thought of.

Glad to know that this is not the case.

Aaron Broadwell

David Rowe

unread,
Aug 15, 2016, 9:59:23 AM8/15/16
to flex...@googlegroups.com
Please be very careful that a choice made for ease of use in the present doesn't become a long-term decision. Although FLEx may allow you to use a digit as a word-forming character, other apps may not.

Victor Gaultney (NRSI font designer) comments:

Very bad idea from a long-term perspective. In most fonts the numerals are designed to different parameters than the normal alphabetic ones. Hence they might be too light, too widely or thinly spaced. Many fonts (like Georgia) even have a lowered old-style version that looks very different. Some apps, particularly on mobiles, use different fonts for numbers than text, even in the same layout.

Numbers are a fundamentally different thing from a current and future computing perspective, and using them as alphabetic characters is setting the language group up for a lifetime (or more) of frustration. It will forever brand them as a second-class, odd language with a bad writing system.

David Rowe
--
You are subscribed to the publicly accessible group "FLEx list".
Only members can post but anyone can view messages on the website.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.
To post to this group, send email to flex...@googlegroups.com.

Pauline Linton

unread,
Oct 21, 2016, 1:23:28 AM10/21/16
to flex...@googlegroups.com
So from a font design/computing perspective do percentage signs, asterisks and tildes fall into the category of alphabetic letters or numbers or something different. (The language I work with uses all three as part of the orthography (though the first two are grammatical markers rather than alphabetic characters per se and the tilde is over 'n')).

Pauline

Paul Nelson

unread,
Oct 21, 2016, 6:08:18 AM10/21/16
to flex...@googlegroups.com
This is a Unicode character property issue, not a font issue.

If 7at = ʔat, why not use the correct Unicode character for ʔ instead of 7. A keyboard can easily be created to allow for easy input of the right character. That will be a huge long term benefit for the people using the language.

Best regards,
Paul


From: Pauline Linton
Sent: ‎10/‎21/‎2016 12:23 AM
To: flex...@googlegroups.com
Subject: Re: [FLEx] Re: 7 as a word-forming character

Andrew Cunningham

unread,
Oct 21, 2016, 7:09:00 PM10/21/16
to flex...@googlegroups.com, Michael Galant
Sounds messy.

How is an application supposed to know when 7 is being used as a letter vs number?

I would be incline to input ʔat rather that 7at.

There are three issues
* what character is being used. Easier in this case if it is ʔ rather than 7
* what the character looks like (glyph) this part is a font issue. It is not unusual for some letters to have variant shapes and glyphs depending on language. I would be inclinded to use the unicode character. If the prefered represention looks like 7. Then it is possible to update opensource fonts to include variant glyph and use it.
* how to input or type the character.

If you separate the issue into

1) What is this letter? And
2) What does this letter look like?

It may be easier to address the technical issue. Ie separate question of which code point, and character properties from issues about visual appearance of character

Ie character vs glyph

Andrew
> --
> You are subscribed to the publicly accessible group "FLEx list".
> Only members can post but anyone can view messages on the website.
> ---
> You received this message because you are subscribed to the Google Groups "FLEx list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.
> To post to this group, send email to flex...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/flex-list/6cee2556-71e8-494d-8a9b-e47d944340f3%40googlegroups.com.

> For more options, visit https://groups.google.com/d/optout.
>

--
Andrew Cunningham
lang.s...@gmail.com



Andrew Cunningham

unread,
Oct 21, 2016, 7:14:13 PM10/21/16
to flex...@googlegroups.com
Hi Pauline,

I would handle Percentage signs, asterisks and tildes as punctuation.

It sounds lime your language uses n-tilde ñ or n + combining tilde, which are canonically equivalent and are letters
> To view this discussion on the web visit https://groups.google.com/d/msgid/flex-list/57F953FD.7050402%40kastanet.org.

> For more options, visit https://groups.google.com/d/optout.
>

--
Andrew Cunningham
lang.s...@gmail.com



Aaron Broadwell

unread,
Oct 22, 2016, 5:39:47 PM10/22/16
to FLEx list
Hi all,

I'm not the designer of the orthography in question. It was a preexisting orthography used by a student in my workshop.

So my question is not really whether this is a good idea, but how to get FLEx to work with such an orthography.

Aaron Broadwell

Paul Nelson

unread,
Oct 22, 2016, 10:10:38 PM10/22/16
to flex...@googlegroups.com
Hello Aaron,

FLEx is looking at the Unicode character properties (found in ICU). Ken Zook might know if a hack can be done to get around this. Unless this is a long term issue that cannot be changed to use the correct Unicode character, we may not be well served to use valuable time making a hack for this.

I strongly suggest your student does a search/replace to use the correct Unicode character and FLEx will be happy. I know this may not be a popular field stance, and may be the best long term solution for that language.

Best regards,

Paul


Aaron Broadwell

--
You are subscribed to the publicly accessible group "FLEx list".
Only members can post but anyone can view messages on the website.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+unsubscribe@googlegroups.com.

To post to this group, send email to flex...@googlegroups.com.

Hugh Paterson

unread,
Oct 23, 2016, 3:18:55 AM10/23/16
to flex...@googlegroups.com
Two use cases that I can think of for legitimate usage of numerals in a practical orthography, are:

 1) where cellphones with limited characters in the mobile's OS exist. In such cases it seems quite natural to substitute numerals for letters, and because the mobile screen is the primary location where the language is seen written down then the numeral-for-the-character substitution has the appearance of being "correct" to most of the users in the written language community.

2) Leet speak and other varieties: leet speak is a internet-user to internet-user phenomenon. It is big in several chat platforms, and several gaming platforms. Like natural language it has variations and dialects. Leet is sometimes called "1337" "3" often stands for "E" and "1" for "L". Iconicity is generally highly ranked. I've never played a Haxor on TV but I pretend to be one at home. I guess u could call me a n00b. https://en.wikipedia.org/wiki/Leet

If some1 is doing a dictionary of 1337 sp3k based on 1337 texts, then I'd like to kno!

Otherwise I agree that numerals in words for minority languages in the beginning stages of language development is a dis-prefered idea. 

- Hugh Paterson III




Michael Maxwell

unread,
Oct 23, 2016, 4:12:04 AM10/23/16
to flex...@googlegroups.com
Otoh, it may be easier to change one computer program than hundreds of users of an established orthography, if that's what it is. The alternative is to change seven to glottal stop every time you import texts (but do it in a context sensitive way, so you don't change 2017 to 201').

From: Paul Nelson
Sent: ‎10/‎22/‎2016 4:10 PM
To: flex...@googlegroups.com
Subject: Re: [FLEx] 7 as a word-forming character

To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.

To post to this group, send email to flex...@googlegroups.com.

Andreas_Joswig

unread,
Oct 23, 2016, 9:38:35 AM10/23/16
to flex...@googlegroups.com
This is an old problem in Ethiopia, where in tyepwriter days some official Latin-based orthographies were designed which take 7 as the character for the glottal stop (which none of the languages in question have as a phoneme, incidentally, but they think they have). Nowadays we advise language communities not choose numbers for letters, and they don't, but these old orthographies are stuck with them. You know how difficult it is to change an orthography...
So this is a real-world problem. If I were to run a Fieldworks workshop in Ethiopia with one of these languages, I would probably tell them to replace all 7s with some other word-forming symbol just for the sake of FLEx. There is no better way, and there are very good reasons why FLEx cannot handle this any better. These people have to live with the consequences of poor choices. When I consult on orthography, I use this as an example that technical considerations are still important when designing an orthography. You cannot just take any odd character to represent any phoneme, as all characters have certain properties in the Unicode, and some of these properties may bite you back later on.
Andreas
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.

To post to this group, send email to flex...@googlegroups.com.

Beth-docs Bryson

unread,
Oct 23, 2016, 11:22:39 AM10/23/16
to flex...@googlegroups.com
This is a good discussion, but some confusion has come up because Pauline picked up on an old thread that happened back in August.  I have pasted that August conversation below, after Andreas’ most recent reply, just to refresh memories.

Andreas’ points are good: that orthographies tend to stick (even when they have ill-advised choices), and using digits for letters can result in problems in some software.

It just so happens that FieldWorks is not among the software for which using a digit as a letter is a problem, but that doesn’t make the practice advisable.

In FieldWorks, in the dialog for setting up writing systems, on the Characters tab, there is the ability to define which characters are word-forming, which are numbers, and which are punctuation.  For some characters, the user can change the category (even against the Unicode properties for that character).  (I am not sure at the moment whether that is true for the percent sign and asterisk.  A combining tilde is defined as word-forming even in Unicode, in contrast to a standalone tilde.)

It happens that we have shipped FieldWorks with the default set that digits are word-forming.  This has been true since 2011.  When Aaron was asking, it wasn’t because he had already run into problems, but because he was trying to anticipate them. 

-Beth

On Oct 23, 2016, at 8:38 AM, Andreas_Joswig <andreas...@sil.org> wrote:

This is an old problem in Ethiopia, where in tyepwriter days some official Latin-based orthographies were designed which take 7 as the character for the glottal stop (which none of the languages in question have as a phoneme, incidentally, but they think they have). Nowadays we advise language communities not choose numbers for letters, and they don't, but these old orthographies are stuck with them. You know how difficult it is to change an orthography...
So this is a real-world problem. If I were to run a Fieldworks workshop in Ethiopia with one of these languages, I would probably tell them to replace all 7s with some other word-forming symbol just for the sake of FLEx. There is no better way, and there are very good reasons why FLEx cannot handle this any better. These people have to live with the consequences of poor choices. When I consult on orthography, I use this as an example that technical considerations are still important when designing an orthography. You cannot just take any odd character to represent any phoneme, as all characters have certain properties in the Unicode, and some of these properties may bite you back later on.
Andreas


On Aug 15, 2016, at 8:59 AM, David Rowe <david...@gmail.com> wrote:

Please be very careful that a choice made for ease of use in the present doesn't become a long-term decision. Although FLEx may allow you to use a digit as a word-forming character, other apps may not. 

Victor Gaultney (NRSI font designer) comments:

Very bad idea from a long-term perspective. In most fonts the numerals are designed to different parameters than the normal alphabetic ones. Hence they might be too light, too widely or thinly spaced. Many fonts (like Georgia) even have a lowered old-style version that looks very different. Some apps, particularly on mobiles, use different fonts for numbers than text, even in the same layout.

Numbers are a fundamentally different thing from a current and future computing perspective, and using them as alphabetic characters is setting the language group up for a lifetime (or more) of frustration. It will forever brand them as a second-class, odd language with a bad writing system.

David Rowe

On 8/14/2016 1:40 PM, Aaron Broadwell wrote:
Thank you, Beth!

We did not spot any problems with the use of 7 as a word-forming character, but since we are trying to make sure that the initial parts of the FLEx system are set up correctly, we wanted to be sure that there would not be some future problem we had not thought of.

Glad to know that this is not the case.

Aaron Broadwell


On Aug 9, 2016, at 7:18 PM, Beth-docs Bryson <Beth-doc...@sil.org> wrote:

I believe what is happening is that if something is already defined in FLEx to be a word-forming character, it is not showing up in the list of word-forming characters after leaving that dialog and coming back.

For the digits, are you experiencing that FLEx is breaking words at the digit and treating it as a non-wordforming character?

In 2011 we made it so that digits are by default word-forming characters, so my expectation is that FLEx is treating it that way, even if you cannot see it in the list in the dialog.

-Beth



Andrew Cunningham

unread,
Oct 23, 2016, 7:36:23 PM10/23/16
to flex...@googlegroups.com
Yes, 7 is used as a letter in this specific orthography. But orthogaphies do not specify the encoding of said letter, just how it looks and behaves in the written language.

The fundamental question is do orthography users see a distinct difference the digit 7 and the letter 7? I assmue they do. If they dint they just have to live with the problem. Very programing and scripting language will make assumptions about 7. This is ko avoiding this ... for instance word boundary identification in kost applications is likely to result in |7|at| not |7at|

Does digit 7 and letter 7 collate together or seperately when sorting both letters and numbers?

If the digit amd letter llook the same but are regarded as two distinct things then it is logical to use separate codepoints.

In the latin script it is not uncommon to have variant glyphs for some characters.

Fonts can be adjusted to add the 7 glyph for the glotal stop.

Long term adjuating the fonts and keyboard layouts is less painful than insisting on a poorly implemented encoding of the orthography.

This isn't a question of what should or shouldn't be used in an orthography. This is a question of how that orthography translates to an encoding.

The key issue is whether digit 7 and letter 7 are different things in thos language.


Andrew



On Sunday, 23 October 2016, Michael Maxwell <max...@umiacs.umd.edu> wrote:
> Otoh, it may be easier to change one computer program than hundreds of users of an established orthography, if that's what it is. The alternative is to change seven to glottal stop every time you import texts (but do it in a context sensitive way, so you don't change 2017 to 201').
> ________________________________
> From: Paul Nelson
> Sent: ‎10/‎22/‎2016 4:10 PM
> To: flex...@googlegroups.com
> Subject: Re: [FLEx] 7 as a word-forming character
>
> Hello Aaron,
> FLEx is looking at the Unicode character properties (found in ICU). Ken Zook might know if a hack can be done to get around this. Unless this is a long term issue that cannot be changed to use the correct Unicode character, we may not be well served to use valuable time making a hack for this.
> I strongly suggest your student does a search/replace to use the correct Unicode character and FLEx will be happy. I know this may not be a popular field stance, and may be the best long term solution for that language.
> Best regards,
> Paul
>
> On Sat, Oct 22, 2016 at 4:39 PM, Aaron Broadwell <g.bro...@gmail.com> wrote:
>>
>> Hi all,
>>
>> I'm not the designer of the orthography in question.  It was a preexisting orthography used by a student in my workshop.
>>
>> So my question is not really whether this is a good idea, but how to get FLEx to work with such an orthography.
>>
>> Aaron Broadwell
>>
>> --
>> You are subscribed to the publicly accessible group "FLEx list".
>> Only members can post but anyone can view messages on the website.
>> ---
>> You received this message because you are subscribed to the Google Groups "FLEx list" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.
>> To post to this group, send email to flex...@googlegroups.com.
>> To view this discussion on the web visit https://groups.google.com/d/msgid/flex-list/8c43eb0d-5ffa-4bd6-81ec-84c2a8a4c59a%40googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You are subscribed to the publicly accessible group "FLEx list".
> Only members can post but anyone can view messages on the website.
> ---
> You received this message because you are subscribed to the Google Groups "FLEx list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.
> To post to this group, send email to flex...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/flex-list/CAJ6OYokRQiqivpg0mX_Z1OGYMA0O5CZfhykhZ-zKpVK6dG6Z3w%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.
>
> --
> You are subscribed to the publicly accessible group "FLEx list".
> Only members can post but anyone can view messages on the website.
> ---
> You received this message because you are subscribed to the Google Groups "FLEx list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.
> To post to this group, send email to flex...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/flex-list/20161023081200.C19191400D6%40mrouter7.umiacs.umd.edu.

> For more options, visit https://groups.google.com/d/optout.
>

--
Andrew Cunningham
lang.s...@gmail.com



Andrew Cunningham

unread,
Oct 23, 2016, 7:44:26 PM10/23/16
to flex...@googlegroups.com
Beth,

If you can specify a digit as a latter I  fieldworks, can you define it both as a letter and a number?

It would seem if you need to use a digit as a letter, you would still need to also use it as a digit as well. Unless you ate using something else in pkace of 7 as the corresponding digot

Andrew
> --
> You are subscribed to the publicly accessible group "FLEx list".
> Only members can post but anyone can view messages on the website.
> ---
> You received this message because you are subscribed to the Google Groups "FLEx list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.
> To post to this group, send email to flex...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/flex-list/A404A859-7F10-4B6D-A03C-3A9CA4F4CB39%40sil.org.

> For more options, visit https://groups.google.com/d/optout.
>

--
Andrew Cunningham
lang.s...@gmail.com



Beth-docs Bryson

unread,
Oct 23, 2016, 8:32:26 PM10/23/16
to flex...@googlegroups.com
It would only be one or the other.  That is one of the reasons it isn’t a good idea to use a digit as a letter.  

However, when writing endangered languages, sometimes it is a matter of documenting what language data already exists.  For a language that is living and spoken and will have new material written down, it makes sense to choose an orthography that will serve the community going forward.  When writing a language that has few or no speakers, the goal may be to document what little language data exists, in the form it was written at the time.  It’s not hard to imagine that maybe no actual digits were used in the data.  Perhaps the words for numbers were spelled out.

Whatever the case, there were enough people writing to the FieldWorks team who had data with digits as word-forming characters, that in 2011 we decided to make it possible in FieldWorks, even though it isn’t advisable.  As Andreas noted, sometimes it is not possible to change an orthography, especially one that was used in the past.

Beth


Reply all
Reply to author
Forward
0 new messages