FLEx does a small but annoying thing by failing to recognise an
apostrophe as a letter in Vernacular (Fulfulde) script. By way of example:
the word na'i (/cows/) gets split in a text chart into na and i as if it
were two separate words.
I have not been able to see how to rectify this yet, but I feel sure it
must have to do with the apostrophe needing to be in a list of
admissible letters for the vernacular. Can anyone help?
Many thanks,
Catherine Crawford
If you're using a Cameroon Unicode keyboard, you can use ;g to get a
character that looks like an apostrophe but is treated like a letter, e.g.
na'i.
Richard Gravina
--------------------------------------------------
From: "Catherine Crawford" <catherine...@sil.org>
Sent: Thursday, July 08, 2010 5:25 PM
To: <flex...@googlegroups.com>
Subject: [FLEx] apostrophe as a letter
> --
> You received this message because you are subscribed to the discussion
> group "FLEx list". This group is hosted by Google Groups and is open for
> anyone to browse.
> To post to this group, send email to flex...@googlegroups.com
> To unsubscribe from this group, send email to
> flex-list-...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/flex-list
I don't have a lot of experience in this area, but I know I had to get FLEx to recognize the hyphen (-) as a word-forming character in the language I am working on, and this is how I think I was told to get that to happen. (FLEx does handle my hyphens correctly for me now, so I know it can be done.)
First things first: copy your 'apostrophe' onto your "clipboard" in case you need it later in the process I outline below. (To do this, go to your Lexicon, find a word that has an apostrophe in it, highlight just the apostrophe by clicking and dragging, then use Ctrl-c to copy it onto your "clipboard".) Once you've done that, ...
Go to File > Project Management > FieldWorks Project Properties, then click on the "Writing Systems" tab. You should see a dialog box that includes two panes - one for Vernacular Writing Systems and the other for Analysis Writing Systems. Highlight the language in the Vernacular WS pane that corresponds to Fulfulde (probably called 'Fulfulde', but it might possibly have been set up with a different name) if it's not already highlighted in dark blue. Then click on the "Modify" button to the right. Next, click on the "Characters" tab, then on the "Valid Characters" button. That gets you to the screen (technically, a dialog box) referred to on the Help file page I've written about below (at the end of this message).
What I think is the key thing for you is to get the apostrophe that you use in Fulfulde words to appear in the topmost pane (Word-forming characters). (Be aware that there are a number of different Unicode characters that are similar and all look like apostrophes.) I suspect that it is either not in any of the panes you are looking at as you follow these steps, or it is in the middle pane (Punctuation, Symbols, and Spaces).
If you see the apostrophe in the middle pane, you should be able to right-click on it there and receive an option to make it a word-forming character, at which point it should move from the middle pane to the topmost pane, and your problem is solved (I think).
If you don't see the apostrophe in either of the top two panes, you will need to add it manually to the list of word-forming characters. To do this, click on the "Manual Entry" tab (to the left side of the dialog box). Click on the "Single Character" radio button if it hasn't already got a black dot in it. Click in the white space just to the right of the text that reads "Enter a single base character plus any ..." and use Ctrl-v to paste the apostrophe there that you copied from the word in your lexicon. Then click the "Add" button. That will put that character in one of the panes on the right. If it puts it in the topmost pane, you're done. If it puts it in one of the other panes, try what I suggested in the previous paragraph and see if you can get it to transfer to the top one.
For more information, look up "word-forming" in the Help files that come with FLEx, then choose the page that is entitled "Treat punctuation as word-forming characters". It seems to me that what is discussed there is just what you're describing.
Please let me know if this was helpful or not.
Blessings,
Kevin Warfel
-----Original Message-----
From: flex...@googlegroups.com [mailto:flex...@googlegroups.com] On Behalf Of Catherine Crawford
Sent: Thursday, July 08, 2010 12:25 PM
To: flex...@googlegroups.com
Subject: [FLEx] apostrophe as a letter
Dear All,
Many thanks,
Catherine Crawford
--
- I don't know how long ago you created the project. In projects
that were created more recently, the apostrophe is in the word-
forming characters section by default. But if it was created more
than a year or so ago, it wouldn't be. (Saying this for the benefit
of others who may wonder.)
- Do think about whether you really do want to use apostrophe in
your orthography, or some other character that Unicode does recognize
as word-forming. Particularly think about whether you ever need to
use apostrophe for punctuation--that would be a key indicator that
you want something else for the alphabetic letter. As Kevin said,
there are a number of others, including "modifer letter apostrophe"
and "saltillo" (which has both a lower and upper case version). The
key is to try to encourage a standard across everyone using the
orthography if possible (rather difficult with a language as widely
spoken as Fulfulde!!). It would be nice if every time people saw
that symbol, there were the same Unicode codepoint underneath.
However, that may indeed be a hopeless cause, unless a Fulfulde
language committee chose to make it a priority.
-Beth
> For more options, visit this group at http://groups.google.com/
> group/flex-list
>
>
>
> --
> You received this message because you are subscribed to the
> discussion group "FLEx list". This group is hosted by Google Groups
> and is open for anyone to browse.
> To post to this group, send email to flex...@googlegroups.com
> To unsubscribe from this group, send email to flex-list-
Andreas Joswig
Even languages like in English and French, the apostrophe is not just a
punctuation mark. When it is used to represent missing letters in
optional contracted forms in English (don't, they're, we'll), or
phonologically mandatory "contractions" in French (l'amour, , c'est)
FLEx treating this as punctuation rather than a word-forming character
is not a big problem, as these are all composed of two morphemes, then
can be divided up an analyzed though these are of course recognized as
single words in spell checkers, dictionaries, etc. But English and
French also both have archaic contractions which require the use of the
apostrophe ("o'clock," "ain't," "aujourd'hui") as a word-forming element
and which cannot be easily broken down into synchronic morphemes. And
of course many loanwords and names routinely are spelled with an
apostrophe in English and French (e.g coup d'état, al Q'aida, N'djamena,
Xi'an). When I type apostrophes in these words and Word or Thunderbird
correctly treat the word as a single word, rather than as a punctuation
error (missing a space after or before the apostrophe, etc.) is that
because these software are swapping out my keyboard's punctuation
apostrophe for one of the pseudo-apostrophes that Andreas mentioned or
are these software sophisticated enough to understand that while the
apostrophe does not represent a phoneme in these languages for most
type-setting purposes it is, in fact, a letter as well as a punctuation
symbol?
Having uses an apostrophe as a word-forming element in FLEx and other
software for our research language for a number of years now, it seems
that if one can live without having capital and small forms for this
particular phoneme, using an apostrophe as a word-forming character to
represent a phoneme (e.g. glottal stop) is not a big problem. We do
avoid using it for internal quotes and use the curly apostrophes
instead. For some applications that cannot automatically convert the
straight apostrophe to the curly apostrophes, we use greater and lesser
than symbols < > and then find and replace. If we add the straight
quotations to the punctuation rules for the writing system, (e.g. I told
her, "He said 'Hi.'") then FLEx will consider the beginning and ending
quote apostrophes as part of the word itself and parse it with the word
as we've told FLEx to treat the apostrophe as word-forming and there
appears no way to tell it to treat the apostrophe as word-forming only
when found word-medially. But as we are not at liberty to reinvent the
official orthography, we must continue to use apostrophes as
word-forming and substitute other symbols for the punctuation, or else
bring in an "apostrophe-like" character and be forced to use a keyboard
converter like Keyman for a writing system that uses only the 26 Latin
letters plus the apostrophe just to get that one look alike character.
So we've chosen to keep the default apostrophe and use substitutes for
the punctuation instead.
However, I wonder whether in the long run the valid characters section
of the writings systems setup will need something more sophisticated at
present, in which it appears that a given symbol cannot be considered
both a word-forming character in certain environments (e.g. word-medial)
and punctuation in other environments (e.g. word-initial and
word-final). Another character besides the apostrophe that should be
treated as both word-forming and word-final is the hyphen. Quite a few
words in my English and French dictionaries have an hyphen as a word
forming character (cross-country, n'est-ce, Port-au-prince). Though I
can add the hyphen as a word-forming character, then I lose it as a
punctuation mark. Also, I'm not sure whether it's possible to tell FLEx
that though a given symbol should be considered "word-forming" because
it does not represent a phoneme it should not be considered in any way
in alphabetization. For example, my Oxford English Dictionary treats
word-forming apostrophes and hyphens as not affecting alphabetization
(so "o'clock falls between "ocker" and "octad," and "crossfire" falls
between "cross-fertilization" and "cross-grain"). But though I've not
added the apostrophe to my custom sort order, it is treated as a
sortable symbol and therefore I get:
ndi
ndi'ndang
ndiag
ndin
ndip
Rather than:
ndi
ndiag
ndin
ndi'ndang
ndip
This is not actually a problem for me as I find it easier for my own use
when words are sorted by like syllables. However, for end-users of a
dictionary someday, it could be a big confusing if they've been taught
to look things up in traditional alphabetical order, and/or are not sure
whether an apostrophe was needed in the sought after word. (The
apostrophe in "ndi'ndang" /ʔdi¹³ʔdaːŋ¹³/ is to disambiguate for the
phonologically possible "ndin'dang" /ʔdin¹³taːŋ¹³/)
So it seems like we more ways to characterize symbols beyond just
“word-forming" and "punctuation," to allow word-forming characters to
also be used as punctuation and excluded from sorting.
Eric
Kevin, your instructions were great and I see the route very clearly.
However, when I look in the Word-forming characters pane there are just
the following: a couple of symbols (circles with exclamation marks in
them), some combinations of capital letters, a hyphen, numbers from 0 to
9 and something that looks very like a straight apostrophe. The latter
suggests that FW should already recognise an apostrophe as a letter.
Is the choice of keyboard significant? I use Clavier du Mali (Keyman
Mali keyboard), linked to Maltese as the system language keyboard.
Catherine
I have not so far observed what happens to our alphabetical order, but
will do now.
Catherine
That actually sounds a lot like what mine looked like, but I thought mine was perhaps unusual, so didn't mention it. Try this approach, which worked well for me yesterday when I was doing my research in order to respond to you :
Follow my instructions as before: Go to File > Project Management > FieldWorks Project Properties, then click on the "Writing Systems" tab. You should see a dialog box that includes two panes - one for Vernacular Writing Systems and the other for Analysis Writing Systems. Highlight the language in the Vernacular WS pane that corresponds to Fulfulde (probably called 'Fulfulde', but it might possibly have been set up with a different name) if it's not already highlighted in dark blue. Then click on the "Modify" button to the right. Next, click on the "Characters" tab, then on the "Valid Characters" button.
Now, instead of clicking on the "Manual Entry" (at left), click on "From Data". Then click on the "Scan" button. This will scan all of your Fulfulde text material and do an inventory of the characters it finds - letters, numbers, punctuation, and all the different combinations of letters and diacritics (but I'm not sure you have much of that sort of thing in Fulfulde - I worked in Burkina Faso, so am superficially acquainted with the language). Go down through the list and uncheck any that are not valid characters. (I found a few interesting typos of vowels with double tone marks when I did this yesterday, so excluded those and went and cleaned them up in my texts at the same time!) Then click on the "Add" button. That will put in your ɓ and ƴ without you even having to use your Mali keyboard (which doesn't work in this dialog box anyway). You can then try to move the apostrophe to the word-forming group, if necessary, but you may run into the problem that has been alluded to, namely that the apostrophe that is obtained directly from the keyboard is defined as non-word-forming in Unicode, and you may therefore be unable to move it to the word-forming group. It is also possible that Dan Brubaker, who designed the Keyman keyboard you are using, if I'm not mistaken, created it to insert one of the apostrophe look-alikes that have been alluded to, rather than the simple apostrophe that is defined as a "non-letter"; if that is the case, Dan is to be commended for his foresight.
In the event your apostrophes are the "ordinary" ones, here is my suggestion. Since there are other Unicode characters/codepoints that look virtually identical to the apostrophe generated by typing the apostrophe directly from your physical keyboard, talk to Dan about the possibility of altering the Keyman keyboard so that it can produce one of the word-forming apostrophes, then find a way to systematically replace the ordinary apostrophes in your database with the word-forming ones. Your data will *look* no different (so it will conform to the Fulfulde language standards), but it will behave differently (esp regarding your original problem). If you are working with sacred texts, you can use TE's Replace All function to make those changes in the texts. Someone else will have to tell you how to make such a wholesale change in the lexical database or in other texts. As always, however, it is wise to do a back up *before* using the Replace All function.
Now that you know more about the nuts and bolts of your FW project, you can speak or write more knowledgeably about it. I would suggest that you contact Dan and work out with him the exact steps you should take to get from where you are currently to where you want to be. Dan is familiar with all of the issues at work in Mali and can better advise you than any of us outsiders, though I am happy to have been able to point you in the right direction.
Blessings,
Kevin
Catherine
--
You received this message because you are subscribed to the discussion group "FLEx list". This group is hosted by Google Groups and is open for anyone to browse.
To post to this group, send email to flex...@googlegroups.com
To unsubscribe from this group, send email to flex-list-...@googlegroups.com
Jeff S.
Milange, Mozambique
Paul
--
Jeff S.
Milange, Mozambique
-----Original Message-----
From: flex...@googlegroups.com [mailto:flex...@googlegroups.com] On
Behalf Of Beth
Sent: Friday, July 09, 2010 7:38 AM
To: flex...@googlegroups.com
-Beth
flex-list-...@googlegroups.com
> Kevin, your instructions were great and I see the route very
> clearly. However, when I look in the Word-forming characters pane
> there are just the following: a couple of symbols (circles with
> exclamation marks in them), some combinations of capital letters, a
> hyphen, numbers from 0 to 9 and something that looks very like a
> straight apostrophe. The latter suggests that FW should already
> recognise an apostrophe as a letter.
>
> Is the choice of keyboard significant? I use Clavier du Mali
> (Keyman Mali keyboard), linked to Maltese as the system language
> keyboard.
When you are in that dialog, hover over the character you are
wondering about. If you wait long enough, a "tooltip" will appear,
showing you the Unicode value and name of the character you are
hovering over. If it is the normal ASCII apostrophe, it will say U
+0027 Apostrophe. If it is the saltillo, you will find that out.
If the apostrophe is not there, try adding it according to Kevin's
earlier recommendations. Note that there is also a place at the
bottom of the dialog to add things by Unicode value, rather than
typing in the character. There you could type in 0029, to be sure
you're getting the right one in.
In his more recent message he suggests using a different Unicode
character that looks like apostrophe but isn't. However, I would
only recommend doing that if that is the convention for all of
Fulfulde. However, it sounds like most people who type Fulfulde use
the simple apostrophe, and it also sounds like it doesn't figure as a
punctuation character. That sounds just fine. In that case, I
highly recommend using the same character everyone else uses.
Sorting and searching gets very confusing when different people type
different things for the same character. It's in a language group's
interest to define a standard for which Unicode characters to use for
that language, and to encourage the use of that standard.
It may be that you need to speak with Dan Brubaker about helping you
get the left and right guillemet symbols (the double angle quotes)
out of your keyboard. It is much better to use the real Unicode
values for those (U+00AB, U+00BB and U+2039, U+203A) rather than
just two greater-than or less-than signs. It should be quite
possible to make the keyboard produce those.
-Beth