[FLEx] using Pathway (formerly PublishingSolution, or typeset-dictionary, or Dictionary Express)

Jon C

unread,

Apr 20, 2010, 11:37:35 PM4/20/10

to flex...@googlegroups.com

Dear all,

Others on this list have expressed interest in using Pathway to export/print from FLEx, and we've been wanting to do so with our project as well. Last time, we exported to SFM, converted to MDF, used Toolbox's MDF-to-RTF exporter, and published a limited number of pocket dictionaries using MS Word (and PDF995--for some reason PDFCreator couldn't handle our page configuration). This time, I attempted to get to MS Word more directly by installing the Pathway plugin and running it from within FLEx. (I haven't tried the third option, publishing via Lexique Pro, because I was told that its output is too "loose", too draft-like for book publications.)

Overall, I think that Pathway is probably now a better way to go than Toolbox, especially since the data's sort order, separators, etc, can be configured within FLEx itself. Most of the work I had to do after exporting from Pathway was formatting, rather than massaging the data. I did have some issues to overcome, however, so I've reported them at http://code.google.com/p/typeset-dictionary/issues
I've also copied them in full below for those of you who want to read through the gory details.

BTW, our dictionary is trilingual, so Pathway couldn't export it in one pass. I actually did three passes: exported the dictionary, exported the reversals, switched the default analysis language, and exported the reversals again. Then I merged the three documents together with the book's front and back matter to get a single file I could print to a single PDF file.

You may want to make note of the fact that sometimes styles can mysteriously disappear when you're copying data from OpenOffice into MS Word. I was surprised to find that the part of speech style (and a couple of others) had disappeared completely from my data, even though most of the styles came through just fine. This is a serious problem that you want to catch early on in the process. (I didn't, so I had to get creative later.)

Likewise, you can't assume that headers will convert over properly. After I saved as .DOC from OpenOffice and opened the dictionary in Word, the header labels were missing from the top of the page. (They were in the file, but formatted as hidden text and mixed in with the lexical data. I'm not sure why OpenOffice did this.) So if you're going to use Word anyways, it's better not to bother leaving the "Insert first and last word in header" option checked in Pathway, and no need to bother with the macro.

Maybe next time I'll be brave enough to do the whole thing in OpenOffice.

-Jon (working on a root-based trilingual dictionary for an Indonesian language)
Pathway 0.5.6-2010-03-27
Fieldworks 6.0.1
WinXP SP2

SUMMARY:
- Pathway export crashes OpenOffice
(I found a workaround)
- Character and paragraph styles have identical names
(with workaround)
- Exported file performs very slowly
- Homograph numbers not subscripted for reversals
- Duplicate homograph numbers on subentries
- Visual appearance doesn't match FLEx Dictionary view
- Too many section breaks
(with workaround)
- Entries and subentries need to stay together

DETAILS:

Pathway export crashes OpenOffice

1. Export from FLEx to OpenOffice (via Pathway of course)
2. Leave both options checked: "Configured Dictionary" and "Reversal Indexes"
(3. When prompted by OO, Enable Macros)

Instead of the exported ODM file opening properly in OO, it crashes OO
every time.

Character and paragraph styles have identical names

1. Export from Pathway to OpenOffice
2. Save as .DOC
3. Open in Word and try to edit a character style for which there is a
paragraph style with the same name

I would expect to be able to edit the style. Instead, I get an error from
Word: "This style name already exists or is reserved for a built-in style."
Also, it is not possible to search for either style in Word as long as both
exist with identical names.

Deleting or renaming the identically-named paragraph style seems to solve
the problem. Renaming would seem the safer option.

Some styles that I found to have duplicates:
headword_.klw_sense_senses_entry_letData
mainentryref_.klw_complexformcomponents_complexform_primaryrefs_sense_senses_entry_letData
mainentryref_.klw_xitem_complexformcomponents_complexform_primaryrefs_sense_senses_entry_letData
complexform_primaryrefs_sense_senses_entry_letData

Exported file performs very slowly

1. Export to OpenOffice
2a. Enable macros and then try to edit in OpenOffice
OR
2b. Use OpenOffice to save as .DOC and then try to edit in MS Word.

In OpenOffice, the macro that updates the page headings is EXTREMELY slow,
and it runs every time I open the document. Is there perhaps some other way
to generate these headings? (In MS Word, a computed field based on the
lexeme form's style can do this.)

In MS Word, something about the file (perhaps the number and complexity of
styles?) causes repagination to take a long time (about 3 minutes for my
dictionary of about 4,000 entries). In Print Layout mode, various edits can
trigger repagination.

MS Word does not have this issue when working with the document (and
styles) exported by Toolbox's MDF-to-RTF exporter. The same amount of data
repaginates in about 3 seconds.

Homograph numbers not subscripted for reversals

1. Export reversals from FLEx

Homographs ought to be marked with subscripted numbers. Instead, they used
full-sized numbers, which is especially problematic because "1" resembles
the letter "l".

Duplicate homograph numbers on subentries

1. Configure FLEx Dictionary view to root-based.
2. Export data containing subentries that are homographs.

I'd expect those subentries to be marked with just a subscripted number,
like a main entry with homographs is, but these are marked with both a
subscripted number and a duplicate, full-sized number. (The numbers are
separated by a space. Screenshot attached.)

Visual appearance doesn't match FLEx Dictionary view

1. Format the Dictionary view in FLEx as desired.
2. Export to OpenOffice via Pathway.

Instead of using the fonts and sizes defined in FLEx, Pathway exports the
data using its own formatting. This is true for both the default layout
(TwoColumn) and the FieldworksStyles layout.

Fortunately, the data itself (including its order and any separators) is
preserved through the export, and the formatting can be redone via styles.
It's just that it's tedious to do so.

The default formatting is legible, but having the headword in a smaller
point size than the definitions seems a little odd to me. Also, I'm not
sure that Charis SIL is the best default font. It uses up a lot of vertical
space compared to, say, Times New Roman. I guess this wouldn't be a big
deal if most styles could avoid defining the font and just inherit them
from a parent style. Changing the parent style's font would then ripple
down to its children.

Too many section breaks

I see that you're using a similar approach to letter headings (e.g. "--- A
---") as the Toolbox/MDF-to-RTF exporter does, creating section breaks
between letter sections so that the letter headings can span both columns.
The problem with this is that it makes it difficult to format or
cut-and-paste the document as a whole, since each section is somewhat
autonomous.

For example, I saved from OO into a Word doc and then wanted to cut/paste
our dictionary data into our pocket-sized half-A5 template, but I needed to
first carefully delete all of the section breaks, so that page and column
formatting wouldn't be disrupted.

You might want to consider using a simpler solution instead, such as an
in-column heading.

Entries and subentries need to stay together

1. Export a root-based dictionary

Currently some pages end with a root, but all of that root's subentries are
on the next page. I need to set some main-entry paragraphs to "keep with
next" so that they're on the same page as at least one of their subentries.
We cannot apply this option to all main-entry paragraphs (that is, to the
entry_letData style) because this creates lots of white space: not every
main entry has subentries, and if it doesn't it shouldn't be attached to
the following entry.

Maybe "parent" entries (main entries that have subentries) need to be a
special subtype of entry_letData so they can be set to "keep with next".

--
You received this message because you are subscribed to the discussion group "FLEx list". This group is hosted by Google Groups and is open for anyone to browse.
To post to this group, send email to flex...@googlegroups.com
To unsubscribe from this group, send email to flex-list-...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/flex-list

TwoColumn-subentry-homographs.png

Jeff and Peg Shrum

unread,

Apr 21, 2010, 2:27:46 AM4/21/10

to flex...@googlegroups.com

I did not know that the name of Publishing Solutions had been changed. What is the link to get the newest version of Pathways?

Jeff S.

Robert Hedinger

unread,

Apr 21, 2010, 4:55:55 AM4/21/10

to flex...@googlegroups.com

http://code.google.com/p/typeset-dictionary/downloads/list

Robert Hedinger

unread,

Apr 21, 2010, 12:11:16 PM4/21/10

to flex...@googlegroups.com

To avoid the crash, when exporting, don't check both options: "Configured Dictionary" or "Reversal Indexes".
That works for me.

Robert

From: Jon C

Sent: Wednesday, April 21, 2010 4:37 AM

To: flex...@googlegroups.com

Subject: [FLEx] using Pathway (formerly PublishingSolution, or typeset-dictionary, or Dictionary Express)

Dennis Walters

unread,

Apr 22, 2010, 2:22:19 AM4/22/10

to flex...@googlegroups.com

Jon C wrote:

> Maybe next time I'll be brave enough to do the whole thing in OpenOffice.
>

Since you are losing some features by converting to .doc, maybe you
could go ahead and try OOo Writer. It's not a terribly difficult switch.
The OOo help files are excellent and you'll find better control of
styles and formatting I think.

If you have limited bandwidth, getting the download could be a problem.
If can get OOo easily, you may find the learning curve not too steep and
worth the trouble.

I found OOo much easier to work with than MSWord for typesetting a
tri-lingual dictionary. I recommend the Master Document feature. This
will allow you to keep your front and back matter in separate
sub-documents from the dictionary and finder lists. The style names in
sub-documents will need to match the ones in the master document, but
don't worry about properties of styles in sub-documents. You only have
to manage the properties of styles in the master document and it will be
consistent through the whole publication. Also, you can export your
dictionary and finder lists from FLEx as often as necessary. The OOo
master document will bring in the new sub-document in place of the old
as long as the linked file name is the same.

Another tip. In OOo, you can use Format, Sections to remove the section
breaks between letter sections if you want. That will allow Writer to
process the whole publication much more efficiently if you don't need
the separate sections.

Dennis Walters

jlarmagost

unread,

Apr 22, 2010, 12:27:45 PM4/22/10

to flex...@googlegroups.com

I'm not sure I'll ever be printing a dictionary, but I really appreciate the
comments in this thread. I've suspected for some time that the task wouldn't
be problem free. In particular, my project uses some invalid characters
(i.e. non-IPA private use Unicode) and in a couple of very small test
exports from one of the earlier versions of Flex I found that some of them
didn't print correctly, nor did one IPA character.

All of that was some time ago and I just keep hoping that if and when I do
want to print, it will go smoothly.

Jim

jlarm...@verizon.net

-----Original Message-----
From: flex...@googlegroups.com [mailto:flex...@googlegroups.com]On

Behalf Of Dennis Walters
Sent: Wednesday, April 21, 2010 11:22 PM
To: flex...@googlegroups.com
Subject: Re: [FLEx] using Pathway (formerly PublishingSolution, or
typeset-dictionary, or Dictionary Express)

Jon C

unread,

Apr 22, 2010, 3:48:42 PM4/22/10

to flex...@googlegroups.com

Thanks for the tips, Dennis. For the time being, it is the ODM file (the
Master Document I presume) exported by Pathway that is causing OOo to
crash (both for me and Robert Hedinger). I can avoid this by exporting
to separate ODT files. Maybe making my own ODM instead would solve the
problem. Hmm.

I do have the latest OOo installed (v.3.2.0) and that is what I used to
convert from .ODT to .DOC (I tried both the Save As feature and
copying/pasting over into Word.) There were several reasons why I didn't
want to do the whole thing in OOo just yet:
- Our front and back matter was in Word from before.
- Word's find/replace feature is more powerful with respect to styles.
- Word 2003 is much more responsive and stable on my computer than OOo
3.2.0.
- Word has a simple way to generate the header words needed by a
dictionary by using fields, whereas for OOo Pathway uses a very slow
macro to accomplish this.
- I don't know OOo as well so it's harder for me to deal with problems
that arise (such as unwanted section breaks).
- Word has the option of creating signatures (mini "booklets" that are
folded and then bound together). But I guess this isn't essential for us.

Hopefully these things can be overcome as I learn OOo better. For
example, I didn't know how to delete sections, so thank you for the tip
about Format, Sections, Delete. After deleting all the sections in one
fell swoop, all I had to do was select all and Format Columns back to
two-column format. Very nice.

-Jon

Robert Hedinger

unread,

Apr 24, 2010, 12:08:32 PM4/24/10

to flex...@googlegroups.com

The macro works fast in OO 3.1.1. See earlier postings about this.

Robert

--------------------------------------------------
From: "Jon C" <jvco...@gmail.com>
Sent: Thursday, April 22, 2010 8:48 PM

To: <flex...@googlegroups.com>
Subject: Re: [FLEx] using Pathway (formerly PublishingSolution, or
typeset-dictionary, or Dictionary Express)

Bruce Hooley

unread,

Apr 27, 2010, 1:00:49 AM4/27/10

to flex...@googlegroups.com

Hi.

Can someone tell me how to write a regular expression to filter for all
entries that DON'T contain a particular item? There must surely be a way to
do it and I assume it would be by using regular expressions, but it is not
obvious to me. I'm trying to find all entries where the definition does not
contain : or alternatively, that don't contain see, or see: in the
definition field.
Thanks in advance.

Bruce Hooley

Dennis Walters

unread,

Apr 27, 2010, 2:17:27 AM4/27/10

to flex...@googlegroups.com

jlarmagost wrote:
> I'm not sure I'll ever be printing a dictionary,

If your putting the data in, you're likely to want it back out at some
point.

> I've suspected for some time that the task wouldn't
> be problem free.

It won't. I believe in the 90-90 rule of project schedules: The first
90% of a project takes 90% of the time, and the last 10% takes the other
90%!

> In particular, my project uses some invalid characters
> (i.e. non-IPA private use Unicode) and in a couple of very small test
> exports from one of the earlier versions of Flex I found that some of them
> didn't print correctly, nor did one IPA character.
>
> All of that was some time ago and I just keep hoping that if and when I do
> want to print, it will go smoothly.

I'd start now working on a solution for that one. Do your characters
appear correctly in FLEx?

Dennis

Robert Hedinger

unread,

Apr 27, 2010, 6:33:29 AM4/27/10

to flex...@googlegroups.com

I am interested in this too.

Robert

--------------------------------------------------
From: "Bruce Hooley" <bruce_...@sil.org>
Sent: Tuesday, April 27, 2010 6:00 AM
To: <flex...@googlegroups.com>
Subject: [FLEx] Regular expressions

D. Rowe

unread,

Apr 27, 2010, 11:02:53 AM4/27/10

to flex...@googlegroups.com

Bruce

> Can someone tell me how to write a regular expression to filter for
> all entries that DON'T contain a particular item? There must surely be
> a way to do it and I assume it would be by using regular expressions,
> but it is not obvious to me. I'm trying to find all entries where the
> definition does not contain : or alternatively, that don't contain
> see, or see: in the definition field.

I'm a beginner with regular expressions, but I'll suggest the following:

^[^:]*$

where
^ = beginning of line
[^:] = a character that is not a ":"
+ = one or more of the preceding
$ = end of line

This loosely translates as "only give me records where the whole
definition (from "beginning of line" to "end of line") can be filled
with a string of non colon characters".

Note that using "+" (meaning "one or more") means that no blank
definitions will be matched.

David Rowe

Jon C

unread,

Apr 27, 2010, 3:04:39 PM4/27/10

to flex...@googlegroups.com

I'm afraid I can't find which posting you're referring to--do you know
the date?

The macro was slow for me in both 3.0.0 and 3.2.0. It would seem odd
that 3.1.1 would be somehow better than both of those.

-Jon

Jon C

unread,

Apr 27, 2010, 3:15:01 PM4/27/10

to flex...@googlegroups.com

Just to clarify, I think David meant to write this:
^[^:]+$

I've used this approach too and it works well. However, I don't know how
to match negatively for a sequence of more than one character, such as
the "bee:" example Bruce gave.

I've often wished for a "does not contain" (or "does not match" or
"exclude") checkbox in the filter dialog. Whether used in basic mode or
in combination with regular expressions, this could be quite handy.

-Jon

D. Rowe

unread,

Apr 27, 2010, 3:32:39 PM4/27/10

to flex...@googlegroups.com

Jon is right. I mistyped the expression when I copied it. The "*" (zero or more) version will match records with a blank definition.

To match the non-presence of more than one character, I think you can do something like this (for "xy", "xyz" and "xyz:", respectively):

^([^x]|(x[^y]))+x?$

^([^x]|(x[^y])|(xy[^z]))+(x|xy)?$

^([^x]|(x[^y])|(xy[^z])|(xyz[^:]))+(x|xy|xyz)?$

To expand the second example:

^ = beginning of line

( = beginning of a group of three possibilities
[^x] = any characters except "x"
| = or
(x[^y]) = "x" followed by any character except "y"
| = or
(xy[^z]) = "xy" followed by any character except "z"
) = end of the group of three possibilities
+ = one or more times
( = beginning of a group of two possibilities (to take into account an "x" or "xy" string at the end of the line)
x = "x"
| = or
xy = "xy"
) = end of the group of two possibilities
? = zero or one time

$ = end of line

that is, "only give me records where the whole definition (from 'beginning of line' to 'end of line') can be filled with any number of three possible strings -- a string of one character that is not an 'x', or a string of 'x' followed by a character that is not a 'y', or a string of 'xy' followed by a character that is not a 'z' -- in any order, with an optional 'x' or 'xy' at the very end."

David Rowe

Ronald Moe

unread,

Apr 27, 2010, 4:56:36 PM4/27/10

to flex...@googlegroups.com

Just in case someone wants to use this regular expression in the future and
can't remember it... Go to Help-Language Explorer-Search tab, search for
"regular expressions" and choose "Examples of combinations of regular
expressions". The sixth item in the list is:
^[^ ]+$
This finds all strings that do not contain a space, in other words all
strings that consist of a single word. It would find "brother" or
"older.brother", but not "older brother".
Ron Moe

-----Original Message-----
From: flex...@googlegroups.com [mailto:flex...@googlegroups.com] On

Behalf Of Jon C
Sent: Tuesday, April 27, 2010 12:15 PM
To: flex...@googlegroups.com

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.814 / Virus Database: 271.1.1/2838 - Release Date: 04/26/10
23:27:00

John Thomson

unread,

Apr 27, 2010, 5:11:20 PM4/27/10

to flex...@googlegroups.com

FLEx supports a regular expression syntax known as "negative lookahead" which can do this fairly cleanly. For example,

^(?:(?!see).)*$

will match anything that does not contain the string 'see'.

Explaining why is harder. ^...$ makes it apply to the whole string. The *, as usual, says it can have any number of the bit before, that is, the whole string must consist of any number of (?:(?!see).), which is where the magic is.

The critical bit is the negative lookahead, (?!see), which doesn't 'use up' anything in the input, but will prevent a match at the current position if the next three characters are 'see'. Thus, (?!see). will match any single character (the dot) PROVIDED the next three characters aren't 'see'. The outer (?:...) just groups things, so we can have any number of any character, provided it isn't the start of the sequence 'see'.

A simpler example of this is q(?!u), which will match 'q' not followed by 'u'. Unlike q[^u], it will even match 'Iraq' where nothing at all follows the q.

John Thomson

> However, I don't know how to match negatively for a sequence of more than one character, such as the "bee:" example Bruce gave.

Bruce Hooley

unread,

Apr 27, 2010, 7:47:18 PM4/27/10

to flex...@googlegroups.com

Many thanks David. That seems to have done the trick. I thought it had to be
something involving the ^, but it didn't occur to me to use two of them!
Regards
Bruce

--------------------------------------------------
From: "D. Rowe" <test...@kastanet.org>
Sent: Wednesday, April 28, 2010 1:02 AM

To: <flex...@googlegroups.com>
Subject: Re: [FLEx] Regular expressions

Bruce Hooley

unread,

Apr 27, 2010, 8:05:50 PM4/27/10

to flex...@googlegroups.com

Thanks to all of you who responded to my query on regular expressions. They
were all very helpful and I learned some things in the process!

Bruce

Jeff and Peg Shrum

unread,

Apr 28, 2010, 9:23:21 AM4/28/10

to flex...@googlegroups.com

I would like to add something a bit new to the discussion.
To find all entries with "see," or "seen," or "saw," use the expression
see| saw
This will also give all entries with these strings like "seem", but you can
just ignore those.
The same approach can be used to find all entries with "go" or "went." Just
use the expression
go| went I found this method in the help and it has been a quick way for
me to find related words in Flex.
It is also useful for sorting out synonyms, and items in part whole
relationships like "spoon," "knife," "fork," and "plate."

Jeff S.
Milange, Mozambique

-----Original Message-----
From: flex...@googlegroups.com [mailto:flex...@googlegroups.com] On

D. Rowe

unread,

Apr 28, 2010, 11:04:45 AM4/28/10

to flex...@googlegroups.com

John

> FLEx supports a regular expression syntax known as "negative
> lookahead" which can do this fairly cleanly.

It would be very helpful if this could be added to the FLEx documentation.

David Baines

unread,

Apr 28, 2010, 3:22:16 PM4/28/10

to flex...@googlegroups.com

Yes, that is true, and there has been discussion about it. Flex supports Regular Expressions, and only supports one variety of them (ICU) and only supports one version of that variety (whatever is current at the time FLEx was built). So I am in favour of adding to the help precise and tested information about RegEx to the Help file.

However that is a fairly large job, and could take several hours or even days of work. If someone was willing to volunteer to write up some worked examples and test them in FLEx then perhaps we could see whether those could be added to the Helps. I can't make any promises at all, but even if the help is available in a Resource document rather than the help file that would be a step in the right direction. Either way, I think it would have to be a volunteer that writes this up rather than the development team, and it would be good to have it updated should that be necessary.

Any thoughts about that?
David Baines.

Jon C

unread,

Apr 29, 2010, 2:57:08 PM4/29/10

to flex...@googlegroups.com

Yes, this takes a little getting used to. The first ^ means "beginning
of line" and the second ^ means "not" (when it's at the beginning of a
set of bracketed characters). They are quite unrelated to each other,
which is confusing at first.
-Jon

Jon C

unread,

Apr 29, 2010, 4:09:12 PM4/29/10

to flex...@googlegroups.com

Wow, John T's solution is excellent! This is a filter I've wanted for a while now.

^(?:(?!see).)*$

It took me a while to understand it, and I had to look up the (? ... ) syntax at http://userguide.icu-project.org/strings/regexp to understand that it's "non-capturing". I believe the following, simpler regex works just like John T's except that it's less efficient:

^((?!see).)*$

David, regarding doing a full overhaul of the regex helps, who has the expertise to do so, other than people on the development team? Maybe adding a link to the ICU site above would be a good way to fill the gap for now.

We do already have a nice help page in place, "Examples of combinations of regular expressions", although its descriptions of the dot's (.) behavior ("match... as many times as possible") seem to contradict the definition in "Examples of regular expressions": "Match any single character, without regard to what that character is."

I've copied sections from those two help files, edited them in OpenOffice (since that's our new standard) with revisions turned on, and attached it. (BCC-ed to Marlon, David and John.)

After those minor edits, I added John T's example filter, but I reversed the order of the explanation in order to start simple and build up. Perhaps Marlon could incorporate these changes into the help file without too much trouble.

Are there other as-yet-undocumented regex features that many of us would find useful? If so, maybe now is the time to add them to the help file.

-Jon

EXCERPT:

q[^u]	Match "q" followed by a character that is not a "u"	Finds "Iraq is in the mid-East" but not "Iraq" nor "He came quickly." (Note that a caret leading bracketed contents acts as a "not.")
q(?!u)	Match "q" not followed by "u". This (? ... ) syntax is for negative lookahead, which doesn't "use up" anything in the input but will prevent a match at the current position if the next character is "u".	Finds "Iraq" but not "quickly".
^((?!see).)*$	Match anything that does not contain the string "see".	Finds "serene" and "se'e" but not "see", "seer", "Tennessee".
^(?:(?!see).)*$	Match anything that does not contain the string "see". This version is more efficient because (?: ... ) is a non-capturing parenthesis