The lexer for .po translation files, which is located inside LexOthers.cxx as ColourisePoDoc / ColourisePoLine, uses a static variable 'state'. This appears to be unsafe and is likely to be incorrect when called multiple times or when multiple files are loaded. If anyone is using this lexer, it should be modified to not use any static variables: it is likely what it wants is to continue on from the pressing style which is the 'initStyle' (3rd) parameter to the lexer.
If anybody received my previous empty message, I'm sorry. I messed up
with my mail client ending up sending the message before writing it...
Fortunately, it seems to have been stopped at some point, so maybe
nobody got it.
Le 10/07/2012 10:20, Neil Hodgson a �crit :
> The lexer for .po translation files, which is located inside
> LexOthers.cxx as ColourisePoDoc / ColourisePoLine, uses a static
> variable 'state'. This appears to be unsafe and is likely to be
> incorrect when called multiple times or when multiple files are
> loaded. If anyone is using this lexer, it should be modified to not
> use any static variables: it is likely what it wants is to continue
> on from the pressing style which is the 'initStyle' (3rd) parameter
> to the lexer.
Trying to fix this I ended up writing a new and a little more complete
lexer. It behaves like the old one but follows a little better the
language like defined by Gettext[1] and as understood by Gettext's
`msgftm` tool; and provides a few more styles [2].
However, I'm not 100% sure about how I dealt with what used the static
variable before. The thing is that there are 3 string styles, and the
chosen one depends on what precedes it, which can be lines before, and
moreover that is separated by the default style. For example, given the
code below:
the style for the strings foo and bar are different, and especially line
4 should be styled with no (default) style. This leads to two problems:
1) to style line 2 and 5, I need to look behind and find what is the
previous non-default style.
2) doing so will break when changing e.g. "msgstr" to "msgid" since line
5 won't be restyled (but it now should use another style).
So, what I finally did is adding 3 "default" styles (
SCE_PO_MSGCTXT_TEXT_DEFAULT, SCE_PO_MSGID_TEXT_DEFAULT and
SCE_PO_MSGSTR_TEXT_DEFAULT) that are used in place of the default styles
when the various string types are expected, using those to remember how
a string should be styled here. This removes the need to look behind,
and fixes the restyling issue at the same time.
Again, I'm not sure it's the right approach, but I don't see another way
of keeping the information through many lines without styling everything
with the string style -- which would not be really great either. I'm
open to comments :)
Attached are two patches, one is the new lexer patch, the other adds the
PO properties to SciTE (which didn't seem to exist at all yet).
Thanks for reading,
Colomban
[1] https://www.gnu.org/software/gettext/manual/gettext.html#PO-Files [2] SCE_PO_PROGRAMMER_COMMENT, SCE_PO_REFERENCE, SCE_PO_FLAGS,
SCE_PO_MSGID_TEXT_EOL, SCE_PO_MSGSTR_TEXT_EOL, SCE_PO_MSGCTXT_TEXT_EOL,
SCE_PO_ERROR and the three SCE_PO_MSGID_TEXT_DEFAULT,
SCE_PO_MSGSTR_TEXT_DEFAULT and SCE_PO_MSGCTXT_TEXT_DEFAULT.
> If anybody received my previous empty message, I'm sorry.
Your account still had the moderate flag on, so I saw it and removed it.
> However, I'm not 100% sure about how I dealt with what used the static
> variable before. The thing is that there are 3 string styles, and the
> chosen one depends on what precedes it, which can be lines before, and
> moreover that is separated by the default style.
The common ways of implementing this without introducing user-visible changes are:
1) Use line states to remember the state at the end of each line. styler.SetLineState / styler.GetLineState
2) Seek back to the start of the feature and discover the state there.
3) Write an object lexer and record each change in the feature state. This is much more work but is better for complex functionality.
>> If anybody received my previous empty message, I'm sorry.
> Your account still had the moderate flag on, so I saw it and removed
> it.
Oh, so I haven't ever posted on the ML before? Well, maybe it was only
on the trackers. Anyway that's fortunate, and thanks :)
>> However, I'm not 100% sure about how I dealt with what used the
>> static variable before. The thing is that there are 3 string
>> styles, and the chosen one depends on what precedes it, which can
>> be lines before, and moreover that is separated by the default
>> style.
> The common ways of implementing this without introducing user-visible
> changes are:
> 1) Use line states to remember the state at the end of each line.
> styler.SetLineState / styler.GetLineState
That's perfect, thanks! I updated my patch (attached) to use this
instead of the extra styles and it works just fine, removing the weird
styles and even some lines of code.
> 2) Seek back to the start of the feature and discover the state
> there.
> 3) Write an object lexer and record each change in the feature state.
> This is much more work but is better for complex functionality.
Not sure what you mean with a "feature" here, but since solution 1 works
just fine for this lexer and should even be enough if somebody wants to
add e.g. c-style format highlighting after a "c-format" flag, I won't
rewrite the thing a second time :) And po format isn't that complex it
requires anything really fancy anyway I guess.
> That's perfect, thanks! I updated my patch (attached) to use this
> instead of the extra styles and it works just fine, removing the weird
> styles and even some lines of code.
cppcheck complains about duplicate if/else branches for lines 693/695. While it looks like you plan to differentiate here, its not unusual for such plans to not be completed leaving confusing code.
Including .po support in LexOthers was expedient but it should have its own file so it can be included and excluded easily.
>> That's perfect, thanks! I updated my patch (attached) to use this >> instead of the extra styles and it works just fine, removing the
>> weird styles and even some lines of code.
> cppcheck complains about duplicate if/else branches for lines
> 693/695. While it looks like you plan to differentiate here, its not
> unusual for such plans to not be completed leaving confusing code.
OK, I'll merge the branches.
> Including .po support in LexOthers was expedient but it should have
> its own file so it can be included and excluded easily.
OK, I'll try to move it to its own file and integrate that.
>> OK, I'll try to move it to its own file and integrate that.
> I accidentally committed your current patch. I'd still like the
> updates though.
Here they are.
0001-Move-PO-lexer-outside-LexOthers.patch:
Moves the PO lexer to it own file. Tested build with gtk, and
grepped for others, hopefully it's OK everywhere (I ran
LexGen.py). I also changed the lexer module name from "lmPo"
to "lmPO" (uppercase) to match other lexer names (like lmCPP,
lmVB, etc.) since "po" here is an abbreviation.
I'm not completely sure of the implications though, if it
brings compatibility issues I can change that back to the old
"lmPo" name.
0002-Remove-duplicated-branch-in-PO-lexer.patch:
Removes the duplicated branch.