Message from discussion
.po file lexer uses static variable
Received: by 10.180.95.2 with SMTP id dg2mr674456wib.2.1347140920141;
Sat, 08 Sep 2012 14:48:40 -0700 (PDT)
X-BeenThere: scintilla-interest@googlegroups.com
Received: by 10.180.90.195 with SMTP id by3ls903153wib.4.gmail; Sat, 08 Sep
2012 14:48:39 -0700 (PDT)
Received: by 10.180.98.234 with SMTP id el10mr674387wib.3.1347140919630;
Sat, 08 Sep 2012 14:48:39 -0700 (PDT)
Received: by 10.216.101.133 with SMTP id b5msweg;
Sat, 8 Sep 2012 13:16:14 -0700 (PDT)
Received: by 10.180.103.37 with SMTP id ft5mr455781wib.0.1347135374106;
Sat, 08 Sep 2012 13:16:14 -0700 (PDT)
Received: by 10.180.103.37 with SMTP id ft5mr455780wib.0.1347135374091;
Sat, 08 Sep 2012 13:16:14 -0700 (PDT)
Return-Path: <lists....@herbesfolles.org>
Received: from mail.herbesfolles.org (a4nancy.globenet.org. [80.67.172.114])
by gmr-mx.google.com with ESMTPS id fa8si747596wid.1.2012.09.08.13.16.13
(version=TLSv1/SSLv3 cipher=OTHER);
Sat, 08 Sep 2012 13:16:14 -0700 (PDT)
Received-SPF: neutral (google.com: 80.67.172.114 is neither permitted nor denied by best guess record for domain of lists....@herbesfolles.org) client-ip=80.67.172.114;
Authentication-Results: gmr-mx.google.com; spf=neutral (google.com: 80.67.172.114 is neither permitted nor denied by best guess record for domain of lists....@herbesfolles.org) smtp.mail=lists....@herbesfolles.org
Received: from [127.0.0.1] (localhost [127.0.0.1]) with ESMTPSA id 706EF26686
Message-ID: <504BA78D.8010606@herbesfolles.org>
Date: Sat, 08 Sep 2012 22:16:13 +0200
From: Colomban Wendling <lists....@herbesfolles.org>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.6esrpre) Gecko/20120817 Icedove/10.0.6
MIME-Version: 1.0
To: scintilla-interest@googlegroups.com
Subject: Re: [scintilla] .po file lexer uses static variable
References: <31E3CC9C-0064-4BF5-B02B-BC8D27B8682F@me.com>
In-Reply-To: <31E3CC9C-0064-4BF5-B02B-BC8D27B8682F@me.com>
X-Enigmail-Version: 1.4.1
Content-Type: multipart/mixed;
boundary="------------000705090008060404010003"
This is a multi-part message in MIME format.
--------------000705090008060404010003
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
If anybody received my previous empty message, I'm sorry. I messed up
with my mail client ending up sending the message before writing it...
Fortunately, it seems to have been stopped at some point, so maybe
nobody got it.
Le 10/07/2012 10:20, Neil Hodgson a �crit :
> The lexer for .po translation files, which is located inside
> LexOthers.cxx as ColourisePoDoc / ColourisePoLine, uses a static
> variable 'state'. This appears to be unsafe and is likely to be
> incorrect when called multiple times or when multiple files are
> loaded. If anyone is using this lexer, it should be modified to not
> use any static variables: it is likely what it wants is to continue
> on from the pressing style which is the 'initStyle' (3rd) parameter
> to the lexer.
Trying to fix this I ended up writing a new and a little more complete
lexer. It behaves like the old one but follows a little better the
language like defined by Gettext[1] and as understood by Gettext's
`msgftm` tool; and provides a few more styles [2].
However, I'm not 100% sure about how I dealt with what used the static
variable before. The thing is that there are 3 string styles, and the
chosen one depends on what precedes it, which can be lines before, and
moreover that is separated by the default style. For example, given the
code below:
1. msgid "foo"
2. "foo string continues"
3. msgstr "bar"
4.
5. "bar string continues"
the style for the strings foo and bar are different, and especially line
4 should be styled with no (default) style. This leads to two problems:
1) to style line 2 and 5, I need to look behind and find what is the
previous non-default style.
2) doing so will break when changing e.g. "msgstr" to "msgid" since line
5 won't be restyled (but it now should use another style).
So, what I finally did is adding 3 "default" styles (
SCE_PO_MSGCTXT_TEXT_DEFAULT, SCE_PO_MSGID_TEXT_DEFAULT and
SCE_PO_MSGSTR_TEXT_DEFAULT) that are used in place of the default styles
when the various string types are expected, using those to remember how
a string should be styled here. This removes the need to look behind,
and fixes the restyling issue at the same time.
Again, I'm not sure it's the right approach, but I don't see another way
of keeping the information through many lines without styling everything
with the string style -- which would not be really great either. I'm
open to comments :)
Attached are two patches, one is the new lexer patch, the other adds the
PO properties to SciTE (which didn't seem to exist at all yet).
Thanks for reading,
Colomban
[1] https://www.gnu.org/software/gettext/manual/gettext.html#PO-Files
[2] SCE_PO_PROGRAMMER_COMMENT, SCE_PO_REFERENCE, SCE_PO_FLAGS,
SCE_PO_MSGID_TEXT_EOL, SCE_PO_MSGSTR_TEXT_EOL, SCE_PO_MSGCTXT_TEXT_EOL,
SCE_PO_ERROR and the three SCE_PO_MSGID_TEXT_DEFAULT,
SCE_PO_MSGSTR_TEXT_DEFAULT and SCE_PO_MSGCTXT_TEXT_DEFAULT.
--------------000705090008060404010003
Content-Type: text/x-patch;
name="0001-Rewrite-PO-lexer.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename="0001-Rewrite-PO-lexer.patch"
# HG changeset patch
# User Colomban Wendling <b...@herbesfolles.org>
# Date 1347133751 -7200
# Branch po-lexer-rewrite
# Node ID 58f3c0be084e350b449d54458e8752ae6262679e
# Parent e8282d36744c05e1d24d14fadb2dc1884b8fe9bd
Rewrite the GetText translation (po) lexer
The old one had a few bugs and was somewhat limited, this one should
hopefully fix the issues. The new one should behave like the old one
but adding some more styles and following better the file format.
diff -r e8282d36744c -r 58f3c0be084e include/SciLexer.h
--- a/include/SciLexer.h Sat Sep 08 22:45:53 2012 +1000
+++ b/include/SciLexer.h Sat Sep 08 21:49:11 2012 +0200
@@ -1354,6 +1354,16 @@
#define SCE_PO_MSGCTXT 6
#define SCE_PO_MSGCTXT_TEXT 7
#define SCE_PO_FUZZY 8
+#define SCE_PO_PROGRAMMER_COMMENT 9
+#define SCE_PO_REFERENCE 10
+#define SCE_PO_FLAGS 11
+#define SCE_PO_MSGID_TEXT_EOL 12
+#define SCE_PO_MSGSTR_TEXT_EOL 13
+#define SCE_PO_MSGCTXT_TEXT_EOL 14
+#define SCE_PO_ERROR 15
+#define SCE_PO_MSGID_TEXT_DEFAULT 16
+#define SCE_PO_MSGSTR_TEXT_DEFAULT 17
+#define SCE_PO_MSGCTXT_TEXT_DEFAULT 18
#define SCE_PAS_DEFAULT 0
#define SCE_PAS_IDENTIFIER 1
#define SCE_PAS_COMMENT 2
diff -r e8282d36744c -r 58f3c0be084e include/Scintilla.iface
--- a/include/Scintilla.iface Sat Sep 08 22:45:53 2012 +1000
+++ b/include/Scintilla.iface Sat Sep 08 21:49:11 2012 +0200
@@ -3924,6 +3924,16 @@
val SCE_PO_MSGCTXT=6
val SCE_PO_MSGCTXT_TEXT=7
val SCE_PO_FUZZY=8
+val SCE_PO_PROGRAMMER_COMMENT=9
+val SCE_PO_REFERENCE=10
+val SCE_PO_FLAGS=11
+val SCE_PO_MSGID_TEXT_EOL=12
+val SCE_PO_MSGSTR_TEXT_EOL=13
+val SCE_PO_MSGCTXT_TEXT_EOL=14
+val SCE_PO_ERROR=15
+val SCE_PO_MSGID_TEXT_DEFAULT=16
+val SCE_PO_MSGSTR_TEXT_DEFAULT=17
+val SCE_PO_MSGCTXT_TEXT_DEFAULT=18
# Lexical states for SCLEX_PASCAL
lex Pascal=SCLEX_PASCAL SCE_PAS_
val SCE_PAS_DEFAULT=0
diff -r e8282d36744c -r 58f3c0be084e lexers/LexOthers.cxx
--- a/lexers/LexOthers.cxx Sat Sep 08 22:45:53 2012 +1000
+++ b/lexers/LexOthers.cxx Sat Sep 08 21:49:11 2012 +0200
@@ -614,76 +614,126 @@
} while (static_cast<int>(startPos) + length > curLineStart);
}
-static void ColourisePoLine(
- char *lineBuffer,
- unsigned int lengthLine,
- unsigned int startLine,
- unsigned int endPos,
- Accessor &styler) {
-
- unsigned int i = 0;
- static unsigned int state = SCE_PO_DEFAULT;
- unsigned int state_start = SCE_PO_DEFAULT;
-
- while ((i < lengthLine) && isspacechar(lineBuffer[i])) // Skip initial spaces
- i++;
- if (i < lengthLine) {
- if (lineBuffer[i] == '#') {
- // check if the comment contains any flags ("#, ") and
- // then whether the flags contain "fuzzy"
- if (strstart(lineBuffer, "#, ") && strstr(lineBuffer, "fuzzy"))
- styler.ColourTo(endPos, SCE_PO_FUZZY);
- else
- styler.ColourTo(endPos, SCE_PO_COMMENT);
- } else {
- if (lineBuffer[0] == '"') {
- // line continuation, use previous style
- styler.ColourTo(endPos, state);
- return;
- // this implicitly also matches "msgid_plural"
- } else if (strstart(lineBuffer, "msgid")) {
- state_start = SCE_PO_MSGID;
- state = SCE_PO_MSGID_TEXT;
- } else if (strstart(lineBuffer, "msgstr")) {
- state_start = SCE_PO_MSGSTR;
- state = SCE_PO_MSGSTR_TEXT;
- } else if (strstart(lineBuffer, "msgctxt")) {
- state_start = SCE_PO_MSGCTXT;
- state = SCE_PO_MSGCTXT_TEXT;
+// see https://www.gnu.org/software/gettext/manual/gettext.html#PO-Files for the syntax reference
+// some details are taken from the GNU msgfmt behavior (like that indent is allows in front of lines)
+static void ColourisePoDoc(unsigned int startPos, int length, int initStyle, WordList *[], Accessor &styler) {
+ StyleContext sc(startPos, length, initStyle, styler);
+ bool escaped = false;
+
+ for (; sc.More(); sc.Forward()) {
+ // whether we should leave a state
+ switch (sc.state) {
+ case SCE_PO_COMMENT:
+ case SCE_PO_PROGRAMMER_COMMENT:
+ case SCE_PO_REFERENCE:
+ case SCE_PO_FLAGS:
+ case SCE_PO_FUZZY:
+ if (sc.atLineEnd) {
+ sc.SetState(SCE_PO_DEFAULT);
+ }
+ break;
+
+ case SCE_PO_MSGCTXT:
+ case SCE_PO_MSGID:
+ case SCE_PO_MSGSTR:
+ if (isspacechar(sc.ch)) {
+ if (sc.state == SCE_PO_MSGCTXT)
+ sc.SetState(SCE_PO_MSGCTXT_TEXT_DEFAULT);
+ else if (sc.state == SCE_PO_MSGID)
+ sc.SetState(SCE_PO_MSGID_TEXT_DEFAULT);
+ else if (sc.state == SCE_PO_MSGSTR)
+ sc.SetState(SCE_PO_MSGSTR_TEXT_DEFAULT);
+ }
+ break;
+
+ case SCE_PO_ERROR:
+ if (sc.atLineEnd)
+ sc.SetState(SCE_PO_DEFAULT);
+ break;
+
+ case SCE_PO_MSGCTXT_TEXT:
+ case SCE_PO_MSGID_TEXT:
+ case SCE_PO_MSGSTR_TEXT:
+ int defaultState = SCE_PO_DEFAULT;
+
+ if (sc.state == SCE_PO_MSGCTXT_TEXT)
+ defaultState = SCE_PO_MSGCTXT_TEXT_DEFAULT;
+ else if (sc.state == SCE_PO_MSGID_TEXT)
+ defaultState = SCE_PO_MSGID_TEXT_DEFAULT;
+ else if (sc.state == SCE_PO_MSGSTR_TEXT)
+ defaultState = SCE_PO_MSGSTR_TEXT_DEFAULT;
+
+ if (sc.atLineEnd) { // invalid inside a string
+ if (sc.state == SCE_PO_MSGCTXT_TEXT)
+ sc.ChangeState(SCE_PO_MSGCTXT_TEXT_EOL);
+ else if (sc.state == SCE_PO_MSGID_TEXT)
+ sc.ChangeState(SCE_PO_MSGID_TEXT_EOL);
+ else if (sc.state == SCE_PO_MSGSTR_TEXT)
+ sc.ChangeState(SCE_PO_MSGSTR_TEXT_EOL);
+ sc.ForwardSetState(defaultState);
+ escaped = false;
+ } else {
+ if (escaped)
+ escaped = false;
+ else if (sc.ch == '\\')
+ escaped = true;
+ else if (sc.ch == '"')
+ sc.ForwardSetState(defaultState);
+ }
+ break;
+ }
+
+ // whether we should enter a new state
+ switch (sc.state) {
+ case SCE_PO_DEFAULT:
+ case SCE_PO_MSGCTXT_TEXT_DEFAULT:
+ case SCE_PO_MSGID_TEXT_DEFAULT:
+ case SCE_PO_MSGSTR_TEXT_DEFAULT: {
+ // forward to the first non-white character on the line
+ bool atLineStart = sc.atLineStart;
+ if (atLineStart) {
+ while (sc.More() && ! sc.atLineEnd && isspacechar(sc.ch))
+ sc.Forward();
+ }
+
+ if (atLineStart && sc.ch == '#') {
+ if (sc.chNext == '.')
+ sc.SetState(SCE_PO_PROGRAMMER_COMMENT);
+ else if (sc.chNext == ':')
+ sc.SetState(SCE_PO_REFERENCE);
+ else if (sc.chNext == ',')
+ sc.SetState(SCE_PO_FLAGS);
+ else if (sc.chNext == '|')
+ sc.SetState(SCE_PO_COMMENT); // previous untranslated string, no special style yet
+ else
+ sc.SetState(SCE_PO_COMMENT);
+ } else if (atLineStart && sc.Match("msgid")) { // includes msgid_plural
+ sc.SetState(SCE_PO_MSGID);
+ } else if (atLineStart && sc.Match("msgstr")) { // includes [] suffixes
+ sc.SetState(SCE_PO_MSGSTR);
+ } else if (atLineStart && sc.Match("msgctxt")) {
+ sc.SetState(SCE_PO_MSGCTXT);
+ } else if (sc.ch == '"') {
+ if (sc.state == SCE_PO_MSGCTXT_TEXT_DEFAULT)
+ sc.SetState(SCE_PO_MSGCTXT_TEXT);
+ else if (sc.state == SCE_PO_MSGID_TEXT_DEFAULT)
+ sc.SetState(SCE_PO_MSGID_TEXT);
+ else if (sc.state == SCE_PO_MSGSTR_TEXT_DEFAULT)
+ sc.SetState(SCE_PO_MSGSTR_TEXT);
+ else
+ sc.SetState(SCE_PO_ERROR);
+ } else if (! isspacechar(sc.ch))
+ sc.SetState(SCE_PO_ERROR);
+ break;
}
- if (state_start != SCE_PO_DEFAULT) {
- // find the next space
- while ((i < lengthLine) && ! isspacechar(lineBuffer[i]))
- i++;
- styler.ColourTo(startLine + i - 1, state_start);
- styler.ColourTo(startLine + i, SCE_PO_DEFAULT);
- styler.ColourTo(endPos, state);
- }
- }
- } else {
- styler.ColourTo(endPos, SCE_PO_DEFAULT);
- }
-}
-
-static void ColourisePoDoc(unsigned int startPos, int length, int, WordList *[], Accessor &styler) {
- char lineBuffer[1024];
- styler.StartAt(startPos);
- styler.StartSegment(startPos);
- unsigned int linePos = 0;
- unsigned int startLine = startPos;
- for (unsigned int i = startPos; i < startPos + length; i++) {
- lineBuffer[linePos++] = styler[i];
- if (AtEOL(styler, i) || (linePos >= sizeof(lineBuffer) - 1)) {
- // End of line (or of line buffer) met, colourise it
- lineBuffer[linePos] = '\0';
- ColourisePoLine(lineBuffer, linePos, startLine, i, styler);
- linePos = 0;
- startLine = i + 1;
+
+ case SCE_PO_FLAGS:
+ if (sc.Match("fuzzy"))
+ sc.ChangeState(SCE_PO_FUZZY);
+ break;
}
}
- if (linePos > 0) { // Last line does not have ending characters
- ColourisePoLine(lineBuffer, linePos, startLine, startPos + length - 1, styler);
- }
+ sc.Complete();
}
static inline bool isassignchar(unsigned char ch) {
--------------000705090008060404010003
Content-Type: text/x-patch;
name="0001-SciTE-add-po-properties.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename="0001-SciTE-add-po-properties.patch"
# HG changeset patch
# User Colomban Wendling <b...@herbesfolles.org>
# Date 1347134940 -7200
# Node ID 871bbfb8c6b0e3ebc8883a3574cc6e5c56e23538
# Parent 4bf4dd7d8b688cb88db2943d1bb666997d8de447
Add GetText Translation (po) properties
diff -r 4bf4dd7d8b68 -r 871bbfb8c6b0 src/SciTEGlobal.properties
--- a/src/SciTEGlobal.properties Wed Jul 04 13:21:41 2012 +1000
+++ b/src/SciTEGlobal.properties Sat Sep 08 22:09:00 2012 +0200
@@ -369,6 +369,7 @@
$(filter.pascal)\
$(filter.perl)\
$(filter.php)\
+#$(filter.po)\
$(filter.pov)\
$(filter.powershell)\
$(filter.prg)\
@@ -574,6 +575,7 @@
P&HP|php||\
#P&LSQL|spec||\
#P&ostScript|ps||\
+#GetText Translation|po||\
#P&OV-Ray SDL|pov||\
#PowerShell|ps1||\
#PowerPro|powerpro||\
diff -r 4bf4dd7d8b68 -r 871bbfb8c6b0 src/others.properties
--- a/src/others.properties Wed Jul 04 13:21:41 2012 +1000
+++ b/src/others.properties Sat Sep 08 22:09:00 2012 +0200
@@ -5,11 +5,13 @@
file.patterns.batch=*.bat;*.cmd;*.nt
file.patterns.diff=*.diff;*.patch
file.patterns.make=makefile;Makefile;*.mak;configure
+file.patterns.po=*.po;*.pot
filter.properties=Properties (ini inf reg url cfg cnf)|$(file.patterns.props)|
filter.text=Text (txt log lst doc diz nfo)|$(file.patterns.text);make*|
filter.batch=Batch (bat cmd nt)|$(file.patterns.batch)|
filter.diff=Difference (diff patch)|$(file.patterns.diff)|
+filter.po=GetText Translation (po pot)|$(file.patterns.po)|
lexer.$(file.patterns.props)=props
lexer.$(file.patterns.batch)=batch
@@ -17,6 +19,7 @@
lexer.$(file.patterns.make)=makefile
lexer.*.iface=makefile
lexer.$(file.patterns.diff)=diff
+lexer.$(file.patterns.po)=po
word.characters.$(file.patterns.text)=$(chars.alpha)$(chars.numeric)$(chars.accented)-'
@@ -175,6 +178,44 @@
# Line change (!...)
style.diff.7=fore:#7F7F7F
+# GetText Translation styles
+
+# Default
+style.po.0=fore:#000000
+style.po.16=$(style.po.0)
+style.po.17=$(style.po.0)
+style.po.18=$(style.po.0)
+# Comment
+style.po.1=$(colour.code.comment.line),$(font.code.comment.line)
+# msgid
+style.po.2=$(colour.keyword),bold
+# msgid text
+style.po.3=$(colour.string)
+# msgstr
+style.po.4=$(style.po.2)
+# msgstr text
+style.po.5=$(style.po.3)
+# msgstr
+style.po.6=$(style.po.2)
+# msgstr text
+style.po.7=$(style.po.3)
+# Fuzzy flag
+style.po.8=$(colour.code.comment.doc),$(font.code.comment.doc)
+# Programmer comments
+style.po.9=$(colour.code.comment.doc),$(font.code.comment.doc)
+# Reference to the source file
+style.po.10=$(colour.code.comment.line),$(font.code.comment.line)
+# Flags
+style.po.11=$(colour.code.comment.doc),$(font.code.comment.doc)
+# Unterminated msgid text
+style.po.12=$(colour.string),italic
+# Unterminated msgstr text
+style.po.13=$(style.po.12)
+# Unterminated msgstr text
+style.po.14=$(style.po.12)
+# Invalid line
+style.po.15=fore:#000000,back:#ff0000
+
command.build.makefile=make
command.build.*.mak=make
--------------000705090008060404010003--