Fixing makemessages for Javascript

465 views
Skip to first unread message

Ned Batchelder

unread,
Apr 4, 2011, 5:15:16 PM4/4/11
to django-d...@googlegroups.com
Last week I re-encountered the problems with using makemessages on Javascript files, and lost a couple of half-days to trying to figure out why some of my translatable messages weren't being found and deposited into my .po files.  After fully understanding the extent of Django's current hack, I decided to take a stab at providing a better solution.

Background: today, Javascript source files are parsed for messages by running a "pythonize" regex over them, and giving the resulting text to xgettext, claiming it is Perl.  The pythonize regex simply changes any //-style comment on its own line into a #-style comment.  This strange accommodation leaves a great deal of valid Javascript syntax in place to confuse the Perl parser in xgettext.  As a result, seemingly innocuous Javascript will result in lost messages:
gettext("xyzzy 1");
var x = y;
gettext("xyzzy 2");
var x = z;
gettext("xyzzy 3");
In this sample, messages 1 and 3 are found, and message 2 is not, because y;ABC;abc; is valid Perl for a transliteration operator.  Digging into this, every time I thought I finally understood the full complexity of the brokenness, another case would pop up that didn't make sense.  The full horror of Perl syntax (http://perldoc.perl.org/perlop.html#Quote-and-Quote-like-Operators , for example) means that it is very difficult to treat non-Perl code as Perl and expect everything to be OK.  This is polyglot programming at its worst.

This needs to be fixed.  To that end, I've written a Javascript lexer (https://bitbucket.org/ned/jslex) with the goal of using it to pre-process Javascript into a form more suitable for xgettext.  My understanding of why we claim Javascript is Perl is that Perl has regex literals like Javascript does, and so xgettext stands the best chance of parsing Javascript as Perl.  Clearly that's not working well.  My solution would instead remove the regex literals from the Javascript, and then have xgettext treat it as C.

I have a few questions you can help me with:

1. Is this the best path forward?  Ideally xgettext would support Javascript directly. There's code out there to add Javascript to xgettext, but I don't know what shape that code is in, or if it's reasonable to expect Django installations to use bleeding-edge xgettext.  Is there some better solution that someone is pursuing?

2. Is there some other badness that will bite us if we tell xgettext that the modified Javascript is C?  With a full Javascript lexer, I feel pretty confident that we could solve issues if they do come up, but I'd like to know now what they are.

3. I know that lexing Javascript is tricky.  I need help finding diabolical test cases for my lexer (https://bitbucket.org/ned/jslex).  Anyone care to come up with some Javascript source that it can't properly find the regex literals in?

BTW: This would close tickets #7704, #14045, #15331, and #15495.

--Ned.

Łukasz Rekucki

unread,
Apr 4, 2011, 5:45:02 PM4/4/11
to django-d...@googlegroups.com
On 4 April 2011 23:15, Ned Batchelder <n...@nedbatchelder.com> wrote:
>
> I have a few questions you can help me with:
>
> 1. Is this the best path forward?  Ideally xgettext would support Javascript
> directly. There's code out there to add Javascript to xgettext, but I don't
> know what shape that code is in, or if it's reasonable to expect Django
> installations to use bleeding-edge xgettext.  Is there some better solution
> that someone is pursuing?

It would be best to have xgettext allow pluggable parsers or/and
rewritten in Python. Oh wait, it's called Babel ;). I was planning to
refactor the `makemessages` command to enable custom parsers (My
motivation is having a bunch of client-side templates written in a
variant of Mustache or jQuery Templates). As babel already has an
interface for that, integrating it into Django would be cool. Sure,
it's a dependancy, but so is xgettext.

>
> 2. Is there some other badness that will bite us if we tell xgettext that
> the modified Javascript is C?  With a full Javascript lexer, I feel pretty
> confident that we could solve issues if they do come up, but I'd like to
> know now what they are.

IMHO, it would be best to leave only gettext() calls and pad them with
comments, so that line numbers match (the template converter does
something similar, by replacing all other stuff with string like
ZZZZ). You can then choose C, Python or whatever :)

>
> 3. I know that lexing Javascript is tricky.  I need help finding diabolical
> test cases for my lexer (https://bitbucket.org/ned/jslex).  Anyone care to
> come up with some Javascript source that it can't properly find the regex
> literals in?

I'll give it a spin on my code :) Thanks for doing this!

--
Łukasz Rekucki

Ned Batchelder

unread,
Apr 4, 2011, 6:41:53 PM4/4/11
to django-d...@googlegroups.com, Łukasz Rekucki
On 4/4/2011 5:45 PM, Łukasz Rekucki wrote:
> On 4 April 2011 23:15, Ned Batchelder<n...@nedbatchelder.com> wrote:
>> I have a few questions you can help me with:
>>
>> 1. Is this the best path forward? Ideally xgettext would support Javascript
>> directly. There's code out there to add Javascript to xgettext, but I don't
>> know what shape that code is in, or if it's reasonable to expect Django
>> installations to use bleeding-edge xgettext. Is there some better solution
>> that someone is pursuing?
> It would be best to have xgettext allow pluggable parsers or/and
> rewritten in Python. Oh wait, it's called Babel ;). I was planning to
> refactor the `makemessages` command to enable custom parsers (My
> motivation is having a bunch of client-side templates written in a
> variant of Mustache or jQuery Templates). As babel already has an
> interface for that, integrating it into Django would be cool. Sure,
> it's a dependancy, but so is xgettext.
I don't understand yet how Babel fits into the ecosystem, but if it's a
better xgettext, then that might be the way to go. I see a jslexer.py
in the source, I wish I'd seen that a few days ago! :)

>> 2. Is there some other badness that will bite us if we tell xgettext that
>> the modified Javascript is C? With a full Javascript lexer, I feel pretty
>> confident that we could solve issues if they do come up, but I'd like to
>> know now what they are.
> IMHO, it would be best to leave only gettext() calls and pad them with
> comments, so that line numbers match (the template converter does
> something similar, by replacing all other stuff with string like
> ZZZZ). You can then choose C, Python or whatever :)
>
I saw your comment to that effect on one of the tickets. That approach
would probably also work. All of these approaches presuppose accurate
tokenization of the Javascript, which we can now do.

--Ned.

Jannis Leidel

unread,
Apr 4, 2011, 6:42:42 PM4/4/11
to django-d...@googlegroups.com
On 04.04.2011, at 23:15, Ned Batchelder wrote:

> Last week I re-encountered the problems with using makemessages on Javascript files, and lost a couple of half-days to trying to figure out why some of my translatable messages weren't being found and deposited into my .po files. After fully understanding the extent of Django's current hack, I decided to take a stab at providing a better solution.
>
> Background: today, Javascript source files are parsed for messages by running a "pythonize" regex over them, and giving the resulting text to xgettext, claiming it is Perl. The pythonize regex simply changes any //-style comment on its own line into a #-style comment. This strange accommodation leaves a great deal of valid Javascript syntax in place to confuse the Perl parser in xgettext. As a result, seemingly innocuous Javascript will result in lost messages:
> gettext("xyzzy 1");
> var x = y;
> gettext("xyzzy 2");
> var x = z;
> gettext("xyzzy 3");
> In this sample, messages 1 and 3 are found, and message 2 is not, because y;ABC;abc; is valid Perl for a transliteration operator. Digging into this, every time I thought I finally understood the full complexity of the brokenness, another case would pop up that didn't make sense. The full horror of Perl syntax (http://perldoc.perl.org/perlop.html#Quote-and-Quote-like-Operators , for example) means that it is very difficult to treat non-Perl code as Perl and expect everything to be OK. This is polyglot programming at its worst.
>
> This needs to be fixed. To that end, I've written a Javascript lexer (https://bitbucket.org/ned/jslex) with the goal of using it to pre-process Javascript into a form more suitable for xgettext. My understanding of why we claim Javascript is Perl is that Perl has regex literals like Javascript does, and so xgettext stands the best chance of parsing Javascript as Perl. Clearly that's not working well. My solution would instead remove the regex literals from the Javascript, and then have xgettext treat it as C.

Thanks Ned, I meant the post about this issue here after 1.3, since we also talked about this during the Pycon sprint, especially since we seem to have hit a few more problems with the recent gettext 0.18.1.1 (such as a seemingly stricter Perl lexer) -- which I encountered while I applied the final translation updates right before 1.3 but didn't have time to investigate yet. The bottom line is that I think we should rethink the way we look for translateable strings instead of working around the limitations of xgettext.

> 1. Is this the best path forward? Ideally xgettext would support Javascript directly. There's code out there to add Javascript to xgettext, but I don't know what shape that code is in, or if it's reasonable to expect Django installations to use bleeding-edge xgettext. Is there some better solution that someone is pursuing?

We can't really expect Django users to upgrade to the most recent (or even an unreleased) version of gettext, We've bumped the minimum required version in Django 1.2 to 0.15 once all OSes were covered with installers. Which made me talk to Armin Ronacher about using Babel instead of GNU gettext, since it has a JavaScript lexer and is in use in Sphinx and Trac. [1] In that sense, I wholeheartedly encourage you to take a stab at it for 1.4 -- if you think that's a good idea.

Having a Python based library (assuming it works similarly) seems like a better fit to Django than relying on a C program.

> 2. Is there some other badness that will bite us if we tell xgettext that the modified Javascript is C? With a full Javascript lexer, I feel pretty confident that we could solve issues if they do come up, but I'd like to know now what they are.

I feel this is much better solved once and fall all than to keep misusing xgettext.

Jannis

1: http://babel.edgewall.org/browser/trunk/babel/messages/jslexer.py


Ned Batchelder

unread,
Apr 5, 2011, 7:39:37 AM4/5/11
to django-d...@googlegroups.com, Jannis Leidel
Jannis and Łukasz have both suggested the same thing: use Babel instead
of xgettext. I understand why: it's a more complete solution than what I
have proposed, which is at heart still a way to trick xgettext into
parsing source code it doesn't natively understand.

I have no experience with Babel, so I don't know what work lies ahead to
integrate it with Django. I have used my code in makemessages, and it
works well.

Questions now include:

1) What can we get done in 1.3.1? Is integrating Babel something that
would have to wait for 1.4?

2) Who is the best expert on Babel and Django that could comment on the
work needed?

3) Are there other opinions about the two paths forward? Are there other
options?

I would like very much to get this problem solved.

--Ned.

Jonathan Slenders

unread,
Apr 5, 2011, 9:12:48 AM4/5/11
to Django developers
How about being able to parse inline javascript? Like:

<body><script type="text/javascript"> alert(gettext("hello world!"));
</script></body>


That's much cleaner than using template tags and escape filters
(addslashes) like this:

<body><script type="text/javascript"> alert("{% filter escape_js %}{%
trans "hello world!" %}{% endfilter %}"); </script></body>


Even for external js-files. If they happen to be in the template
directory, it is possible to have a mix of template tags and gettext.
So whatever javascript parser you are using, it's may be better to
foresee the possibility of django template tags to appear inside the
javascript. (And using a "mixed" grammar.)



Ned Batchelder

unread,
Apr 5, 2011, 10:31:37 AM4/5/11
to django-d...@googlegroups.com, Jonathan Slenders
On 4/5/2011 9:12 AM, Jonathan Slenders wrote:
> How about being able to parse inline javascript? Like:
>
> <body><script type="text/javascript"> alert(gettext("hello world!"));
> </script></body>
>
>
> That's much cleaner than using template tags and escape filters
> (addslashes) like this:
>
> <body><script type="text/javascript"> alert("{% filter escape_js %}{%
> trans "hello world!" %}{% endfilter %}");</script></body>
>
That is much cleaner. I wasn't considering adding features to the
current process, merely fixing the egregious time-sucking bugs! I don't
know quite how message extraction would work in a mixed environment like
that...

--Ned.

Ned Batchelder

unread,
Apr 9, 2011, 10:14:19 AM4/9/11
to django-d...@googlegroups.com
I've created two patches implementing this strategy, and attached them to ticket http://code.djangoproject.com/ticket/7704.

--Ned.
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-d...@googlegroups.com.
To unsubscribe from this group, send email to django-develop...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.

Jannis Leidel

unread,
Apr 9, 2011, 10:43:56 AM4/9/11
to django-d...@googlegroups.com

On 09.04.2011, at 16:14, Ned Batchelder wrote:

> I've created two patches implementing this strategy, and attached them to ticket http://code.djangoproject.com/ticket/7704.

Thanks Ned, but I'm a bit confused, I thought we agreed that Babel should be the way to go, since adding an own JavaScript lexer into Django would further move us away from our goal to get rid of the dependency on xgettext.

Jannis

Ned Batchelder

unread,
Apr 9, 2011, 11:32:34 AM4/9/11
to django-d...@googlegroups.com, Jannis Leidel
On 4/9/2011 10:43 AM, Jannis Leidel wrote:
> On 09.04.2011, at 16:14, Ned Batchelder wrote:
>
>> I've created two patches implementing this strategy, and attached them to ticket http://code.djangoproject.com/ticket/7704.
> Thanks Ned, but I'm a bit confused, I thought we agreed that Babel should be the way to go, since adding an own JavaScript lexer into Django would further move us away from our goal to get rid of the dependency on xgettext.
>
> Jannis
I understood that a few people including you expressed a preference for
integrating Babel. But I don't see any motion toward doing it, and it
brings its own questions, such as the relationship between Django and
Babel. Forgive me if I'm wrong, but it seemed like switching from
xgettext to Babel was not a simple proposition, and had its detractors.

For example, could we switch to Babel for 1.3.1? I'd very much like to
have this headache gone as soon as possible. The lexer I've attached to
the ticket works well, at least I'd like to hear from someone who
believes it doesn't. I understand the desire to move away from
xgettext, but it doesn't seem to be happening. No one had claimed the
three open tickets about this problem, for instance.

I apologize if I'm being brash or impatient here. What's the right way
forward?

--Ned.


Jannis Leidel

unread,
Apr 9, 2011, 11:57:29 AM4/9/11
to django-d...@googlegroups.com

On 05.04.2011, at 13:39, Ned Batchelder wrote:

> Jannis and Łukasz have both suggested the same thing: use Babel instead of xgettext. I understand why: it's a more complete solution than what I have proposed, which is at heart still a way to trick xgettext into parsing source code it doesn't natively understand.
>
> I have no experience with Babel, so I don't know what work lies ahead to integrate it with Django. I have used my code in makemessages, and it works well.
>
> Questions now include:
>
> 1) What can we get done in 1.3.1? Is integrating Babel something that would have to wait for 1.4?

Both, your lexel as well as Babel adoption would have to wait for 1.4.

> 2) Who is the best expert on Babel and Django that could comment on the work needed?

I talked to Armin Ronacher who both knows Django and Babel reasonable well, being one of the developers of the latter.

> 3) Are there other opinions about the two paths forward? Are there other options?

Not as far as I can see.

Jannis Leidel

unread,
Apr 9, 2011, 11:57:33 AM4/9/11
to django-d...@googlegroups.com
On 09.04.2011, at 17:32, Ned Batchelder wrote:

> On 4/9/2011 10:43 AM, Jannis Leidel wrote:
>> On 09.04.2011, at 16:14, Ned Batchelder wrote:
>>
>>> I've created two patches implementing this strategy, and attached them to ticket http://code.djangoproject.com/ticket/7704.
>> Thanks Ned, but I'm a bit confused, I thought we agreed that Babel should be the way to go, since adding an own JavaScript lexer into Django would further move us away from our goal to get rid of the dependency on xgettext.
>>
>> Jannis
> I understood that a few people including you expressed a preference for integrating Babel. But I don't see any motion toward doing it, and it brings its own questions, such as the relationship between Django and Babel. Forgive me if I'm wrong, but it seemed like switching from xgettext to Babel was not a simple proposition, and had its detractors.

We've just been out with 1.3 for a few weeks, so there is no big surprise we haven't produced any substancial patches for moving away from xgettext. But that doesn't mean that I (and it seems a few others) haven't researched the bits needed to make it happen. As for the "detractors", I'm not even sure what you mean by that.

> For example, could we switch to Babel for 1.3.1? I'd very much like to have this headache gone as soon as possible. The lexer I've attached to the ticket works well, at least I'd like to hear from someone who believes it doesn't.

No, Babel can't be adopted in the 1.3.X release branch, just like your JavaScript lexer.

> I understand the desire to move away from xgettext, but it doesn't seem to be happening. No one had claimed the three open tickets about this problem, for instance.

How can you say it's not happening? I clearly made a statement about my committement for Babel adoption in the current release cycle.

Jannis

Ned Batchelder

unread,
Apr 9, 2011, 1:02:57 PM4/9/11
to django-d...@googlegroups.com, Jannis Leidel
On 4/9/2011 11:57 AM, Jannis Leidel wrote:
> On 09.04.2011, at 17:32, Ned Batchelder wrote:
>
>> On 4/9/2011 10:43 AM, Jannis Leidel wrote:
>>> On 09.04.2011, at 16:14, Ned Batchelder wrote:
>>>
>>>> I've created two patches implementing this strategy, and attached them to ticket http://code.djangoproject.com/ticket/7704.
>>> Thanks Ned, but I'm a bit confused, I thought we agreed that Babel should be the way to go, since adding an own JavaScript lexer into Django would further move us away from our goal to get rid of the dependency on xgettext.
>>>
>>> Jannis
>> I understood that a few people including you expressed a preference for integrating Babel. But I don't see any motion toward doing it, and it brings its own questions, such as the relationship between Django and Babel. Forgive me if I'm wrong, but it seemed like switching from xgettext to Babel was not a simple proposition, and had its detractors.
> We've just been out with 1.3 for a few weeks, so there is no big surprise we haven't produced any substancial patches for moving away from xgettext. But that doesn't mean that I (and it seems a few others) haven't researched the bits needed to make it happen. As for the "detractors", I'm not even sure what you mean by that.
I thought I had read some issues about integrating Babel, but I can't
find them now. It must have been a figment of my lexer-fevered brain,
my mistake. What are the bits needed to make it happen?

>> For example, could we switch to Babel for 1.3.1? I'd very much like to have this headache gone as soon as possible. The lexer I've attached to the ticket works well, at least I'd like to hear from someone who believes it doesn't.
> No, Babel can't be adopted in the 1.3.X release branch, just like your JavaScript lexer.

Even if Babel is the solution for 1.4, I don't understand why my patch
can't be applied to 1.3.x? It's well tested, and clearly works better
than the code in 1.3 now. It's transparent to the user, except it isn't
baffling like today's code. From my point of view, makemessages simply
doesn't work for Javascript files. It's a frustrating process that
usually ends with twisting your Javascript sources to meet the needs of
a Perl parser, but without understanding that that's what you're doing.

>> I understand the desire to move away from xgettext, but it doesn't seem to be happening. No one had claimed the three open tickets about this problem, for instance.
> How can you say it's not happening? I clearly made a statement about my committement for Babel adoption in the current release cycle.

I was basing that on the fact that the tickets weren't claimed. Again,
I don't mean to antagonize anyone, I just want this to be fixed.

--Ned.

Ned Batchelder

unread,
Apr 14, 2011, 8:41:49 AM4/14/11
to django-d...@googlegroups.com, Jannis Leidel
Does anyone else have any opinions on a direction forward to fix this
problem? At the very least I'd like to make a doc patch for 1.3.1 that
explains the fragility.

--Ned.

Łukasz Rekucki

unread,
Apr 14, 2011, 9:01:06 AM4/14/11
to django-d...@googlegroups.com
On 14 April 2011 14:41, Ned Batchelder <n...@nedbatchelder.com> wrote:
> Does anyone else have any opinions on a direction forward to fix this
> problem?  At the very least I'd like to make a doc patch for 1.3.1 that
> explains the fragility.

I'm +1 on fixing it now, rather then later. The regex approach was a
bad idea in the first place and it never really worked - it's just
that no one noticed. Babel integration is the ultimate goal, but that
can't happen until 1.4, 'cause it's a new feature. I don't think this
patch moves us away from that goal. It just fixes a long existing bug
and doesn't touch any public APIs. I tested the patch on my code with
good results.


--
Łukasz Rekucki

Peter Portante

unread,
Apr 14, 2011, 9:07:30 AM4/14/11
to django-d...@googlegroups.com
+1 from the Tabblo group remnants.

Jannis Leidel

unread,
Apr 14, 2011, 9:08:30 AM4/14/11
to django-d...@googlegroups.com

Actually that's a good reason, why fixing this in 1.3.X is a bad idea. If we have a better way to fix this properly in trunk then we shouldn't try adding a huge chunk of code to a release branch. IOW, I'm -1 on backporting the patch Ned proposed.

Jannis

Jannis Leidel

unread,
Apr 14, 2011, 9:26:18 AM4/14/11
to django-d...@googlegroups.com
On 14.04.2011, at 14:41, Ned Batchelder wrote:

> Does anyone else have any opinions on a direction forward to fix this problem? At the very least I'd like to make a doc patch for 1.3.1 that explains the fragility.

Yeah, a doc patch sounds like a good plan.

Jannis

Ned Batchelder

unread,
Apr 14, 2011, 9:54:58 AM4/14/11
to django-d...@googlegroups.com, Jannis Leidel

Frankly, I'm disappointed by this approach. Shall I draft the paragraph
that says roughly, "This feature of Django doesn't work and will fail
silently. Please find a third-party alternative."?

--Ned.

Jannis Leidel

unread,
Apr 14, 2011, 10:30:11 AM4/14/11
to Ned Batchelder, django-d...@googlegroups.com

This is hardly a new issue (see the age of the tickets), so we should only clarify that using the recent gettext (0.18.1.1) won't work.

Having said that, are you at all interested in working on the Babel-based solution?

Jannis

Ned Batchelder

unread,
Apr 14, 2011, 11:17:11 AM4/14/11
to django-d...@googlegroups.com, Jannis Leidel
Precisely because of the age of the tickets, we know this isn't limited
to recent gettext. The problem has been exacerbated by recent gettext,
but in fact, it's been a problem with older gettext as well.
Apostrophes in line comments have always caused issues, for example.
There is no version of gettext that works well for Javascript. This
feature of Django is simply broken. I expect as more users have gettext
0.18, we'll hear more about it, as you can already see from the recent
uptick in activity on these old tickets.

I don't know what to write in the docs to explain to users that this
doesn't work in Django. 1.3 will be used for a very long time by many
people, I'd love to be able to tell them that Django can support
localization of Javascript text out of the box.

Are there any other committers (or even a BDFL) with an opinion about
this? We've gathered a -1 from a committer and three +1 (including
mine) from the community.

Having said all that, I am very interested in making this work well in
Django. If I can help with Babel, let me know.

--Ned.

Jacob Kaplan-Moss

unread,
Apr 14, 2011, 11:27:47 AM4/14/11
to django-developers
Hey all --

I think I agree with Ned here: I can't see the downside to fixing it
on the release branch. "It violates our policy" doesn't count IMO:
it's *our* policy, and we get to break it if there's a good reason.
Making translation in JavaScript work is a good reason as I see it.

Jannis: can you speak a bit more to why you're against fixing this for
the 1.3 series? Is there a technical downside I'm missing?

Jacob

Jannis Leidel

unread,
Apr 14, 2011, 12:30:56 PM4/14/11
to django-d...@googlegroups.com
On 14.04.2011, at 17:27, Jacob Kaplan-Moss wrote:

> I think I agree with Ned here: I can't see the downside to fixing it
> on the release branch. "It violates our policy" doesn't count IMO:
> it's *our* policy, and we get to break it if there's a good reason.
> Making translation in JavaScript work is a good reason as I see it.

FTR I agree we have an issue here, I just disagree with the proposed fix. If you think we can adopt Babel in the release branch, let's do it.

> Jannis: can you speak a bit more to why you're against fixing this for
> the 1.3 series? Is there a technical downside I'm missing?

Again, I'm not convinced that Ned's patch solves the actual problem: Django abusing the xgettext CLI tool to parse JavaScript files with its Perl lexer. He proposes to convert the content of JavaScript files with a custom made lexer to a format that xgettext can understand as "C" instead. This -- while being a nifty piece of code -- seems like a hack to me. Which is why I believe that such a code only adds more maintanance burden on us for the already fragile i18n system and should be replaced with a proper JavaScript parser.

So I've proposed to adopt Babel, which is a proven, widely used system that implements many of the weird hacks we have in Django i18n cleanly in Python. As a bonus it would allow us to get rid of a binary dependency that was difficult to install on all platforms in the past anyway.

In other words, technically we're speaking of a bikeshed that has some holes in the roof and Ned is trying to fix them with new paint. IMHO, it needs to be reconstructed instead. Whether we do this in the release branch or in trunk I don't care.

Jannis

Łukasz Rekucki

unread,
Apr 14, 2011, 2:09:47 PM4/14/11
to django-d...@googlegroups.com
On 14 April 2011 18:30, Jannis Leidel <lei...@gmail.com> wrote:
> On 14.04.2011, at 17:27, Jacob Kaplan-Moss wrote:
>
>> I think I agree with Ned here: I can't see the downside to fixing it
>> on the release branch. "It violates our policy" doesn't count IMO:
>> it's *our* policy, and we get to break it if there's a good reason.
>> Making translation in JavaScript work is a good reason as I see it.
>
> FTR I agree we have an issue here, I just disagree with the proposed fix. If you think we can adopt Babel in the release branch, let's do it.

Integrating Babel is much more work, then replacing the regexp hack
with a lexer. There are also some decisions to make: do we bundle
Babel with Django? Should Django use Babel's locale interface (which
would be slower/faster ?) or just the gettext part?

>
>> Jannis: can you speak a bit more to why you're against fixing this for
>> the 1.3 series? Is there a technical downside I'm missing?
>
> Again, I'm not convinced that Ned's patch solves the actual problem: Django abusing the xgettext CLI tool to parse JavaScript files with its Perl lexer. He proposes to convert the content of JavaScript files with a custom made lexer to a format that xgettext can understand as "C" instead. This -- while being a nifty piece of code -- seems like a hack to me. Which is why I believe that such a code only adds more maintanance burden on us for the already fragile i18n system and should be replaced with a proper JavaScript parser.

I agree that using xgettext like that is a hack, but it's a one that
*can* work - and it works pretty well for Django's template language.
Mostly because the translator uses a lexer. Using regular expression
to "parse" JavaScript is a hack that's well... You're Doing It
Wrong(tm).

> So I've proposed to adopt Babel, which is a proven, widely used system that implements many of the weird hacks we have in Django i18n cleanly in Python. As a bonus it would allow us to get rid of a binary dependency that was difficult to install on all platforms in the past anyway.

I think no one denies that. I'll be happy to help with that**. But
this probably won't happen today or tomorrow, so getting rid of one
broken hack would be great. If the core team decides that integrating
Babel in 1.3.1 is fine, then that's great news. If not, lets just make
it work.

>
> In other words, technically we're speaking of a bikeshed that has some holes in the roof and Ned is trying to fix them with new paint. IMHO, it needs to be reconstructed instead. Whether we do this in the release branch or in trunk I don't care.

Following this metaphor, it's autumn, it's raining and the materials
for a new roof won't be coming until late winter, so covering the
bikes with a some tilt, so they don't rust might be a good idea :P


**You mentioned talking to Armin Ronacher. Do you have any plan of
action or something ? :)

--
Łukasz Rekucki

Jannis Leidel

unread,
Apr 14, 2011, 3:47:59 PM4/14/11
to django-d...@googlegroups.com
On 14.04.2011, at 20:09, Łukasz Rekucki wrote:

> On 14 April 2011 18:30, Jannis Leidel <lei...@gmail.com> wrote:
>> On 14.04.2011, at 17:27, Jacob Kaplan-Moss wrote:
>>
>>> I think I agree with Ned here: I can't see the downside to fixing it
>>> on the release branch. "It violates our policy" doesn't count IMO:
>>> it's *our* policy, and we get to break it if there's a good reason.
>>> Making translation in JavaScript work is a good reason as I see it.
>>
>> FTR I agree we have an issue here, I just disagree with the proposed fix. If you think we can adopt Babel in the release branch, let's do it.
>
> Integrating Babel is much more work, then replacing the regexp hack
> with a lexer. There are also some decisions to make: do we bundle
> Babel with Django?

I'm not convinced it's much more work, since we really only need to
replace the string extraction code in makemessages and the
compilation code in compilemessages with calls to the equivalent code
in Babel. There is already a simple message extractor in BabelDjango
[1] that is a port of django.utils.translation.templatize which would
only need to be updated to the current version in Django, given the
few differences [2].

Other than that I don't see other hard requirements for Babel in Django,
which is why I'd suggest to not bundle Babel and check for its existence
when calling the commands, asking the developer to install it with one
of the package management tools. [3]

> Should Django use Babel's locale interface (which
> would be slower/faster ?) or just the gettext part?

The code in django.utils.translation.trans_real doesn't need to be
modified since it relies on Python's gettext module to load the
translations.

> **You mentioned talking to Armin Ronacher. Do you have any plan of
> action or something ? :)

Not specifically, we only discussed what the level of integration between
the two systems could be and how the JavaScript lexer [4] works. Since
then I've spent a few hours reviewing Babel code but haven't produced
code yet.

Jannis


1: http://svn.edgewall.org/repos/babel/contrib/django/babeldjango/extract.py
2: https://gist.github.com/05a28232e63bc30277d5
3: http://pypi.python.org/pypi/Babel
http://packages.ubuntu.com/search?keywords=python-pybabel
http://packages.debian.org/search?keywords=python-pybabel
http://packages.gentoo.org/package/dev-python/Babel
http://www.rpmfind.net/linux/rpm2html/search.php?query=babel
http://www.freshports.org/devel/py-babel/
4: http://svn.edgewall.org/repos/babel/trunk/babel/messages/jslexer.py

Łukasz Rekucki

unread,
Apr 14, 2011, 5:40:58 PM4/14/11
to django-d...@googlegroups.com, Jannis Leidel
Ok, I created a ticket to track the Babel integration:
http://code.djangoproject.com/ticket/15832

On 14 April 2011 21:47, Jannis Leidel <lei...@gmail.com> wrote:
>
> I'm not convinced it's much more work, since we really only need to
> replace the string extraction code in makemessages and the
> compilation code in compilemessages with calls to the equivalent code
> in Babel. There is already a simple message extractor in BabelDjango
> [1] that is a port of django.utils.translation.templatize which would
> only need to be updated to the current version in Django, given the
> few differences [2].

I guess we can just do it and see how long it takes :)

>> Should Django use Babel's locale interface (which
>> would be slower/faster ?) or just the gettext part?
>
> The code in django.utils.translation.trans_real doesn't need to be
> modified since it relies on Python's gettext module to load the
> translations.

By 'locale interface' I actually meant the CLDR. Django's
"localflavor/**/formats.py" mostly duplicates information provided by
Babel.

--
Łukasz Rekucki

Russell Keith-Magee

unread,
Apr 14, 2011, 11:40:55 PM4/14/11
to django-d...@googlegroups.com

I'm a little uncertain where to throw my vote here.

I can appreciate Ned and Jacob's "pragmatic" approach -- this problem
exists, so an interim partial solution is better than nothing if it is
easy to apply.

However -- I can also see Jannis' point: Babel is the real solution
here, and any effort spent on maintaining the existing lexer is effort
that could be spent in fixing the problem properly with Babel.

My concern in accepting the "pragmatic" approach is whether it
actually *is* pragmatic. Completely independent of whether there is a
Babel solution coming soon, I don't have any feeling as to whether
Ned's proposed partial solution will introduce more problems than it
solves.

My gut feel tells me the Javascript is more like C than Perl, but
that's really nothing more than a gut feel, and the proposed patch
does much more than just say "treat it like C" -- it also involves
introducing a JavaScript lexer. We're proposing putting this into the
1.3 branch -- a stable branch -- without any significant testing.
We're essentially saying as a project that the JS lexer is stable
tested code, known to be better than the existing solution in all
conditions, with no significant regressions.

No offense intended to Ned, but I simply don't see sufficient evidence
to be confident that this is the case. 216 lines of tests doesn't
strike me as anything close to a broad enough test suite to validate
that the Javascript lexer will work under all conditions. Yes, the
Perl-based lexer is broken, but it's broken in known ways; we don't
have any experience to know where the flaws lie in the C-based lexer,
and once it's in Django's repo, we're committing to maintaining it and
all it's flaws and foibles.

If the proposed patch was leaning on a well established lexer, or was
a simple configuration change (i.e., "treat it as C, not Perl") that
could be quickly demonstrated to fix a bunch of problems without
introducing any obvious new problems, I'd be all in favor of it as a
temporary solution. But that's not what is on the table -- it's a
complex body of code that we're proposing to evaluate, introduce,
maintain, and potentially deprecate very quickly.

I'm happy to be proven wrong on any of the points mentioned here --
for example, if someone can provide some mechanism that independently
demonstrates the robustness of the Javascript lexer, or evidence that
the risk of regressions is low. However, absent of such evidence, I'm
inclined to side with Jannis and concentrate our efforts on Babel --
especially if, as Jannis suggests, a Babel-based solution isn't that
much work and could be knocked off easily with a short concentrated
effort.

Yours,
Russ Magee %-)

Ned Batchelder

unread,
Apr 15, 2011, 9:25:40 AM4/15/11
to django-d...@googlegroups.com
I'm don't know why we are assuming Babel is stable, well established
technology. No offense intended to Babel, but its Javascript lexer has
only 89 lines of tests, so by your line-count criteria, Babel isn't
ready to depend on either. And an examination of the code shows that it
will mishandle at least one case (the regex /[/]/). Known bugs in my
patch: 0, known bugs in Babel: 1.

But this is an odd debate, there are really three solutions to evaluate:
the existing code, my patch, and Babel, and by any yardstick, the
existing code is badly broken.

I have no idea how widely used Babel is, but it can't be as wide as the
Gnu gettext utilities, so we'd have to evaluate those other components
of Babel as well. Are there edge cases in .po and .mo files that it
doesn't handle properly? I have no idea.

Keep in mind that the proposal is not to include Babel, but to depend on
it as a prerequisite, which means we are stuck in the same situation we
are with gettext: it can change independently of Django, and new
versions can introduce new bad behavior. That's one of the reasons we
have a bad problem today: gettext changed from 0.17 to 0.18, and
exacerbated the hack. Babel has the advantage that it is pure Python,
so it is both more installable than gettext, and is more readable for
us. It also has the advantage that it isn't based on a hack, but that
doesn't mean it performs flawlessly.

BTW: the Perl-based lexer is not broken in "known ways". I've never
looked at the gettext source, and have no idea what subset of Perl it
parses correctly, and I don't know Perl syntax well enough even to start
testing the tricky cases. And the bad behavior depends on the version
of gettext. A Django project that has meticulously twisted their
Javascript to avoid the "known problems" can then fail if used on a
system with a newer (or older) version of gettext.

> If the proposed patch was leaning on a well established lexer, or was
> a simple configuration change (i.e., "treat it as C, not Perl") that
> could be quickly demonstrated to fix a bunch of problems without
> introducing any obvious new problems, I'd be all in favor of it as a
> temporary solution. But that's not what is on the table -- it's a
> complex body of code that we're proposing to evaluate, introduce,
> maintain, and potentially deprecate very quickly.
>

While the patch I've submitted is certainly larger than a configuration
change, and is not a well-established lexer, I have "quickly
demonstrated that it fixes a bunch of problems without introducing any
obvious new problems", or at least, no one has come forward with a new
problem. I've paid a bounty on Stack Overflow for people to find
problems in the lexer itself, which they have done, and those problems
have been fixed.


> I'm happy to be proven wrong on any of the points mentioned here --
> for example, if someone can provide some mechanism that independently
> demonstrates the robustness of the Javascript lexer, or evidence that
> the risk of regressions is low. However, absent of such evidence, I'm
> inclined to side with Jannis and concentrate our efforts on Babel --
> especially if, as Jannis suggests, a Babel-based solution isn't that
> much work and could be knocked off easily with a short concentrated
> effort.

I'd be glad to undertake the effort to demonstrate the robustness of the
Javascript lexer, if someone can tell me what that test would look
like. I've done the work to read the ECMAScript spec, I can certainly
do the work to write more tests. I've run some significant code through
the lexer (jQuery, for example), and it didn't result in any 'other'
tokens, and was properly synchronized at the end, though I can't say I
examined every token in the stream. If anyone has an idea how to more
thoroughly test a lexer, I'm all ears.

But keep in mind: that work will also have to be done for Babel. I'm
more than happy to contribute my 216 lines of tests to their 89 lines,
or to lend my new-found knowledge about the finer points of lexing
Javascript. We should decide on a set of acceptance criteria, because
it's clear that any solution we adopt will have to meet it.

As you say, Jannis has suggested that a Babel-based solution isn't that
much work. But that work hasn't been done yet. I don't know how much
work it is. It's going to be a larger change to the code base than my
patch is, at least it is if you properly consider that the Babel code is
part of the change, even if it isn't included in the patch. But we
don't have a Babel patch to consider, only the suggestion that it won't
be a big deal and it will work well. That suggestion remains to be proven.

Let me be clear: I don't care much if we use my patch or Babel. I just
want this problem fixed well. I put real work into my patch, but if it
isn't the fix, OK, I learned a lot and had fun doing it. If need be,
I'll package up my patch as a standalone app that adds a new management
command, makejsmessages, that does it right. Then I'll never have to
deal with it again. I just think it's bad that Django 1.3 does this
thing poorly, and I want Django to be the best it can be.

--Ned.

Russell Keith-Magee

unread,
Apr 15, 2011, 12:40:29 PM4/15/11
to django-d...@googlegroups.com
On Fri, Apr 15, 2011 at 9:25 PM, Ned Batchelder <n...@nedbatchelder.com> wrote:
> On 4/14/2011 11:40 PM, Russell Keith-Magee wrote:
>>
>> No offense intended to Ned, but I simply don't see sufficient evidence
>> to be confident that this is the case. 216 lines of tests doesn't
>> strike me as anything close to a broad enough test suite to validate
>> that the Javascript lexer will work under all conditions. Yes, the
>> Perl-based lexer is broken, but it's broken in known ways; we don't
>> have any experience to know where the flaws lie in the C-based lexer,
>> and once it's in Django's repo, we're committing to maintaining it and
>> all it's flaws and foibles.
>>
> I'm don't know why we are assuming Babel is stable, well established
> technology.  No offense intended to Babel, but its Javascript lexer has only
> 89 lines of tests, so by your line-count criteria, Babel isn't ready to
> depend on either.  And an examination of the code shows that it will
> mishandle at least one case (the regex /[/]/).  Known bugs in my patch: 0,
> known bugs in Babel: 1.

I'll be honest -- I have no specific reason to believe in Babel. I'm
going on the fact people who I trust when it comes to i18n (like
Jannis) have recommended it highly.

I'm also enthused at the prospect of having a better foundation to
build on. The gettext handling that we have is starting to look like
an increasingly fragile collection of hacks; there comes a point at
which that becomes a maintenance hassle, and we should step back and
fix the problem properly.

> But this is an odd debate, there are really three solutions to evaluate: the
> existing code, my patch, and Babel, and by any yardstick, the existing code
> is badly broken.

No argument here. This isn't a new situation, though -- the existing
code has been broken for a long time.

> I have no idea how widely used Babel is, but it can't be as wide as the Gnu
> gettext utilities, so we'd have to evaluate those other components of Babel
> as well.  Are there edge cases in .po and .mo files that it doesn't handle
> properly?  I have no idea.

Yes, gettext is widely used. And yet, it apparently doesn't have a
Javascript parser, which seems like a pretty stunning omission in the
modern world, and a glaring missing feature for a project like Django.

IMHO, a native Javascript parsing mode would seem like a much better
contribution (to the world, not just Django) than a set of
Django-specfic hacks designed to cajole the C parser into working with
Javascript.

> Keep in mind that the proposal is not to include Babel, but to depend on it
> as a prerequisite, which means we are stuck in the same situation we are
> with gettext: it can change independently of Django, and new versions can
> introduce new bad behavior.  That's one of the reasons we have a bad problem
> today: gettext changed from 0.17 to 0.18, and exacerbated the hack.

This is true. However, what you're proposing is, IMHO, a slightly
worse situation.

Babel is a self contained tool. Assuming it work as advertised (and
I'll grant that is a big and important assumption), it is a self
contained body of code. It is a dependency, but as long as it
continues to work as advertised, we're fine. We're only dealing with
it's advertised interface, and working with that interface in the way
it was intended to be worked with.

On the other hand, gettext is also a dependency, and gettext can also
change between releases -- but we're not using it as intended. We're
bending the Perl parser (or, in your case, the C parser) in strange
and unusual ways to do something it wasn't originally intended to do.
Something completely innocuous can change in gettext, and the
follow-on effect to us can be huge because we've built our castle on
an unstable foundation.

The maintenance issue is the critical part here. My hesitation isn't
just to do with the suitability of your code *right now*. It's to do
with the fact that once we adopt the code into trunk, we are to
agreeing to maintain it. Bits don't rot, but gettext has already
demonstrated that it changes between versions, so it's reasonable to
assume that when gettext 0.19 is released (whenever that happens),
we'll need to make changes to our Javascript parser. By taking on the
lexer, we're absorbing into Django a whole bunch of project
responsibility that frankly, I'd rather we didn't have.

> Babel
> has the advantage that it is pure Python, so it is both more installable
> than gettext, and is more readable for us.  It also has the advantage that
> it isn't based on a hack, but that doesn't mean it performs flawlessly.

I'm not saying it does. But presumably Babel 0.9 works better than
0.8, and 0.10 will work better than 0.9, and so on. If a high profile
project like Django uses it, presumably this improvement will happen
faster by virtue of the extra attention. If a problem is found, we can
direct that fix upstream, instead of falling victim to NIH and making
everything a problem that Django needs to fix.

Improvements in gettext don't follow on the same way -- after all,
gettext is busy fixing the C parser, and an improvement in the C
parser may not serve the needs of our Javascript parser. If there's a
problem with the Django's Javascript lexer, that's Django's problem
alone, and there's no broader community that will help to make it
better.

> BTW: the Perl-based lexer is not broken in "known ways".  I've never looked
> at the gettext source, and have no idea what subset of Perl it parses
> correctly, and I don't know Perl syntax well enough even to start testing
> the tricky cases.  And the bad behavior depends on the version of gettext.
>  A Django project that has meticulously twisted their Javascript to avoid
> the "known problems" can then fail if used on a system with a newer (or
> older) version of gettext.

When I said "broken in known ways", I mostly meant that it was prima
facie broken. However, this isn't a recent development -- the fact
that your patch is attached to ticket #7704 is evidence of that.

>> If the proposed patch was leaning on a well established lexer, or was
>> a simple configuration change (i.e., "treat it as C, not Perl") that
>> could be quickly demonstrated to fix a bunch of problems without
>> introducing any obvious new problems, I'd be all in favor of it as a
>> temporary solution. But that's not what is on the table -- it's a
>> complex body of code that we're proposing to evaluate, introduce,
>> maintain, and potentially deprecate very quickly.
>>
> While the patch I've submitted is certainly larger than a configuration
> change, and is not a well-established lexer, I have "quickly demonstrated
> that it fixes a bunch of problems without introducing any obvious new
> problems", or at least, no one has come forward with a new problem.  I've
> paid a bounty on Stack Overflow for people to find problems in the lexer
> itself, which they have done, and those problems have been fixed.

That sort of thing evidence certainly works in your favor -- I wasn't
aware that this sort of testing had taken place.

>> I'm happy to be proven wrong on any of the points mentioned here --
>> for example, if someone can provide some mechanism that independently
>> demonstrates the robustness of the Javascript lexer, or evidence that
>> the risk of regressions is low. However, absent of such evidence, I'm
>> inclined to side with Jannis and concentrate our efforts on Babel --
>> especially if, as Jannis suggests, a Babel-based solution isn't that
>> much work and could be knocked off easily with a short concentrated
>> effort.
>
> I'd be glad to undertake the effort to demonstrate the robustness of the
> Javascript lexer, if someone can tell me what that test would look like.
>  I've done the work to read the ECMAScript spec, I can certainly do the work
> to write more tests.  I've run some significant code through the lexer
> (jQuery, for example), and it didn't result in any 'other' tokens, and was
> properly synchronized at the end, though I can't say I examined every token
> in the stream.  If anyone has an idea how to more thoroughly test a lexer,
> I'm all ears.

Again -- this is good evidence, and something that hasn't (AFAICT)
been stated previously in this forum.

> But keep in mind: that work will also have to be done for Babel.  I'm more
> than happy to contribute my 216 lines of tests to their 89 lines, or to lend
> my new-found knowledge about the finer points of lexing Javascript.  We
> should decide on a set of acceptance criteria, because it's clear that any
> solution we adopt will have to meet it.
>
> As you say, Jannis has suggested that a Babel-based solution isn't that much
> work.  But that work hasn't been done yet.  I don't know how much work it
> is.  It's going to be a larger change to the code base than my patch is, at
> least it is if you properly consider that the Babel code is part of the
> change, even if it isn't included in the patch.  But we don't have a Babel
> patch to consider, only the suggestion that it won't be a big deal and it
> will work well.  That suggestion remains to be proven.

Again, can't argue with this. My hope is that this discussion will
kickstart a serious effort on Babel integration.

> Let me be clear: I don't care much if we use my patch or Babel.  I just want
> this problem fixed well.  I put real work into my patch, but if it isn't the
> fix, OK, I learned a lot and had fun doing it.  If need be, I'll package up
> my patch as a standalone app that adds a new management command,
> makejsmessages, that does it right.  Then I'll never have to deal with it
> again.  I just think it's bad that Django 1.3 does this thing poorly, and I
> want Django to be the best it can be.

I have the same goal. I'd like to see this collection of bugs fixed.
However, it's not a new problem, either, so while I would like to see
this problem fixed, I'd rather address it properly, rather than
quickly.

I'd rather see the effort put into Babel integration so we can
evaluate if will solve the problem properly; if it turns out it
doesn't, then we can always apply your patch as the "best of a bad
bunch of options" option.

Yours,
Russ Magee %-)

Ned Batchelder

unread,
Apr 17, 2011, 6:14:53 PM4/17/11
to django-d...@googlegroups.com, Russell Keith-Magee
On 4/15/2011 12:40 PM, Russell Keith-Magee wrote:
> On Fri, Apr 15, 2011 at 9:25 PM, Ned Batchelder<n...@nedbatchelder.com> wrote:
>> On 4/14/2011 11:40 PM, Russell Keith-Magee wrote:
>
>> Keep in mind that the proposal is not to include Babel, but to depend on it
>> as a prerequisite, which means we are stuck in the same situation we are
>> with gettext: it can change independently of Django, and new versions can
>> introduce new bad behavior. That's one of the reasons we have a bad problem
>> today: gettext changed from 0.17 to 0.18, and exacerbated the hack.
> This is true. However, what you're proposing is, IMHO, a slightly
> worse situation.
>
> Babel is a self contained tool. Assuming it work as advertised (and
> I'll grant that is a big and important assumption), it is a self
> contained body of code. It is a dependency, but as long as it
> continues to work as advertised, we're fine. We're only dealing with
> it's advertised interface, and working with that interface in the way
> it was intended to be worked with.
>
> On the other hand, gettext is also a dependency, and gettext can also
> change between releases -- but we're not using it as intended. We're
> bending the Perl parser (or, in your case, the C parser) in strange
> and unusual ways to do something it wasn't originally intended to do.
> Something completely innocuous can change in gettext, and the
> follow-on effect to us can be huge because we've built our castle on
> an unstable foundation.
>
I'm not much concerned that the C parsing in gettext will change
significantly, but I take your point, it certainly could change behind
our backs and we could be broken again.

> The maintenance issue is the critical part here. My hesitation isn't
> just to do with the suitability of your code *right now*. It's to do
> with the fact that once we adopt the code into trunk, we are to
> agreeing to maintain it. Bits don't rot, but gettext has already
> demonstrated that it changes between versions, so it's reasonable to
> assume that when gettext 0.19 is released (whenever that happens),
> we'll need to make changes to our Javascript parser. By taking on the
> lexer, we're absorbing into Django a whole bunch of project
> responsibility that frankly, I'd rather we didn't have.
>
>> Babel
>> has the advantage that it is pure Python, so it is both more installable
>> than gettext, and is more readable for us. It also has the advantage that
>> it isn't based on a hack, but that doesn't mean it performs flawlessly.
>

Sorry, I also wrote about this journey on my blog
(http://nedbatchelder.com/blog/201104/a_javascript_lexer_in_python_and_the_saga_behind_it.html),
and had lost track of which details went where.


>> But keep in mind: that work will also have to be done for Babel. I'm more
>> than happy to contribute my 216 lines of tests to their 89 lines, or to lend
>> my new-found knowledge about the finer points of lexing Javascript. We
>> should decide on a set of acceptance criteria, because it's clear that any
>> solution we adopt will have to meet it.
>>
>> As you say, Jannis has suggested that a Babel-based solution isn't that much
>> work. But that work hasn't been done yet. I don't know how much work it
>> is. It's going to be a larger change to the code base than my patch is, at
>> least it is if you properly consider that the Babel code is part of the
>> change, even if it isn't included in the patch. But we don't have a Babel
>> patch to consider, only the suggestion that it won't be a big deal and it
>> will work well. That suggestion remains to be proven.
> Again, can't argue with this. My hope is that this discussion will
> kickstart a serious effort on Babel integration.
>

I'm looking forward to that too, and will help where I can.

--Ned.

Jonathan Slenders

unread,
Apr 19, 2011, 9:35:03 AM4/19/11
to Django developers
A related question.

How should we thread escape characters when the msgid is generated?
Is the escaping backslash assumed to be part of the translation?
And is this behaviour consistent between Babel and the current parser?

gettext("xy\"zzy 3");

In my opinion, we should completely unescape the gettext string before
doing the translation, and escape it again afterwards.


I'm asking this because I'm going to implement gettext preprocessing
in the template preprocessor and I'd like all implementations to be
compatible.
Basically, when this works, the i18n catalog for javascript is no
longer required, even for external files.
https://github.com/citylive/django-template-preprocessor

cheers,
Jonathan
> (http://nedbatchelder.com/blog/201104/a_javascript_lexer_in_python_and...),

Łukasz Rekucki

unread,
Apr 19, 2011, 9:50:12 AM4/19/11
to django-d...@googlegroups.com
On 19 April 2011 15:35, Jonathan Slenders <jonathan...@gmail.com> wrote:
>
> Basically, when this works, the i18n catalog for javascript is no
> longer required, even for external files.
> https://github.com/citylive/django-template-preprocessor
>

Could you elaborate on that ? How does your application help me handle
client-side translations ?

--
Łukasz Rekucki

Jonathan Slenders

unread,
Apr 19, 2011, 10:08:49 AM4/19/11
to Django developers
Hi Lukasz,

It does not yet generate .po files. While I haven't yet had any plans
for doing this, it's not really complex at all to make this possible
through the preprocessor.

What is currently does, is parsing and processing templates and
external css/js files. It does a lot of optimizations in order to gain
better performance at runtime, and one of the optimizations is to
preprocess all the translations.

Currently {% trans %} and {% blocktrans %} are preprocessed, even with
support for variables and {% plural %}. HTML, javascript and CSS are
also parsed, so now it's not too much work to preprocess gettext() in
javascript files as well.

The output is a 'cache' directory where the processed templates are
stored (one subdir for each language), and a cache directory for the
media/static files.

This would make using gettext() for internal javascript files instead
of the ugly {% trans %} within quotes have no more disadvantages.

And while everything is processed, it's rather easy log all the
strings which are to be translated on the way.

Cheers,
Jonathan


On 19 avr, 15:50, Łukasz Rekucki <lreku...@gmail.com> wrote:

Aron Griffis

unread,
Apr 24, 2011, 9:16:55 PM4/24/11
to django-d...@googlegroups.com
On Thursday, April 14, 2011 12:30:56 PM UTC-4, Jannis Leidel wrote:
On 14.04.2011, at 17:27, Jacob Kaplan-Moss wrote:

> I think I agree with Ned here: I can't see the downside to fixing it
> on the release branch. "It violates our policy" doesn't count IMO:
> it's *our* policy, and we get to break it if there's a good reason.
> Making translation in JavaScript work is a good reason as I see it.

FTR I agree we have an issue here, I just disagree with the proposed fix. If you think we can adopt Babel in the release branch, let's do it.

I haven't posted here before, or made any contribution to Django thus far, so I recognize in advance that my opinion isn't worth much. However I'm a Django user who would benefit from a solution to the Javascript messages problem, so I'd like to register my voice. :-)

I'm in favor of Ned's patch for 1.3.x then switching to Babel in 1.4.  As I see it, there are two points in favor of Ned's patch: readiness and risk. Regarding readiness, Ned's patch is drop-in and the Babel work hasn't been done yet. Regarding risk, Ned's patch fixes known-terrible-brokenness in Javascript message parsing without introducing much risk, whereas switching to Babel carries some (perhaps minor) risk of regressing existing 1.3 installations so it deserves soaking in the development tree for a while.  It's interesting to me that Ned pointed out some JS syntax that Babel misparses but Ned's jslex gets right, suggesting that Babel may still need work too.

Thanks,
Aron

Jannis Leidel

unread,
May 29, 2011, 5:40:11 AM5/29/11
to django-d...@googlegroups.com

On 15.04.2011, at 15:25, Ned Batchelder wrote:

Hi Ned,

> As you say, Jannis has suggested that a Babel-based solution isn't that much work. But that work hasn't been done yet. I don't know how much work it is. It's going to be a larger change to the code base than my patch is, at least it is if you properly consider that the Babel code is part of the change, even if it isn't included in the patch. But we don't have a Babel patch to consider, only the suggestion that it won't be a big deal and it will work well. That suggestion remains to be proven.

I have good and some bad news. Last week I tried to put my money where my mouth
is and dived into Babel to replace the xgettext calls in makemessage with the
appropriate calls to Babel's message extraction/compilation functions. I've
found it to be pretty easy to get started at first but stumbled over differences
in the way the message catalogue update process works.

Contrary to Django it doesn't easily allow to "look for new strings and update
PO file" but needs a separate step to update an original POT, which is then to be
manually merged with the existing PO file. Given the amount of glue code I had to
write to work around it, I'm not longer convinced that the adoption of Babel
would be a sensible to fix to the Javascript message extraction problems.

That said, I'd be happy to know whether there are any updates with regard to your
Javascript lexer that I need to consider before merging your patches.

Jannis

Ned Batchelder

unread,
Jun 2, 2011, 10:15:45 PM6/2/11
to django-d...@googlegroups.com, Jannis Leidel
On 5/29/2011 5:40 AM, Jannis Leidel wrote:
> On 15.04.2011, at 15:25, Ned Batchelder wrote:
>
> Hi Ned,
>
>> As you say, Jannis has suggested that a Babel-based solution isn't that much work. But that work hasn't been done yet. I don't know how much work it is. It's going to be a larger change to the code base than my patch is, at least it is if you properly consider that the Babel code is part of the change, even if it isn't included in the patch. But we don't have a Babel patch to consider, only the suggestion that it won't be a big deal and it will work well. That suggestion remains to be proven.
> I have good and some bad news. Last week I tried to put my money where my mouth
> is and dived into Babel to replace the xgettext calls in makemessage with the
> appropriate calls to Babel's message extraction/compilation functions. I've
> found it to be pretty easy to get started at first but stumbled over differences
> in the way the message catalogue update process works.
>
> Contrary to Django it doesn't easily allow to "look for new strings and update
> PO file" but needs a separate step to update an original POT, which is then to be
> manually merged with the existing PO file. Given the amount of glue code I had to
> write to work around it, I'm not longer convinced that the adoption of Babel
> would be a sensible to fix to the Javascript message extraction problems.
>
I'm sorry to hear Babel didn't work out, it would have been liberating
to have a tool entirely within our grasp.

> That said, I'd be happy to know whether there are any updates with regard to your
> Javascript lexer that I need to consider before merging your patches.
There were two fixes to jslex.py, I've added an updated patch to ticket
#7704. Let me know what I can do to help with the merge. Thanks.

--Ned.

> Jannis
>

Jannis Leidel

unread,
Jun 3, 2011, 12:26:37 PM6/3/11
to django-d...@googlegroups.com

Great, thank you, I'll try to get this in during the next week's DjangoCon sprints.

Just out of curiousity though, is the update basically
https://bitbucket.org/ned/jslex/changeset/2270dbe90afe and
https://bitbucket.org/ned/jslex/changeset/4f4bc539f533?

Jannis

Ned Batchelder

unread,
Jun 3, 2011, 12:41:49 PM6/3/11
to django-d...@googlegroups.com, Jannis Leidel
Yes, that's exactly what it is.

--Ned.
> Jannis
>

Peter Portante

unread,
Jun 3, 2011, 3:31:18 PM6/3/11
to django-d...@googlegroups.com
FWIW: we are successfully using Ned's fix on top of 1.2.5 today. -peter
Reply all
Reply to author
Forward
0 new messages