gettext("xyzzy 1");In this sample, messages 1 and 3 are found, and message 2 is not, because y;ABC;abc; is valid Perl for a transliteration operator. Digging into this, every time I thought I finally understood the full complexity of the brokenness, another case would pop up that didn't make sense. The full horror of Perl syntax (http://perldoc.perl.org/perlop.html#Quote-and-Quote-like-Operators , for example) means that it is very difficult to treat non-Perl code as Perl and expect everything to be OK. This is polyglot programming at its worst.
var x = y;
gettext("xyzzy 2");
var x = z;
gettext("xyzzy 3");
It would be best to have xgettext allow pluggable parsers or/and
rewritten in Python. Oh wait, it's called Babel ;). I was planning to
refactor the `makemessages` command to enable custom parsers (My
motivation is having a bunch of client-side templates written in a
variant of Mustache or jQuery Templates). As babel already has an
interface for that, integrating it into Django would be cool. Sure,
it's a dependancy, but so is xgettext.
>
> 2. Is there some other badness that will bite us if we tell xgettext that
> the modified Javascript is C? With a full Javascript lexer, I feel pretty
> confident that we could solve issues if they do come up, but I'd like to
> know now what they are.
IMHO, it would be best to leave only gettext() calls and pad them with
comments, so that line numbers match (the template converter does
something similar, by replacing all other stuff with string like
ZZZZ). You can then choose C, Python or whatever :)
>
> 3. I know that lexing Javascript is tricky. I need help finding diabolical
> test cases for my lexer (https://bitbucket.org/ned/jslex). Anyone care to
> come up with some Javascript source that it can't properly find the regex
> literals in?
I'll give it a spin on my code :) Thanks for doing this!
--
Łukasz Rekucki
--Ned.
> Last week I re-encountered the problems with using makemessages on Javascript files, and lost a couple of half-days to trying to figure out why some of my translatable messages weren't being found and deposited into my .po files. After fully understanding the extent of Django's current hack, I decided to take a stab at providing a better solution.
>
> Background: today, Javascript source files are parsed for messages by running a "pythonize" regex over them, and giving the resulting text to xgettext, claiming it is Perl. The pythonize regex simply changes any //-style comment on its own line into a #-style comment. This strange accommodation leaves a great deal of valid Javascript syntax in place to confuse the Perl parser in xgettext. As a result, seemingly innocuous Javascript will result in lost messages:
> gettext("xyzzy 1");
> var x = y;
> gettext("xyzzy 2");
> var x = z;
> gettext("xyzzy 3");
> In this sample, messages 1 and 3 are found, and message 2 is not, because y;ABC;abc; is valid Perl for a transliteration operator. Digging into this, every time I thought I finally understood the full complexity of the brokenness, another case would pop up that didn't make sense. The full horror of Perl syntax (http://perldoc.perl.org/perlop.html#Quote-and-Quote-like-Operators , for example) means that it is very difficult to treat non-Perl code as Perl and expect everything to be OK. This is polyglot programming at its worst.
>
> This needs to be fixed. To that end, I've written a Javascript lexer (https://bitbucket.org/ned/jslex) with the goal of using it to pre-process Javascript into a form more suitable for xgettext. My understanding of why we claim Javascript is Perl is that Perl has regex literals like Javascript does, and so xgettext stands the best chance of parsing Javascript as Perl. Clearly that's not working well. My solution would instead remove the regex literals from the Javascript, and then have xgettext treat it as C.
Thanks Ned, I meant the post about this issue here after 1.3, since we also talked about this during the Pycon sprint, especially since we seem to have hit a few more problems with the recent gettext 0.18.1.1 (such as a seemingly stricter Perl lexer) -- which I encountered while I applied the final translation updates right before 1.3 but didn't have time to investigate yet. The bottom line is that I think we should rethink the way we look for translateable strings instead of working around the limitations of xgettext.
> 1. Is this the best path forward? Ideally xgettext would support Javascript directly. There's code out there to add Javascript to xgettext, but I don't know what shape that code is in, or if it's reasonable to expect Django installations to use bleeding-edge xgettext. Is there some better solution that someone is pursuing?
We can't really expect Django users to upgrade to the most recent (or even an unreleased) version of gettext, We've bumped the minimum required version in Django 1.2 to 0.15 once all OSes were covered with installers. Which made me talk to Armin Ronacher about using Babel instead of GNU gettext, since it has a JavaScript lexer and is in use in Sphinx and Trac. [1] In that sense, I wholeheartedly encourage you to take a stab at it for 1.4 -- if you think that's a good idea.
Having a Python based library (assuming it works similarly) seems like a better fit to Django than relying on a C program.
> 2. Is there some other badness that will bite us if we tell xgettext that the modified Javascript is C? With a full Javascript lexer, I feel pretty confident that we could solve issues if they do come up, but I'd like to know now what they are.
I feel this is much better solved once and fall all than to keep misusing xgettext.
Jannis
1: http://babel.edgewall.org/browser/trunk/babel/messages/jslexer.py
I have no experience with Babel, so I don't know what work lies ahead to
integrate it with Django. I have used my code in makemessages, and it
works well.
Questions now include:
1) What can we get done in 1.3.1? Is integrating Babel something that
would have to wait for 1.4?
2) Who is the best expert on Babel and Django that could comment on the
work needed?
3) Are there other opinions about the two paths forward? Are there other
options?
I would like very much to get this problem solved.
--Ned.
--Ned.
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-d...@googlegroups.com.
To unsubscribe from this group, send email to django-develop...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
> I've created two patches implementing this strategy, and attached them to ticket http://code.djangoproject.com/ticket/7704.
Thanks Ned, but I'm a bit confused, I thought we agreed that Babel should be the way to go, since adding an own JavaScript lexer into Django would further move us away from our goal to get rid of the dependency on xgettext.
Jannis
For example, could we switch to Babel for 1.3.1? I'd very much like to
have this headache gone as soon as possible. The lexer I've attached to
the ticket works well, at least I'd like to hear from someone who
believes it doesn't. I understand the desire to move away from
xgettext, but it doesn't seem to be happening. No one had claimed the
three open tickets about this problem, for instance.
I apologize if I'm being brash or impatient here. What's the right way
forward?
--Ned.
> Jannis and Łukasz have both suggested the same thing: use Babel instead of xgettext. I understand why: it's a more complete solution than what I have proposed, which is at heart still a way to trick xgettext into parsing source code it doesn't natively understand.
>
> I have no experience with Babel, so I don't know what work lies ahead to integrate it with Django. I have used my code in makemessages, and it works well.
>
> Questions now include:
>
> 1) What can we get done in 1.3.1? Is integrating Babel something that would have to wait for 1.4?
Both, your lexel as well as Babel adoption would have to wait for 1.4.
> 2) Who is the best expert on Babel and Django that could comment on the work needed?
I talked to Armin Ronacher who both knows Django and Babel reasonable well, being one of the developers of the latter.
> 3) Are there other opinions about the two paths forward? Are there other options?
Not as far as I can see.
> On 4/9/2011 10:43 AM, Jannis Leidel wrote:
>> On 09.04.2011, at 16:14, Ned Batchelder wrote:
>>
>>> I've created two patches implementing this strategy, and attached them to ticket http://code.djangoproject.com/ticket/7704.
>> Thanks Ned, but I'm a bit confused, I thought we agreed that Babel should be the way to go, since adding an own JavaScript lexer into Django would further move us away from our goal to get rid of the dependency on xgettext.
>>
>> Jannis
> I understood that a few people including you expressed a preference for integrating Babel. But I don't see any motion toward doing it, and it brings its own questions, such as the relationship between Django and Babel. Forgive me if I'm wrong, but it seemed like switching from xgettext to Babel was not a simple proposition, and had its detractors.
We've just been out with 1.3 for a few weeks, so there is no big surprise we haven't produced any substancial patches for moving away from xgettext. But that doesn't mean that I (and it seems a few others) haven't researched the bits needed to make it happen. As for the "detractors", I'm not even sure what you mean by that.
> For example, could we switch to Babel for 1.3.1? I'd very much like to have this headache gone as soon as possible. The lexer I've attached to the ticket works well, at least I'd like to hear from someone who believes it doesn't.
No, Babel can't be adopted in the 1.3.X release branch, just like your JavaScript lexer.
> I understand the desire to move away from xgettext, but it doesn't seem to be happening. No one had claimed the three open tickets about this problem, for instance.
How can you say it's not happening? I clearly made a statement about my committement for Babel adoption in the current release cycle.
Jannis
>> For example, could we switch to Babel for 1.3.1? I'd very much like to have this headache gone as soon as possible. The lexer I've attached to the ticket works well, at least I'd like to hear from someone who believes it doesn't.
> No, Babel can't be adopted in the 1.3.X release branch, just like your JavaScript lexer.
Even if Babel is the solution for 1.4, I don't understand why my patch
can't be applied to 1.3.x? It's well tested, and clearly works better
than the code in 1.3 now. It's transparent to the user, except it isn't
baffling like today's code. From my point of view, makemessages simply
doesn't work for Javascript files. It's a frustrating process that
usually ends with twisting your Javascript sources to meet the needs of
a Perl parser, but without understanding that that's what you're doing.
>> I understand the desire to move away from xgettext, but it doesn't seem to be happening. No one had claimed the three open tickets about this problem, for instance.
> How can you say it's not happening? I clearly made a statement about my committement for Babel adoption in the current release cycle.
I was basing that on the fact that the tickets weren't claimed. Again,
I don't mean to antagonize anyone, I just want this to be fixed.
--Ned.
--Ned.
I'm +1 on fixing it now, rather then later. The regex approach was a
bad idea in the first place and it never really worked - it's just
that no one noticed. Babel integration is the ultimate goal, but that
can't happen until 1.4, 'cause it's a new feature. I don't think this
patch moves us away from that goal. It just fixes a long existing bug
and doesn't touch any public APIs. I tested the patch on my code with
good results.
--
Łukasz Rekucki
Actually that's a good reason, why fixing this in 1.3.X is a bad idea. If we have a better way to fix this properly in trunk then we shouldn't try adding a huge chunk of code to a release branch. IOW, I'm -1 on backporting the patch Ned proposed.
Jannis
> Does anyone else have any opinions on a direction forward to fix this problem? At the very least I'd like to make a doc patch for 1.3.1 that explains the fragility.
Yeah, a doc patch sounds like a good plan.
Jannis
Frankly, I'm disappointed by this approach. Shall I draft the paragraph
that says roughly, "This feature of Django doesn't work and will fail
silently. Please find a third-party alternative."?
--Ned.
This is hardly a new issue (see the age of the tickets), so we should only clarify that using the recent gettext (0.18.1.1) won't work.
Having said that, are you at all interested in working on the Babel-based solution?
Jannis
I don't know what to write in the docs to explain to users that this
doesn't work in Django. 1.3 will be used for a very long time by many
people, I'd love to be able to tell them that Django can support
localization of Javascript text out of the box.
Are there any other committers (or even a BDFL) with an opinion about
this? We've gathered a -1 from a committer and three +1 (including
mine) from the community.
Having said all that, I am very interested in making this work well in
Django. If I can help with Babel, let me know.
--Ned.
I think I agree with Ned here: I can't see the downside to fixing it
on the release branch. "It violates our policy" doesn't count IMO:
it's *our* policy, and we get to break it if there's a good reason.
Making translation in JavaScript work is a good reason as I see it.
Jannis: can you speak a bit more to why you're against fixing this for
the 1.3 series? Is there a technical downside I'm missing?
Jacob
> I think I agree with Ned here: I can't see the downside to fixing it
> on the release branch. "It violates our policy" doesn't count IMO:
> it's *our* policy, and we get to break it if there's a good reason.
> Making translation in JavaScript work is a good reason as I see it.
FTR I agree we have an issue here, I just disagree with the proposed fix. If you think we can adopt Babel in the release branch, let's do it.
> Jannis: can you speak a bit more to why you're against fixing this for
> the 1.3 series? Is there a technical downside I'm missing?
Again, I'm not convinced that Ned's patch solves the actual problem: Django abusing the xgettext CLI tool to parse JavaScript files with its Perl lexer. He proposes to convert the content of JavaScript files with a custom made lexer to a format that xgettext can understand as "C" instead. This -- while being a nifty piece of code -- seems like a hack to me. Which is why I believe that such a code only adds more maintanance burden on us for the already fragile i18n system and should be replaced with a proper JavaScript parser.
So I've proposed to adopt Babel, which is a proven, widely used system that implements many of the weird hacks we have in Django i18n cleanly in Python. As a bonus it would allow us to get rid of a binary dependency that was difficult to install on all platforms in the past anyway.
In other words, technically we're speaking of a bikeshed that has some holes in the roof and Ned is trying to fix them with new paint. IMHO, it needs to be reconstructed instead. Whether we do this in the release branch or in trunk I don't care.
Jannis
Integrating Babel is much more work, then replacing the regexp hack
with a lexer. There are also some decisions to make: do we bundle
Babel with Django? Should Django use Babel's locale interface (which
would be slower/faster ?) or just the gettext part?
>
>> Jannis: can you speak a bit more to why you're against fixing this for
>> the 1.3 series? Is there a technical downside I'm missing?
>
> Again, I'm not convinced that Ned's patch solves the actual problem: Django abusing the xgettext CLI tool to parse JavaScript files with its Perl lexer. He proposes to convert the content of JavaScript files with a custom made lexer to a format that xgettext can understand as "C" instead. This -- while being a nifty piece of code -- seems like a hack to me. Which is why I believe that such a code only adds more maintanance burden on us for the already fragile i18n system and should be replaced with a proper JavaScript parser.
I agree that using xgettext like that is a hack, but it's a one that
*can* work - and it works pretty well for Django's template language.
Mostly because the translator uses a lexer. Using regular expression
to "parse" JavaScript is a hack that's well... You're Doing It
Wrong(tm).
> So I've proposed to adopt Babel, which is a proven, widely used system that implements many of the weird hacks we have in Django i18n cleanly in Python. As a bonus it would allow us to get rid of a binary dependency that was difficult to install on all platforms in the past anyway.
I think no one denies that. I'll be happy to help with that**. But
this probably won't happen today or tomorrow, so getting rid of one
broken hack would be great. If the core team decides that integrating
Babel in 1.3.1 is fine, then that's great news. If not, lets just make
it work.
>
> In other words, technically we're speaking of a bikeshed that has some holes in the roof and Ned is trying to fix them with new paint. IMHO, it needs to be reconstructed instead. Whether we do this in the release branch or in trunk I don't care.
Following this metaphor, it's autumn, it's raining and the materials
for a new roof won't be coming until late winter, so covering the
bikes with a some tilt, so they don't rust might be a good idea :P
**You mentioned talking to Armin Ronacher. Do you have any plan of
action or something ? :)
--
Łukasz Rekucki
> On 14 April 2011 18:30, Jannis Leidel <lei...@gmail.com> wrote:
>> On 14.04.2011, at 17:27, Jacob Kaplan-Moss wrote:
>>
>>> I think I agree with Ned here: I can't see the downside to fixing it
>>> on the release branch. "It violates our policy" doesn't count IMO:
>>> it's *our* policy, and we get to break it if there's a good reason.
>>> Making translation in JavaScript work is a good reason as I see it.
>>
>> FTR I agree we have an issue here, I just disagree with the proposed fix. If you think we can adopt Babel in the release branch, let's do it.
>
> Integrating Babel is much more work, then replacing the regexp hack
> with a lexer. There are also some decisions to make: do we bundle
> Babel with Django?
I'm not convinced it's much more work, since we really only need to
replace the string extraction code in makemessages and the
compilation code in compilemessages with calls to the equivalent code
in Babel. There is already a simple message extractor in BabelDjango
[1] that is a port of django.utils.translation.templatize which would
only need to be updated to the current version in Django, given the
few differences [2].
Other than that I don't see other hard requirements for Babel in Django,
which is why I'd suggest to not bundle Babel and check for its existence
when calling the commands, asking the developer to install it with one
of the package management tools. [3]
> Should Django use Babel's locale interface (which
> would be slower/faster ?) or just the gettext part?
The code in django.utils.translation.trans_real doesn't need to be
modified since it relies on Python's gettext module to load the
translations.
> **You mentioned talking to Armin Ronacher. Do you have any plan of
> action or something ? :)
Not specifically, we only discussed what the level of integration between
the two systems could be and how the JavaScript lexer [4] works. Since
then I've spent a few hours reviewing Babel code but haven't produced
code yet.
Jannis
1: http://svn.edgewall.org/repos/babel/contrib/django/babeldjango/extract.py
2: https://gist.github.com/05a28232e63bc30277d5
3: http://pypi.python.org/pypi/Babel
http://packages.ubuntu.com/search?keywords=python-pybabel
http://packages.debian.org/search?keywords=python-pybabel
http://packages.gentoo.org/package/dev-python/Babel
http://www.rpmfind.net/linux/rpm2html/search.php?query=babel
http://www.freshports.org/devel/py-babel/
4: http://svn.edgewall.org/repos/babel/trunk/babel/messages/jslexer.py
On 14 April 2011 21:47, Jannis Leidel <lei...@gmail.com> wrote:
>
> I'm not convinced it's much more work, since we really only need to
> replace the string extraction code in makemessages and the
> compilation code in compilemessages with calls to the equivalent code
> in Babel. There is already a simple message extractor in BabelDjango
> [1] that is a port of django.utils.translation.templatize which would
> only need to be updated to the current version in Django, given the
> few differences [2].
I guess we can just do it and see how long it takes :)
>> Should Django use Babel's locale interface (which
>> would be slower/faster ?) or just the gettext part?
>
> The code in django.utils.translation.trans_real doesn't need to be
> modified since it relies on Python's gettext module to load the
> translations.
By 'locale interface' I actually meant the CLDR. Django's
"localflavor/**/formats.py" mostly duplicates information provided by
Babel.
--
Łukasz Rekucki
I'm a little uncertain where to throw my vote here.
I can appreciate Ned and Jacob's "pragmatic" approach -- this problem
exists, so an interim partial solution is better than nothing if it is
easy to apply.
However -- I can also see Jannis' point: Babel is the real solution
here, and any effort spent on maintaining the existing lexer is effort
that could be spent in fixing the problem properly with Babel.
My concern in accepting the "pragmatic" approach is whether it
actually *is* pragmatic. Completely independent of whether there is a
Babel solution coming soon, I don't have any feeling as to whether
Ned's proposed partial solution will introduce more problems than it
solves.
My gut feel tells me the Javascript is more like C than Perl, but
that's really nothing more than a gut feel, and the proposed patch
does much more than just say "treat it like C" -- it also involves
introducing a JavaScript lexer. We're proposing putting this into the
1.3 branch -- a stable branch -- without any significant testing.
We're essentially saying as a project that the JS lexer is stable
tested code, known to be better than the existing solution in all
conditions, with no significant regressions.
No offense intended to Ned, but I simply don't see sufficient evidence
to be confident that this is the case. 216 lines of tests doesn't
strike me as anything close to a broad enough test suite to validate
that the Javascript lexer will work under all conditions. Yes, the
Perl-based lexer is broken, but it's broken in known ways; we don't
have any experience to know where the flaws lie in the C-based lexer,
and once it's in Django's repo, we're committing to maintaining it and
all it's flaws and foibles.
If the proposed patch was leaning on a well established lexer, or was
a simple configuration change (i.e., "treat it as C, not Perl") that
could be quickly demonstrated to fix a bunch of problems without
introducing any obvious new problems, I'd be all in favor of it as a
temporary solution. But that's not what is on the table -- it's a
complex body of code that we're proposing to evaluate, introduce,
maintain, and potentially deprecate very quickly.
I'm happy to be proven wrong on any of the points mentioned here --
for example, if someone can provide some mechanism that independently
demonstrates the robustness of the Javascript lexer, or evidence that
the risk of regressions is low. However, absent of such evidence, I'm
inclined to side with Jannis and concentrate our efforts on Babel --
especially if, as Jannis suggests, a Babel-based solution isn't that
much work and could be knocked off easily with a short concentrated
effort.
Yours,
Russ Magee %-)
But this is an odd debate, there are really three solutions to evaluate:
the existing code, my patch, and Babel, and by any yardstick, the
existing code is badly broken.
I have no idea how widely used Babel is, but it can't be as wide as the
Gnu gettext utilities, so we'd have to evaluate those other components
of Babel as well. Are there edge cases in .po and .mo files that it
doesn't handle properly? I have no idea.
Keep in mind that the proposal is not to include Babel, but to depend on
it as a prerequisite, which means we are stuck in the same situation we
are with gettext: it can change independently of Django, and new
versions can introduce new bad behavior. That's one of the reasons we
have a bad problem today: gettext changed from 0.17 to 0.18, and
exacerbated the hack. Babel has the advantage that it is pure Python,
so it is both more installable than gettext, and is more readable for
us. It also has the advantage that it isn't based on a hack, but that
doesn't mean it performs flawlessly.
BTW: the Perl-based lexer is not broken in "known ways". I've never
looked at the gettext source, and have no idea what subset of Perl it
parses correctly, and I don't know Perl syntax well enough even to start
testing the tricky cases. And the bad behavior depends on the version
of gettext. A Django project that has meticulously twisted their
Javascript to avoid the "known problems" can then fail if used on a
system with a newer (or older) version of gettext.
> If the proposed patch was leaning on a well established lexer, or was
> a simple configuration change (i.e., "treat it as C, not Perl") that
> could be quickly demonstrated to fix a bunch of problems without
> introducing any obvious new problems, I'd be all in favor of it as a
> temporary solution. But that's not what is on the table -- it's a
> complex body of code that we're proposing to evaluate, introduce,
> maintain, and potentially deprecate very quickly.
>
While the patch I've submitted is certainly larger than a configuration
change, and is not a well-established lexer, I have "quickly
demonstrated that it fixes a bunch of problems without introducing any
obvious new problems", or at least, no one has come forward with a new
problem. I've paid a bounty on Stack Overflow for people to find
problems in the lexer itself, which they have done, and those problems
have been fixed.
> I'm happy to be proven wrong on any of the points mentioned here --
> for example, if someone can provide some mechanism that independently
> demonstrates the robustness of the Javascript lexer, or evidence that
> the risk of regressions is low. However, absent of such evidence, I'm
> inclined to side with Jannis and concentrate our efforts on Babel --
> especially if, as Jannis suggests, a Babel-based solution isn't that
> much work and could be knocked off easily with a short concentrated
> effort.
I'd be glad to undertake the effort to demonstrate the robustness of the
Javascript lexer, if someone can tell me what that test would look
like. I've done the work to read the ECMAScript spec, I can certainly
do the work to write more tests. I've run some significant code through
the lexer (jQuery, for example), and it didn't result in any 'other'
tokens, and was properly synchronized at the end, though I can't say I
examined every token in the stream. If anyone has an idea how to more
thoroughly test a lexer, I'm all ears.
But keep in mind: that work will also have to be done for Babel. I'm
more than happy to contribute my 216 lines of tests to their 89 lines,
or to lend my new-found knowledge about the finer points of lexing
Javascript. We should decide on a set of acceptance criteria, because
it's clear that any solution we adopt will have to meet it.
As you say, Jannis has suggested that a Babel-based solution isn't that
much work. But that work hasn't been done yet. I don't know how much
work it is. It's going to be a larger change to the code base than my
patch is, at least it is if you properly consider that the Babel code is
part of the change, even if it isn't included in the patch. But we
don't have a Babel patch to consider, only the suggestion that it won't
be a big deal and it will work well. That suggestion remains to be proven.
Let me be clear: I don't care much if we use my patch or Babel. I just
want this problem fixed well. I put real work into my patch, but if it
isn't the fix, OK, I learned a lot and had fun doing it. If need be,
I'll package up my patch as a standalone app that adds a new management
command, makejsmessages, that does it right. Then I'll never have to
deal with it again. I just think it's bad that Django 1.3 does this
thing poorly, and I want Django to be the best it can be.
--Ned.
I'll be honest -- I have no specific reason to believe in Babel. I'm
going on the fact people who I trust when it comes to i18n (like
Jannis) have recommended it highly.
I'm also enthused at the prospect of having a better foundation to
build on. The gettext handling that we have is starting to look like
an increasingly fragile collection of hacks; there comes a point at
which that becomes a maintenance hassle, and we should step back and
fix the problem properly.
> But this is an odd debate, there are really three solutions to evaluate: the
> existing code, my patch, and Babel, and by any yardstick, the existing code
> is badly broken.
No argument here. This isn't a new situation, though -- the existing
code has been broken for a long time.
> I have no idea how widely used Babel is, but it can't be as wide as the Gnu
> gettext utilities, so we'd have to evaluate those other components of Babel
> as well. Are there edge cases in .po and .mo files that it doesn't handle
> properly? I have no idea.
Yes, gettext is widely used. And yet, it apparently doesn't have a
Javascript parser, which seems like a pretty stunning omission in the
modern world, and a glaring missing feature for a project like Django.
IMHO, a native Javascript parsing mode would seem like a much better
contribution (to the world, not just Django) than a set of
Django-specfic hacks designed to cajole the C parser into working with
Javascript.
> Keep in mind that the proposal is not to include Babel, but to depend on it
> as a prerequisite, which means we are stuck in the same situation we are
> with gettext: it can change independently of Django, and new versions can
> introduce new bad behavior. That's one of the reasons we have a bad problem
> today: gettext changed from 0.17 to 0.18, and exacerbated the hack.
This is true. However, what you're proposing is, IMHO, a slightly
worse situation.
Babel is a self contained tool. Assuming it work as advertised (and
I'll grant that is a big and important assumption), it is a self
contained body of code. It is a dependency, but as long as it
continues to work as advertised, we're fine. We're only dealing with
it's advertised interface, and working with that interface in the way
it was intended to be worked with.
On the other hand, gettext is also a dependency, and gettext can also
change between releases -- but we're not using it as intended. We're
bending the Perl parser (or, in your case, the C parser) in strange
and unusual ways to do something it wasn't originally intended to do.
Something completely innocuous can change in gettext, and the
follow-on effect to us can be huge because we've built our castle on
an unstable foundation.
The maintenance issue is the critical part here. My hesitation isn't
just to do with the suitability of your code *right now*. It's to do
with the fact that once we adopt the code into trunk, we are to
agreeing to maintain it. Bits don't rot, but gettext has already
demonstrated that it changes between versions, so it's reasonable to
assume that when gettext 0.19 is released (whenever that happens),
we'll need to make changes to our Javascript parser. By taking on the
lexer, we're absorbing into Django a whole bunch of project
responsibility that frankly, I'd rather we didn't have.
> Babel
> has the advantage that it is pure Python, so it is both more installable
> than gettext, and is more readable for us. It also has the advantage that
> it isn't based on a hack, but that doesn't mean it performs flawlessly.
I'm not saying it does. But presumably Babel 0.9 works better than
0.8, and 0.10 will work better than 0.9, and so on. If a high profile
project like Django uses it, presumably this improvement will happen
faster by virtue of the extra attention. If a problem is found, we can
direct that fix upstream, instead of falling victim to NIH and making
everything a problem that Django needs to fix.
Improvements in gettext don't follow on the same way -- after all,
gettext is busy fixing the C parser, and an improvement in the C
parser may not serve the needs of our Javascript parser. If there's a
problem with the Django's Javascript lexer, that's Django's problem
alone, and there's no broader community that will help to make it
better.
> BTW: the Perl-based lexer is not broken in "known ways". I've never looked
> at the gettext source, and have no idea what subset of Perl it parses
> correctly, and I don't know Perl syntax well enough even to start testing
> the tricky cases. And the bad behavior depends on the version of gettext.
> A Django project that has meticulously twisted their Javascript to avoid
> the "known problems" can then fail if used on a system with a newer (or
> older) version of gettext.
When I said "broken in known ways", I mostly meant that it was prima
facie broken. However, this isn't a recent development -- the fact
that your patch is attached to ticket #7704 is evidence of that.
>> If the proposed patch was leaning on a well established lexer, or was
>> a simple configuration change (i.e., "treat it as C, not Perl") that
>> could be quickly demonstrated to fix a bunch of problems without
>> introducing any obvious new problems, I'd be all in favor of it as a
>> temporary solution. But that's not what is on the table -- it's a
>> complex body of code that we're proposing to evaluate, introduce,
>> maintain, and potentially deprecate very quickly.
>>
> While the patch I've submitted is certainly larger than a configuration
> change, and is not a well-established lexer, I have "quickly demonstrated
> that it fixes a bunch of problems without introducing any obvious new
> problems", or at least, no one has come forward with a new problem. I've
> paid a bounty on Stack Overflow for people to find problems in the lexer
> itself, which they have done, and those problems have been fixed.
That sort of thing evidence certainly works in your favor -- I wasn't
aware that this sort of testing had taken place.
>> I'm happy to be proven wrong on any of the points mentioned here --
>> for example, if someone can provide some mechanism that independently
>> demonstrates the robustness of the Javascript lexer, or evidence that
>> the risk of regressions is low. However, absent of such evidence, I'm
>> inclined to side with Jannis and concentrate our efforts on Babel --
>> especially if, as Jannis suggests, a Babel-based solution isn't that
>> much work and could be knocked off easily with a short concentrated
>> effort.
>
> I'd be glad to undertake the effort to demonstrate the robustness of the
> Javascript lexer, if someone can tell me what that test would look like.
> I've done the work to read the ECMAScript spec, I can certainly do the work
> to write more tests. I've run some significant code through the lexer
> (jQuery, for example), and it didn't result in any 'other' tokens, and was
> properly synchronized at the end, though I can't say I examined every token
> in the stream. If anyone has an idea how to more thoroughly test a lexer,
> I'm all ears.
Again -- this is good evidence, and something that hasn't (AFAICT)
been stated previously in this forum.
> But keep in mind: that work will also have to be done for Babel. I'm more
> than happy to contribute my 216 lines of tests to their 89 lines, or to lend
> my new-found knowledge about the finer points of lexing Javascript. We
> should decide on a set of acceptance criteria, because it's clear that any
> solution we adopt will have to meet it.
>
> As you say, Jannis has suggested that a Babel-based solution isn't that much
> work. But that work hasn't been done yet. I don't know how much work it
> is. It's going to be a larger change to the code base than my patch is, at
> least it is if you properly consider that the Babel code is part of the
> change, even if it isn't included in the patch. But we don't have a Babel
> patch to consider, only the suggestion that it won't be a big deal and it
> will work well. That suggestion remains to be proven.
Again, can't argue with this. My hope is that this discussion will
kickstart a serious effort on Babel integration.
> Let me be clear: I don't care much if we use my patch or Babel. I just want
> this problem fixed well. I put real work into my patch, but if it isn't the
> fix, OK, I learned a lot and had fun doing it. If need be, I'll package up
> my patch as a standalone app that adds a new management command,
> makejsmessages, that does it right. Then I'll never have to deal with it
> again. I just think it's bad that Django 1.3 does this thing poorly, and I
> want Django to be the best it can be.
I have the same goal. I'd like to see this collection of bugs fixed.
However, it's not a new problem, either, so while I would like to see
this problem fixed, I'd rather address it properly, rather than
quickly.
I'd rather see the effort put into Babel integration so we can
evaluate if will solve the problem properly; if it turns out it
doesn't, then we can always apply your patch as the "best of a bad
bunch of options" option.
Yours,
Russ Magee %-)
> The maintenance issue is the critical part here. My hesitation isn't
> just to do with the suitability of your code *right now*. It's to do
> with the fact that once we adopt the code into trunk, we are to
> agreeing to maintain it. Bits don't rot, but gettext has already
> demonstrated that it changes between versions, so it's reasonable to
> assume that when gettext 0.19 is released (whenever that happens),
> we'll need to make changes to our Javascript parser. By taking on the
> lexer, we're absorbing into Django a whole bunch of project
> responsibility that frankly, I'd rather we didn't have.
>
>> Babel
>> has the advantage that it is pure Python, so it is both more installable
>> than gettext, and is more readable for us. It also has the advantage that
>> it isn't based on a hack, but that doesn't mean it performs flawlessly.
>
Sorry, I also wrote about this journey on my blog
(http://nedbatchelder.com/blog/201104/a_javascript_lexer_in_python_and_the_saga_behind_it.html),
and had lost track of which details went where.
>> But keep in mind: that work will also have to be done for Babel. I'm more
>> than happy to contribute my 216 lines of tests to their 89 lines, or to lend
>> my new-found knowledge about the finer points of lexing Javascript. We
>> should decide on a set of acceptance criteria, because it's clear that any
>> solution we adopt will have to meet it.
>>
>> As you say, Jannis has suggested that a Babel-based solution isn't that much
>> work. But that work hasn't been done yet. I don't know how much work it
>> is. It's going to be a larger change to the code base than my patch is, at
>> least it is if you properly consider that the Babel code is part of the
>> change, even if it isn't included in the patch. But we don't have a Babel
>> patch to consider, only the suggestion that it won't be a big deal and it
>> will work well. That suggestion remains to be proven.
> Again, can't argue with this. My hope is that this discussion will
> kickstart a serious effort on Babel integration.
>
I'm looking forward to that too, and will help where I can.
--Ned.
Could you elaborate on that ? How does your application help me handle
client-side translations ?
--
Łukasz Rekucki
On 14.04.2011, at 17:27, Jacob Kaplan-Moss wrote:> I think I agree with Ned here: I can't see the downside to fixing it
> on the release branch. "It violates our policy" doesn't count IMO:
> it's *our* policy, and we get to break it if there's a good reason.
> Making translation in JavaScript work is a good reason as I see it.FTR I agree we have an issue here, I just disagree with the proposed fix. If you think we can adopt Babel in the release branch, let's do it.
Hi Ned,
> As you say, Jannis has suggested that a Babel-based solution isn't that much work. But that work hasn't been done yet. I don't know how much work it is. It's going to be a larger change to the code base than my patch is, at least it is if you properly consider that the Babel code is part of the change, even if it isn't included in the patch. But we don't have a Babel patch to consider, only the suggestion that it won't be a big deal and it will work well. That suggestion remains to be proven.
I have good and some bad news. Last week I tried to put my money where my mouth
is and dived into Babel to replace the xgettext calls in makemessage with the
appropriate calls to Babel's message extraction/compilation functions. I've
found it to be pretty easy to get started at first but stumbled over differences
in the way the message catalogue update pr