I've started using blocktrans in my templates and noticed that when using make_messages it includes all the tabs and linebreaks around the text that is surrounded by blocktrans tags.
For example: {% blocktrans %} Translate this string {% plural %} And this plural string {% endblocktrans %}
Will turn into: "\t\t\t\tTranslate this string" "\t\t" Plural: "\t\t" "\t\t\t\tAnd this plural string"
in the .po file... (it affects both normal blocktrans and plural blocktrans)
This is rather ugly i think. Also in HTML, indentation and linebreaks are generally ignored, so in that sense django does not behave correctly. If you have a string of some length you might not want to place it all on the same line, for the beauty of the syntax...
So I put in some preliminary work for stripping the translation string. I just realised that my patch only strips the ends of the string which still leaves the middle full of ugliness if you have a multiline msgid.
Has this ever been discussed? To what extent would the stripping be appropriate? Or perhaps there is a simpler solution that I am missing?
Index: django/templatetags/i18n.py =================================================================== --- django/templatetags/i18n.py (revision 6603) +++ django/templatetags/i18n.py (working copy) @@ -66,10 +66,12 @@ for var,val in self.extra_context.items(): context[var] = val.resolve(context) singular = self.render_token_list(self.singular) + singular = singular.strip() if self.plural and self.countervar and self.counter: count = self.counter.resolve(context) context[self.countervar] = count plural = self.render_token_list(self.plural) + plural = plural.strip() result = translation.ungettext(singular, plural, count) else: result = translation.ugettext(singular) Index: django/utils/translation/trans_real.py =================================================================== --- django/utils/translation/trans_real.py (revision 6603) +++ django/utils/translation/trans_real.py (working copy) @@ -442,13 +442,13 @@ pluralmatch = plural_re.match(t.contents) if endbmatch: if inplural: - out.write(' ngettext(%r,%r,count) ' % (''.join(singular), ''.join(plural))) + out.write(' ngettext(%r,%r,count) ' % (''.join(singular).strip(), ''.join(plural).strip())) for part in singular: out.write(blankout(part, 'S')) for part in plural: out.write(blankout(part, 'P')) else: - out.write(' gettext(%r) ' % ''.join(singular)) + out.write(' gettext(%r) ' % ''.join(singular).strip()) for part in singular: out.write(blankout(part, 'S')) intrans = False
On Fri, 2007-10-26 at 15:04 +0000, Dmitri Fedortchenko wrote: > I've started using blocktrans in my templates and noticed that when > using make_messages it includes all the tabs and linebreaks around the > text that is surrounded by blocktrans tags.
> For example: > {% blocktrans %} > Translate this string > {% plural %} > And this plural string > {% endblocktrans %}
> Will turn into: > "\t\t\t\tTranslate this string" > "\t\t" > Plural: > "\t\t" > "\t\t\t\tAnd this plural string"
> in the .po file... (it affects both normal blocktrans and plural > blocktrans)
> This is rather ugly i think. Also in HTML, indentation and linebreaks > are generally ignored, so in that sense django does not behave > correctly. If you have a string of some length you might not want to > place it all on the same line, for the beauty of the syntax...
> So I put in some preliminary work for stripping the translation > string. > I just realised that my patch only strips the ends of the string which > still leaves the middle full of ugliness if you have a multiline > msgid.
> Has this ever been discussed? > To what extent would the stripping be appropriate? Or perhaps there is > a simpler solution that I am missing?
There are really two things at work here: one not so good and one that I don't want to change.
The first issue (which is worth fixing) is the leading whitespace and particularly the bonus blank lines that appear to be introduced. They are worth stripping.
The second issue is what to when we fix blocktrans so that it can correctly handle *blocks* of text (things with newlines). In that case, I wouldn't want to strip anything beyond a common amount of leading whitespace because the source layout can be indicative of meaning sometimes (or at least, useful). PO files don't need to preserve linebreaks, so the translator can do whatever they want there, but arbitrarily stripping and merging feels slightly wrong to me.
Whitespace may well be ignore by HTML processors(not entirely true in all cases, but a reasonable generalisation), but it's not ignored by humans reading the text. Strip out all whitespace between tags in an HTML page and notice how completely unreadable it becomes.
So I'd be happy to include something that stripped the same amount of whitespace from the front of every line inside a blocktrans block but kept other linebreaks and extra leading whitespace intact. That would make the PO file more readable. Anything more compressing than that is going to require stronger arguments.
I would argue against ANY implicit stripping. some of us use the template system for generating things like .ics files and others for which every space counts as part of the syntax.
Please don't break the pycon website calendar support ;-)
-Doug
On Oct 26, 7:41 pm, Malcolm Tredinnick <malc...@pointy-stick.com> wrote:
> On Fri, 2007-10-26 at 15:04 +0000, Dmitri Fedortchenko wrote: > > I've started using blocktrans in my templates and noticed that when > > using make_messages it includes all the tabs and linebreaks around the > > text that is surrounded by blocktrans tags.
> > For example: > > {% blocktrans %} > > Translate this string > > {% plural %} > > And this plural string > > {% endblocktrans %}
> > Will turn into: > > "\t\t\t\tTranslate this string" > > "\t\t" > > Plural: > > "\t\t" > > "\t\t\t\tAnd this plural string"
> > in the .po file... (it affects both normal blocktrans and plural > > blocktrans)
> > This is rather ugly i think. Also in HTML, indentation and linebreaks > > are generally ignored, so in that sense django does not behave > > correctly. If you have a string of some length you might not want to > > place it all on the same line, for the beauty of the syntax...
> > So I put in some preliminary work for stripping the translation > > string. > > I just realised that my patch only strips the ends of the string which > > still leaves the middle full of ugliness if you have a multiline > > msgid.
> > Has this ever been discussed? > > To what extent would the stripping be appropriate? Or perhaps there is > > a simpler solution that I am missing?
> There are really two things at work here: one not so good and one that I > don't want to change.
> The first issue (which is worth fixing) is the leading whitespace and > particularly the bonus blank lines that appear to be introduced. They > are worth stripping.
> The second issue is what to when we fix blocktrans so that it can > correctly handle *blocks* of text (things with newlines). In that case, > I wouldn't want to strip anything beyond a common amount of leading > whitespace because the source layout can be indicative of meaning > sometimes (or at least, useful). PO files don't need to preserve > linebreaks, so the translator can do whatever they want there, but > arbitrarily stripping and merging feels slightly wrong to me.
> Whitespace may well be ignore by HTML processors(not entirely true in > all cases, but a reasonable generalisation), but it's not ignored by > humans reading the text. Strip out all whitespace between tags in an > HTML page and notice how completely unreadable it becomes.
> So I'd be happy to include something that stripped the same amount of > whitespace from the front of every line inside a blocktrans block but > kept other linebreaks and extra leading whitespace intact. That would > make the PO file more readable. Anything more compressing than that is > going to require stronger arguments.
On Sat, 2007-10-27 at 00:57 +0000, doug.napole...@gmail.com wrote: > I would argue against ANY implicit stripping. some of us use the > template system for generating things like .ics files and others for > which every space counts as part of the syntax.
> Please don't break the pycon website calendar support ;-)
If leading whitespace is important in the translated string, you're doomed. There's nothing to require translators to keep any whitespace intact. If whitespace is significant it should be outside of the text passed to translators.
We are only talking about stripping whitespace in the strings presented to translators and it won't change the strings you see in untranslated text.
> On Sat, 2007-10-27 at 00:57 +0000, doug.napole...@gmail.com wrote: > > I would argue against ANY implicit stripping. some of us use the > > template system for generating things like .ics files and others for > > which every space counts as part of the syntax.
> > Please don't break the pycon website calendar support ;-)
> If leading whitespace is important in the translated string, you're > doomed. There's nothing to require translators to keep any whitespace > intact. If whitespace is significant it should be outside of the text > passed to translators.
> We are only talking about stripping whitespace in the strings presented > to translators and it won't change the strings you see in untranslated > text.
That's a great idea! I'll see if I can squeeze out a patch for this, since I feel that I want to be able to indent blocktrans without having the extra spaces in the po files (I realize one can remove the extra spaces manually in the po files, but I like consistent autogeneration hehe).
In other words, whitespace equal to the amount of the indentation of first line will be removed, as well as leading and trailing whitespace and line breaks. One question that springs to mind: If the first line is on the same line as the {% blocktrans %}tag itself (i.e. it has no indentation), should we strip the subsequent lines based on the indentation of the second line, or assume that the user wants to keep the text as is?
:) Please advise.
On Oct 27, 12:41 am, Malcolm Tredinnick <malc...@pointy-stick.com> wrote:
> On Fri, 2007-10-26 at 15:04 +0000, Dmitri Fedortchenko wrote: > > I've started using blocktrans in my templates and noticed that when > > using make_messages it includes all the tabs and linebreaks around the > > text that is surrounded by blocktrans tags.
> > For example: > > {% blocktrans %} > > Translate this string > > {% plural %} > > And this plural string > > {% endblocktrans %}
> > Will turn into: > > "\t\t\t\tTranslate this string" > > "\t\t" > > Plural: > > "\t\t" > > "\t\t\t\tAnd this plural string"
> > in the .po file... (it affects both normal blocktrans and plural > > blocktrans)
> > This is rather ugly i think. Also in HTML, indentation and linebreaks > > are generally ignored, so in that sense django does not behave > > correctly. If you have a string of some length you might not want to > > place it all on the same line, for the beauty of the syntax...
> > So I put in some preliminary work for stripping the translation > > string. > > I just realised that my patch only strips the ends of the string which > > still leaves the middle full of ugliness if you have a multiline > > msgid.
> > Has this ever been discussed? > > To what extent would the stripping be appropriate? Or perhaps there is > > a simpler solution that I am missing?
> There are really two things at work here: one not so good and one that I > don't want to change.
> The first issue (which is worth fixing) is the leading whitespace and > particularly the bonus blank lines that appear to be introduced. They > are worth stripping.
> The second issue is what to when we fix blocktrans so that it can > correctly handle *blocks* of text (things with newlines). In that case, > I wouldn't want to strip anything beyond a common amount of leading > whitespace because the source layout can be indicative of meaning > sometimes (or at least, useful). PO files don't need to preserve > linebreaks, so the translator can do whatever they want there, but > arbitrarily stripping and merging feels slightly wrong to me.
> Whitespace may well be ignore by HTML processors(not entirely true in > all cases, but a reasonable generalisation), but it's not ignored by > humans reading the text. Strip out all whitespace between tags in an > HTML page and notice how completely unreadable it becomes.
> So I'd be happy to include something that stripped the same amount of > whitespace from the front of every line inside a blocktrans block but > kept other linebreaks and extra leading whitespace intact. That would > make the PO file more readable. Anything more compressing than that is > going to require stronger arguments.
On Sun, 2007-10-28 at 21:46 +0000, Dmitri Fedortchenko wrote: > That's a great idea! I'll see if I can squeeze out a patch for this, > since I feel that I want to be able to indent blocktrans without > having the extra spaces in the po files (I realize one can remove the > extra spaces manually in the po files, but I like consistent > autogeneration hehe).
> In other words, whitespace equal to the amount of the indentation of > first line will be removed, as well as leading and trailing whitespace > and line breaks. > One question that springs to mind: If the first line is on the same > line as the {% blocktrans %}tag itself (i.e. it has no indentation), > should we strip the subsequent lines based on the indentation of the > second line, or assume that the user wants to keep the text as is?
Well, firstly, I'd rather you solved the two problems in a slightly reverse order: first allow linebreaks inside blocktrans, then let's worry about stripping whitespace. But it's up to you.
My ideal scenario would be that if we have a block of text that needs to be translated, strip a constant amount of whitespace from the front of each line. This amount will be the minimum amount of leading whitespace on any of the lines in that block. So, at the end, one line (at least) will always be flush against the left margin and all other lines will be indented relative to that line as they were in the source text.
(My reason for writing that alternative formulation is because I don't really understand what you were asking about being on the same level as the blocktrans tag. I'm not sure why the tag is relevant here.)
Come up with a patch that does whatever you like and we can go from there. I think we're thinking along similar lines, so it will be easy enough to tweak once we have some concrete code to throw darts at.
Should I open a ticket? I attached it, but not sure if it will get through to the mailing list.
I also attached a semi-comprehensive battery of tests, showing the outcome of the stripping process.
Basically it strips indentation based on the indentation of the first line of text. If the first line is empty, it will use the indentation of the second line.
If the first line has no indentation, then nothing other then a .strip() will be done to the string.
If any subsequent lines have less indentation then the first line, then those lines will be de-indented as much as possible. If any subsequent lines have more indetation then the first line, then only the amount of indetation equal to the first line is stripped.
The only caveat is if mixed tabs and spaces are used for indentation, some strange results may be produced.
//D
On 10/29/07, Malcolm Tredinnick <malc...@pointy-stick.com> wrote:
> On Sun, 2007-10-28 at 21:46 +0000, Dmitri Fedortchenko wrote: > > That's a great idea! I'll see if I can squeeze out a patch for this, > > since I feel that I want to be able to indent blocktrans without > > having the extra spaces in the po files (I realize one can remove the > > extra spaces manually in the po files, but I like consistent > > autogeneration hehe).
> > In other words, whitespace equal to the amount of the indentation of > > first line will be removed, as well as leading and trailing whitespace > > and line breaks. > > One question that springs to mind: If the first line is on the same > > line as the {% blocktrans %}tag itself (i.e. it has no indentation), > > should we strip the subsequent lines based on the indentation of the > > second line, or assume that the user wants to keep the text as is?
> Well, firstly, I'd rather you solved the two problems in a slightly > reverse order: first allow linebreaks inside blocktrans, then let's > worry about stripping whitespace. But it's up to you.
> My ideal scenario would be that if we have a block of text that needs to > be translated, strip a constant amount of whitespace from the front of > each line. This amount will be the minimum amount of leading whitespace > on any of the lines in that block. So, at the end, one line (at least) > will always be flush against the left margin and all other lines will be > indented relative to that line as they were in the source text.
> (My reason for writing that alternative formulation is because I don't > really understand what you were asking about being on the same level as > the blocktrans tag. I'm not sure why the tag is relevant here.)
> Come up with a patch that does whatever you like and we can go from > there. I think we're thinking along similar lines, so it will be easy > enough to tweak once we have some concrete code to throw darts at.
Now that I re-read your definition, there is one difference between your suggestion and my patch. My patch just uses the first line to define the amount of whitespace to strip, while you suggest we use the shortest indentation in the whole block.
Which is better?
//D
On 10/29/07, Malcolm Tredinnick <malc...@pointy-stick.com> wrote:
> On Sun, 2007-10-28 at 21:46 +0000, Dmitri Fedortchenko wrote: > > That's a great idea! I'll see if I can squeeze out a patch for this, > > since I feel that I want to be able to indent blocktrans without > > having the extra spaces in the po files (I realize one can remove the > > extra spaces manually in the po files, but I like consistent > > autogeneration hehe).
> > In other words, whitespace equal to the amount of the indentation of > > first line will be removed, as well as leading and trailing whitespace > > and line breaks. > > One question that springs to mind: If the first line is on the same > > line as the {% blocktrans %}tag itself (i.e. it has no indentation), > > should we strip the subsequent lines based on the indentation of the > > second line, or assume that the user wants to keep the text as is?
> Well, firstly, I'd rather you solved the two problems in a slightly > reverse order: first allow linebreaks inside blocktrans, then let's > worry about stripping whitespace. But it's up to you.
> My ideal scenario would be that if we have a block of text that needs to > be translated, strip a constant amount of whitespace from the front of > each line. This amount will be the minimum amount of leading whitespace > on any of the lines in that block. So, at the end, one line (at least) > will always be flush against the left margin and all other lines will be > indented relative to that line as they were in the source text.
> (My reason for writing that alternative formulation is because I don't > really understand what you were asking about being on the same level as > the blocktrans tag. I'm not sure why the tag is relevant here.)
> Come up with a patch that does whatever you like and we can go from > there. I think we're thinking along similar lines, so it will be easy > enough to tweak once we have some concrete code to throw darts at.
> Now that I re-read your definition, there is one difference between your > suggestion and my patch. > My patch just uses the first line to define the amount of whitespace to > strip, while you suggest we use the shortest indentation in the whole block.
> Which is better?
> //D
> On 10/29/07, Malcolm Tredinnick <malc...@pointy-stick.com> wrote:
> > On Sun, 2007-10-28 at 21:46 +0000, Dmitri Fedortchenko wrote: > > > That's a great idea! I'll see if I can squeeze out a patch for this, > > > since I feel that I want to be able to indent blocktrans without > > > having the extra spaces in the po files (I realize one can remove the > > > extra spaces manually in the po files, but I like consistent > > > autogeneration hehe).
> > > In other words, whitespace equal to the amount of the indentation of > > > first line will be removed, as well as leading and trailing whitespace > > > and line breaks. > > > One question that springs to mind: If the first line is on the same > > > line as the {% blocktrans %}tag itself (i.e. it has no indentation), > > > should we strip the subsequent lines based on the indentation of the > > > second line, or assume that the user wants to keep the text as is?
> > Well, firstly, I'd rather you solved the two problems in a slightly > > reverse order: first allow linebreaks inside blocktrans, then let's > > worry about stripping whitespace. But it's up to you.
> > My ideal scenario would be that if we have a block of text that needs to > > be translated, strip a constant amount of whitespace from the front of > > each line. This amount will be the minimum amount of leading whitespace > > on any of the lines in that block. So, at the end, one line (at least) > > will always be flush against the left margin and all other lines will be > > indented relative to that line as they were in the source text.
> > (My reason for writing that alternative formulation is because I don't > > really understand what you were asking about being on the same level as > > the blocktrans tag. I'm not sure why the tag is relevant here.)
> > Come up with a patch that does whatever you like and we can go from > > there. I think we're thinking along similar lines, so it will be easy > > enough to tweak once we have some concrete code to throw darts at.
Thanks. I'll get to it eventually, but it might take a week or so. I'm pretty busy right at the moment and have a few other Django commitments to deliver on. At some point I have to spend a few days closing all the internationalisation bugs, though. I'd like to that sometime in November.
> Thanks. I'll get to it eventually, but it might take a week or so. I'm > pretty busy right at the moment and have a few other Django commitments > to deliver on. At some point I have to spend a few days closing all the > internationalisation bugs, though. I'd like to that sometime in > November.