Sorting support still missing

wpbasti

unread,

Dec 16, 2007, 10:40:08 AM12/16/07

to python-polib

While using the new merge feature (thanks for that again) I have
detected a small issue which results into larger (svn) diffs for my PO
files. While the native gettext tools support a way of sorting the
entries of a PO file by file this is still missing in polib. This is
especially interesting for my xgettext-like implementation where I
initially create a file. I think the support for this must be handled
somewhere at __str__ not sure where exactly yet.

David Jean Louis

unread,

Dec 16, 2007, 10:59:12 AM12/16/07

to python...@googlegroups.com

IMO it should be handled by __cmp__(), __eq__(), __neq__(), __gt__()
... methods (see http://docs.python.org/ref/customization.html).

ATM I do not have the time to do this, I'll see what I can do when I
have some time...
(Patches are welcome of course :) )

Cheers,

David.

wpbasti

unread,

Dec 16, 2007, 12:13:46 PM12/16/07

to python-polib

Hi David,

thanks for the tip with __cmp__. I followed the documentation. The
patch is appended inline after this text. Hope this feature will made
it into one of the next releases. Thanks.

Cheers,

Sebastian

Index: polib.py
===================================================================
--- polib.py (revision 10964)
+++ polib.py (working copy)
@@ -234,6 +234,21 @@
# }}}

+def compareOccurrences(a, b):
+ """
+ Compare an entry occurrence with another one.
+ """
+ # compareOccurrences {{{
+ if a[0] != b[0]:
+ return a[0] < b[0]
+
+ if a[1] != b[1]:
+ return a[1] < b[1]
+
+ return 0
+ # }}}
+
+
class _BaseFile(list):
"""
Common parent class for POFile and MOFile classes.
@@ -890,6 +905,37 @@
_listappend(ret, _BaseEntry.__str__(self))
return _strjoin('\n', ret)

+ def __cmp__(self, other):
+ """ Called by comparison operations if rich comparison is not
defined. """
+ # First: Obsolete test
+ if self.obsolete != other.obsolete:
+ if self.obsolete: return -1
+ else: return 1
+
+ # Work on a copy to protect original
+ occ1 = self.occurrences[:]
+ occ2 = other.occurrences[:]
+
+ # Sorting using compare method
+ occ1.sort(compareOccurrences)
+ occ2.sort(compareOccurrences)
+
+ # Comparing sorted occurrences
+ for pos, entry1 in enumerate(occ1):
+ entry2 = occ2[pos]
+
+ if entry1[0] != entry2[0]:
+ if entry1[0] > entry2[0]: return 1
+ else: return -1
+
+ if entry1[1] != entry2[1]:
+ if entry1[1] > entry2[1]: return 1
+ else: return -1
+
+ # Finally: Compare message ID
+ if self.msgid > other.msgid: return 1
+ else: return -1
+
def translated(self):
"""Return True if the entry has been translated or False"""
return ((self.msgstr != '' or self.msgstr_plural) and \

On 16 Dez., 16:59, David Jean Louis <izimo...@gmail.com> wrote:
> > While using the new merge feature (thanks for that again) I have
> > detected a small issue which results into larger (svn) diffs for my PO
> > files. While the native gettext tools support a way of sorting the
> > entries of a PO file by file this is still missing in polib. This is
> > especially interesting for my xgettext-like implementation where I
> > initially create a file. I think the support for this must be handled
> > somewhere at __str__ not sure where exactly yet.
>
> IMO it should be handled by __cmp__(), __eq__(), __neq__(), __gt__()

> ... methods (seehttp://docs.python.org/ref/customization.html).

David Jean Louis

unread,

Dec 16, 2007, 1:13:43 PM12/16/07

to python...@googlegroups.com

Hi Sebastian,

> Hi David,
>
> thanks for the tip with __cmp__. I followed the documentation. The
> patch is appended inline after this text. Hope this feature will made
> it into one of the next releases. Thanks.
>

Thanks for the patch Sebastian, your contrib is appreciated !

I reviewed the patch and here are some comments:

1) I applied the patch and did a simple test on a test file (provided in
polib):
>>> import polib
>>> po = polib.pofile('tests/test_utf8.po')
>>> po.sort()
>>> po.save('tmp.po')
Then I compared tmp.po and the original (with vimdiff) and they are
not sorted the same way, I think that we miss something on how gettext
do the sorting, hopefully it will be an easy fix...

2) Some nitpicking:
- the compareOccurrences should be named compare_occurrences and could
be an inner function (in the __cmp__ func) since it is only used here,
- we should not use enumerate() since polib is compatible with python 2.3,
- a try/except should be added when doing "entry2 = occ2[pos]" because
there could be an IndexError exception if the compared entry has less
occurrences than the current one...

Cheers,

David.

wpbasti

unread,

Dec 16, 2007, 1:56:07 PM12/16/07

to python-polib

Hi David,

me again. I've updated the patch. Attached below. I have modified it
according to your rules ;) The sorting issue is quite interesting. I
have fixed one issue in my comparison code, but it does not completely
fix it.

The test file you offer is not sorted IMHO. In our project we
initially create po files using the option "--sort-by-file". I don't
know if it is that important to be absolutely identical to the
behavior of the gettext-tools, but for my requirements the current
implementation is useable enough (in my testcase the sorting is
identical).

I have modified the test file with "msgcat -F tests/test_utf8.po >
res.po" before making the diff. It looks somewhat more comparable -
even if not identical. I think this needs further investigation.

Cheers,

Sebastian

The Patch:

*** polib.py 2007-12-16 13:45:20.000000000 +0100
--- polib_wpbasti.py 2007-12-16 19:39:21.000000000 +0100
***************
*** 890,895 ****
--- 890,948 ----

_listappend(ret, _BaseEntry.__str__(self))
return _strjoin('\n', ret)

+ def __cmp__(self, other):
+ """ Called by comparison operations if rich comparison is
not defined. """

+ def compareOccurrences(a, b):

+ """
+ Compare an entry occurrence with another one.
+ """
+ # compareOccurrences {{{
+ if a[0] != b[0]:

+ if a[0] > b[0]: return 1

+ else: return -1
+

+ if a[1] != b[1]:

+ if a[1] > b[1]: return 1

+ else: return -1
+

+ return 0
+ # }}}
+

+ # First: Obsolete test
+ if self.obsolete != other.obsolete:
+ if self.obsolete: return -1
+ else: return 1
+
+ # Work on a copy to protect original
+ occ1 = self.occurrences[:]
+ occ2 = other.occurrences[:]
+
+ # Sorting using compare method
+ occ1.sort(compareOccurrences)
+ occ2.sort(compareOccurrences)
+
+ # Comparing sorted occurrences

+ pos = 0
+ for entry1 in occ1:
+ try:

+ entry2 = occ2[pos]

+ except IndexError:
+ break

+
+ if entry1[0] != entry2[0]:
+ if entry1[0] > entry2[0]: return 1
+ else: return -1
+
+ if entry1[1] != entry2[1]:
+ if entry1[1] > entry2[1]: return 1
+ else: return -1
+

+ pos += 1

+
+ # Finally: Compare message ID
+ if self.msgid > other.msgid: return 1
+ else: return -1
+
def translated(self):
"""Return True if the entry has been translated or False"""
return ((self.msgstr != '' or self.msgstr_plural) and \

David Jean Louis

unread,

Dec 16, 2007, 3:46:58 PM12/16/07

to python...@googlegroups.com

> I have modified the test file with "msgcat -F tests/test_utf8.po >
> res.po" before making the diff. It looks somewhat more comparable -
> even if not identical. I think this needs further investigation.
>

I think I found the issue:
$ python
>>> "2" > "11"
True

:)

So, in other words, casting line numbers to int in compareOcccurences
fixes the problem.
I'll integrate your changes in svn (and in the next release) soon, but I
need to invistigate more on how gettext behaves by default.

Thanks for your time,

David.

wpbasti

unread,

Dec 19, 2007, 7:45:43 AM12/19/07

to python-polib

Thank you David. This seems sounds reasonable.

Sebastian

Reply all

Reply to author

Forward