Prevent source files which are updated (not created) from getting removed.

41 views
Skip to first unread message

Magnus Lyckå

unread,
Jan 26, 2015, 12:19:38 PM1/26/15
to fabrica...@googlegroups.com
I'm using fabricate for i18n, running my pybabel commands. Doing this the naive way, my translations get removed by fabricate! :-(
The issue is that my message.po files are UPDATED (not created from scratch) by a command I run from fabricate, so when I clean, they get erased, and they are the source of all my translations!

Briefly, the process works like this:

Initially do this:
1. Run "pybabel extract ..." to get strings from source code to be translated into "messages.pot"
2. Run "pylabel init ..." to create a new "messages.po" for a locale from the "messages.pot"
3. Run "pylabel compile ..." to create the binary "messages.mo" from "messages.po"
This is just an initial process for the project, nothing to automate, and no problem if I would...

But as the source code string change, I do this:
1. Run "pybabel extract ..." to get strings from source code to be translated into "messages.pot"
2. Run "pylabel update ..." to merge changes a new "messages.po" for a locale from the "messages.pot"
3. Run "pylabel compile ..." to create the binary "messages.mo" from "messages.po"

Step 1. above is fine. We can discard the old output file and make a new.

Step2. is the problem.
    "pybabel update -d i18n -i i18n/messages.pot": {
        ".": "input-5058f1af8388633f609cadb75a75dc9d",
        "i18n/en_US/LC_MESSAGES/messages.po": "output-54730ad04ad7f18c4ab9414319ed2b4f",
        "i18n/messages.pot": "input-01f66f91bb193c7ac139b42252ea42e1",
        "i18n/sv_SE/LC_MESSAGES/messages.po": "output-10c30a87e4812715409f9daf0c3b91db"
    }
As you see, the messages.po files are marked as output, but they are not CREATED by the command, only UPDATED.

What do I do about this?

Extract from build file:

def pot_extract():
    run('pybabel', 'extract', '-F', 'i18n/babel.cnf',
        '-o', 'i18n/messages.pot', '.')

def po_update():
    run('pybabel', 'update', '-d', 'i18n', '-i', 'i18n/messages.pot')

def _mo_compile(lang):
    run('pybabel', 'compile', '-d', 'i18n', '-l', lang)

def mo_SE():
    _mo_compile('sv_SE')

def mo_US():
    _mo_compile('en_US')

def mo_all():
    for fn in os.listdir('i18n'):
        full_path = os.path.join(os.getcwd(), 'i18n', fn)
        if os.path.isdir(full_path):
            _mo_compile(fn)

Lex Trotman

unread,
Jan 26, 2015, 6:42:43 PM1/26/15
to fabrica...@googlegroups.com
It appears that fabricate treats files opened read/write as outputs
only, not inputs and outputs. See
https://code.google.com/p/fabricate/source/browse/fabricate.py#628

Cheers
Lex
> --
> You received this message because you are subscribed to the Google Groups
> "fabricate users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to fabricate-use...@googlegroups.com.
> To post to this group, send email to fabrica...@googlegroups.com.
> Visit this group at http://groups.google.com/group/fabricate-users.
> For more options, visit https://groups.google.com/d/optout.

pjz

unread,
Jan 27, 2015, 7:48:21 AM1/27/15
to fabrica...@googlegroups.com
Can you just always do init?

Magnus Lyckå

unread,
Jan 27, 2015, 9:43:21 AM1/27/15
to fabrica...@googlegroups.com
Den tisdag 27 januari 2015 kl. 13:48:21 UTC+1 skrev pjz:
Can you just always do init?

No. If I do init, I'll throw away all my translations.

The localized "messafes.po" files contain both derived information and source information. I certainly don't want to repeat all the work of translating the whole program as soon as anything to be translated changes...

See e.g. http://babel.pocoo.org/docs/messages/
"pybabel update" it like the GNU msgmerge program.

Most tutorials on i18n shows it as a waterfall, but of course it's an iterative process. That's why you need msgmerge / pybabel update, and we have the problem with the .po files getting derived data, source data, derived data, source data...

Magnus Lyckå

unread,
Jan 27, 2015, 10:58:13 AM1/27/15
to fabrica...@googlegroups.com
First of all, let me thank everybody involved for building this very nice tool. I certainly hope I'll be able to use it for my simple system. I think it's the most "pythonic" build tool I've come across since I started using Python back in version 1.4...


Den tisdag 27 januari 2015 kl. 00:42:43 UTC+1 skrev LexT:
It appears that fabricate treats files opened read/write as outputs
only, not inputs and outputs. See
https://code.google.com/p/fabricate/source/browse/fabricate.py#628
 
Yes, the peculiar thing with the gettext .po-files is that they contain both derived material and source material.

During development & maintenance, they will grow from two sources:

 1. msgid, i.e. the texts derived from source code, such as marked string literals in .py files, will be derived via the .pot files and merged into the .po files for the various locales.
 2. msgstr, i.e. the translations of each msgid in whatever languages we use. This is source material and must absolutely not be discarded.

If I understand Builder.autoclean correctly, it will remove any file which was output from any build step, assuming this file only contains derived material. While the assumption (output == only derived data) is true in most cases, it's obviously false in this case, and I can imagine other scenarios. (Please correct me if I'm confused here, I only found this nice tool a few days ago). I include a couple of "grep sv_SE" from strace of the pybabel update below to show what happens in this particular case.

Anyway, I can imagine two approaches to fix the problem here:

 1. Make fabricate.py figure out more in detail whether a file is completely derived, or if it contains source code as well.
 2. Make it possible to somehow tell fabricate.py that some files (e.g. a pattern such as '*.po') should not be cleaned out.

Perhaps the second approach is more practical. I suspect the first approach will both complicate the code, and whatever we do, we might find out that there is some other case we didn't cover.

No change in the pot file since last update this time.

stat("i18n/sv_SE/LC_MESSAGES/messages.po", {st_mode=S_IFREG|0664, st_size=26004, ...}) = 0
write
(2, "updating catalog 'i18n/sv_SE/LC_"..., 83updating catalog 'i18n/sv_SE/LC_MESSAGES/messages.po' based on 'i18n/messages.pot'
open
("i18n/sv_SE/LC_MESSAGES/messages.po", O_RDONLY) = 3
stat
("/usr/local/lib/python2.7/dist-packages/babel/localedata/sv_SE.dat", {st_mode=S_IFREG|0644, st_size=389, ...}) = 0
open
("i18n/sv_SE/LC_MESSAGES/tmpmessages.po", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
rename
("i18n/sv_SE/LC_MESSAGES/tmpmessages.po", "i18n/sv_SE/LC_MESSAGES/messages.po") = 0
stat
("i18n/sv_SE/LC_MESSAGES/messages.po", {st_mode=S_IFREG|0664, st_size=26004, ...}) = 0
write
(2, "updating catalog 'i18n/sv_SE/LC_"..., 83updating catalog 'i18n/sv_SE/LC_MESSAGES/messages.po' based on 'i18n/messages.pot'
open
("i18n/sv_SE/LC_MESSAGES/messages.po", O_RDONLY) = 3
stat
("/usr/local/lib/python2.7/dist-packages/babel/localedata/sv_SE.dat", {st_mode=S_IFREG|0644, st_size=389, ...}) = 0
open
("i18n/sv_SE/LC_MESSAGES/tmpmessages.po", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
rename
("i18n/sv_SE/LC_MESSAGES/tmpmessages.po", "i18n/sv_SE/LC_MESSAGES/messages.po") = 0

This time there was a change in the "messages.pot" file.

stat("i18n/sv_SE/LC_MESSAGES/messages.po", {st_mode=S_IFREG|0664, st_size=26004, ...}) = 0
write(2, "updating catalog 'i18n/sv_SE/LC_"..., 83updating catalog 'i18n/sv_SE/LC_MESSAGES/messages.po' based on 'i18n/messages.pot'
open("i18n/sv_SE/LC_MESSAGES/messages.po", O_RDONLY) = 3
stat("/usr/local/lib/python2.7/dist-packages/babel/localedata/sv_SE.dat", {st_mode=S_IFREG|0644, st_size=389, ...}) = 0
open("i18n/sv_SE/LC_MESSAGES/tmpmessages.po", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
rename("i18n/sv_SE/LC_MESSAGES/tmpmessages.po", "i18n/sv_SE/LC_MESSAGES/messages.po") = 0
stat("i18n/sv_SE/LC_MESSAGES/messages.po", {st_mode=S_IFREG|0664, st_size=26004, ...}) = 0
write(2, "updating catalog 'i18n/sv_SE/LC_"..., 83updating catalog 'i18n/sv_SE/LC_MESSAGES/messages.po' based on 'i18n/messages.pot'
open("i18n/sv_SE/LC_MESSAGES/messages.po", O_RDONLY) = 3
stat("/usr/local/lib/python2.7/dist-packages/babel/localedata/sv_SE.dat", {st_mode=S_IFREG|0644, st_size=389, ...}) = 0
open("i18n/sv_SE/LC_MESSAGES/tmpmessages.po", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
rename("i18n/sv_SE/LC_MESSAGES/tmpmessages.po", "i18n/sv_SE/LC_MESSAGES/messages.po") = 0
stat("i18n/sv_SE/LC_MESSAGES/messages.po", {st_mode=S_IFREG|0664, st_size=26065, ...}) = 0
write(2, "updating catalog 'i18n/sv_SE/LC_"..., 83updating catalog 'i18n/sv_SE/LC_MESSAGES/messages.po' based on 'i18n/messages.pot'
open("i18n/sv_SE/LC_MESSAGES/messages.po", O_RDONLY) = 3
stat("/usr/local/lib/python2.7/dist-packages/babel/localedata/sv_SE.dat", {st_mode=S_IFREG|0644, st_size=389, ...}) = 0
open("i18n/sv_SE/LC_MESSAGES/tmpmessages.po", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
rename("i18n/sv_SE/LC_MESSAGES/tmpmessages.po", "i18n/sv_SE/LC_MESSAGES/messages.po") = 0

Simon Alford

unread,
Jan 27, 2015, 4:37:16 PM1/27/15
to fabrica...@googlegroups.com

What you need is to ensure that *.po files are only treated as inputs. Currently the only way to do this is a custom Runner. I did this for a fabricate script that ran a compiler that "stat'ed" all files in a directory and all sub directories.

I don't have the source of that script to hand right now but will try and get an example of how to do this tomorrow. If I remember correctly it just involved altering one of the regex that match file operations. You will just have to alter any regex that matches for output operations.

Simon.

Lex Trotman

unread,
Jan 27, 2015, 6:13:58 PM1/27/15
to fabrica...@googlegroups.com
Looking at your strace below its doing the rename trick which is gonna
be very hard to detect as update, ie it reads messages.po, writes
tmpmessages.po and then renames tmpmessages.po to messages.po. It
actually never writes messages.po.

The more simple pattern read file, close file, write file, close file
will also have the problem.

These are because done()
https://code.google.com/p/fabricate/source/browse/fabricate.py#1139
will add an `input-` entry for the read then overwrite that entry with
an `output-` entry for the rename/write.

The "right" solution is that there needs to be an `update -` type
stored in the dependencies by done() if a `input-` entry exists when
its about to write a `output-` entry, and so clean won't delete it,
since its not output. That entry should be treated as both input and
output when the deps are used. Then files that are opened R/W need to
create both a dep and an output entry as I noted in my last post so
they create an `input-` and then an `update-` entry in done().

Note there is no way of adding both an input and an output entry to
the deps for a filename since its a dict keyed by filename so it can
only have one entry, so it has to be a separate `update-` entry.

That solution is actually not totally right since which hash should it
use, the one for the file when its input, or the one for the file
after its updated? Probably the output hash is likely to be the most
right solution, ... or is it, should the command always be re-run if
the file changes?

Cheers
Lex

Simon Alford

unread,
Jan 28, 2015, 9:23:34 AM1/28/15
to fabrica...@googlegroups.com
Hi again,

This is what I did to workaround an issue in the closure-compiler. I have this at the start of my build script. As you can see, I extend the normal strace runner, and disable stat matching by giving it a regex that does not match anything.

# Disable dependency checking on stat
# Works around problem of closure compiler stat-ing all files
class NoStatStraceRunner(StraceRunner):

        def __init__(self, builder, build_dir=None):
                super(NoStatStraceRunner, self).__init__(builder, build_dir=build_dir)
                self._stat_re = re.compile('^Do not match anything$')


# Set the runner to the custom runner
setup(runner=NoStatStraceRunner, dirs=['./])


As Lex pointed out your build uses open and rename to perform the changes on the file. The regex for these are (from the fabricate source code):

    _open_re       = re.compile(r'(?P<pid>\d+)\s+open\("(?P<name>[^"]*)", (?P<mode>[^,)]*)')
    _rename_re     = re.compile(r'(?P<pid>\d+)\s+rename\("[^"]*", "(?P<name>[^"]*)"\)')

What you need to do is ensure that the rename regex does not match *.po files and that the open regex does not set <match> to O_WRONLY or O_ RDWR on *.po files. This will ensure that all *.po files are only treated as inputs and will never be cleaned. The input dependency with ensure commands are re-run if the file changes. You may also have to disable create matches.

# Disable output checking on *.po files
# Works around problem of *.po file being a mix of generated and source
class NoPoOutputRunner(StraceRunner):

  def __init__(self, builder, build_dir=None):
    super(NoPoOutputRunner, self).__init__(builder, build_dir=build_dir)
    self._open_re = re.compile(r'(?:(?P<pid>\d+)\s+open\("(?P<name>[^"]*\.po)", (?:O_WRONLY|O_RDWR)(?P<mode>[^,)]*))|(?:(?P<pid>\d+)\s+open\("(?P<name>[^"]*)", (?P<mode>[^,)]*))')
    self._rename_re = re.compile(r'(?P<pid>\d+)\s+rename\("[^"]*", "(?P<name>[^"]*)(?:(?:\.po){0,1})"\)')

# Set the runner to the custom runner
setup(runner=NoPoOutputRunner, dirs=['.'])

I think this will work. Its a bit of a hack. The regex for open will not capture the O_WRONLY or O_ RDWR on *.po files. The one for rename is worse, it will not capture the full file name for .po files. So fabricate will assume the file has been deleted when it comes to calculates the hash. The approach for the rename may cause other problems if for example there are files called "messages" and "messages.po". If you better at regex than me you may be able to invent a better one that does not match/capture at all. Note: I have tried the regexs on http://regexr.com/ but not in python.

Hope this helps, or at leas gives you an idea how to work around it.

As a proper fix fabricate really needs a method of handing it a list of files or patterns that should not be deleted by the autoclean() function. Take a look at autoclean in fabricate.py. Its only 20 lines long. You may find it easier to hack that to take a list of files to avoid deleting.

Simon.
Reply all
Reply to author
Forward
0 new messages