Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

how to simulate tar filename substitution across piped subprocess.Popen() calls?

96 views
Skip to first unread message

jkn

unread,
Nov 8, 2012, 1:05:11 PM11/8/12
to
Hi All
i am trying to build up a set of subprocess.Ponen calls to
replicate the effect of a horribly long shell command. I'm not clear
how I can do one part of this and wonder if anyone can advise. I'm on
Linux, fairly obviously.

I have a command which (simplified) is a tar -c command piped through
to xargs:

tar -czvf myfile.tgz -c $MYDIR mysubdir/ | xargs -I '{}' sh -c "test -
f $MYDIR/'{}'"

(The full command is more complicated than this; I got it from a shell
guru).

IIUC, when called like this, the two occurences of '{}' in the xargs
command will get replaced with the file being added to the tarfile.

Also IIUC, I will need two calls to subprocess.Popen() and use
subprocess.stdin on the second to receive the output from the first.
But how can I achive the substitution of the '{}' construction across
these two calls?

Apologies if I've made any howlers in this description - it's very
likely...

Cheers
J^n






Hans Mulder

unread,
Nov 8, 2012, 8:12:42 PM11/8/12
to
On 8/11/12 19:05:11, jkn wrote:
> Hi All
> i am trying to build up a set of subprocess.Ponen calls to
> replicate the effect of a horribly long shell command. I'm not clear
> how I can do one part of this and wonder if anyone can advise. I'm on
> Linux, fairly obviously.
>
> I have a command which (simplified) is a tar -c command piped through
> to xargs:
>
> tar -czvf myfile.tgz -c $MYDIR mysubdir/ | xargs -I '{}' sh -c "test -
> f $MYDIR/'{}'"
>
> (The full command is more complicated than this; I got it from a shell
> guru).
>
> IIUC, when called like this, the two occurences of '{}' in the xargs
> command will get replaced with the file being added to the tarfile.
>
> Also IIUC, I will need two calls to subprocess.Popen() and use
> subprocess.stdin on the second to receive the output from the first.
> But how can I achive the substitution of the '{}' construction across
> these two calls?

That's what 'xargs' will do for you. All you need to do, is invoke
xargs with arguments containing '{}'. I.e., something like:

cmd1 = ['tar', '-czvf', 'myfile.tgz', '-c', mydir, 'mysubdir']
first_process = subprocess.Popen(cmd1, stdout=subprocess.PIPE)

cmd2 = ['xargs', '-I', '{}', 'sh', '-c', "test -f %s/'{}'" % mydir]
second_process = subprocess.Popen(cmd2, stdin=first_process.stdout)

> Apologies if I've made any howlers in this description - it's very
> likely...

I think the second '-c' argument to tar should have been a '-C'.

I'm not sure I understand what the second command is trying to
achieve. On my system, nothing happens, because tar writes the
names of the files it is adding to stderr, so xargs receives no
input at all. If I send the stderr from tar to the stdin of
xargs, then it still doesn't seem to do anything sensible.

Perhaps your real xargs command is more complicated and more
sensible.



Hope this helps,

-- HansM

jkn

unread,
Nov 12, 2012, 7:55:23 AM11/12/12
to
Hi Hans
thanks a lot for your reply:

> That's what 'xargs' will do for you.  All you need to do, is invoke
> xargs with arguments containing '{}'.  I.e., something like:
>
> cmd1 = ['tar', '-czvf', 'myfile.tgz', '-c', mydir, 'mysubdir']
> first_process = subprocess.Popen(cmd1, stdout=subprocess.PIPE)
>
> cmd2 = ['xargs', '-I', '{}', 'sh', '-c', "test -f %s/'{}'" % mydir]
> second_process = subprocess.Popen(cmd2, stdin=first_process.stdout)
>

Hmm - that's pretty much what I've been trying. I will have to
experiment a bit more and post the results in a bit more detail.

> > Apologies if I've made any howlers in this description - it's very
> > likely...
>

> I think the second '-c' argument to tar should have been a '-C'.

You are correct, thanks. Serves me right for typing the simplified
version in by hand. I actually use the equivalent "--directory=..." in
the actual code.

> I'm not sure I understand what the second command is trying to
> achieve.  On my system, nothing happens, because tar writes the
> names of the files it is adding to stderr, so xargs receives no
> input at all.  If I send the stderr from tar to the stdin of
> xargs, then it still doesn't seem to do anything sensible.

That's interesting ... on my system, and all others that I know about,
the file list goes to stdout.

> Perhaps your real xargs command is more complicated and more
> sensible.

Yes, in fact the output from xargs is piped to a third process. But I
realise this doesn't alter the result of your experiment; the xargs
process should filter a subset of the files being fed to it.

I will experiment a bit more and hopefully post some results. Thanks
in the meantime...

Regards
Jon N

jkn

unread,
Nov 12, 2012, 10:36:58 AM11/12/12
to
slight followup ...

I have made some progress; for now I'm using subprocess.communicate to
read the output from the first subprocess, then writing it into the
secodn subprocess. This way I at least get to see what is
happening ...

The reason 'we' weren't seeing any output from the second call (the
'xargs') is that as mentioned I had simplified this. The actual shell
command was more like (in python-speak):

"xargs -I {} sh -c \"test -f %s/{} && md5sum %s/{}\"" % (mydir, mydir)

ie. I am running md5sum on each tar-file entry which passes the 'is
this a file' test.

My next problem; how to translate the command-string clause

"test -f %s/{} && md5sum %s/{}" # ...

into s parameter to subprocss.Popen(). I think it's the command
chaining '&&' which is tripping me up...

Cheers
J^n



Hans Mulder

unread,
Nov 12, 2012, 11:35:43 AM11/12/12
to
It is not really necessary to translate the '&&': you can
just write:

"test -f '%s/{}' && md5sum '%s/{}'" % (mydir, mydir)

, and xargs will pass that to the shell, and then the shell
will interpret the '&&' for you: you have shell=False in your
subprocess.Popen call, but the arguments to xargs are -I {}
sh -c "....", and this means that xargs ends up invoking the
shell (after replacing the {} with the name of a file).

Alternatively, you could translate it as:

"if [ -f '%s/{}' ]; then md5sum '%s/{}'; fi" % (mydir, mydir)

; that might make the intent clearer to whoever gets to
maintain your code.

Rebelo

unread,
Nov 12, 2012, 11:58:16 AM11/12/12
to
Dana četvrtak, 8. studenoga 2012. 19:05:12 UTC+1, korisnik jkn napisao je:
> Hi All
>
> i am trying to build up a set of subprocess.Ponen calls to
>
> replicate the effect of a horribly long shell command. I'm not clear
>
> how I can do one part of this and wonder if anyone can advise. I'm on
>
> Linux, fairly obviously.
>
> J^n

You should try to do it in pure python, avoiding shell altogether.
The first step would be to actually write what it is you want to do.

To filter files you want to add to tar file check tarfile (http://docs.python.org/2/library/tarfile.html?highlight=tar#module-tarfile),
specifically :
TarFile.add(name, arcname=None, recursive=True, exclude=None, filter=None)
which takes filter paramter :
"If filter is specified it must be a function that takes a TarInfo object argument and returns the changed TarInfo object. If it instead returns None the TarInfo object will be excluded from the archive."

jkn

unread,
Nov 12, 2012, 12:22:44 PM11/12/12
to
Hi Hans
Yes to both points; turns out that my problem was in building up the
command sequence to subprocess.Popen() - when to use, and not use,
quotes etc. It has ended up as (spelled out in longhand...)


xargsproc = ['xargs']

xargsproc.append('-I')
xargsproc.append("{}")

xargsproc.append('sh')
xargsproc.append('-c')

xargsproc.append("test -f %s/{} && md5sum %s/{}" % (mydir,
mydir))


As usual, breaking it all down for the purposes of clarification has
helpd a lot, as has your input. Thanks a lot.

Cheers
Jon N

jkn

unread,
Nov 12, 2012, 12:25:01 PM11/12/12
to
On Nov 12, 4:58 pm, Rebelo <puntabl...@gmail.com> wrote:
> Dana četvrtak, 8. studenoga 2012. 19:05:12 UTC+1, korisnik jkn napisao je:
>
> > Hi All
>
> >     i am trying to build up a set of subprocess.Ponen calls to
>
> > replicate the effect of a horribly long shell command. I'm not clear
>
> > how I can do one part of this and wonder if anyone can advise. I'm on
>
> > Linux, fairly obviously.
>
> >     J^n
>
> You should try to do it in pure python, avoiding shell altogether.
> The first step would be to actually write what it is you want to do.
>

Hi Rebelo
FWIW I intend to do exactly this - but I wanted to duplicate the
existing shell action beforehand, so that I could get rid of the shell
command.

After I've tidied things up, that will be my next step.

Cheers
Jon N



Hans Mulder

unread,
Nov 12, 2012, 1:30:05 PM11/12/12
to
This will break if there are spaces in the file name, or other
characters meaningful to the shell. If you change if to

xargsproc.append("test -f '%s/{}' && md5sum '%s/{}'"
% (mydir, mydir))

, then it will only break if there are single quotes in the file name.

As I understand, your plan is to rewrite this bit in pure Python, to
get rid of any and all such problems.

> As usual, breaking it all down for the purposes of clarification has
> helpd a lot, as has your input. Thanks a lot.

You're welcome.

-- HansM


jkn

unread,
Nov 12, 2012, 4:43:56 PM11/12/12
to
Hi Hans

[...]
>
> >         xargsproc.append("test -f %s/{} && md5sum %s/{}" % (mydir,
> > mydir))
>
> This will break if there are spaces in the file name, or other
> characters meaningful to the shell.  If you change if to
>
>         xargsproc.append("test -f '%s/{}' && md5sum '%s/{}'"
>                              % (mydir, mydir))
>
> , then it will only break if there are single quotes in the file name.

Fair point. As it happens, I know that there are no 'unhelpful'
characters in the filenames ... but it's still worth doing.

>
> As I understand, your plan is to rewrite this bit in pure Python, to
> get rid of any and all such problems.

Yep - as mentioned in another reply I wanted first to have something
which duplicated the current action (which has taken longer than I
expected), and then rework in a more pythonic way.

Still, I've learned some things about the subprocess module, and also
about the shell, so it's been far from wasted time.

Regards
Jon N

Thomas Rachel

unread,
Nov 13, 2012, 1:52:30 PM11/13/12
to
Am 09.11.2012 02:12 schrieb Hans Mulder:

> That's what 'xargs' will do for you. All you need to do, is invoke
> xargs with arguments containing '{}'. I.e., something like:
>
> cmd1 = ['tar', '-czvf', 'myfile.tgz', '-c', mydir, 'mysubdir']
> first_process = subprocess.Popen(cmd1, stdout=subprocess.PIPE)
>
> cmd2 = ['xargs', '-I', '{}', 'sh', '-c', "test -f %s/'{}'" % mydir]
> second_process = subprocess.Popen(cmd2, stdin=first_process.stdout)

After launching second_process, it might be useful to
firstprocess.stdout.close(). If you fail to do so, your process is a
second reader which might break things apart.

At least, I once hat issues with it; I currently cannot recapitulate
what these were nor how they could arise; maybe there was just the open
file descriptor which annoyed me.


Thomas

Thomas Rachel

unread,
Nov 13, 2012, 4:36:47 PM11/13/12
to
Am 12.11.2012 19:30 schrieb Hans Mulder:

> This will break if there are spaces in the file name, or other
> characters meaningful to the shell. If you change if to
>
> xargsproc.append("test -f '%s/{}'&& md5sum '%s/{}'"
> % (mydir, mydir))
>
> , then it will only break if there are single quotes in the file name.

And if you do mydir_q = mydir.replace("'", "'\\''") and use mydir_q, you
should be safe...


Thomas

Hans Mulder

unread,
Nov 14, 2012, 3:22:52 AM11/14/12
to
The problem isn't single quotes in mydir, but single quotes in the
files names that 'tar' generates and 'xargs' consumes. In the shell
script, these names go directly from tar to xargs via a pipe. If the
OP wants to do your replace, his script would have to read the output
of tar and do the replace before passing the filenames down a second
pipe to xargs.

However, once he does that, it's simpler to cut out xargs and invoke
"sh" directly. Or even cut out "sh" and "test" and instead use
os.path.isfile and then call md5sum directly. And once he does that,
he no longer needs to worry about single quotes.

The OP has said, he's going to d all that. One step at a time.
That sounds like a sensible plan to me.

jkn

unread,
Nov 18, 2012, 6:17:48 PM11/18/12
to
Hi Hans

[...]
>
>
> However, once he does that, it's simpler to cut out xargs and invoke
>
> "sh" directly. Or even cut out "sh" and "test" and instead use
>
> os.path.isfile and then call md5sum directly. And once he does that,
>
> he no longer needs to worry about single quotes.
>

Yes indeed, using os.path.isfile() and them md5sum directly is my plan ... for reasons of maintainability (by myself) more than anything else.

>
>
> The OP has said, he's going to d all that. One step at a time.
>
> That sounds like a sensible plan to me.
>

Thanks a lot.

J^n
0 new messages