Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
creating size-limited tar files
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  23 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
andrea crotti  
View profile  
 More options Nov 7 2012, 12:13 pm
Newsgroups: comp.lang.python
From: andrea crotti <andrea.crott...@gmail.com>
Date: Wed, 7 Nov 2012 17:13:47 +0000
Local: Wed, Nov 7 2012 12:13 pm
Subject: creating size-limited tar files
Simple problem, given a lot of data in many files/directories, I
should create a tar file splitted in chunks <= a given size.

The simplest way would be to compress the whole thing and then split.

At the moment the actual script which I'm replacing is doing a
"system('split..')", which is not that great, so I would like to do it
while compressing.

So I thought about (in pseudocode)

while remaining_files:
    tar_file.addfile(remaining_files.pop())
    if size(tar_file) >= limit:
         close(tar_file)
         tar_file = new_tar_file()

which might work maybe, but how do I get the current size?  There
should be tarinfo.size but it doesn't exist on a TarFile opened in
write mode, so should I do a stat after each flush?

Any other better ideas otherwise?
thanks


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Neil Cerutti  
View profile  
 More options Nov 7 2012, 1:40 pm
Newsgroups: comp.lang.python
From: Neil Cerutti <ne...@norwich.edu>
Date: 7 Nov 2012 18:40:14 GMT
Local: Wed, Nov 7 2012 1:40 pm
Subject: Re: creating size-limited tar files
On 2012-11-07, andrea crotti <andrea.crott...@gmail.com> wrote:

I have not used this module before, but what you seem to be
asking about is:

TarFile.gettarinfo().size

But your algorithm stops after the file is already too big.

--
Neil Cerutti


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alexander Blinne  
View profile  
 More options Nov 7 2012, 2:05 pm
Newsgroups: comp.lang.python
From: Alexander Blinne <n...@blinne.net>
Date: Wed, 07 Nov 2012 20:05:30 +0100
Local: Wed, Nov 7 2012 2:05 pm
Subject: Re: creating size-limited tar files
I don't know the best way to find the current size, I only have a
general remark.
This solution is not so good if you have to impose a hard limit on the
resulting file size. You could end up having a tar file of size "limit +
size of biggest file - 1 + overhead" in the worst case if the tar is at
limit - 1 and the next file is the biggest file. Of course that may be
acceptable in many cases or it may be acceptable to do something about
it by adjusting the limit.

My Idea:
Assuming tar_file works on some object with a file-like interface one
could implement a "transparent splitting file" class which would have to
use some kind of buffering mechanism. It would represent a virtual big
file that is stored in many pieces of fixed size (except the last) and
would allow you to just add all files to one tar_file and have it split
up transparently by the underlying file-object, something like

tar_file = TarFile(SplittingFile(names='archiv.tar-%03d', chunksize=
chunksize, mode='wb'))
while remaining_files:
    tar_file.addfile(remaining_files.pop())

and the splitting_file would automatically create chunks with size
chunksize and filenames archiv.tar-001, archiv.tar-002, ...

The same class could be used to put it back together, it may even
implement transparent seeking over a set of pieces of a big file. I
would like to have such a class around for general usage.

greetings


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Roy Smith  
View profile  
 More options Nov 7 2012, 3:32 pm
Newsgroups: comp.lang.python
From: Roy Smith <r...@panix.com>
Date: Wed, 07 Nov 2012 15:32:23 -0500
Local: Wed, Nov 7 2012 3:32 pm
Subject: Re: creating size-limited tar files
In article <509ab0fa$0$6636$9b4e6...@newsspool2.arcor-online.net>,
 Alexander Blinne <n...@blinne.net> wrote:

> I don't know the best way to find the current size, I only have a
> general remark.
> This solution is not so good if you have to impose a hard limit on the
> resulting file size. You could end up having a tar file of size "limit +
> size of biggest file - 1 + overhead" in the worst case if the tar is at
> limit - 1 and the next file is the biggest file. Of course that may be
> acceptable in many cases or it may be acceptable to do something about
> it by adjusting the limit.

If you truly have a hard limit, one possible solution would be to use
tell() to checkpoint the growing archive after each addition.  If adding
a new file unexpectedly causes you exceed your hard limit, you can
seek() back to the previous spot and truncate the file there.

Whether this is worth the effort is an exercise left for the reader.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Andrea Crotti  
View profile  
 More options Nov 7 2012, 4:54 pm
Newsgroups: comp.lang.python
From: Andrea Crotti <andrea.crott...@gmail.com>
Date: Wed, 07 Nov 2012 21:52:18 +0000
Local: Wed, Nov 7 2012 4:52 pm
Subject: Re: creating size-limited tar files
On 11/07/2012 08:32 PM, Roy Smith wrote:

So I'm not sure if it's an hard limit or not, but I'll check tomorrow.
But in general for the size I could also take the size of the files and
simply estimate the size of all of them,
pushing as many as they should fit in a tarfile.
With compression I might get a much smaller file maybe, but it would be
much easier..

But the other problem is that at the moment the people that get our
chunks reassemble the file with a simple:

cat file1.tar.gz file2.tar.gz > file.tar.gz

which I suppose is not going to work if I create 2 different tar files,
since it would recreate the header in all of the them, right?
So or I give also a script to reassemble everything or I have to split
in a more "brutal" way..

Maybe after all doing the final split was not too bad, I'll first check
if it's actually more expensive for the filesystem (which is very very slow)
or it's not a big deal...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Oscar Benjamin  
View profile  
 More options Nov 7 2012, 6:15 pm
Newsgroups: comp.lang.python
From: Oscar Benjamin <oscar.j.benja...@gmail.com>
Date: Wed, 7 Nov 2012 23:15:14 +0000
Local: Wed, Nov 7 2012 6:15 pm
Subject: Re: creating size-limited tar files
On 7 November 2012 21:52, Andrea Crotti <andrea.crott...@gmail.com> wrote:

Correct. But if you read the rest of Alexander's post you'll find a
suggestion that would work in this case and that can guarantee to give
files of the desired size.

You just need to define your own class that implements a write()
method and then distributes any data it receives to separate files.
You can then pass this as the fileobj argument to the tarfile.open
function:
http://docs.python.org/2/library/tarfile.html#tarfile.open

Oscar


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
andrea crotti  
View profile  
 More options Nov 8 2012, 5:12 am
Newsgroups: comp.lang.python
From: andrea crotti <andrea.crott...@gmail.com>
Date: Thu, 8 Nov 2012 10:11:44 +0000
Local: Thurs, Nov 8 2012 5:11 am
Subject: Re: creating size-limited tar files
2012/11/7 Oscar Benjamin <oscar.j.benja...@gmail.com>:

> Correct. But if you read the rest of Alexander's post you'll find a
> suggestion that would work in this case and that can guarantee to give
> files of the desired size.

> You just need to define your own class that implements a write()
> method and then distributes any data it receives to separate files.
> You can then pass this as the fileobj argument to the tarfile.open
> function:
> http://docs.python.org/2/library/tarfile.html#tarfile.open

> Oscar

Yes yes I saw the answer, but now I was thinking that what I need is
simply this:
tar czpvf - /path/to/archive | split -d -b 100M - tardisk

since it should run only on Linux it's probably way easier, my script
will then only need to create the list of files to tar..

The only doubt is if this is more or less reliably then doing it in
Python, when can this fail with some bad broken pipe?
(the filesystem is not very good as I said and it's mounted with NFS)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
andrea crotti  
View profile  
 More options Nov 8 2012, 5:29 am
Newsgroups: comp.lang.python
From: andrea crotti <andrea.crott...@gmail.com>
Date: Thu, 8 Nov 2012 10:29:36 +0000
Local: Thurs, Nov 8 2012 5:29 am
Subject: Re: creating size-limited tar files
2012/11/8 andrea crotti <andrea.crott...@gmail.com>:

> Yes yes I saw the answer, but now I was thinking that what I need is
> simply this:
> tar czpvf - /path/to/archive | split -d -b 100M - tardisk

> since it should run only on Linux it's probably way easier, my script
> will then only need to create the list of files to tar..

> The only doubt is if this is more or less reliably then doing it in
> Python, when can this fail with some bad broken pipe?
> (the filesystem is not very good as I said and it's mounted with NFS)

In the meanwhile I tried a couple of things, and using the pipe on
Linux actually works very nicely, it's even faster than simple tar for
some reasons..

[andrea@andreacrotti isos]$ time tar czpvf - file1.avi file2.avi |
split -d -b 1000M - inchunks
file1.avi
file2.avi

real    1m39.242s
user    1m14.415s
sys     0m7.140s

[andrea@andreacrotti isos]$ time tar czpvf total.tar.gz file1.avi file2.avi
file1.avi
file2.avi

real    1m41.190s
user    1m13.849s
sys     0m5.723s

[andrea@andreacrotti isos]$ time split -d -b 1000M total.tar.gz inchunks

real    0m55.282s
user    0m0.020s
sys     0m3.553s


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
andrea crotti  
View profile  
 More options Nov 9 2012, 5:39 am
Newsgroups: comp.lang.python
From: andrea crotti <andrea.crott...@gmail.com>
Date: Fri, 9 Nov 2012 10:39:21 +0000
Local: Fri, Nov 9 2012 5:39 am
Subject: Re: creating size-limited tar files
Anyway in the meanwhile I implemented this tar and split in this way below.
It works very well and it's probably much faster, but the downside is that
I give away control to tar and split..

def tar_and_split(inputfile, output, bytes_size=None):
    """Take the file containing all the files to compress, the bytes
    desired for the split and the base name of the output file
    """
    # cleanup first
    for fname in glob(output + "*"):
        logger.debug("Removing old file %s" % fname)
        remove(fname)

    out = '-' if bytes_size else (output + '.tar.gz')
    cmd = "tar czpf {} $(cat {})".format(out, inputfile)
    if bytes_size:
        cmd += "| split -b {} -d - {}".format(bytes_size, output)

    logger.info("Running command %s" % cmd)

    proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
    out, err = proc.communicate()
    if err:
        logger.error("Got error messages %s" % err)

    logger.info("Output %s" % out)

    if proc.returncode != 0:
        logger.error("Something failed running %s, need to re-run" % cmd)
        return False


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
andrea crotti  
View profile  
 More options Nov 13 2012, 5:31 am
Newsgroups: comp.lang.python
From: andrea crotti <andrea.crott...@gmail.com>
Date: Tue, 13 Nov 2012 10:31:32 +0000
Local: Tues, Nov 13 2012 5:31 am
Subject: Re: creating size-limited tar files
2012/11/9 andrea crotti <andrea.crott...@gmail.com>:

There is another problem with this solution, if I run something like
this with Popen:
    cmd = "tar {bigc} -czpf - --files-from {inputfile} | split -b
{bytes_size} -d - {output}"

    proc = subprocess.Popen(to_run, shell=True,
stdout=subprocess.PIPE, stderr=subprocess.PIPE)

the proc.returncode will only be the one from "split", so I lose the
ability to check if tar failed..

A solution would be something like this:
{ ls -dlkfjdsl; echo $? > tar.status; } | split

but it's a bit ugly.  I wonder if I can use the subprocess PIPEs to do
the same thing, is it going to be as fast and work in the same way??


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ian Kelly  
View profile  
 More options Nov 13 2012, 11:07 am
Newsgroups: comp.lang.python
From: Ian Kelly <ian.g.ke...@gmail.com>
Date: Tue, 13 Nov 2012 09:07:15 -0700
Local: Tues, Nov 13 2012 11:07 am
Subject: Re: creating size-limited tar files
On Tue, Nov 13, 2012 at 3:31 AM, andrea crotti

<andrea.crott...@gmail.com> wrote:
> but it's a bit ugly.  I wonder if I can use the subprocess PIPEs to do
> the same thing, is it going to be as fast and work in the same way??

It'll look something like this:

>>> p1 = subprocess.Popen(cmd1, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
>>> p2 = subprocess.Popen(cmd2, shell=True, stdin=p1.stdout, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
>>> p1.communicate()
('', '')
>>> p2.communicate()
('', '')
>>> p1.wait()
0
>>> p2.wait()

0

Note that there's a subtle potential for deadlock here.  During the
p1.communicate() call, if the p2 output buffer fills up, then it will
stop accepting input from p1 until p2.communicate() can be called, and
then if that buffer also fills up, p1 will hang.  Additionally, if p2
needs to wait on the parent process for some reason, then you end up
effectively serializing the two processes.

Solution would be to poll all the open-ended pipes in a select() loop
instead of using communicate(), or perhaps make the two communicate
calls simultaneously in separate threads.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ian Kelly  
View profile  
 More options Nov 13 2012, 11:26 am
Newsgroups: comp.lang.python
From: Ian Kelly <ian.g.ke...@gmail.com>
Date: Tue, 13 Nov 2012 09:25:20 -0700
Local: Tues, Nov 13 2012 11:25 am
Subject: Re: creating size-limited tar files

Sorry, the example I gave above is wrong.  If you're calling
p1.communicate(), then you need to first remove the p1.stdout pipe
from the Popen object.  Otherwise, the communicate() call will try to
read data from it and may "steal" input from p2.  It should look more
like this:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ian Kelly  
View profile  
 More options Nov 13 2012, 11:31 am
Newsgroups: comp.lang.python
From: Ian Kelly <ian.g.ke...@gmail.com>
Date: Tue, 13 Nov 2012 09:30:55 -0700
Local: Tues, Nov 13 2012 11:30 am
Subject: Re: creating size-limited tar files

On Tue, Nov 13, 2012 at 9:25 AM, Ian Kelly <ian.g.ke...@gmail.com> wrote:
> Sorry, the example I gave above is wrong.  If you're calling
> p1.communicate(), then you need to first remove the p1.stdout pipe
> from the Popen object.  Otherwise, the communicate() call will try to
> read data from it and may "steal" input from p2.  It should look more
> like this:

>>>> p1 = subprocess.Popen(cmd1, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
>>>> p2 = subprocess.Popen(cmd2, shell=True, stdin=p1.stdout, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
>>>> p1.stdout = None

Per the docs, that third line should be "p1.stdout.close()".  :-P

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kushal Kumaran  
View profile  
 More options Nov 14 2012, 1:05 am
Newsgroups: comp.lang.python
From: Kushal Kumaran <kushal.kumaran+pyt...@gmail.com>
Date: Wed, 14 Nov 2012 11:35:04 +0530
Local: Wed, Nov 14 2012 1:05 am
Subject: Re: creating size-limited tar files

Or, you could just change the p1's stderr to an io.BytesIO instance.
Then call p2.communicate *first*.

--
regards,
kushal


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ian Kelly  
View profile  
 More options Nov 14 2012, 2:23 am
Newsgroups: comp.lang.python
From: Ian Kelly <ian.g.ke...@gmail.com>
Date: Wed, 14 Nov 2012 00:22:48 -0700
Local: Wed, Nov 14 2012 2:22 am
Subject: Re: creating size-limited tar files
On Tue, Nov 13, 2012 at 11:05 PM, Kushal Kumaran

<kushal.kumaran+pyt...@gmail.com> wrote:
> Or, you could just change the p1's stderr to an io.BytesIO instance.
> Then call p2.communicate *first*.

This doesn't seem to work.

>>> b = io.BytesIO()
>>> p = subprocess.Popen(["ls", "-l"], stdout=b)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.2/subprocess.py", line 711, in __init__
    errread, errwrite) = self._get_handles(stdin, stdout, stderr)
  File "/usr/lib64/python3.2/subprocess.py", line 1112, in _get_handles
    c2pwrite = stdout.fileno()
io.UnsupportedOperation: fileno

I think stdout and stderr need to be actual file objects, not just
file-like objects.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kushal Kumaran  
View profile  
 More options Nov 14 2012, 3:52 am
Newsgroups: comp.lang.python
From: Kushal Kumaran <kushal.kumaran+pyt...@gmail.com>
Date: Wed, 14 Nov 2012 14:21:32 +0530
Local: Wed, Nov 14 2012 3:51 am
Subject: Re: creating size-limited tar files

Well, well, I was wrong, clearly.  I wonder if this is fixable.

--
regards,
kushal


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
andrea crotti  
View profile  
 More options Nov 14 2012, 6:53 am
Newsgroups: comp.lang.python
From: andrea crotti <andrea.crott...@gmail.com>
Date: Wed, 14 Nov 2012 11:52:59 +0000
Local: Wed, Nov 14 2012 6:52 am
Subject: Re: creating size-limited tar files
2012/11/14 Kushal Kumaran <kushal.kumaran+pyt...@gmail.com>:

> Well, well, I was wrong, clearly.  I wonder if this is fixable.

> --
> regards,
> kushal
> --
> http://mail.python.org/mailman/listinfo/python-list

But would it not be possible to use the pipe in memory in theory?
That would be way faster and since I have in theory enough RAM it
might be a great improvement..

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
andrea crotti  
View profile  
 More options Nov 14 2012, 10:56 am
Newsgroups: comp.lang.python
From: andrea crotti <andrea.crott...@gmail.com>
Date: Wed, 14 Nov 2012 15:56:28 +0000
Local: Wed, Nov 14 2012 10:56 am
Subject: Re: creating size-limited tar files
Ok this is all very nice, but:

[andrea@andreacrotti tar_baller]$ time python2 test_pipe.py > /dev/null

real    0m21.215s
user    0m0.750s
sys     0m1.703s

[andrea@andreacrotti tar_baller]$ time ls -lR /home/andrea | cat > /dev/null

real    0m0.986s
user    0m0.413s
sys     0m0.600s

where test_pipe.py is:
from subprocess import PIPE, Popen

# check if doing the pipe with subprocess and with the | is the same or not

pipe_file = open('pipefile', 'w')

p1 = Popen('ls -lR /home/andrea', shell=True, stdout=PIPE, stderr=PIPE)
p2 = Popen('cat', shell=True, stdin=p1.stdout, stdout=PIPE, stderr=PIPE)
p1.stdout.close()

print(p2.stdout.read())

So apparently it's way slower than using this system, is this normal?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dave Angel  
View profile  
 More options Nov 14 2012, 11:11 am
Newsgroups: comp.lang.python
From: Dave Angel <d...@davea.name>
Date: Wed, 14 Nov 2012 11:10:47 -0500
Local: Wed, Nov 14 2012 11:10 am
Subject: Re: creating size-limited tar files
On 11/14/2012 10:56 AM, andrea crotti wrote:

I'm not sure how this timing relates to the thread, but what it mainly
shows is that starting up the Python interpreter takes quite a while,
compared to not starting it up.

--

DaveA


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
andrea crotti  
View profile  
 More options Nov 14 2012, 11:16 am
Newsgroups: comp.lang.python
From: andrea crotti <andrea.crott...@gmail.com>
Date: Wed, 14 Nov 2012 16:16:16 +0000
Local: Wed, Nov 14 2012 11:16 am
Subject: Re: creating size-limited tar files
2012/11/14 Dave Angel <d...@davea.name>:

Well it's related because my program has to be as fast as possible, so
in theory I thought that using Python pipes would be better because I
can get easily the PID of the first process.

But if it's so slow than it's not worth, and I don't think is the
Python interpreter because it's more or less constantly many times
slower even changing the size of the input..


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dave Angel  
View profile  
 More options Nov 14 2012, 11:33 am
Newsgroups: comp.lang.python
From: Dave Angel <d...@davea.name>
Date: Wed, 14 Nov 2012 11:33:27 -0500
Local: Wed, Nov 14 2012 11:33 am
Subject: Re: creating size-limited tar files
On 11/14/2012 11:16 AM, andrea crotti wrote:

Well, as I said, I don't see how the particular timing has anything to
do with the rest of the thread.  If you want to do an ls within a Python
program, go ahead.  But if all you need can be done with ls itself, then
it'll be slower to launch python just to run it.

Your first timing runs python, which runs two new shells, ls, and cat.
Your second timing runs ls and cat.

So the difference is starting up python, plus starting the shell two
extra times.

I'd also be curious if you flushed the system buffers before each
timing, as the second test could be running entirely in system memory.
And no, I don't know offhand how to flush them in Linux, just that
without it, your timings are not at all repeatable.  Note the two
identical runs here.

davea@think:~/temppython$ time ls -lR ~ | cat > /dev/null

real    0m0.164s
user    0m0.020s
sys     0m0.000s
davea@think:~/temppython$ time ls -lR ~ | cat > /dev/null

real    0m0.018s
user    0m0.000s
sys     0m0.010s

real time goes down by 90%, while user time drops to zero.
And on a 3rd and subsequent run, sys time goes to zero as well.

--

DaveA


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Andrea Crotti  
View profile  
 More options Nov 14 2012, 3:45 pm
Newsgroups: comp.lang.python
From: Andrea Crotti <andrea.crott...@gmail.com>
Date: Wed, 14 Nov 2012 20:43:59 +0000
Local: Wed, Nov 14 2012 3:43 pm
Subject: Re: creating size-limited tar files
On 11/14/2012 04:33 PM, Dave Angel wrote:

Right I didn't think about that..
Anyway the only thing I wanted to understand is if using the pipes in
subprocess is exactly the same as doing
the Linux pipe, or not.

And any idea on how to run it in ram?
Maybe if I create a pipe in tmpfs it might already work, what do you think?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dave Angel  
View profile  
 More options Nov 14 2012, 3:57 pm
Newsgroups: comp.lang.python
From: Dave Angel <d...@davea.name>
Date: Wed, 14 Nov 2012 15:57:20 -0500
Local: Wed, Nov 14 2012 3:57 pm
Subject: Re: creating size-limited tar files
On 11/14/2012 03:43 PM, Andrea Crotti wrote:

> <SNIP>
> Anyway the only thing I wanted to understand is if using the pipes in
> subprocess is exactly the same as doing
> the Linux pipe, or not.

It's not the same thing, but you can usually assume it's close.  Other
effects will probably dominate any differences.

> And any idea on how to run it in ram?
> Maybe if I create a pipe in tmpfs it might already work, what do you think?

In a good virtual OS, such as Linux, there's very little predictable
difference between running in RAM (which is to say reading and writing
to the swap file) or reading and writing to a file you specify.  In
fact, writing to a file can frequently be quicker, if it's sequential.

Why?  Linux is using any given piece of physical RAM to map a file, or
an allocated buffer, or shared memory, or nearly anything.  About the
only special cases are the kind of RAM that has to be locked into RAM
for hardware reasons.

Linux decides which pieces to keep in memory, whether it calls it
caching, swapping, memory mapping, or whatever.  And frequently,
attempts to "beat the system"  result in counterintuitive results.

If in doubt, measure.  But choose your measures carefully, because lots
more things will change the measurement than you might expect.

--

DaveA


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »