Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Fast forward-backward (write-read)
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 1 - 25 of 43 - Collapse all  -  Translate all to Translated (View all originals)   Newer >
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Virgil Stokes  
View profile  
 More options Oct 23 2012, 10:52 am
Newsgroups: comp.lang.python
From: Virgil Stokes <v...@it.uu.se>
Date: Tue, 23 Oct 2012 16:31:17 +0200
Local: Tues, Oct 23 2012 10:31 am
Subject: Fast forward-backward (write-read)
I am working with some rather large data files (>100GB) that contain time series
data. The data (t_k,y(t_k)), k = 0,1,...,N are stored in ASCII format. I perform
various types of processing on these data (e.g. moving median, moving average,
and Kalman-filter, Kalman-smoother) in a sequential manner and only a small
number of these data need be stored in RAM when being processed. When performing
Kalman-filtering (forward in time pass, k = 0,1,...,N) I need to save to an
external file several variables (e.g. 11*32 bytes) for each (t_k, y(t_k)). These
are inputs to the Kalman-smoother (backward in time pass, k = N,N-1,...,0).
Thus, I will need to input these variables saved to an external file from the
forward pass, in reverse order --- from last written to first written.

Finally, to my question --- What is a fast way to write these variables to an
external file and then read them in backwards?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tim Chase  
View profile  
 More options Oct 23 2012, 12:09 pm
Newsgroups: comp.lang.python
From: Tim Chase <python.l...@tim.thechases.com>
Date: Tue, 23 Oct 2012 11:09:58 -0500
Local: Tues, Oct 23 2012 12:09 pm
Subject: Re: Fast forward-backward (write-read)
On 10/23/12 09:31, Virgil Stokes wrote:

> I am working with some rather large data files (>100GB) that contain time series
> data. The data (t_k,y(t_k)), k = 0,1,...,N are stored in ASCII format. I perform
> various types of processing on these data (e.g. moving median, moving average,
> and Kalman-filter, Kalman-smoother) in a sequential manner and only a small
> number of these data need be stored in RAM when being processed. When performing
> Kalman-filtering (forward in time pass, k = 0,1,...,N) I need to save to an
> external file several variables (e.g. 11*32 bytes) for each (t_k, y(t_k)). These
> are inputs to the Kalman-smoother (backward in time pass, k = N,N-1,...,0).
> Thus, I will need to input these variables saved to an external file from the
> forward pass, in reverse order --- from last written to first written.

> Finally, to my question --- What is a fast way to write these variables to an
> external file and then read them in backwards?

Am I missing something, or would the fairly-standard "tac" utility
do the reversal you want?  It should[*] be optimized to handle
on-disk files in a smart manner.

Otherwise, if you can pad the record-lengths so they're all the
same, and you know the total number of records, you can seek to
Total-(RecSize*OneBasedOffset) and write the record,optionally
padding if you need/can.  At least on *nix-like OSes, you can seek
into a sparse-file with no problems (untested on Win32).

-tkc

[*]
Just guessing here. Would be disappointed if it *wasn't*.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Paul Rubin  
View profile  
 More options Oct 23 2012, 12:17 pm
Newsgroups: comp.lang.python
From: Paul Rubin <no.em...@nospam.invalid>
Date: Tue, 23 Oct 2012 09:17:35 -0700
Local: Tues, Oct 23 2012 12:17 pm
Subject: Re: Fast forward-backward (write-read)

Virgil Stokes <v...@it.uu.se> writes:
> Finally, to my question --- What is a fast way to write these
> variables to an external file and then read them in backwards?

Seeking backwards in files works, but the performance hit is
significant.  There is also a performance hit to scanning pointers
backwards in memory, due to cache misprediction.  If it's something
you're just running a few times, seeking backwards the simplest
approach.  If you're really trying to optimize the thing, you might
buffer up large chunks (like 1 MB) before writing.  If you're writing
once and reading multiple times, you might reverse the order of records
within the chunks during the writing phase.  

You're of course taking a performance bath from writing the program in
Python to begin with (unless using scipy/numpy or the like), enough that
it might dominate any effects of how the files are written.

Of course (it should go without saying) that you want to dump in a
binary format rather than converting to decimal.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Paul Rubin  
View profile  
 More options Oct 23 2012, 12:22 pm
Newsgroups: comp.lang.python
From: Paul Rubin <no.em...@nospam.invalid>
Date: Tue, 23 Oct 2012 09:22:21 -0700
Local: Tues, Oct 23 2012 12:22 pm
Subject: Re: Fast forward-backward (write-read)

Paul Rubin <no.em...@nospam.invalid> writes:
> Seeking backwards in files works, but the performance hit is
> significant.  There is also a performance hit to scanning pointers
> backwards in memory, due to cache misprediction.  If it's something
> you're just running a few times, seeking backwards the simplest
> approach.

Oh yes, I should have mentioned, it may be simpler and perhaps a little
bit faster to use mmap rather than seeking.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tim Chase  
View profile  
 More options Oct 23 2012, 12:52 pm
Newsgroups: comp.lang.python
From: Tim Chase <python.l...@tim.thechases.com>
Date: Tue, 23 Oct 2012 11:53:37 -0500
Local: Tues, Oct 23 2012 12:53 pm
Subject: Re: Fast forward-backward (write-read)
On 10/23/12 11:17, Paul Rubin wrote:

> Virgil Stokes <v...@it.uu.se> writes:
>> Finally, to my question --- What is a fast way to write these
>> variables to an external file and then read them in backwards?

> Seeking backwards in files works, but the performance hit is
> significant.  There is also a performance hit to scanning pointers
> backwards in memory, due to cache misprediction.  If it's something
> you're just running a few times, seeking backwards the simplest
> approach.  If you're really trying to optimize the thing, you might
> buffer up large chunks (like 1 MB) before writing.  If you're writing
> once and reading multiple times, you might reverse the order of records
> within the chunks during the writing phase.

I agree with Paul here, it's been a while since I did it, and my
dataset was small enough (and passed through once) so I just let it
run.  Writing larger chunks is definitely a good way to go.

> You're of course taking a performance bath from writing the program in
> Python to begin with (unless using scipy/numpy or the like), enough that
> it might dominate any effects of how the files are written.

I usually find that the I/O almost always overwhelms the actual
processing.

> Of course (it should go without saying) that you want to dump in a
> binary format rather than converting to decimal.

Again, the conversion to/from decimal hasn't been a great cost in my
experience, as it's overwhelmed by the I/O cost of shoveling the
data to/from disk.

-tkc


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Paul Rubin  
View profile  
 More options Oct 23 2012, 12:58 pm
Newsgroups: comp.lang.python
From: Paul Rubin <no.em...@nospam.invalid>
Date: Tue, 23 Oct 2012 09:58:38 -0700
Local: Tues, Oct 23 2012 12:58 pm
Subject: Re: Fast forward-backward (write-read)

Tim Chase <python.l...@tim.thechases.com> writes:
> Again, the conversion to/from decimal hasn't been a great cost in my
> experience, as it's overwhelmed by the I/O cost of shoveling the
> data to/from disk.

I've found that cpu costs both for processing and conversion are
significant.  Also, using a binary format makes the file a lot smaller,
which decreases the i/o cost as well eliminating the conversion cost.
And, the conversion can introduce precision loss, another thing to be
avoided.  The famous "butterfly effect" was serendipitously discovered
that way.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Virgil Stokes  
View profile  
 More options Oct 23 2012, 1:17 pm
Newsgroups: comp.lang.python
From: Virgil Stokes <v...@it.uu.se>
Date: Tue, 23 Oct 2012 19:17:51 +0200
Local: Tues, Oct 23 2012 1:17 pm
Subject: Re: Fast forward-backward (write-read)
On 23-Oct-2012 18:09, Tim Chase wrote:

Not sure about "tac" --- could you provide more details on this and/or a simple
example of how it could be used for fast reversed "reading" of a data file?

> Otherwise, if you can pad the record-lengths so they're all the
> same, and you know the total number of records, you can seek to
> Total-(RecSize*OneBasedOffset) and write the record,optionally
> padding if you need/can.  At least on *nix-like OSes, you can seek
> into a sparse-file with no problems (untested on Win32).

The records lengths will all be the same and yes seek could be used; but, I was
hoping for a faster method.

Thanks Tim! :-)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Virgil Stokes  
View profile  
 More options Oct 23 2012, 1:32 pm
Newsgroups: comp.lang.python
From: Virgil Stokes <v...@it.uu.se>
Date: Tue, 23 Oct 2012 19:06:46 +0200
Local: Tues, Oct 23 2012 1:06 pm
Subject: Re: Fast forward-backward (write-read)
On 23-Oct-2012 18:17, Paul Rubin wrote:
> Virgil Stokes <v...@it.uu.se> writes:
>> Finally, to my question --- What is a fast way to write these
>> variables to an external file and then read them in backwards?
> Seeking backwards in files works, but the performance hit is
> significant.  There is also a performance hit to scanning pointers
> backwards in memory, due to cache misprediction.  If it's something
> you're just running a few times, seeking backwards the simplest
> approach.  If you're really trying to optimize the thing, you might
> buffer up large chunks (like 1 MB) before writing.  If you're writing
> once and reading multiple times, you might reverse the order of records
> within the chunks during the writing phase.

I am writing (forward) once and reading (backward) once.

> You're of course taking a performance bath from writing the program in
> Python to begin with (unless using scipy/numpy or the like), enough that
> it might dominate any effects of how the files are written.

I am currently using SciPy/NumPy

> Of course (it should go without saying) that you want to dump in a
> binary format rather than converting to decimal.

Yes, I am doing this (but thanks for "underlining" it!)

Thanks Paul :-)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Virgil Stokes  
View profile  
 More options Oct 23 2012, 1:32 pm
Newsgroups: comp.lang.python
From: Virgil Stokes <v...@it.uu.se>
Date: Tue, 23 Oct 2012 19:09:32 +0200
Local: Tues, Oct 23 2012 1:09 pm
Subject: Re: Fast forward-backward (write-read)
On 23-Oct-2012 18:35, Dennis Lee Bieber wrote:
> On Tue, 23 Oct 2012 16:31:17 +0200, Virgil Stokes <v...@it.uu.se>
> declaimed the following in gmane.comp.python.general:

>> Finally, to my question --- What is a fast way to write these variables to an
>> external file and then read them in backwards?

>    Stuff them into an SQLite3 database and retrieve using a descending
> sort?

Have never worked with a database; but, could be worth a try (at least to
compare I/O times).

Thanks Dennis :-)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tim Chase  
View profile  
 More options Oct 23 2012, 1:55 pm
Newsgroups: comp.lang.python
From: Tim Chase <python.l...@tim.thechases.com>
Date: Tue, 23 Oct 2012 12:56:29 -0500
Local: Tues, Oct 23 2012 1:56 pm
Subject: Re: Fast forward-backward (write-read)
On 10/23/12 12:17, Virgil Stokes wrote:

> On 23-Oct-2012 18:09, Tim Chase wrote:
>>> Finally, to my question --- What is a fast way to write these
>>> variables to an external file and then read them in
>>> backwards?
>> Am I missing something, or would the fairly-standard "tac"
>> utility do the reversal you want?  It should[*] be optimized to
>> handle on-disk files in a smart manner.
> Not sure about "tac" --- could you provide more details on this
> and/or a simple example of how it could be used for fast reversed
> "reading" of a data file?

Well, if you're reading input.txt (and assuming it's one record per
line, separated by newlines), you can just use

  tac < input.txt > backwards.txt

which will create a secondary file that is the first file in reverse
order.  Your program can then process this secondary file in-order
(which would be backwards from your source).

I might have misunderstood your difficulty, but it _sounded_ like
you just want to inverse the order of a file.

-tkc


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Virgil Stokes  
View profile  
 More options Oct 23 2012, 3:02 pm
Newsgroups: comp.lang.python
From: Virgil Stokes <v...@it.uu.se>
Date: Tue, 23 Oct 2012 20:37:04 +0200
Local: Tues, Oct 23 2012 2:37 pm
Subject: Re: Fast forward-backward (write-read)
On 23-Oct-2012 19:56, Tim Chase wrote:

Yes, I do wish to inverse the order,  but the "forward in time" file will be in
binary.

--V


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Cousin Stanley  
View profile  
 More options Oct 23 2012, 4:03 pm
Newsgroups: comp.lang.python
From: Cousin Stanley <cousinstan...@gmail.com>
Date: Tue, 23 Oct 2012 20:03:39 +0000 (UTC)
Local: Tues, Oct 23 2012 4:03 pm
Subject: Re: Fast forward-backward (write-read)

Virgil Stokes wrote:
> Not sure about "tac" --- could you provide more details on this
> and/or a simple example of how it could be used for fast reversed
> "reading" of a data file ?

  tac is available as a command under linux ....

  $ whatis tac
  tac (1) - concatenate and print files in reverse

  $ whereis tac
  tac: /usr/bin/tac /usr/bin/X11/tac /usr/share/man/man1/tac.1.gz

  $ man tac

  SYNOPSIS
    tac [OPTION]... [FILE]...

  DESCRIPTION

    Write each FILE to standard output, last line first.  

    With no FILE, or when FILE is -, read standard input.

  I only know that the  tac  command exists
  but have never used it myself ....

--
Stanley C. Kitching
Human Being
Phoenix, Arizona


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Hutto  
View profile  
 More options Oct 23 2012, 5:51 pm
Newsgroups: comp.lang.python
From: David Hutto <dwightdhu...@gmail.com>
Date: Tue, 23 Oct 2012 17:50:55 -0400
Local: Tues, Oct 23 2012 5:50 pm
Subject: Re: Fast forward-backward (write-read)

On Tue, Oct 23, 2012 at 10:31 AM, Virgil Stokes <v...@it.uu.se> wrote:
> I am working with some rather large data files (>100GB) that contain time
> series data. The data (t_k,y(t_k)), k = 0,1,...,N are stored in ASCII
> format. I perform various types of processing on these data (e.g. moving
> median, moving average, and Kalman-filter, Kalman-smoother) in a sequential
> manner and only a small number of these data need be stored in RAM when
> being processed. When performing Kalman-filtering (forward in time pass, k =
> 0,1,...,N) I need to save to an external file several variables (e.g. 11*32
> bytes) for each (t_k, y(t_k)). These are inputs to the Kalman-smoother
> (backward in time pass, k = N,N-1,...,0). Thus, I will need to input these
> variables saved to an external file from the forward pass, in reverse order
> --- from last written to first written.

> Finally, to my question --- What is a fast way to write these variables to
> an external file and then read them in backwards?

Don't forget to use timeit for an average OS utilization.

I'd suggest two list comprehensions for now, until I've reviewed it some more:

forward =  ["%i = %s" % (i,chr(i)) for i in range(33,126)]
backward = ["%i = %s" % (i,chr(i)) for i in range(126,32,-1)]

for var in forward:
        print var

for var in backward:
        print var

You could also use a dict, and iterate through a straight loop that
assigned a front and back to a dict_one =  {0 : [0.100], 1 : [1.99]}
and the iterate through the loop, and call the first or second in the
dict's var list for frontwards , or backwards calls.

But there might be faster implementations, depending on other
function's usage of certain lower level functions.

--
Best Regards,
David Hutto
CEO: http://www.hitwebdevelopment.com


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Hutto  
View profile  
 More options Oct 23 2012, 6:36 pm
Newsgroups: comp.lang.python
From: David Hutto <dwightdhu...@gmail.com>
Date: Tue, 23 Oct 2012 18:36:33 -0400
Local: Tues, Oct 23 2012 6:36 pm
Subject: Re: Fast forward-backward (write-read)

Missed the part about it being a file. Use:

forward =  ["%i = %s" % (i,chr(i)) for i in range(33,126)]
backward = ["%i = %s" % (i,chr(i)) for i in range(126,32,-1)]

print forward,backward

--
Best Regards,
David Hutto
CEO: http://www.hitwebdevelopment.com


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Hutto  
View profile  
 More options Oct 23 2012, 6:49 pm
Newsgroups: comp.lang.python
From: David Hutto <dwightdhu...@gmail.com>
Date: Tue, 23 Oct 2012 18:49:47 -0400
Local: Tues, Oct 23 2012 6:49 pm
Subject: Re: Fast forward-backward (write-read)
> Missed the part about it being a file. Use:

> forward =  ["%i = %s" % (i,chr(i)) for i in range(33,126)]
> backward = ["%i = %s" % (i,chr(i)) for i in range(126,32,-1)]

> print forward,backward

This was a dud, let me rework it real quick, I deleted what i had, and
accidentally wrote the wrong function.

> --
> Best Regards,
> David Hutto
> CEO: http://www.hitwebdevelopment.com

--
Best Regards,
David Hutto
CEO: http://www.hitwebdevelopment.com

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Steven D'Aprano  
View profile  
 More options Oct 23 2012, 6:53 pm
Newsgroups: comp.lang.python
From: Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info>
Date: 23 Oct 2012 22:53:42 GMT
Local: Tues, Oct 23 2012 6:53 pm
Subject: Re: Fast forward-backward (write-read)

On Tue, 23 Oct 2012 17:50:55 -0400, David Hutto wrote:
> On Tue, Oct 23, 2012 at 10:31 AM, Virgil Stokes <v...@it.uu.se> wrote:
>> I am working with some rather large data files (>100GB)
[...]
>> Finally, to my question --- What is a fast way to write these variables
>> to an external file and then read them in backwards?

> Don't forget to use timeit for an average OS utilization.

Given that the data files are larger than 100 gigabytes, the time
required to process each file is likely to be in hours, not microseconds.
That being the case, timeit is the wrong tool for the job, it is
optimized for timings tiny code snippets. You could use it, of course,
but the added inconvenience doesn't gain you any added accuracy.

Here's a neat context manager that makes timing long-running code simple:

http://code.activestate.com/recipes/577896

> I'd suggest two list comprehensions for now, until I've reviewed it some
> more:

I would be very surprised if the poster will be able to fit 100 gigabytes
of data into even a single list comprehension, let alone two.

This is a classic example of why the old external processing algorithms
of the 1960s and 70s will never be obsolete. No matter how much memory
you have, there will always be times when you want to process more data
than you can fit into memory.

--
Steven


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Demian Brecht  
View profile  
 More options Oct 23 2012, 6:57 pm
Newsgroups: comp.lang.python
From: Demian Brecht <demianbre...@gmail.com>
Date: Tue, 23 Oct 2012 15:57:44 -0700
Local: Tues, Oct 23 2012 6:57 pm
Subject: Re: Fast forward-backward (write-read)

> This is a classic example of why the old external processing algorithms
> of the 1960s and 70s will never be obsolete. No matter how much memory
> you have, there will always be times when you want to process more data
> than you can fit into memory.

But surely nobody will *ever* need more than 640k…

Right?

Demian Brecht
@demianbrecht
http://demianbrecht.github.com


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Hutto  
View profile  
 More options Oct 23 2012, 7:19 pm
Newsgroups: comp.lang.python
From: David Hutto <dwightdhu...@gmail.com>
Date: Tue, 23 Oct 2012 19:19:28 -0400
Local: Tues, Oct 23 2012 7:19 pm
Subject: Re: Fast forward-backward (write-read)
Whether this is fast enough, or not, I don't know:

filename = "data_file.txt"
f = open(filename, 'r')
forward =  [line.rstrip('\n') for line in f.readlines()]
backward =  [line.rstrip('\n') for line in reversed(forward)]
f.close()
print forward, "\n\n", "********************\n\n", backward, "\n"

--
Best Regards,
David Hutto
CEO: http://www.hitwebdevelopment.com


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Hutto  
View profile  
 More options Oct 23 2012, 7:34 pm
Newsgroups: comp.lang.python
From: David Hutto <dwightdhu...@gmail.com>
Date: Tue, 23 Oct 2012 19:34:15 -0400
Local: Tues, Oct 23 2012 7:34 pm
Subject: Re: Fast forward-backward (write-read)
On Tue, Oct 23, 2012 at 6:53 PM, Steven D'Aprano

It depends on the end result, and the fact that if the iterations
themselves are about the same time, then just using a segment of the
iterations could be scaled down, and a full run might be worth it, if
you have a second computer running optimization.

> Here's a neat context manager that makes timing long-running code simple:

> http://code.activestate.com/recipes/577896

I'll test this out for big O notation later. For the OP:

http://en.wikipedia.org/wiki/Big_O_notation

>> I'd suggest two list comprehensions for now, until I've reviewed it some
>> more:

> I would be very surprised if the poster will be able to fit 100 gigabytes
> of data into even a single list comprehension, let alone two.

Again, these can be scaled depending on the operations of the function
in question, and the average time of aforementioned function(s)

> This is a classic example of why the old external processing algorithms
> of the 1960s and 70s will never be obsolete. No matter how much memory
> you have, there will always be times when you want to process more data
> than you can fit into memory

This is a common misconception. You can engineer a device that
accommodates this if it's a direct experimental necessity.


--
Best Regards,
David Hutto
CEO: http://www.hitwebdevelopment.com

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
emile  
View profile  
 More options Oct 23 2012, 7:35 pm
Newsgroups: comp.lang.python
From: emile <em...@fenx.com>
Date: Tue, 23 Oct 2012 16:35:40 -0700
Local: Tues, Oct 23 2012 7:35 pm
Subject: Re: Fast forward-backward (write-read)
On 10/23/2012 04:19 PM, David Hutto wrote:

> Whether this is fast enough, or not, I don't know:

well, the OP's original post started with
   "I am working with some rather large data files (>100GB)..."

> filename = "data_file.txt"
> f = open(filename, 'r')
> forward =  [line.rstrip('\n') for line in f.readlines()]

f.readlines() will be big(!) and have overhead... and forward results in
something again as big.

> backward =  [line.rstrip('\n') for line in reversed(forward)]

and defining backward looks to me to require space to build backward and
hold reversed(forward)

So, let's see, at that point in time (building backward) you've got
probably somewhere close to 400-500Gb in memory.

My guess -- probably not so fast.  Thrashing is sure to be a factor on
all but machines I'll never have a chance to work on.

> f.close()
> print forward, "\n\n", "********************\n\n", backward, "\n"

It's good to retain context.

Emile


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Paul Rubin  
View profile  
 More options Oct 23 2012, 7:46 pm
Newsgroups: comp.lang.python
From: Paul Rubin <no.em...@nospam.invalid>
Date: Tue, 23 Oct 2012 16:46:26 -0700
Local: Tues, Oct 23 2012 7:46 pm
Subject: Re: Fast forward-backward (write-read)

Virgil Stokes <v...@it.uu.se> writes:
> Yes, I do wish to inverse the order,  but the "forward in time" file
> will be in binary.

I really think it will be simplest to just write the file in forward
order, then use mmap to read it one record at a time.  It might be
possible to squeeze out a little more performance with reordering tricks
but that's the first thing to try.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Hutto  
View profile  
 More options Oct 23 2012, 8:01 pm
Newsgroups: comp.lang.python
From: David Hutto <dwightdhu...@gmail.com>
Date: Tue, 23 Oct 2012 20:01:36 -0400
Local: Tues, Oct 23 2012 8:01 pm
Subject: Re: Fast forward-backward (write-read)

On Tue, Oct 23, 2012 at 7:35 PM, emile <em...@fenx.com> wrote:
> On 10/23/2012 04:19 PM, David Hutto wrote:

>> Whether this is fast enough, or not, I don't know:

> well, the OP's original post started with
>   "I am working with some rather large data files (>100GB)..."

Well, is this a dedicated system, and one that they have the budget to upgrade?

Data files have some sort of parsing, unless it's one huge dict, or
list, so there has to be an average size to the parse.

So big O notation should begin to refine without a full file.

>> filename = "data_file.txt"
>> f = open(filename, 'r')
>> forward =  [line.rstrip('\n') for line in f.readlines()]

> f.readlines() will be big(!) and have overhead... and forward results in
> something again as big.

Not if an average can be taken, and then refined as the actual gigs
are being iterated through.

>> backward =  [line.rstrip('\n') for line in reversed(forward)]

> and defining backward looks to me to require space to build backward and
> hold reversed(forward)

> So, let's see, at that point in time (building backward) you've got
> probably somewhere close to 400-500Gb in memory.

> My guess -- probably not so fast.  Thrashing is sure to be a factor on all
> but machines I'll never have a chance to work on.

But does the OP have access? They never stated their hardware, and
upgradable budget.

>> f.close()
>> print forward, "\n\n", "********************\n\n", backward, "\n"

> It's good to retain context.

Trying to practice good form ;).

--
Best Regards,
David Hutto
CEO: http://www.hitwebdevelopment.com


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Oscar Benjamin  
View profile  
 More options Oct 23 2012, 8:06 pm
Newsgroups: comp.lang.python
From: Oscar Benjamin <oscar.j.benja...@gmail.com>
Date: Wed, 24 Oct 2012 01:06:13 +0100
Local: Tues, Oct 23 2012 8:06 pm
Subject: Re: Fast forward-backward (write-read)
On 23 October 2012 15:31, Virgil Stokes <v...@it.uu.se> wrote:

> I am working with some rather large data files (>100GB) that contain time
> series data. The data (t_k,y(t_k)), k = 0,1,...,N are stored in ASCII
> format. I perform various types of processing on these data (e.g. moving
> median, moving average, and Kalman-filter, Kalman-smoother) in a sequential
> manner and only a small number of these data need be stored in RAM when
> being processed. When performing Kalman-filtering (forward in time pass, k =
> 0,1,...,N) I need to save to an external file several variables (e.g. 11*32
> bytes) for each (t_k, y(t_k)). These are inputs to the Kalman-smoother
> (backward in time pass, k = N,N-1,...,0). Thus, I will need to input these
> variables saved to an external file from the forward pass, in reverse order
> --- from last written to first written.

> Finally, to my question --- What is a fast way to write these variables to
> an external file and then read them in backwards?

You mentioned elsewhere that you are using numpy. I'll assume that the
data you want to read/write are numpy arrays.

Numpy arrays can be written very efficiently in binary form using
tofile/fromfile:

>>> import numpy
>>> a = numpy.array([1, 2, 5], numpy.int64)
>>> a
array([1, 2, 5])
>>> with open('data.bin', 'wb') as f:

...   a.tofile(f)
...

You can then reload the array with:

>>> with open('data.bin', 'rb') as f:

...   a2 = numpy.fromfile(f, numpy.int64)
...
>>> a2

array([1, 2, 5])

Numpy arrays can be reversed before writing or after reading using;

>>> a2
array([1, 2, 5])
>>> a2[::-1]

array([5, 2, 1])

Assuming you wrote the file forwards you can make an iterator to yield
the file in chunks backwards like so (untested):

def read_backwards(f, dtype, chunksize=1024 ** 2):
    dtype = numpy.dtype(dtype)
    nbytes = chunksize * dtype.itemsize
    f.seek(0, 2)
    fpos = f.tell()
    while fpos > nbytes:
        f.seek(fpos, 0)
        yield numpy.fromfile(f, dtype, chunksize)[::-1]
        fpos -= nbytes
    yield numpy.fromfile(f, dtype)[::-1]

Oscar


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tim Chase  
View profile  
 More options Oct 23 2012, 8:29 pm
Newsgroups: comp.lang.python
From: Tim Chase <python.l...@tim.thechases.com>
Date: Tue, 23 Oct 2012 19:30:54 -0500
Local: Tues, Oct 23 2012 8:30 pm
Subject: Re: Fast forward-backward (write-read)
On 10/23/12 13:37, Virgil Stokes wrote:

> Yes, I do wish to inverse the order,  but the "forward in time"
> file will be in binary.

Your original post said:

> The data (t_k,y(t_k)), k = 0,1,...,N are stored in ASCII format

making it hard to know what sort of data is in this file.

So I guess it would help to have some sample data to work with, even
if it's just some dummy data and a raw processing loop without doing
anything inside it.  Something like the output of either of these

  $ xxd forward_data.txt | head -50 > forward_head.txt
  $ od forward_data.txt | head -50 > forward_head.txt

plus a basic loop to show how you're extracting the values:

  for line in file("forward_head.txt"):
    data1, data2, data3 = process(line)

and how you want to reverse over them:

  for line in file("reversed.txt"):
    if same_processing_as_forward_source:
      data1, data2, data3 = process(line)
    else:
      data1, data2, data3 = other_process(line)

or do you want something more like

  for line in super_reverse_magic(file("forward_head.txt")):
    data1, data2, data3 = process(line)

?

-tkc


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Hutto  
View profile  
 More options Oct 23 2012, 10:29 pm
Newsgroups: comp.lang.python
From: David Hutto <dwightdhu...@gmail.com>
Date: Tue, 23 Oct 2012 22:29:09 -0400
Local: Tues, Oct 23 2012 10:29 pm
Subject: Re: Fast forward-backward (write-read)
On Tue, Oct 23, 2012 at 8:06 PM, Oscar Benjamin

If that is the case always timeit. The following is an example of 3
functions, with repetitions of time that give an average:

import timeit
#3 dimensional matrix
x_dim = -1
y_dim = -1
z_dim = -1
s = """\

x_dim = -1
y_dim = -1
z_dim = -1
dict_1 = {}

for i in xrange(0,6):
        x_dim = 1
        y_dim = 1
        z_dim = 1
        dict_1['%s' % (i) ] = ['x = %i' % (x_dim), 'y = %i' % (y_dim),  'z =
%i' % (z_dim)]

"""

t = """\
import numpy
numpy.array([[ 1.,  0.,  0.],
       [ 0.,  1.,  2.]])
"""

u = """\
list_count = 0
an_array = []
for i in range(0,10):

        if list_count > 3:
                break

        if i % 3 != 0:
                an_array.append(i)

        if i % 3 == 0:
                list_count += 1

"""
print timeit.timeit(stmt=s, number=100000)
print timeit.timeit(stmt=t, number=100000)
print timeit.timeit(stmt=u, number=100000)

--
Best Regards,
David Hutto
CEO: http://www.hitwebdevelopment.com


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Messages 1 - 25 of 43   Newer >
« Back to Discussions « Newer topic     Older topic »