Re: ply handling huge files

117 views

Skip to first unread message

A.T.Hofkamp

unread,

Jul 31, 2012, 3:15:21 AM7/31/12

to ply-...@googlegroups.com

On 07/20/2012 10:24 AM, gthomas wrote:
> You should mmap the file:
>
> fh = open(filename, 'rb')
> text = mmap.mmap(fh.fileno(), 0, mmap.READ_ACCESS)
>
> Then you can use "text" anywhere you can use an str, but consumes no additional
> memory!

Not entirely, you do claim a chunk of memory, so for very big files you'll still run out of it (at
32 bit systems mostly due to the low limit of a few BG).

> 2012. jďż˝lius 17., kedd 11:54:58 UTC+2 idďż˝pontban PyRate a kďż˝vetkezďż˝t ďż˝rta:
>

> tokens sometime fall between the file chunks. This lead me to add some code
> to the lex and yacc module of ply so that it loads file chunks if it reaches
> a threshold (no of bytes). Is it the way it is done normally? Is there any
> better way to do this?

The core of the problem is that ply uses RE for scanning which assumes that its input is in memory.

One better way of doing this is thus to extend RE to accept data from a file stream.

Alternatively, you can write your own scanner that loads its input from a file. I think someone also
wrote a lex-like generator that you could attach to ply as scanner. I don't remember what it assumed
as input, but it may be worth checking.

Albert

Reply all

Reply to author

Forward

0 new messages