ioutil.ReadFile using until 2GB

1,123 views
Skip to first unread message

Joan Miller

unread,
Apr 4, 2010, 5:41:54 PM4/4/10
to golang-nuts
ioutil.ReadFile to can preallocate buffers of until 2GB (2e9) --using
the file size to read--.

Is that value correct? because i.e. my system has that RAM size.

Conrad Meyer

unread,
Apr 4, 2010, 5:46:45 PM4/4/10
to golan...@googlegroups.com
On Sunday 04 April 2010 02:41:54 pm Joan Miller wrote:
> ioutil.ReadFile to can preallocate buffers of until 2GB (2e9) --using
> the file size to read--.

It only preallocates up to 2G, but it can read larger files into memory.



> Is that value correct? because i.e. my system has that RAM size.

Maybe you mean e.g., not i.e.? Yes, that value is correct. Why is the size of
your RAM relevant?

Regards,
--
Conrad Meyer <cem...@u.washington.edu>

Joan Miller

unread,
Apr 4, 2010, 5:53:02 PM4/4/10
to golang-nuts

On 4 abr, 21:46, Conrad Meyer <ceme...@u.washington.edu> wrote:
> On Sunday 04 April 2010 02:41:54 pm Joan Miller wrote:
>
> > ioutil.ReadFile to can preallocate buffers of until 2GB (2e9) --using
> > the file size to read--.
>
> It only preallocates up to 2G, but it can read larger files into memory.
>
> > Is that value correct? because i.e. my system has that RAM size.
>
> Maybe you mean e.g., not i.e.? Yes, that value is correct. Why is the size of
> your RAM relevant?

Because it's going to be created a new buffer with that size, and if
it uses a buffer of 2GB, it will takes all RAM memory of whatever
desktop system.

Conrad Meyer

unread,
Apr 4, 2010, 6:12:08 PM4/4/10
to golan...@googlegroups.com

Yes. That's kind of the point of ReadFile() -- it reads a file, from disk,
into RAM. It doesn't pre-allocate 2GB for all files, if that's what you're
asking -- only files that are larger than 2GB. ReadFile allocates exactly the
size of files smaller than 2GB.

If you don't want the whole file in memory at once, read through the file in
chunks. bufio.Buffer may be helpful, too.

Joan Miller

unread,
Apr 4, 2010, 6:27:20 PM4/4/10
to golang-nuts

On 4 abr, 22:12, Conrad Meyer <ceme...@u.washington.edu> wrote:
> On Sunday 04 April 2010 02:53:02 pm Joan Miller wrote:
>
> > On 4 abr, 21:46, Conrad Meyer <ceme...@u.washington.edu> wrote:
> > > On Sunday 04 April 2010 02:41:54 pm Joan Miller wrote:
> > > > ioutil.ReadFile to can preallocate buffers of until 2GB (2e9) --using
> > > > the file size to read--.
>
> > > It only preallocates up to 2G, but it can read larger files into memory.
>
> > > > Is that value correct? because i.e. my system has that RAM size.
>
> > > Maybe you mean e.g., not i.e.? Yes, that value is correct. Why is the
> > > size of your RAM relevant?
>
> > Because it's going to be created a new buffer with that size, and if
> > it uses a buffer of 2GB, it will takes all RAM memory of whatever
> > desktop system.

I mean that if it *reads a file* of 2GB (or almost) then it will take
all that memory

> Yes. That's kind of the point of ReadFile() -- it reads a file, from disk,
> into RAM. It doesn't pre-allocate 2GB for all files, if that's what you're
> asking -- only files that are larger than 2GB. ReadFile allocates exactly the
> size of files smaller than 2GB.

My view point is that 2GB is a size too big because if you read a file
near to that size then you system will crash.

Would not better to have a size more moderate to avoid this possible
problem?

Conrad Meyer

unread,
Apr 4, 2010, 6:56:26 PM4/4/10
to golan...@googlegroups.com
On Sunday 04 April 2010 03:27:20 pm Joan Miller wrote:
> On 4 abr, 22:12, Conrad Meyer <ceme...@u.washington.edu> wrote:
> > ...

>
> I mean that if it *reads a file* of 2GB (or almost) then it will take
> all that memory
>
> > Yes. That's kind of the point of ReadFile() -- it reads a file, from
> > disk, into RAM. It doesn't pre-allocate 2GB for all files, if that's
> > what you're asking -- only files that are larger than 2GB. ReadFile
> > allocates exactly the size of files smaller than 2GB.
>
> My view point is that 2GB is a size too big because if you read a file
> near to that size then you system will crash.
>
> Would not better to have a size more moderate to avoid this possible
> problem?

ReadFile() is a convenience function. If you don't want to use 2GB of memory,
don't call ReadFile() on a 2GB file.

No, it would not be better to have a smaller size cutoff. Even if the cut-off
for ReadFile was lower (say, 256MB), that's just the space that is pre-
allocated. Actually reading the file into the buffer will use as much memory
as the file does on disk. ReadFile() is for putting a file into RAM. If you
call ReadFile() on a file larger than your RAM, that's your fault.

Joan Miller

unread,
Apr 4, 2010, 7:21:53 PM4/4/10
to golang-nuts

Ok.

anyway, the code puts:

if err != nil && dir.Size < 2e9 { ... }

Would not it be *if err == nil && ... * ? because I'm supposed that if
there is an error then `dir.Size` will not give that information.

http://golang.org/src/pkg/io/ioutil/ioutil.go#L24

Giles Lean

unread,
Apr 4, 2010, 9:18:49 PM4/4/10
to Joan Miller, golang-nuts

Joan Miller <pelo...@gmail.com> wrote:

> My view point is that 2GB is a size too big because if you
> read a file near to that size then you system will crash.

If my system crashes when I allocate 2GB of memory, it's time
for a new operating system. Even Linux with its evil memory
overcommit feature turned on just kills all your important
processes; it doesn't crash.

I can allocate 4GB on my notebook (which has only 4GB of
physical memory) and while I think OS X's peformance under
memory pressure, um, "could be improved" it stayed up. There
was far too much spinning beach ball activity, but I suspect
Steve Job's usage patterns are different to mine or that would
be corrected.

2GB files aren't large anymore. For enterprise systems,
they're boring. For personal systems, pretty much the same
the way people move around video files and DVD images.

Personally, I'd probably make the limit a percentage of total
system memory: 2GB is ridiculously small on a 256GB system
(and I've used such); 2GB is way too much on an old P-III with
512MB, still too much on a 1GB netbook, and touch and go on a
2GB or 3GB 32 bit system someone bought based on a TV ad.

Regards,

Giles

P.S. Getting off topic so it's in the postscript: I was
debugging sort(1) one fine day (in a non-English locale,
please not golang-dev team -- we will need locale aware
collation) and wrote a quick and dirty alternate program so
that I could tell when the program I was debugging was fixed.

The sort(1) implementation I was debugging was still tuned
for a VAX 11/780 or something and was using disk files and
merging and whatnot; took many minutes to sort. Blech.

Fo my quick and dirty program I just mmap()'d the file into
memory and ran qsort() on it after setting up some keys, using
whatever qsort() came in libc, and got the time down to 20s.
(That's 20s in a non-ASCII locale, too.) At least one order
of magnitude speed increase, and I was using ~15% of the
physical memory of the "small, old" (for the time) system I
had, which my notebook would now run rings around.

If there's a lesson here, it's that file size is relative to
both installed physical memory and free memory. I don't like
seeing fixed limits in the 2GB range: the number of times it
will be the right number is probably outweighed by the times
it will be the wrong number, and allocating it will take long
enough that a few CPU cycles deciding what amount to allocate
won't be missed.

All IMHO as ever, and likely to be ignored as commonly but
not always happens.

Conrad Meyer

unread,
Apr 4, 2010, 11:27:52 PM4/4/10
to golan...@googlegroups.com
On Sunday 04 April 2010 06:18:49 pm Giles Lean wrote:
> Joan Miller <pelo...@gmail.com> wrote:
> > My view point is that 2GB is a size too big because if you
> > read a file near to that size then you system will crash.
>
> Personally, I'd probably make the limit a percentage of total
> system memory: 2GB is ridiculously small on a 256GB system
> (and I've used such); 2GB is way too much on an old P-III with
> 512MB, still too much on a 1GB netbook, and touch and go on a
> 2GB or 3GB 32 bit system someone bought based on a TV ad.

Hi Giles,

It's not a limit. It only limits how large the *preallocated* buffer will be
based on the result of calling Stat(). Reading the file into the buffer past
the allocated size causes the buffer to reallocate enough memory to eventually
hold the file. It works equally well on systems with 32MiB of RAM and systems
with 256GiB of RAM, with both files that can and files that cannot fit into
non-virtual memory.

Best regards,
--
Conrad Meyer <cem...@u.washington.edu>

peterGo

unread,
Apr 5, 2010, 12:49:26 AM4/5/10
to golang-nuts
Conrad,

// [ioutil] ReadFile reads the file named by filename and returns the
contents.
func ReadFile(filename string) ([]byte, os.Error) { ... }

The built-in functions len and cap take arguments of various types and
return a result of type int. The implementation guarantees that the
result always fits into an int.
http://golang.org/doc/go_spec.html#Length_and_capacity

To fit the entire file into []byte, a slice of a byte array, the file
size has to be less than or equal to the capacity (type int) of the
byte slice. Type int is implementation specific and is either 32 or 64
bits; it's 32 bits for all current implementations e.g. 6g and 8g.
unsafe.Sizeof(int(0)) returns 4 bytes.

Peter

On Apr 4, 11:27 pm, Conrad Meyer <ceme...@u.washington.edu> wrote:
> On Sunday 04 April 2010 06:18:49 pm Giles Lean wrote:
>

> Conrad Meyer <ceme...@u.washington.edu>

Conrad Meyer

unread,
Apr 5, 2010, 1:18:07 AM4/5/10
to golan...@googlegroups.com
On Sunday 04 April 2010 09:49:26 pm peterGo wrote:
> Conrad,
>
> // [ioutil] ReadFile reads the file named by filename and returns the
> contents.
> func ReadFile(filename string) ([]byte, os.Error) { ... }
>
> The built-in functions len and cap take arguments of various types and
> return a result of type int. The implementation guarantees that the
> result always fits into an int.
> http://golang.org/doc/go_spec.html#Length_and_capacity
>
> To fit the entire file into []byte, a slice of a byte array, the file
> size has to be less than or equal to the capacity (type int) of the
> byte slice. Type int is implementation specific and is either 32 or 64
> bits; it's 32 bits for all current implementations e.g. 6g and 8g.
> unsafe.Sizeof(int(0)) returns 4 bytes.

Hi Peter,

I guess I forgot to take into account the limitations of len() and cap(),
sorry :).

However, the implementation of ReadFile() actually reads the file into a
bytes.Buffer, and then calls Bytes() on the buffer. bytes.Buffer could have an
implementation that allowed more than 2GiB of data, even if Bytes() couldn't
return all of it as a byte[]. ;)

Giles Lean

unread,
Apr 5, 2010, 3:06:49 AM4/5/10
to Conrad Meyer, golan...@googlegroups.com

Conrad Meyer <cem...@u.washington.edu> wrote:

> It's not a limit. It only limits how large the *preallocated* buffer

> ...

Conrad,

Thanks for clarifying.

Yeah, I should have read the code. Still, there will be times when that
preallocation is too large, but the Stat() call should hardly ever fail.

I should shut up and take a holiday; it's not my day.

(Anyone know a virtualisation solution that *works* on OS X? I've had to
give on VirtualBox after it blew up spectactlarly again; this stalls my
testing of CLs ... if they weren't already stalled by the filed ones not
being reviewed by the golang-dev team anymore. Yeah, it's Easter, and
they're busy with the panic()/defer() feature. Suggestions of Parallels
will be laughed at (it's worse for the *BSDs) and VMWare (any version)
does not claim support for NetBSD. Time to buy some cheap PCs? At
least winter's coming on.)

Again, I should shut up, disconect the Internet connection and take a
holiday.

Sorry for the noise,

Giles

Joan Miller

unread,
Apr 5, 2010, 3:47:36 AM4/5/10
to golang-nuts

On 5 abr, 01:18, Giles Lean <giles.l...@pobox.com> wrote:


> Joan Miller <pelok...@gmail.com> wrote:
> > My view point is that 2GB is a size too big because if you
> > read a file near to that size then you system will crash.
>
> If my system crashes when I allocate 2GB of memory, it's time
> for a new operating system.  Even Linux with its evil memory
> overcommit feature turned on just kills all your important
> processes; it doesn't crash.
>
> I can allocate 4GB on my notebook (which has only 4GB of
> physical memory) and while I think OS X's peformance under
> memory pressure, um, "could be improved" it stayed up.  There
> was far too much spinning beach ball activity, but I suspect
> Steve Job's usage patterns are different to mine or that would
> be corrected.
>
> 2GB files aren't large anymore.  For enterprise systems,
> they're boring.  For personal systems, pretty much the same
> the way people move around video files and DVD images.
>
> Personally, I'd probably make the limit a percentage of total
> system memory: 2GB is ridiculously small on a 256GB system
> (and I've used such); 2GB is way too much on an old P-III with
> 512MB, still too much on a 1GB netbook, and touch and go on a
> 2GB or 3GB 32 bit system someone bought based on a TV ad.

In linux is very easy to inspect information about memory through /
proc.

cat /proc/meminfo

The next values would help to calculate the size to pre-allocate
instead of use a fixed size:

MemFree
Buffers
Cached

peterGo

unread,
Apr 5, 2010, 5:17:08 AM4/5/10
to golang-nuts
Conrad,

I did consider that possibility; I did look into the implementation of
all the functions. bytes.Buffer uses []byte too, plus the offset is
also int.

/ A [bytes] Buffer is a variable-sized buffer of bytes with Read and
Write methods.
// The zero value for Buffer is an empty buffer ready to use.
type Buffer struct {
buf []byte // contents are the bytes buf[off :
len(buf)]
off int // read at &buf[off], write at
&buf[len(buf)]
runeBytes [utf8.UTFMax]byte // avoid allocation of slice on each
WriteByte or Rune
bootstrap [64]byte // memory to hold first slice; helps
small buffers (Printf) avoid allocation.
}

Peter

> Conrad Meyer <ceme...@u.washington.edu>

Norman Yarvin

unread,
Apr 5, 2010, 4:04:25 PM4/5/10
to peterGo, golang-nuts
On Sun, Apr 04, 2010 at 09:49:26PM -0700, peterGo wrote:

>The built-in functions len and cap take arguments of various types and
>return a result of type int. The implementation guarantees that the
>result always fits into an int.
>http://golang.org/doc/go_spec.html#Length_and_capacity
>
>To fit the entire file into []byte, a slice of a byte array, the file
>size has to be less than or equal to the capacity (type int) of the
>byte slice. Type int is implementation specific and is either 32 or 64
>bits; it's 32 bits for all current implementations e.g. 6g and 8g.
>unsafe.Sizeof(int(0)) returns 4 bytes.


So how are people supposed to deal with arrays that might be larger than
2GB? Are there going to be a "len64" and "cap64" added to the language?

To index arrays that might exceed that limit, it'd also be desirable to
have an integer type that would be 32 bits on a 32-bit machine, and 64
bits on a 64-bit machine, since int64 is overkill (and rather slow) on
32-bit machines. I don't see any way of specifying such an integer.

--
Norman Yarvin http://yarchive.net

Ken Thompson

unread,
Apr 5, 2010, 5:58:24 PM4/5/10
to Norman Yarvin, peterGo, golang-nuts
note that since len and cap are in elements, not bytes,
so the memory size restriction is larger than 2G.

> --
> To unsubscribe, reply using "remove me" as the subject.
>

peterGo

unread,
Apr 5, 2010, 8:07:42 PM4/5/10
to golang-nuts
Ken,

> note that since len and cap are in elements, not bytes,
> so the memory size restriction is larger than 2G.

That's certainly what you would expect, but it doesn't seem to be true
in practice.

I verified the len and cap size restrictions, based on the size of
int, by reading the Go language source code. For the memory size
restrictions, I conducted an experiment.

If I increase the make cap elements to ask for more than (1<<31-1),
2GB, of memory, the program halts with the message: mmap: errno=0x16.

package main

import (
"fmt"
"unsafe"
)

func main() {
// if larger values are used for make cap elements then program
halts.
// mmap: errno=0x16
{
s := make([]complex128, 1<<27-1)
fmt.Println(cap(s)*unsafe.Sizeof(complex128(0+0i)), len(s), cap(s))
}
{
s := make([]int64, 1<<28-1)
fmt.Println(cap(s)*unsafe.Sizeof(int64(0)), len(s), cap(s))
}
{
s := make([]int32, 1<<29-1)
fmt.Println(cap(s)*unsafe.Sizeof(int32(0)), len(s), cap(s))
}
{
s := make([]byte, 1<<31-1)
fmt.Println(cap(s)*unsafe.Sizeof(byte(0)), len(s), cap(s))
}
}

Peter

Russ Cox

unread,
Apr 5, 2010, 8:10:56 PM4/5/10
to peterGo, golang-nuts
> If I increase the make cap elements to ask for more than (1<<31-1),
> 2GB, of memory, the program halts with the message: mmap: errno=0x16.

Whether your operating system wants to allocate
a chunk of memory > 2GB of memory is something
Go has no control over.

And there may be bugs in the runtime or in the
memory allocator too. Dealing with that much
memory is not something we've really pounded on.

But we were discussing the language, not the bugs
in one implementation of it. The language allows
slices > 2GB easily, as long as they are not slices of byte.

Russ

peterGo

unread,
Apr 5, 2010, 8:13:58 PM4/5/10
to golang-nuts
Note: Intel Q8300, Ubuntu 9.10, Go amd64, 6g at tip,

Norman Yarvin

unread,
Apr 5, 2010, 11:54:56 PM4/5/10
to Ken Thompson, peterGo, golang-nuts
On Mon, Apr 05, 2010 at 02:58:24PM -0700, Ken Thompson wrote:
>note that since len and cap are in elements, not bytes,
>so the memory size restriction is larger than 2G.

Sorry; yes, that was apparent from the definition, but I neglected to
word things accordingly. Even 8GB or 16GB still seems unduly limiting,
though, especially with regard to future machines that might have
considerably larger memories. Also, there are some large data items,
such as uncompressed video clips, which are composed of 8-bit data;
accessing them byte-wise would give them a limit of 2GB.

Those limits would go away if int were changed to be 64 bits on 64-bit
machines. That seems like it'd be easy enough to do at the moment, but
will become harder once the language is more widely adopted and libraries
start being distributed in binary form.

The question as to why int is 32 bits on 64-bit machines is listed as
being frequently asked:

http://golang.org/doc/go_programming_faq.html#64bit_machine_32bit_int

but the answer there doesn't give any reasons; it just repeats that it's
so, and tells people to use int64 if they need it. But for array
lengths, they don't have that choice.

Corey Thomasson

unread,
Apr 6, 2010, 7:07:15 AM4/6/10
to Norman Yarvin, golang-nuts
If I understand correctly, the language spec doesn't garuantee int as
32 bits, it just happens to be implemented as such right now. I may
have misread though

Mark Plotnick

unread,
Apr 6, 2010, 12:48:54 AM4/6/10
to golang-nuts
Looks like the mmap syscall interface is only passing through the
low-order 32 bits of the size argument. SysAlloc(0x100000000) results
in a call to mmap(NULL, 0, ...), according to strace.

In pkg/runtime/linux/amd64/sys.s, in ·mmap, maybe MOVL 16(SP), SI
should be MOVQ instead?

Ian Lance Taylor

unread,
Apr 6, 2010, 10:56:55 AM4/6/10
to Mark Plotnick, golang-nuts
Mark Plotnick <mark.p...@gmail.com> writes:

> Looks like the mmap syscall interface is only passing through the
> low-order 32 bits of the size argument. SysAlloc(0x100000000) results
> in a call to mmap(NULL, 0, ...), according to strace.
>
> In pkg/runtime/linux/amd64/sys.s, in ·mmap, maybe MOVL 16(SP), SI
> should be MOVQ instead?

Thanks--looks like Russ just fixed that.

Ian

Reply all
Reply to author
Forward
0 new messages