Why is buffer size not always an integer multiple of 4096 when reading file line by line?

mingLiu

unread,

Jul 6, 2014, 8:57:25 AM7/6/14

to golan...@googlegroups.com

Hi everyone,

I'm a newbie Go programmer.Please correct me if i mistake something.

The sample code is,

// test.go

package main

import (

"bufio"

"os"

)

func main() {

if len(os.Args) != 2 {

println("Usage:", os.Args[0], "")

os.Exit(1)

}

fileName := os.Args[1]

fp, err := os.Open(fileName)

if err != nil {

println(err.Error())

os.Exit(2)

}

defer fp.Close()

r := bufio.NewScanner(fp)

var lines []string

or r.Scan() {

lines = append(lines, r.Text())

}

c:\\>go build test.go

c:\\>test.exe test.txt

Then I monitored its process using process monitor when executing it, part of the output is:

test.exe ReadFile SUCCESS Offset: 4,692,375, Length: 8,056

test.exe ReadFile SUCCESS Offset: 4,700,431, Length: 7,198

test.exe ReadFile SUCCESS Offset: 4,707,629, Length: 8,134

test.exe ReadFile SUCCESS Offset: 4,715,763, Length: 7,361

test.exe ReadFile SUCCESS Offset: 4,723,124, Length: 8,056

test.exe ReadFile SUCCESS Offset: 4,731,180, Length: 4,322

test.exe ReadFile END OF FILE Offset: 4,735,502, Length: 8,192

The equivalent java code is,

//Test.java

import java.io.BufferedReader;

import java.io.FileInputStream;

import java.io.InputStreamReader;

public class Test{

public static void main(String[] args) {

try

{

FileInputStream in = new FileInputStream("test.txt");

BufferedReader br = new BufferedReader(new InputStreamReader(in));

String strLine;

while((strLine = br.readLine())!= null)

{

;

}

}catch(Exception e){

System.out.println(e);

}

c:\\>javac Test.java

c:\\>java Test

Then part of the monitoring output is:

java.exe ReadFile SUCCESS Offset: 4,694,016, Length: 8,192

java.exe ReadFile SUCCESS Offset: 4,702,208, Length: 8,192

java.exe ReadFile SUCCESS Offset: 4,710,400, Length: 8,192

java.exe ReadFile SUCCESS Offset: 4,718,592, Length: 8,192

java.exe ReadFile SUCCESS Offset: 4,726,784, Length: 8,192

java.exe ReadFile SUCCESS Offset: 4,734,976, Length: 526

java.exe ReadFile END OF FILE Offset: 4,735,502, Length: 8,192

As you see, the buffer size in java is 8192 and it read 8192 bytes each time.Why is the Length in Go changing during each time reading file?

I have tried 'bufio.ReadString('\n')','bufio.ReadBytes('\n')' and both of them have the same problem.

Benjamin Measures

unread,

Jul 6, 2014, 8:47:35 PM7/6/14

to golan...@googlegroups.com

On Sunday, 6 July 2014 13:57:25 UTC+1, mingLiu wrote:

As you see, the buffer size in java is 8192 and it read 8192 bytes each time.Why is the Length in Go changing during each time reading file?

Java's BufferedReader reads (up to buffer size) when it's empty, whilst Go's bufio reads (up to max buffer size) when it needs more.

Where a line spans reads, Java's BufferedReader reads it out (temporary), whilst Go's bufio reads more. Go's bufio has very little (mostly 0) allocation and doesn't suffer from unbounded lines.

I have tried 'bufio.ReadString('\n')','bufio.ReadBytes('\n')' and both of them have the same problem.

Why is this a problem?

ming...@gmail.com

unread,

Jul 6, 2014, 10:56:09 PM7/6/14

to golan...@googlegroups.com

Thanks for your explanation.

在 2014年7月7日星期一UTC+8上午8时47分35秒，Benjamin Measures写道：

I have post the question to stackoverflow yesterday and received many good replies.FYI, https://stackoverflow.com/questions/24597157/why-isnt-buffer-size-always-an-integer-multiple-of-4096-when-reading-file-line

My concern is performance. System page size is 4096 so maybe returning a multiple of 4096 will get better performance,right?

Benjamin Measures

unread,

Jul 7, 2014, 7:59:57 PM7/7/14

to golan...@googlegroups.com

On Monday, 7 July 2014 03:56:09 UTC+1, Ming Liu wrote:

My concern is performance. System page size is 4096 so maybe returning a multiple of 4096 will get better performance,right?

No, since (at least, Intel) processors can't copy memory in units of 4KB.

Besides, any memcpy that fits in L2 cache would be measured in the tens of GBps. Since you're reading files, this all sounds like premature optimisation to me. Have you tried benchmarking and found something lacking?

Reply all

Reply to author

Forward