Fail to read data from large file

14 views
Skip to first unread message

Lewis Tian

unread,
Jan 14, 2020, 2:28:16 AM1/14/20
to OSv Development
When I run pagerank on Ubuntu, the code works fine. (the graph data is stored in web-NotreDame.txt, 21M)

taseikyo@ubuntu:~/Desktop/osv/apps/my-pagerank-test$ ls
Makefile  module.py  pagerank.c  usr.manifest  web-NotreDame.txt

taseikyo@ubuntu:~/Desktop/osv/apps/my-pagerank-test$ make
cc -pie -o pagerank  pagerank.c

taseikyo@ubuntu:~/Desktop/osv/apps/my-pagerank-test$ ll -h
total 21M
drwxrwxr-x   2 taseikyo taseikyo 4.0K Jan 14 11:59 ./
drwxrwxr-x 128 taseikyo taseikyo 4.0K Jan 14 11:51 ../
-rw-rw-r--   1 taseikyo taseikyo  110 Jan 14 11:51 Makefile
-rw-rw-r--   1 taseikyo taseikyo   60 Jan 13 11:26 module.py
-rwxrwxr-x   1 taseikyo taseikyo  17K Jan 14 11:59 pagerank*
-rw-rw-r--   1 taseikyo taseikyo 5.6K Jan 14 11:51 pagerank.c
-rw-rw-r--   1 taseikyo taseikyo   85 Jan 14 11:53 usr.manifest
-rw-rw-r--   1 taseikyo taseikyo  21M Jan 14 11:54 web-NotreDame.txt

taseikyo@ubuntu:~/Desktop/osv/apps/my-pagerank-test$ ./pagerank
Graph data:

  Nodes: 325729, Edges: 1497134 


Number of iteration to converge: 52 

Final Pagerank values:
[0.002066 , 0.000181 , ...]

Time spent: 0.896491 seconds.

But when I build and run pagerank on osv, it fails to read the graph data (only read part of the graph).

taseikyo@ubuntu:~/Desktop/osv$ ./scripts/build image=my-pagerank-test
taseikyo@ubuntu:~/Desktop/osv$ ./scripts/run.py
OSv v0.54.0-71-g69a0ce39
eth0: 192.168.122.15
Booted up in 338.86 ms
Cmdline: /pagerank

Graph data:

  Nodes: 325729, Edges: 1497134 

Fail to read data...

From: 6 To: 119

Here is part of the code:

while (!feof(fp)) {
fret = fscanf(fp, "%d%d", &fromnode, &tonode);
if (fret == 0) {
printf("Fail to read data...\n");
printf("\n From: %d To: %d\n",fromnode, tonode);
return -1;
}
...
}

When I use a small graph (4 nodes, 7 edges), it runs normally.

taseikyo@ubuntu:~/Desktop/osv$ ./scripts/build image=my-pagerank-test
taseikyo@ubuntu:~/Desktop/osv$ ./scripts/run.py
OSv v0.54.0-71-g69a0ce39
eth0: 192.168.122.15
Booted up in 356.44 ms
Cmdline: /pagerank

Graph data:

  Nodes: 4, Edges: 7 


Number of iteration to converge: 41 

Final Pagerank values:

[0.159913 , 0.144016 , 0.144016 , 0.082809 ]

Time spent: 0.693802 seconds.

Is osv unable to read large files (bug?) I'll appreciate your help very much! : )

Nadav Har'El

unread,
Jan 14, 2020, 2:45:09 AM1/14/20
to Lewis Tian, OSv Development
On Tue, Jan 14, 2020 at 9:28 AM Lewis Tian <tase...@gmail.com> wrote:
When I run pagerank on Ubuntu, the code works fine. (the graph data is stored in web-NotreDame.txt, 21M)

21M isn't very large, it shouldn't present any special problems. We probably have a bug that isn't just about file size:
Maybe we have a bug in fscanf or in the stdio reading layer?
Things I'd like you to please check:
1. Print a failure already if fret < 2 - since 1 is also a failure.
2. On failure, please print ftell(fp) - our position in the file (is it the end? something in the middle?). Please also do a fgets() or a short fgetc() loop or fread() to read the next available bytes, to try to understand by fscanf() failed. Is the reading from the file failing, or is the parsing failing?


 
When I use a small graph (4 nodes, 7 edges), it runs normally.

taseikyo@ubuntu:~/Desktop/osv$ ./scripts/build image=my-pagerank-test
taseikyo@ubuntu:~/Desktop/osv$ ./scripts/run.py
OSv v0.54.0-71-g69a0ce39
eth0: 192.168.122.15
Booted up in 356.44 ms
Cmdline: /pagerank

Graph data:

  Nodes: 4, Edges: 7 


Number of iteration to converge: 41 

Final Pagerank values:

[0.159913 , 0.144016 , 0.144016 , 0.082809 ]

Time spent: 0.693802 seconds.

Is osv unable to read large files (bug?) I'll appreciate your help very much! : )

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/50f3a073-e364-484c-803e-570e0dd6530c%40googlegroups.com.

Lewis Tian

unread,
Jan 14, 2020, 3:15:42 AM1/14/20
to OSv Development
Thanks for your advice,
On failure, the output is as follows:

Graph data:

  Nodes: 325729, Edges: 1497134 

fret: 1
position: 1025
next char: 

fret is 1, fp is in the middle, fgetc or fgets cannot read anything. I think it should be the former case.

 
 
When I use a small graph (4 nodes, 7 edges), it runs normally.

taseikyo@ubuntu:~/Desktop/osv$ ./scripts/build image=my-pagerank-test
taseikyo@ubuntu:~/Desktop/osv$ ./scripts/run.py
OSv v0.54.0-71-g69a0ce39
eth0: 192.168.122.15
Booted up in 356.44 ms
Cmdline: /pagerank

Graph data:

  Nodes: 4, Edges: 7 


Number of iteration to converge: 41 

Final Pagerank values:

[0.159913 , 0.144016 , 0.144016 , 0.082809 ]

Time spent: 0.693802 seconds.

Is osv unable to read large files (bug?) I'll appreciate your help very much! : )

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv...@googlegroups.com.

Nadav Har'El

unread,
Jan 14, 2020, 3:45:27 AM1/14/20
to Lewis Tian, OSv Development
Interesting. Smells like a serious stdio bug that needs to be debugged :-(
I think it's not a coincidence that position is 1025, with stdio's BUFSIZ=1024.

Just as a completely wild guess, can you please try if the following patch to libc/internal/shgetc.c helps?

@@ -22,5 +22,6 @@
  else
  f->shend = f->rend;
  if (f->rend) f->shcnt += f->rend - f->rpos + 1;
+ if (f->rpos[-1] != c) f->rpos[-1] = c;
  return c;
 }



 
 
When I use a small graph (4 nodes, 7 edges), it runs normally.

taseikyo@ubuntu:~/Desktop/osv$ ./scripts/build image=my-pagerank-test
taseikyo@ubuntu:~/Desktop/osv$ ./scripts/run.py
OSv v0.54.0-71-g69a0ce39
eth0: 192.168.122.15
Booted up in 356.44 ms
Cmdline: /pagerank

Graph data:

  Nodes: 4, Edges: 7 


Number of iteration to converge: 41 

Final Pagerank values:

[0.159913 , 0.144016 , 0.144016 , 0.082809 ]

Time spent: 0.693802 seconds.

Is osv unable to read large files (bug?) I'll appreciate your help very much! : )

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/50f3a073-e364-484c-803e-570e0dd6530c%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/ae3d8e59-bb13-40a4-bdd0-b03ec89b9ff3%40googlegroups.com.

Lewis Tian

unread,
Jan 14, 2020, 5:41:35 AM1/14/20
to OSv Development
Yeah, the patch is helpful! Thanks! : )
 

 
 
When I use a small graph (4 nodes, 7 edges), it runs normally.

taseikyo@ubuntu:~/Desktop/osv$ ./scripts/build image=my-pagerank-test
taseikyo@ubuntu:~/Desktop/osv$ ./scripts/run.py
OSv v0.54.0-71-g69a0ce39
eth0: 192.168.122.15
Booted up in 356.44 ms
Cmdline: /pagerank

Graph data:

  Nodes: 4, Edges: 7 


Number of iteration to converge: 41 

Final Pagerank values:

[0.159913 , 0.144016 , 0.144016 , 0.082809 ]

Time spent: 0.693802 seconds.

Is osv unable to read large files (bug?) I'll appreciate your help very much! : )

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/50f3a073-e364-484c-803e-570e0dd6530c%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv...@googlegroups.com.

Nadav Har'El

unread,
Jan 14, 2020, 5:47:55 AM1/14/20
to Lewis Tian, OSv Development
On Tue, Jan 14, 2020 at 12:41 PM Lewis Tian <tase...@gmail.com> wrote:


On Tuesday, January 14, 2020 at 4:45:27 PM UTC+8, Nadav Har'El wrote:

Interesting. Smells like a serious stdio bug that needs to be debugged :-(
I think it's not a coincidence that position is 1025, with stdio's BUFSIZ=1024.

Just as a completely wild guess, can you please try if the following patch to libc/internal/shgetc.c helps?

@@ -22,5 +22,6 @@
  else
  f->shend = f->rend;
  if (f->rend) f->shcnt += f->rend - f->rpos + 1;
+ if (f->rpos[-1] != c) f->rpos[-1] = c;
  return c;
 }


Yeah, the patch is helpful! Thanks! : )

I'm pleasantly surprised :-) I'll send a more organized patch to the mailing list.
 

 
 
When I use a small graph (4 nodes, 7 edges), it runs normally.

taseikyo@ubuntu:~/Desktop/osv$ ./scripts/build image=my-pagerank-test
taseikyo@ubuntu:~/Desktop/osv$ ./scripts/run.py
OSv v0.54.0-71-g69a0ce39
eth0: 192.168.122.15
Booted up in 356.44 ms
Cmdline: /pagerank

Graph data:

  Nodes: 4, Edges: 7 


Number of iteration to converge: 41 

Final Pagerank values:

[0.159913 , 0.144016 , 0.144016 , 0.082809 ]

Time spent: 0.693802 seconds.

Is osv unable to read large files (bug?) I'll appreciate your help very much! : )

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/50f3a073-e364-484c-803e-570e0dd6530c%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/ae3d8e59-bb13-40a4-bdd0-b03ec89b9ff3%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/9e83516e-d25f-4409-a15d-a0a59871978d%40googlegroups.com.

Dor Laor

unread,
Jan 14, 2020, 12:20:24 PM1/14/20
to Nadav Har'El, Lewis Tian, OSv Development
A classical good OSS reporter and a good (!) committer
> To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/CANEVyjva7W44mktoixKtdprxbEyXNt1qg7%3D%3D6qNatcH3xAGAfA%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages