TwitterLarge Maxst running time....

5 views
Skip to first unread message

Shen Victor

unread,
May 4, 2011, 10:59:16 PM5/4/11
to cs2110-sp11
I am testing my maxst method with the twitterlarge.txt .... but it
runs for ever ....i let it run for more that 25 mins but it didn't
end ... anyone else with the same problem?

Steve Spagnola

unread,
May 6, 2011, 9:54:50 AM5/6/11
to cs2110-sp11
Hi Shen,

Depending on your choice of data structures and lookup methods, you
may be doing this inefficiently. It should not take 25 mins to load.
I would suggest debugging how your system is loading into memory
either via the debugger or using printLn statements.

Good luck,
Steve

Robert Escriva

unread,
May 6, 2011, 10:01:28 AM5/6/11
to cornell-c...@googlegroups.com

Also,

Check to make sure you aren't swapping out to disk. On lower memory
systems, you'll see the pagefile/swapfile usage increase, indicating
that your memory accesses are going to disk. This can also happen if
you have many programs using memory. Disk is slow. Using disk in a
program makes your program slow.

Scanner is also not as efficient as the other input methods.

-Robert

Alex

unread,
May 7, 2011, 11:00:35 AM5/7/11
to cs2110-sp11
Hi Robert,

Because we are not going to be tested on our parser, could you tell us
a more efficient input method? I am running out of heap space quite
fast and parsing is very slow using a scanner and a filereader.

Alex

Nikos Karampatziakis

unread,
May 7, 2011, 2:00:13 PM5/7/11
to cornell-c...@googlegroups.com
On Sat, May 7, 2011 at 11:00 AM, Alex <acs...@cornell.edu> wrote:
> Hi Robert,
>
> Because we are not going to be tested on our parser, could you tell us
> a more efficient input method? I am running out of heap space quite
> fast and parsing is very slow using a scanner and a filereader.
>
> Alex
>

- Use an initially large heap size. As your program reaches the
heap limit it will become slow because it needs to find blocks of
memory that are no longer in use and recycle them.
- Related to the above is to make sure that you store only what's
absolutely necessary. I can comfortably load twitterLarge using -Xmx512m.
- Use top (mac, linux) or the windows task manager while your program
is running to see how fast memory is being used and if you quickly
reach figures close to the heap limit which will make your program
slow down. Also you can periodically print your progress in the file
(number of lines you have parsed).
- Read the file using a bufferedReader with a large buffer. you can
specify the size of the buffer as a second argument in the constructor
of bufferedreader. I use 8192 = 8 kilobytes.
- Do not use scanner. Instead, split each line by yourself to three
strings (i use the indexOf lastIndexOf and substring methods; there's
also a method called split) and use Integer.ParseInt()

-Nikos

Reply all
Reply to author
Forward
0 new messages