Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Try to read a file faster

1 view
Skip to first unread message

aquafresh3

unread,
Sep 29, 2004, 4:23:18 AM9/29/04
to
Hello,

I have to analyse the content of file to find some specific words in
it.
First I was using the bufferedReader.readline() method to do read my
file but sometimes my file could be composed by only one line with a
size of 30Mb (e.g rar file) so it resulted in a outOfMemory error.
So I decided to use another method which is to read a line with a
maximum of characters read set to 204800 per line.

My problem is when I read the file it takes several minutes (about 30
minutes for a file that is 1.68Mb).

Here is the method I use to read the file

/**
* This method is used to read a line with a maximum of characters
read set to 204800 per line
* @param br The BufferedReader on which read
* @return finalLine The String representing the line read
* @throws IOException
*/
private String readLineWithMaxSize(BufferedReader br) throws
IOException {
String finalLine = null;
int readCharacter = -1;
char[] lineChars = new char[204800];
boolean bufferFull = false;
if (br != null) {
int index = 0;
readCharacter = br.read();
// If the read character does not correspond to a new line
or to
// an end of file, we treat it.
while (readCharacter != -1 && readCharacter != '\r' &&
readCharacter != '\n') {
// if the buffer is not full, we add the character to
the array of characters
if (!bufferFull) {
lineChars[index] = (char) readCharacter;
index++;
bufferFull = index >= lineChars.length;
}
readCharacter = br.read();
}
// If the read character is \r and the next one is \n, we
skip it.
if (readCharacter == '\r') {
br.mark(2);
int nextReadCharacter = br.read();
if (nextReadCharacter != '\n') {
br.reset();
}
}
// We construct a string representing the line from the
buffer of
// characters read
if (index != 0) {
finalLine = new String(lineChars);
} else if (readCharacter == '\r' || readCharacter == '\n')
{
finalLine = "";
}
}
return finalLine;
}

Is there a better solution/method do read a big file faster ?

Thanks in advance for your kind assistance

Shanmuhanathan T

unread,
Sep 29, 2004, 4:35:11 AM9/29/04
to
on 9/29/2004 1:53 PM aquafresh3 Wrote:
> Hello,
>
> I have to analyse the content of file to find some specific words in
> it.
> First I was using the bufferedReader.readline() method to do read my
> file but sometimes my file could be composed by only one line with a
> size of 30Mb (e.g rar file) so it resulted in a outOfMemory error.
> So I decided to use another method which is to read a line with a
> maximum of characters read set to 204800 per line.
>
> My problem is when I read the file it takes several minutes (about 30
> minutes for a file that is 1.68Mb).
>
> Here is the method I use to read the file
SNIP

> Is there a better solution/method do read a big file faster ?
>
> Thanks in advance for your kind assistance

Hi,
If you are using JDK version >= 1.4, you could try with the
Native IO Methods (NIO).

Regards,
--
Shanmu.

Skip

unread,
Sep 29, 2004, 4:42:07 AM9/29/04
to
> First I was using the bufferedReader.readline() method to do read my
> file but sometimes my file could be composed by only one line with a
> size of 30Mb (e.g rar file) so it resulted in a outOfMemory error.
> So I decided to use another method which is to read a line with a
> maximum of characters read set to 204800 per line.

Pleas do NOT read/write binary data with Readers/Writers. They are for text
only.
You could look at java.nio* in Java 1.4, yes, but java.io.* could be enough
in this case. Take a look at {Buffered | File}Input/OutputStreams.


Michael Borgwardt

unread,
Sep 29, 2004, 4:54:16 AM9/29/04
to
aquafresh3 wrote:
> I have to analyse the content of file to find some specific words in
> it.
> First I was using the bufferedReader.readline() method to do read my
> file but sometimes my file could be composed by only one line with a
> size of 30Mb (e.g rar file) so it resulted in a outOfMemory error.
> So I decided to use another method which is to read a line with a
> maximum of characters read set to 204800 per line.
>
> My problem is when I read the file it takes several minutes (about 30
> minutes for a file that is 1.68Mb).
>
> Here is the method I use to read the file

Your method wastes a lot of time clinging to the illusion of
working with lines, which achieves nothing. It reads one character
at a time and doeas a lot of absolutely pointless work for each of
them. In the end, your size limit means that you are NOT working
with lines, so why pretend to?

Forget about lines. Forget about BufferedReader. Just use a FileReader
(or, if the encoding is an issue, an InputStreamReader wrapping a
FileInputStream). Use its read(char[]) method with a reasonably sized
char[] array acting as buffer (don't forget to look at the method's
return value - the buffern is not necessarily filled), and take
care that your word search won't miss words that span the boundaries
between two calls to read().

Michael Borgwardt

unread,
Sep 29, 2004, 5:00:26 AM9/29/04
to
Skip wrote:

>>First I was using the bufferedReader.readline() method to do read my
>>file but sometimes my file could be composed by only one line with a
>>size of 30Mb (e.g rar file) so it resulted in a outOfMemory error.
>>So I decided to use another method which is to read a line with a
>>maximum of characters read set to 204800 per line.
>
>
> Pleas do NOT read/write binary data with Readers/Writers.

He's reading the data to look for "words" in it, so it can't be binary,
at least not all of it.

bugbear

unread,
Sep 29, 2004, 5:21:29 AM9/29/04
to
Michael Borgwardt wrote:

>
> Your method wastes a lot of time clinging to the illusion of
> working with lines, which achieves nothing. It reads one character
> at a time and doeas a lot of absolutely pointless work for each of
> them. In the end, your size limit means that you are NOT working
> with lines, so why pretend to?
>
> Forget about lines. Forget about BufferedReader. Just use a FileReader
> (or, if the encoding is an issue, an InputStreamReader wrapping a
> FileInputStream). Use its read(char[]) method with a reasonably sized
> char[] array acting as buffer (don't forget to look at the method's
> return value - the buffern is not necessarily filled), and take
> care that your word search won't miss words that span the boundaries
> between two calls to read().

Agreed; to achive this, process all the words from the
buffer until a words ends at the end of the buffer.

This word *may* be partial.

Copy the partial word "tail" down to the base of your buffer,
and read some more. (note that read will read at an offset in the
buffer).

Repeat.

If you can do all this in 1 allocated buffer you'll also
do less new byte[] operations, which can't hurt.

BugBear

JScoobyCed

unread,
Sep 29, 2004, 6:46:16 AM9/29/04
to
Michael Borgwardt wrote:

Well... not sure: he says he is reading a .rar file.

--
JScoobyCed
What about a JScooby snack Shaggy ? ... Shaggy ?!

Sudsy

unread,
Sep 29, 2004, 8:49:47 AM9/29/04
to
Shanmuhanathan T wrote:
<snip>

> If you are using JDK version >= 1.4, you could try with the
> Native IO Methods (NIO).

NIO stands for New I/O, not Native I/O. Documentation can be found here:
<http://java.sun.com/j2se/1.4.2/docs/guide/nio/>

Shanmuhanathan T

unread,
Sep 29, 2004, 9:21:08 AM9/29/04
to
Thanks Sudsy.
Must have been confused when I posted that.
Regards,
--
Shanmu.

Steve Horsley

unread,
Sep 29, 2004, 9:26:50 AM9/29/04
to
aquafresh3 wrote:
> Hello,
>
> I have to analyse the content of file to find some specific words in
> it.
> First I was using the bufferedReader.readline() method to do read my
> file but sometimes my file could be composed by only one line with a
> size of 30Mb (e.g rar file) so it resulted in a outOfMemory error.
> So I decided to use another method which is to read a line with a
> maximum of characters read set to 204800 per line.
>
> My problem is when I read the file it takes several minutes (about 30
> minutes for a file that is 1.68Mb).
>

For a start, try working in bytes rather than text. This removes
a lot of characterset conversion work, and discourages creation of
lots of String objects.

Use String.getBytes(chractersetname) to retrieve the byte sequences
you are searching for.

Use a bufferedInputStream to read the bytes into a sensible-sized
byte[] - maybe a few K. Then use a byte-pattern-matching routing to
scan for any of the desired byte sequences. After you have scanned
the buffer, copy the tail-end of the buffer (enough to capture any
search sequences that were incomplete) to the start, and fill up
with more data and search again.

I think this should go quite fast.

Steve

Will Hartung

unread,
Sep 29, 2004, 1:43:48 PM9/29/04
to

"aquafresh3" <a.d...@laposte.net> wrote in message
news:972060cc.04092...@posting.google.com...

> Hello,
>
> I have to analyse the content of file to find some specific words in
> it.
> First I was using the bufferedReader.readline() method to do read my
> file but sometimes my file could be composed by only one line with a
> size of 30Mb (e.g rar file) so it resulted in a outOfMemory error.
> So I decided to use another method which is to read a line with a
> maximum of characters read set to 204800 per line.
>
> My problem is when I read the file it takes several minutes (about 30
> minutes for a file that is 1.68Mb).
>
> Here is the method I use to read the file

If you're working with text files, use Readers/Writers. If you're working
with binary files, Input/OutputStreams. If you're working with text and
happen to have intimate understanding of Unicode, then you can use Streams
and code that knowledge into your program. (I don't have intimate knowledge
of Unicode, so I let the Readers/Writers do the work for me).

If you KNOW that the file will ALWAYS fit in your heap space, then:

public char[] readIt(File f)
throws Exception
{
long size = f.length();
char[] buf = new char[size];
BufferedReader br = new BufferedReader(new FileReader(file));

br.read(buf, 0, size);
return buf;
}

Otherwise you have to page the file in through a smaller buffer, and deal
with that.

The NIO mapping functions MAY be faster, but probably not enough to worry
about unless you're dealing with EMORMOUS files. It would certainly be more
complicated.

Regards,

Will Hartung
(wi...@msoft.com)

Richard

unread,
Sep 30, 2004, 11:18:09 AM9/30/04
to

Here's how I did it
http://homepage.ntlworld.com/j.palethorpe/Programming/ITF/

The source code is in the jar file, it's a bit messy, but it works.

0 new messages