-module(test1).
-compile(export_all).
test1(N) ->
timer:tc(?MODULE, make, [N]).
test2() ->
timer:tc(?MODULE, read, []).
make(N) ->
Line = lists:seq(30,59) ++ [$\n],
file:write_file("big.tmp",
lists:duplicate(N,Line)).
read() ->
{ok,B} = file:read_file("big.tmp"),
L = split(B, [], []),
length(L).
split(<<10,B/binary>>, L1, L2) ->
split(B, [], [list_to_binary(lists:reverse(L1))|L2]);
split(<<H,T/binary>>, L1, L2) ->
split(T, [H|L1], L2);
split(<<>>, L1, L2) ->
lists:reverse([list_to_binary(lists:reverse(L1))|L2]).
The timing I got were that Erlang was 9 times slower than perl (or wc)
which is more or less what I expected. If I wanted to speed this up
I'd write a NIF to split the binary at the first newline character.
I actually always use file:read_file(F) for everything - since getting the
entire file in at one go always seems a good idea and I have small files
(compared to my RAM) - I'd use file:pread for files that are too big for memory and do random access read. Reading the entire file seems
a good idea for files less than 100MB since I have 4GB of memory.
The OS seems to do a better job of caching entire files than I could ever
do so I don't worry about re-reading them ...
I have no idea why you see a factor of 250 - is this a memory problem.
How much memory have you got? Does your program scale linearly with
the file size - or does something go suddenly wrong as you increase the
size of the file?
Cheers
/Joe