So, OCaml programs will only be run seldomly to access the database and
generate HTML files, using the data fetched from the DB. However, I am still
worried whether this would cause too much performance impact.
I heard that OCaml is particularly slow (and probably memory-inefficient)
when it comes to string manipulation. What is the preferred way in handling
strings (building long strings from short parts - something StringBuilder
would be used in Java)? Does anybody have any experience concerning this
kind of applications?
What about the startup time and memory usage of the program? Could these
affect the stability and efficiency of the web server?
(Hope someone will be able to decipher my language and care to answer :P )
- Tom
Hi Tom,
I suggest you take a look at Ocsigen (http://www.ocsigen.org/). It's
a fully-featured web server written in OCaml, that not only supports
static pages and traditional CGI programming, but also has a module
called Eliom that allows you to build dynamic websites using all the
best features of the OCaml language.
As for performance, the bottleneck will surely be the database backend.
Even when generating dynamic pages with Eliom, Ocsigen can easily output
close to a hundred pages per second on a decent machine. (And of course
it's even faster with static content!)
Cheers,
Dario Teixeira
________________________________________________________
Nervous about who has your email address? Yahoo! Mail can help you win the war against spam.
http://uk.docs.yahoo.com/mail/addressguard2.html
_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs
No, this is nonsense. Of course, you can slow everything down by using
strings in an inappropriate way, like
let rec concat_list l =
match l with
[] -> ""
| s :: l' -> s ^ concat_list l'
Use the Buffer module instead:
let concat_list l =
let b = Buffer.create 243 in
let rec concat l =
[] -> ()
| s :: l' ->
Buffer.add_string b s;
concat l' in
concat l;
Buffer.contents b
>
> What about the startup time and memory usage of the program? Could
> these affect the stability and efficiency of the web server?
>
> (Hope someone will be able to decipher my language and care to
> answer :P )
Have a look at ocamlnet (ocamlnet.sf.net). It has plenty of ways of
building web apps. For example, you can easily run your own HTTP server
in a multi-processing or multi-threading setup. Or you can connect your
web app with Apache by using fastcgi or a few other available protocols.
All this is pretty much scalable.
There is no support for generating HTML, however.
An example for a stand-alone webserver (it is accompanied only by a
config file):
Here is the same for the "connect to Apache" approach:
https://godirepo.camlcity.org/wwwsvn/trunk/code/examples/cgi/netcgi2-plex/?root=lib-ocamlnet2
In either way, it is possible to keep the connection to the db in case
you need it for generating the page.
Hope this helps,
Gerd
--
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany
ge...@gerd-stolpmann.de http://www.gerd-stolpmann.de
Phone: +49-6151-153855 Fax: +49-6151-997714
------------------------------------------------------------
Someone (don't remember the name) implemented ropes in Ocaml. Ropes
were specifically designed for string manipulation, if I remember
well. Maybe this is worth a look.
Loup Vaillant
Have a look here : http://sourceforge.net/projects/ocaml-rope
ChriS
> > I heard that OCaml is particularly slow (and probably
> > memory-inefficient) when it comes to string manipulation.
>
> No, this is nonsense. Of course, you can slow everything down by using
> strings in an inappropriate way, like
>
> let rec concat_list l =
> match l with
> [] -> ""
> | s :: l' -> s ^ concat_list l'
Now Gerd, I would not call the claim nonsense. If you can't
use a data structure in a natural way, I'd say the claim indeed
has some weight.
The example above is ugly because it isn't tail recursive.
If you consider an imperative loop to concat the strings
in an array
let s = ref "" in
for i = 0 to Array.length a do
s := !s ^ a.[i]
done;
then Ocaml is likely to do this slowly. C++ on the other
hand will probably do this faster, especially if you reserve
enough storage first.
The poor performance of Ocaml in such situations is a result
of two factors. The first is the worst-possible choice for a
data representation: mutable characters and immutable length.
The mutability of characters has limited utility and prevents
easy functional optimisations, the useful mutability would
have to include both the content and the length (as in C++).
The second issue would probably make a functional string have
poor performance: Ocaml doesn't do any serious optimisations,
so it probably wouldn't recognize an optimisation opportunity
anyhow.
Note this is by design policy, it isn't a bug or laziness.
[I'm sure someone can quote a ref to Xavier's comments on this]
The effect is that you do have to make fairly low level choices
in Ocaml to get good performance. The good thing about this
is that the optimisation techniques are manifest in the
source code so you have control over them.
Felix does high level optimisations and sometimes a tiny
change in the code can cause orders of magnitude performance
differences, and when I notice it can take me (the author)
quite some time to track down what triggered the difference
in the generated code.
Now, back to the issue: in the Felix compiler itself, the
code generator is printing out C++ code. This is primarily
done with Ocaml string concatenation of exactly the kind which
one might call 'inappropriate'. Buffer is used too, but only
for aggregating large strings.
The reason, trivially, is that it is easier and clearer to write
"class " ^ name ^ " {\n" ^
catmap "\n" string_of_member members ^
"\n};\n"
than to use the imperative Buffer everywhere. The above gives more
clue to what the output looks like.
Despite the cost of using strings this way .. the compiler backend
code generator isn't a performance bottleneck.
--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net
> Now Gerd, I would not call the claim nonsense. If you can't
> use a data structure in a natural way, I'd say the claim indeed
> has some weight.
The original claim was:
>> I heard that OCaml is particularly slow (and probably memory-inefficient)
>> when it comes to string manipulation. What is the preferred way in handling
>> strings (building long strings from short parts - something StringBuilder
>> would be used in Java)? Does anybody have any experience concerning this
>> kind of applications?
ie comparing Ocaml string handling to Java and other web languages like
php, perl, ruby and python.
While I agree that yes, it is possible to write slow code in Ocaml
(or any other language), I suspect that idiomatic Ocaml string handling
compiled to a binary is just as fast if not faster than Java/Perl/Python/
Ruby/PHP/whatever.
Erik
--
-----------------------------------------------------------------
Erik de Castro Lopo
-----------------------------------------------------------------
"Windows was created to keep stupid people away from UNIX."
-- Tom Christiansen
> While I agree that yes, it is possible to write slow code in Ocaml
> (or any other language), I suspect that idiomatic Ocaml string handling
> compiled to a binary is just as fast if not faster than Java/Perl/Python/
> Ruby/PHP/whatever.
Fraid not. Python eats Ocaml alive. Python:
s= "a"
x = ""
for i in xrange(0,10000000):
x = x+s
print "done"
Time: 6 seconds. Without optimisation switched on.
Ocaml:
let x = ref "";;
let s = "a";;
for i = 0 to 100000 do
x:= !x ^ s
done;;
print_endline "done";;
Time: 4.5 seconds.
Notice one TINY difference ... Ocaml is processing only 100K strings.
Python is processing 10 MILLION strings in about the same time.
I cannot measure Ocaml's performance when the number is increased
to even 1 million because I have run out of coffee.
--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net
_______________________________________________
And here is the Felix (C++) version:
int i;
var x = "";
s := "a";
forall i in 1 upto 10_000_000 do
x += s;
done;
println$ len x;
Time: 0m0.198s
Which eats Python for breakfast .. forget about Ocaml.
For 100 million strings, time: 0m1.795s.
I don't have enough RAM to test the next decimal O.
Out of curiosity, does it work as well (meaning as fast) if you write "x
= s+x" instead ?
Arnaud Spiwack
So, as shown in other posts, Ocaml really is SLOW with strings.
But here Erik says 'idiomatic'. I haven't tested this, but
again, this is probably wrong.
If you use Buffer for concatenation you'll get faster times than
Ocaml (^) operator on strings, but what this misses is that other
operations on strings (such as searching, substring etc etc)
aren't available for Buffer. So in order to use these you'd have to
a) make a Buffer and add string into it
b) get the string OUT of the buffer
c) call the functions
d) puts stuff back into some Buffer
This is not only extremely ugly because it is mixing functional
and imperative code .. it is probably as slow as two wet weeks
because of the conversions back and forth.
C++ strings on the other hand combine access to many operations,
both functional and mutations, and automatically provide
the 'Buffer'ing functionality as well. Unlike Buffer, however,
they're passed by value so that 'string const' data type is
purely functional.
Note that Python strings are immutable, so surprisingly of all
the languages I considered .. Python's string operations are
actually purely functional.
--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net
_______________________________________________
I checked:
x = x + s
and that's slow, I guess x = s + x is also slow. In Felix I would be
able to fix the x = x + s case with an imperative reduction,
however Felix only supports functional reductions at the moment.
[A reduction is a user defined term rewriting rule].
--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net
_______________________________________________
The two string manipulation benchmarks are
http://shootout.alioth.debian.org/gp4/benchmark.php?test=regexdna&lang=all
(Not quite as much string manipulation as Regexps, OCaml is among the
best here, but Python is about 2x faster -- I've tried improving it with
PCRE but the final result is not as fast as with Str)
http://shootout.alioth.debian.org/gp4/benchmark.php?test=fasta&lang=all
(Not quite as much string manipulation as outputting strings, OCaml is
still among the best here, and Python is about 25x slower)
Fwiw.
Cheers,
David
On Tue, 2007-10-09 at 09:05 +1000, skaller wrote:
> On Tue, 2007-10-09 at 08:21 +1000, Erik de Castro Lopo wrote:
> > skaller wrote:
>
> > While I agree that yes, it is possible to write slow code in Ocaml
> > (or any other language), I suspect that idiomatic Ocaml string handling
> > compiled to a binary is just as fast if not faster than Java/Perl/Python/
> > Ruby/PHP/whatever.
>
> Fraid not. Python eats Ocaml alive. Python:
_______________________________________________
Are you sure you are comparing string manipulation and languages here?
> s= "a"
> x = ""
> for i in xrange(0,10000000):
> x = x+s
> print "done"
>
> Time: 6 seconds. Without optimisation switched on.
Time: 6.238s Without optimisation switched on.
> Ocaml:
let x = ref(Rope.of_string "")
let s = Rope.of_string "a";;
for i = 0 to 10_000_000 do
x := Rope.concat2 !x s
done;;
print_endline "done"
Time: 2.047s Without optimisation switched on.
Cheers,
ChriS
The other operations are implemented for ropes (except regular
expressions which will happen when I have some time or some help!)
> Note that Python strings are immutable,
So are ropes.
Cheers,
ChriS
I don't know what the "nature" of strings is. I'm rather to believe they
are artifacts, and that there are several ways of defining strings
mostly resulting in different runtime behaviour.
The point is here that ^ always copies strings, and this is generally
expensive, especially in this example, because the same bytes are copied
every time the result string is extended.
I'm fully aware that you can get rid of this copying in the definition
of strings, but this has a price for some other operations. As said, you
can implement strings in various ways.
> If you consider an imperative loop to concat the strings
> in an array
>
> let s = ref "" in
> for i = 0 to Array.length a do
> s := !s ^ a.[i]
> done;
I would call this version even uglier... But taste differs.
The point is that neither the O'Caml runtime representation of strings
nor the compiler (it could recognize the specific use of ^ and
implicitly convert the code so it uses a buffer) do anything for
avoiding this trap.
But we have to be fair. It is simply nonsense to call the whole O'Caml
string manipulation slow. You have access to all operations you need to
do it fast. You just have to know how to code it.
> then Ocaml is likely to do this slowly. C++ on the other
> hand will probably do this faster, especially if you reserve
> enough storage first.
> Now, back to the issue: in the Felix compiler itself, the
> code generator is printing out C++ code. This is primarily
> done with Ocaml string concatenation of exactly the kind which
> one might call 'inappropriate'. Buffer is used too, but only
> for aggregating large strings.
>
> The reason, trivially, is that it is easier and clearer to write
>
> "class " ^ name ^ " {\n" ^
> catmap "\n" string_of_member members ^
> "\n};\n"
>
>
> than to use the imperative Buffer everywhere. The above gives more
> clue to what the output looks like.
Well, if you only concatenate a few strings isn't going to be a problem,
and is probably as fast as using a buffer (which has also some cost).
Gerd
--
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany
ge...@gerd-stolpmann.de http://www.gerd-stolpmann.de
Phone: +49-6151-153855 Fax: +49-6151-997714
------------------------------------------------------------
_______________________________________________
>On Mon, 2007-10-08 at 18:04 +0200, Gerd Stolpmann wrote:
>
>
>
>>>I heard that OCaml is particularly slow (and probably
>>>memory-inefficient) when it comes to string manipulation.
>>>
>>>
>>No, this is nonsense. Of course, you can slow everything down by using
>>strings in an inappropriate way, like
>>
>>let rec concat_list l =
>> match l with
>> [] -> ""
>> | s :: l' -> s ^ concat_list l'
>>
>>
>
>Now Gerd, I would not call the claim nonsense. If you can't
>use a data structure in a natural way, I'd say the claim indeed
>has some weight.
>
>The example above is ugly because it isn't tail recursive.
>If you consider an imperative loop to concat the strings
>in an array
>
> let s = ref "" in
> for i = 0 to Array.length a do
> s := !s ^ a.[i]
> done;
>
>then Ocaml is likely to do this slowly. C++ on the other
>hand will probably do this faster, especially if you reserve
>enough storage first.
>
>
And if you don't, and thus have to repeatedly allocate more memory, C++
is likely going to be slower than Ocaml (poor allocation performance).
In fact, I'm willing to bet you can get near C++ speed by doing things
in the C++ way- allocate the string once (with enough space), and then
use String.blit to fill it in.
That said, there are better implementations of strings for Ocaml. So
what? Ocaml isn't a string processing language. Yeah, there are things
which are probably better done in Perl/Python/Ruby. A language doesn't
have to be the perfect language for all purposes in order to be a good
language- in fact, in my experience languages that try to be everything
to everybody end up being useless for all purposes (C++ being example #1
here).
Brian
Out of curiosity, do your ropes handle UTF-8 and UTF-16?
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
> Fraid not. Python eats Ocaml alive.
Sure. If you want to go about your task in a hideously naive manner. Let's
try something different:
# let time f a = let t0 = Unix.gettimeofday () in let r = f a in r,
(Unix.gettimeofday () -. t0);;
val time : ('a -> 'b) -> 'a -> 'b * float = <fun>
# let a_cat n =
let rec build_as acc = function
| 0 -> acc
| n -> build_as ("a"::acc) (pred n)
in String.concat "" (build_as [] n);;
val a_cat : int -> string = <fun>
# snd (time a_cat 1_000_000);;
- : float = 0.55100011825561523
Now, it's not necessarily the first ting a person might think of for this
task, and it's not applicable to all uses of concatination, but for the task
that started this thread (slapping together bits of text to make a web page)
and for tasks like hard coding concatinations it's very convenient.
--
William D. Neumann
In this context, yes. In general, strings are not as efficient as the
equivalent concrete data structure in C. Specifically, using strings as a
byte array and applying arithmetic operations to the elements is
significantly slower in OCaml than C.
The only option you have in OCaml is to blow your memory wad and use an int
array, which is fast but wastes enormous amounts of space and still has
different modulo-arithmetic properties (you might want 8-bit for some apps).
Consequently, OCaml is not very good for arithmetic operations over byte
arrays.
I discovered this on my Sudoku solver and revisited it with the Brainf*ck
interpreter. This has never bitten me in practice though.
Perhaps this is an issue for bioinformaticians or some image processing
applications?
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
_______________________________________________
> But we have to be fair. It is simply nonsense to call the whole O'Caml
> string manipulation slow. You have access to all operations you need to
> do it fast. You just have to know how to code it.
No you don't, that's the point. There is no fast way to append using
string. You can use Buffer, but then you can't do (for example)
search. You can convert back and forth, and then you
pay an extra conversion cost.
C++ strings provide all the operations of both String and Buffer
and do not pay this cost.
--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net
_______________________________________________
> In this context, yes. In general, strings are not as efficient as
> the equivalent concrete data structure in C. Specifically, using
> strings as a byte array and applying arithmetic operations to the
> elements is significantly slower in OCaml than C.
>
> The only option you have in OCaml is to blow your memory wad and use
> an int array, which is fast but wastes enormous amounts of space and
> still has different modulo-arithmetic properties (you might want 8-
> bit for some apps). Consequently, OCaml is not very good for
> arithmetic operations over byte arrays.
I'd moaned about this a few years ago, and Xavier pointed out the following:
"A better alternative is to declare
external get_byte: string -> int -> int = "%string_safe_get"
external set_byte: string -> int -> int -> unit = "%string_safe_set"
and use these two functions to access strings as if they were byte
arrays. set_byte will store the low 8 bits of its third argument, so
you'd save on "land 0xFF" operations too."
It works pretty well for getting and setting bytes of a string. There's
also the int8_* bigarrays, but I've not used them much, so I can't say if
they're of much help, but they certainly weren't horrible.
--
William D. Neumann
If you allow arbitrary code .. you could use the previously mentioned
Ropes library in Ocaml and possibly do well .. and you could write fast
code in C++ using some other data structure too.
It's not clear then you're using "strings".
--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net
_______________________________________________
Of course that's nice, but Rope isn't the standard data structure.
Maybe it should be ..
--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net
_______________________________________________
> No you don't, that's the point. There is no fast way to append using
> string. You can use Buffer, but then you can't do (for example)
> search. You can convert back and forth, and then you
> pay an extra conversion cost.
So use buffer.ml with a slightly modified interface to create a rawBuffer
module that gives you direct access to the normally hidden string (and the
position of the end of the buffer). Presto, Bufferlike operations with a
string you can directly touch, search, blit, whatever.
No, it's not the default way the stdlib works, and again, it may not be the
first thing someone thinks of when they facce this problem. But the
language makes this option available with a minimum of work. Is it ideal?
No. Is is awful? Again, no.
--
William D. Neumann
> It's not clear then you're using "strings".
You'd think it would be if you're using finctions from the String module.
--
William D. Neumann
it should definitely not be standard, but be available as choice over
ocaml strings. each implementation has some use cases when their perform
better (memory/cpu wise).
--
Vincent Hanquez
Out of curiosity, why would a string implementation (has a handle of
chars bundle together) has to handle UTF-X ?
--
Vincent Hanquez
It never was.
The concrete data structures used to represent strings in these languages are
different. So you've just picked a concrete data structure with slow append
and showed that its append is slower than a concrete data structure with slow
random access and worse memory usage.
This is just swings and roundabouts.
You might like to compare the performance of setting a single char in a string
in Python and OCaml...
> C++ strings provide all the operations of both String and Buffer
> and do not pay this cost.
Can C++ escape a string using OCaml syntax?
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
_______________________________________________
Yes indeed. However, I'd like to pattern match over all of them.
Oh, and I'd like pattern matching over strings to be fast. Isn't there some
sexy Gray code or something that isn't just usefully fast but also qualifies
as research? Having the optimizing pattern match compiler generate linear
searches is just silly.
Oh, and I want OpenGL integrated into the language. I don't know why, or what
that's got to do with strings but I think its very important. And it must be
faster than C++. ;-)
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
_______________________________________________
My 2 cents:
It is more convenient to consider strings as characters arrays. Then,
these characters are handled as atoms, even if they take several bytes
in the chosen encoding. Of course, multi-byte characters must be
supported as well.
Still, I can use byte arrays as strings. But it limits me to ASCII and
Latin-like encodings: if I want to do UTF-X, then I have to worry
about multi-bytes characters myself. Internationalization made hard...
I would find very convenient to have plain unicode strings (and
chars), with appropriate scan, print, byte_array_from_string, and
string_from_byte_array functions, one bundle per supported encoding.
So I don't need to think about the internals of such a string.
Loup Vaillant