Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[Caml-list] ANN: Chess III Arena 0.5

4 views
Skip to first unread message

Grant Olson

unread,
Jul 3, 2007, 5:51:01 PM7/3/07
to caml...@yquem.inria.fr
Chess III Arena 0.5 is my first (allegedly) non-trivial app in OCaml. It is
a fully functional chess game, although it lacks some desirable features
such as a computer opponent and network play at this point in time. It uses
Quake III player models as the pieces.

Binaries for windows and source for other platforms are available at:

http://members.verizon.net/~olsongt/c3a/

I've also detailed some of the things I like about OCaml on that page, since
I don't write enough to have a blog, but I imagine I'm preaching to the
choir on this list.

-Grant

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Jon Harrop

unread,
Jul 3, 2007, 6:51:26 PM7/3/07
to caml...@yquem.inria.fr
On Tuesday 03 July 2007 22:48:40 Grant Olson wrote:
> Chess III Arena 0.5 is my first (allegedly) non-trivial app in OCaml. It
> is a fully functional chess game, although it lacks some desirable features
> such as a computer opponent and network play at this point in time. It
> uses Quake III player models as the pieces.
>
> Binaries for windows and source for other platforms are available at:
>
> http://members.verizon.net/~olsongt/c3a/
>
> I've also detailed some of the things I like about OCaml on that page,
> since I don't write enough to have a blog, but I imagine I'm preaching to
> the choir on this list.

ROTFLMAO. Wow! That is absolutely amazing! Love the name. =8-)

Here's a quick patch, if you put this right after your definition of draw_md3
in md3.ml then the whole program runs an order of magnitude faster:

let draw_md3 =
let m = Hashtbl.create 1 in
fun x y ->
try GlList.call(Hashtbl.find m (x, y)) with Not_found ->
let list = GlList.create `compile in
draw_md3 x y;
GlList.ends();
GlList.call list;
Hashtbl.add m (x, y) list

This simply memoizes the rendering of each frame of animation for each model
in a display list. The result is that your geometry is stored on the graphics
card and rendered directly, hence the gratuitous speedup.

This is a wonderful idea for a first project in OCaml. I've taken a quick look
at your code to give you some constructive criticism but your coding style is
already excellent. Keep posting!

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
The OCaml Journal
http://www.ffconsultancy.com/products/ocaml_journal/?e

Grant Olson

unread,
Jul 3, 2007, 10:20:45 PM7/3/07
to caml...@yquem.inria.fr
> From: caml-lis...@yquem.inria.fr
> [mailto:caml-lis...@yquem.inria.fr] On Behalf Of Jon Harrop

> let draw_md3 =
> let m = Hashtbl.create 1 in
> fun x y ->
> try GlList.call(Hashtbl.find m (x, y)) with Not_found ->
> let list = GlList.create `compile in
> draw_md3 x y;
> GlList.ends();
> GlList.call list;
> Hashtbl.add m (x, y) list
>
> This simply memoizes the rendering of each frame of animation
> for each model in a display list. The result is that your
> geometry is stored on the graphics card and rendered
> directly, hence the gratuitous speedup.
>

Thanks. I'll give it a try. I'm wondering if it will have any performance
impact for the software rendering on my linux box. You know I think I saw
you post something similar before, and ignored it because I didn't want to
trick myself into thinking the GPU's speed was actually Ocaml's speed. Now
that I have a better feel for that though, might as well let the frame-rates
fly!

-Grant

Daniel Bünzli

unread,
Jul 4, 2007, 4:02:34 AM7/4/07
to caml...@yquem.inria.fr

Le 4 juil. 07 à 04:18, Grant Olson a écrit :

> You know I think I saw you post something similar before, and
> ignored it because I didn't want to
> trick myself into thinking the GPU's speed was actually Ocaml's speed.

This is complete nonsense. In any language you try to get most from
your gpu, someone writing in C would do the same.
Your sentence translates to I will program badly on purpose to test
ocaml's speed. I'm sure you wouldnt't do that.

And by the way _if_ your bottleneck is on the gpu or in cpu to gpu
transfers this attitude will in no way test ocaml's performance.

Best,

Daniel

Jon Harrop

unread,
Jul 4, 2007, 7:06:07 AM7/4/07
to caml...@yquem.inria.fr

Incidentally, loading binary streams as bits, bytes, words, floats and so on
might be a good test case for active patterns. You might also get more
mileage in OCaml by using more combinators...

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
The OCaml Journal
http://www.ffconsultancy.com/products/ocaml_journal/?e

_______________________________________________

Grant Olson

unread,
Jul 4, 2007, 4:08:21 PM7/4/07
to Daniel Bünzli, caml...@yquem.inria.fr
> Le 4 juil. 07 à 04:18, Grant Olson a écrit :
>
> > You know I think I saw you post something similar before,
> and ignored
> > it because I didn't want to trick myself into thinking the
> GPU's speed
> > was actually Ocaml's speed.
>
> This is complete nonsense. In any language you try to get
> most from your gpu, someone writing in C would do the same.
> Your sentence translates to I will program badly on purpose
> to test ocaml's speed. I'm sure you wouldnt't do that.
>
> And by the way _if_ your bottleneck is on the gpu or in cpu
> to gpu transfers this attitude will in no way test ocaml's
> performance.
>

My point was offloading the work to another subsystem doesn't give me a very
good idea of the general-case performance of the language. For most
programs the GPU is not a factor one way or the other. And yes I realize
I’m still using the GPU to some extent so I'm already cheating. This was a
learning exercise, not an attempt to write high-performance production code.
Standard pre-mature optimization mantras come to mind as well.

Incidentally, Jon's code seemed to run slower on my graphics card anyway.

Off to see if us Americans can reclaim the hot-dog eating world title,

-Grant

Bünzli Daniel

unread,
Jul 4, 2007, 4:53:25 PM7/4/07
to caml...@yquem.inria.fr

Le 4 juil. 07 à 22:06, Grant Olson a écrit :

> My point was offloading the work to another subsystem doesn't give
> me a very
> good idea of the general-case performance of the language.

Yes but as I said in my previous message in that particular case not
doing it doesn't give you a good idea of the general-case performance
of the language either. And if you render is too slow you may end up
blame the language while the problem may be that you are misusing the
gpu.

> Incidentally, Jon's code seemed to run slower on my graphics card
> anyway.

This is possible. Iirc on some platforms using vertex buffer objects
is faster and there are some gl commands that, for the sake of
performance, should not be put in a display list (e.g. glTexImage2d)
-- even though they are allowed to be.

Best,

Daniel

Jon Harrop

unread,
Jul 4, 2007, 5:29:27 PM7/4/07
to caml...@yquem.inria.fr
On Wednesday 04 July 2007 21:06:29 Grant Olson wrote:
> My point was offloading the work to another subsystem doesn't give me a
> very good idea of the general-case performance of the language. For most
> programs the GPU is not a factor one way or the other.

Don't forget the OpenGL driver. This is a very complicated and often highly
optimized piece of software. In the case of display lists, it is essentially
acting as an optimizing compiler. The small function that I gave posted
offloads the prioritization of texture and vertex data to the OpenGL driver
and good drivers (e.g. any nVidia driver) will do an excellent job of
optimizing for you.

When you are rendering static geometry (as you are), display lists are an
excellent way to get huge performance improvements without leaving the
comfort of idiomatic OCaml code.

> Incidentally, Jon's code seemed to run slower on my graphics card anyway.

What hardware and drivers are you using?

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
The OCaml Journal
http://www.ffconsultancy.com/products/ocaml_journal/?e

_______________________________________________

Daniel Bünzli

unread,
Jul 4, 2007, 8:27:03 PM7/4/07
to caml...@yquem.inria.fr

Le 4 juil. 07 à 23:22, Jon Harrop a écrit :

> The small function that I gave posted offloads the prioritization
> of texture and vertex data to the OpenGL driver
> and good drivers (e.g. any nVidia driver) will do an excellent job
> of optimizing for you.

[...]


> What hardware and drivers are you using?

This may not be due to hardware. Looking at the call graph of
draw_md3 it seems that a glTexImage2D gets captured by the display
list. On some platforms this is not a good thing, texture objects
(GlTex.gen_texture etc.) should be used instead. See for example
question 16.090 here [1].

Daniel

[1] http://www.opengl.org/resources/faq/technical/displaylist.htm

Jon Harrop

unread,
Jul 4, 2007, 11:12:35 PM7/4/07
to caml...@yquem.inria.fr
On Thursday 05 July 2007 01:25:28 Daniel Bünzli wrote:
> Le 4 juil. 07 ŕ 23:22, Jon Harrop a écrit :

> > What hardware and drivers are you using?
>
> This may not be due to hardware.

Something is certainly wrong and I don't think its my code. :-)

There's no way this optimization should slow the program down unless its
software rendered, and even then I'd be surprised because it avoids tens of
thousands of OCaml->C calls by keeping the vertex and texture data with the
driver.

For me, the extra 9 lines of code improve performance from 9 to 600fps!

> Looking at the call graph of
> draw_md3 it seems that a glTexImage2D gets captured by the display
> list. On some platforms this is not a good thing, texture objects
> (GlTex.gen_texture etc.) should be used instead. See for example
> question 16.090 here [1].

Yes but the performance difference between texture objects and display lists
should be tiny compared to the difference with immediate mode, which is what
we're comparing here.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
The OCaml Journal
http://www.ffconsultancy.com/products/ocaml_journal/?e

_______________________________________________

Daniel Bünzli

unread,
Jul 5, 2007, 3:47:25 AM7/5/07
to caml...@yquem.inria.fr

Le 5 juil. 07 à 05:05, Jon Harrop a écrit :

> There's no way this optimization should slow the program down
> unless its
> software rendered,

[...]


> Yes but the performance difference between texture objects and
> display lists
> should be tiny compared to the difference with immediate mode,
> which is what
> we're comparing here.

No, it can result in a slow down even if it is harwdare rendered, the
effect of your "optimization" is implementation dependent. The
problem is that glTexImage2d cannot be compiled efficently in a
display list (see the bottom of this page [1] for why this is the
case) and it may be that in the implementation of OpenGL he uses the
cpu to gpu texture transfer is done each time the display list is
called. In that case if your bottleneck is in the cpu to gpu transfer
of textures, the performance advantage of retained vs immediate is
lost. The only platform independent way of ensuring that cpu to gpu
texture transfers are not performed each time the display list is
called is to use texture objects.

Daniel

[1] http://www.bluevoid.com/opengl/sig00/advanced00/notes/node60.html

Jon Harrop

unread,
Jul 5, 2007, 9:00:29 AM7/5/07
to caml...@yquem.inria.fr
On Thursday 05 July 2007 08:45:49 Daniel Bünzli wrote:
> No, it can result in a slow down even if it is harwdare rendered, the
> effect of your "optimization" is implementation dependent.

Can you quantify that: what implementations? how much slow down?

> The
> problem is that glTexImage2d cannot be compiled efficently in a
> display list (see the bottom of this page [1] for why this is the

> case).

If "glTexImage2d cannot be compiled efficently in a display list" why does
this optimization show an enormous performance improvement here?

Anyway, you can use texture objects by supplementing texture.ml with:

let activate_texture tex =
let target = `texture_2d in
let t = GlTex.gen_texture () in
GlTex.bind_texture ~target t;
GlPix.store (`unpack_alignment 1);
GlTex.image2d tex;
List.iter (GlTex.parameter ~target)
[ `wrap_s `clamp;
`wrap_t `clamp;
`mag_filter `linear;
`min_filter `linear ];
GlTex.env (`mode `modulate);
t;;

let m = Hashtbl.create 1

let set_current_texture texname =
try GlTex.bind_texture ~target:`texture_2d (Hashtbl.find m texname)
with Not_found ->
let tex =
try
activate_texture (Hashtbl.find textures texname)
with Not_found ->
activate_texture (Hashtbl.find textures "unknown") in
Hashtbl.add m texname tex

This gives me ~50fps without any display lists. If I memoize vertex data
generated by Md3.draw_frame_triangles as well:

let draw_frame_triangles =


let m = Hashtbl.create 1 in

fun a b c d ->
let key = (a, b, c, d) in
try GlList.call(Hashtbl.find m key) with Not_found ->


let list = GlList.create `compile in

draw_frame_triangles a b c d;
GlList.ends();
GlList.call list;
Hashtbl.add m key list

then I'm back up to ~600fps.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
The OCaml Journal
http://www.ffconsultancy.com/products/ocaml_journal/?e

_______________________________________________

Daniel Bünzli

unread,
Jul 5, 2007, 9:38:53 AM7/5/07
to caml...@yquem.inria.fr

Le 5 juil. 07 à 14:52, Jon Harrop a écrit :

> Can you quantify that: what implementations? how much slow down?

The only thing I know is that this happen in apple's implementation,
see these messages [1] from a programmer working on apple's opengl. I
never tried to time since I strive for implementation independent
optimizations and in that case texture objects are the only way to go.

Best,

Daniel

[1]
http://lists.apple.com/archives/mac-opengl/2004/Feb/msg00189.html
http://lists.apple.com/archives/mac-opengl/2004/Feb/msg00191.html

Daniel Bünzli

unread,
Jul 5, 2007, 9:53:50 AM7/5/07
to caml...@yquem.inria.fr

Le 5 juil. 07 à 14:52, Jon Harrop a écrit :

> Anyway, you can use texture objects by supplementing texture.ml with:

[...]

By the way I'm not sure how your code gets called but I think that
the first time you'll load the texture is when the display list is
created, so it's this command that will be retained (along with the
creation of a texture object) and not the _use_ of a texture object.
Thus you should first "activate" all textures once and then generate
the display lists. No glTexImage2d call during the display list
creation !

Daniel

Jon Harrop

unread,
Jul 5, 2007, 10:30:34 AM7/5/07
to caml...@yquem.inria.fr
On Thursday 05 July 2007 14:52:23 Daniel Bünzli wrote:
> By the way I'm not sure how your code gets called but I think that
> the first time you'll load the texture is when the display list is
> created, so it's this command that will be retained (along with the
> creation of a texture object) and not the _use_ of a texture object.
> Thus you should first "activate" all textures once and then generate
> the display lists. No glTexImage2d call during the display list
> creation !

The latest code memoizes the textures and vertex data separately, textures in
texture objects and vertex data in display lists. That should remove all
texture-related calls from the display lists.

This is three times as much code and it gives the same result on my system so
I'd like to know if anyone really does see a performance improvement from
this.

> The only thing I know is that this happen in apple's implementation,
> see these messages [1] from a programmer working on apple's opengl. I
> never tried to time since I strive for implementation independent
> optimizations and in that case texture objects are the only way to go.
>

That is certainly someone else offering the same advice but the thread does
not conclude that the advice is good. Moreover, the code is not available (it
may well have been wrong).

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
The OCaml Journal
http://www.ffconsultancy.com/products/ocaml_journal/?e

_______________________________________________

Daniel Bünzli

unread,
Jul 5, 2007, 10:45:37 AM7/5/07
to caml...@yquem.inria.fr

Le 5 juil. 07 à 16:22, Jon Harrop a écrit :

> That is certainly someone else offering the same advice but the
> thread does
> not conclude that the advice is good. Moreover, the code is not
> available (it
> may well have been wrong).

Did you actually read my post ? This is a programmer that
_implements_ opengl at apple, he certainly knows what you should do
performance-wise with their implementation. I prefer to follow these
advices than the conclusions of your experiments on a particular
machine.

Daniel

Grant Olson

unread,
Jul 5, 2007, 7:43:00 PM7/5/07
to caml...@yquem.inria.fr

> -----Original Message-----
> From: caml-lis...@yquem.inria.fr
> [mailto:caml-lis...@yquem.inria.fr] On Behalf Of Jon Harrop
>
> > Incidentally, Jon's code seemed to run slower on my
> graphics card anyway.
>
> What hardware and drivers are you using?
>

I did read all the other posts in this thread, but just wanted to follow up
quickly. I've got a no-thrills ATI Radeon X550. I was originally getting
about 10 fps, and then down to 6.6 after the display lists. Updating the
driver helped. It went from 10 fps before to 70 fps with the display lists.
But:

(1) There were a few hangs in the display while display lists were intially
being created. The code should probably be run at startup.
(2) The textures got screwy after text got displayed. For example, working
through a fools' mate, when I take the pawn the textures break.

I didn't get around to all of the other optimizations. I'm a little too
busy right now to go through all of that, but I'll give it a try eventually.
I'm wondering what impact meomizing the 'draw_frame_triangles' will have on
memory usage. I'm guessing that's one of the reasons the files were stored
in their given format to begin with.

-Grant

Jon Harrop

unread,
Jul 7, 2007, 12:27:42 PM7/7/07
to caml...@yquem.inria.fr
On Friday 06 July 2007 00:40:24 Grant Olson wrote:
> It went from 10 fps before to 70 fps with the display lists.

That's more like it. :-)

> (1) There were a few hangs in the display while display lists were
> intially being created. The code should probably be run at startup.

The memoization technique I used is a good way to avoid that problem because
the display lists are lazily allocated. So you can't get premature allocation
bugs (a problem if you try allocating global OpenGL entities before the GL is
initialized).

> (2) The textures got screwy after text got displayed. For example,
> working through a fools' mate, when I take the pawn the textures break.

Probably a bug in ATi's OpenGL implementation. IIRC, John Carmack used display
lists to cache adaptive tesselations of Bezier surfaces in Quake III and had
similar problems. nVidia's drivers are much better... :-)

> I didn't get around to all of the other optimizations. I'm a little too
> busy right now to go through all of that, but I'll give it a try
> eventually. I'm wondering what impact meomizing the 'draw_frame_triangles'
> will have on memory usage.

System memory or graphics card memory? Does it take up much memory?

> I'm guessing that's one of the reasons the files were stored in their given
> format to begin with.

Maybe.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
The OCaml Journal
http://www.ffconsultancy.com/products/ocaml_journal/?e

_______________________________________________

0 new messages