Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[Caml-list] Rewriting the Digest module causes linking errors

26 views
Skip to first unread message

Goswin von Brederlow

unread,
Mar 17, 2010, 4:27:50 AM3/17/10
to caml-list
Hi,

I want to rewrite the Digest module to expose a more lowlevel interface
to the md5 digest and add support to digest Bigarrays. I've patched the
respective files involved and it all looks alright but when I try to
build ocaml I get the following error:

File "_none_", line 1, characters 0-1:
Error: Error while linking boot/stdlib.cma(Digest):
The external function `caml_md5_update_string' is not available
make[2]: *** [ocamlc] Error 2
make[2]: Leaving directory `/home/mrvn/src/debian/ocaml/ocaml-3.11.2'
make[1]: *** [world] Error 2
make[1]: Leaving directory `/home/mrvn/src/debian/ocaml/ocaml-3.11.2'
make: *** [build-stamp] Error 2

Can anyone explain why the new functions are not available?

MfG
Goswin

PS: The patch is based on Debians 3.11.2-1 source and the patch below
can be added to debian/patches/ as is.

--
From: Goswin von Brederlow <goswi...@web.de>
Date: Wed, 17 Mar 2010 05:56:56 +0200
Subject: [PATCH] Rewrite digest module

---
byterun/md5.c | 30 ++++++++++++++++++++++++++++++
byterun/md5.h | 6 +++++-
otherlibs/bigarray/bigarray.h | 2 ++
otherlibs/bigarray/bigarray.ml | 11 +++++++++++
otherlibs/bigarray/bigarray.mli | 10 ++++++++++
otherlibs/bigarray/bigarray_stubs.c | 13 +++++++++++++
stdlib/digest.ml | 13 +++++++++++++
stdlib/digest.mli | 28 ++++++++++++++++++++++++++++
8 files changed, 112 insertions(+), 1 deletion(-)

Index: ocaml-3.11.2/stdlib/digest.ml
===================================================================
--- ocaml-3.11.2.orig/stdlib/digest.ml 2010-03-17 08:17:24.000000000 +0100
+++ ocaml-3.11.2/stdlib/digest.ml 2010-03-17 08:19:46.000000000 +0100
@@ -16,8 +16,13 @@
(* Message digest (MD5) *)

type t = string
+type context

+external context: unit -> context = "caml_md5_context"
external unsafe_string: string -> int -> int -> t = "caml_md5_string"
+external unsafe_update_string: context -> string -> int -> int -> unit
+ = "caml_md5_update_string"
+external final: context -> t = "caml_md5_final"
external channel: in_channel -> int -> t = "caml_md5_chan"

let string str =
@@ -28,6 +33,14 @@
then invalid_arg "Digest.substring"
else unsafe_string str ofs len

+let update_string context str =
+ unsafe_update_string context str 0 (String.length str)
+
+let update_substring context str ofs len =
+ if ofs < 0 || len < 0 || ofs > String.length str - len
+ then invalid_arg "Digest.update_substring"
+ else unsafe_update_string context str ofs len
+
let file filename =
let ic = open_in_bin filename in
let d = channel ic (-1) in
Index: ocaml-3.11.2/stdlib/digest.mli
===================================================================
--- ocaml-3.11.2.orig/stdlib/digest.mli 2010-03-17 08:17:24.000000000 +0100
+++ ocaml-3.11.2/stdlib/digest.mli 2010-03-17 08:19:56.000000000 +0100
@@ -24,6 +24,34 @@
type t = string
(** The type of digests: 16-character strings. *)

+type context
+(** The type of a digest context. *)
+
+external context : unit -> context = "caml_md5_context"
+(** Return a fresh digest context. *)
+
+external unsafe_update_string : context -> string -> int -> int -> unit
+ = "caml_md5_update_string"
+(** [Digest.unsafe_update_string ctx s ofs len] updates the context
+ to include the substring of [s] starting at character number [ofs]
+ and containing [len] characters. *)
+
+external unsafe_string: string -> int -> int -> t = "caml_md5_string"
+(** [Digest.unsafe_string s ofs len] returns the digest of the
+ substring of [s] starting at character number [ofs] and containing
+ [len] characters. *)
+
+val update_string : context -> string -> unit
+(** [Digest.update_string ctx s ] updates the context to include [s]. *)
+
+val update_substring : context -> string -> int -> int -> unit
+(** [Digest.update_substring ctx s ofs len] updates the context
+ to include the substring of [s] starting at character number [ofs]
+ and containing [len] characters. *)
+
+external final : context -> t = "caml_md5_final"
+(** [Digest.final ctx] computs the final digest from [ctx]. *)
+
val string : string -> t
(** Return the digest of the given string. *)

Index: ocaml-3.11.2/byterun/md5.c
===================================================================
--- ocaml-3.11.2.orig/byterun/md5.c 2010-03-17 08:17:24.000000000 +0100
+++ ocaml-3.11.2/byterun/md5.c 2010-03-17 08:17:43.000000000 +0100
@@ -24,6 +24,36 @@

/* MD5 message digest */

+CAMLprim value caml_md5_context(void)
+{
+ value context;
+ struct MD5Context *ctx = malloc(sizeof(struct MD5Context));
+ if (ctx == NULL) {
+ caml_raise_out_of_memory();
+ }
+ caml_MD5Init(ctx);
+ context = caml_alloc_small(1, Abstract_tag);
+ Store_field(context, 0, (value)ctx);
+ return context;
+}
+
+CAMLprim value caml_md5_update_string(value context,
+ value str, value ofs, value len)
+{
+ struct MD5Context *ctx = (struct MD5Context *)Data_custom_val(context);
+ caml_MD5Update(ctx, &Byte_u(str, Long_val(ofs)), Long_val(len));
+ return Val_unit;
+}
+
+CAMLprim value caml_md5_final(value context)
+{
+ struct MD5Context ctx = *(struct MD5Context *)Data_custom_val(context);
+ value res;
+ res = caml_alloc_string(16);
+ caml_MD5Final(&Byte_u(res, 0), &ctx);
+ return res;
+}
+
CAMLprim value caml_md5_string(value str, value ofs, value len)
{
struct MD5Context ctx;
Index: ocaml-3.11.2/byterun/md5.h
===================================================================
--- ocaml-3.11.2.orig/byterun/md5.h 2010-03-17 08:17:24.000000000 +0100
+++ ocaml-3.11.2/byterun/md5.h 2010-03-17 08:17:43.000000000 +0100
@@ -22,6 +22,10 @@
#include "mlvalues.h"
#include "io.h"

+CAMLextern value caml_md5_context (void);
+CAMLextern value caml_md5_update_string (value context,
+ value str, value ofs, value len);
+CAMLextern value caml_md5_final (value context);
CAMLextern value caml_md5_string (value str, value ofs, value len);
CAMLextern value caml_md5_chan (value vchan, value len);

@@ -32,7 +36,7 @@
};

CAMLextern void caml_MD5Init (struct MD5Context *context);
-CAMLextern void caml_MD5Update (struct MD5Context *context, unsigned char *buf,
+CAMLextern void caml_MD5Update (struct MD5Context *context, unsigned char *buf,
uintnat len);
CAMLextern void caml_MD5Final (unsigned char *digest, struct MD5Context *ctx);
CAMLextern void caml_MD5Transform (uint32 *buf, uint32 *in);
Index: ocaml-3.11.2/otherlibs/bigarray/bigarray.h
===================================================================
--- ocaml-3.11.2.orig/otherlibs/bigarray/bigarray.h 2010-03-17 08:17:43.000000000 +0100
+++ ocaml-3.11.2/otherlibs/bigarray/bigarray.h 2010-03-17 08:17:43.000000000 +0100
@@ -93,4 +93,6 @@
... /*dimensions, with type intnat */);
CAMLBAextern uintnat caml_ba_byte_size(struct caml_ba_array * b);

+CAMLprim value caml_md5_update_bigarray (value context, value array);
+
#endif
Index: ocaml-3.11.2/otherlibs/bigarray/bigarray.ml
===================================================================
--- ocaml-3.11.2.orig/otherlibs/bigarray/bigarray.ml 2010-03-17 08:17:43.000000000 +0100
+++ ocaml-3.11.2/otherlibs/bigarray/bigarray.ml 2010-03-17 08:17:43.000000000 +0100
@@ -124,6 +124,17 @@
ba
let map_file fd ?pos kind layout shared dim =
Genarray.map_file fd ?pos kind layout shared [|dim|]
+
+ module Digest = struct
+ external update_bigarray: context -> ('a, 'b, 'c) Bigarray.Array1.t -> unit
+ = "caml_md5_update_bigarray" "noalloc"
+
+ let bigarray arr =
+ let context = context () in
+ let () = update_bigarray context arr
+ in
+ final context
+ end
end

module Array2 = struct
Index: ocaml-3.11.2/otherlibs/bigarray/bigarray.mli
===================================================================
--- ocaml-3.11.2.orig/otherlibs/bigarray/bigarray.mli 2010-03-17 08:17:43.000000000 +0100
+++ ocaml-3.11.2/otherlibs/bigarray/bigarray.mli 2010-03-17 08:17:43.000000000 +0100
@@ -507,6 +507,16 @@
Use with caution and only when the program logic guarantees that
the access is within bounds. *)

+ module Digest : sig
+ external update_bigarray : context -> ('a, 'b, 'c) t -> unit
+ = "caml_md5_update_bigarray" "noalloc"
+ (** Updates the context to include the bigarray. This function
+ runs concurrent with other threads. *)
+
+ val bigarray : ('a, 'b, 'c) t -> Digest.t
+ (** Return the digest of the given bigarray. This function
+ runs concurrent with other threads. *)
+ end
end


Index: ocaml-3.11.2/otherlibs/bigarray/bigarray_stubs.c
===================================================================
--- ocaml-3.11.2.orig/otherlibs/bigarray/bigarray_stubs.c 2010-03-17 08:17:43.000000000 +0100
+++ ocaml-3.11.2/otherlibs/bigarray/bigarray_stubs.c 2010-03-17 08:17:43.000000000 +0100
@@ -23,6 +23,8 @@
#include "intext.h"
#include "memory.h"
#include "mlvalues.h"
+#include "md5.h"
+#include "signals.h"

#define int8 caml_ba_int8
#define uint8 caml_ba_uint8
@@ -1097,3 +1099,14 @@
caml_register_custom_operations(&caml_ba_ops);
return Val_unit;
}
+
+CAMLprim value caml_md5_update_bigarray(value context, value array)
+{
+ struct MD5Context *ctx = (struct MD5Context *)Data_custom_val(context);
+ void* data = Caml_ba_data_val(array);
+ size_t len = Caml_ba_byte_size(Caml_ba_array_val(array));
+ caml_leave_blocking_section();
+ caml_MD5Update(ctx, data, len);
+ caml_enter_blocking_section();
+ return Val_unit;
+}

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Mark Shinwell

unread,
Mar 17, 2010, 4:39:42 AM3/17/10
to Goswin von Brederlow, caml-list
On Wed, Mar 17, 2010 at 09:27:30AM +0100, Goswin von Brederlow wrote:
> I want to rewrite the Digest module to expose a more lowlevel interface
> to the md5 digest and add support to digest Bigarrays. I've patched the
> respective files involved and it all looks alright but when I try to
> build ocaml I get the following error:
>
> File "_none_", line 1, characters 0-1:
> Error: Error while linking boot/stdlib.cma(Digest):
> The external function `caml_md5_update_string' is not available
> make[2]: *** [ocamlc] Error 2
> make[2]: Leaving directory `/home/mrvn/src/debian/ocaml/ocaml-3.11.2'
> make[1]: *** [world] Error 2
> make[1]: Leaving directory `/home/mrvn/src/debian/ocaml/ocaml-3.11.2'
> make: *** [build-stamp] Error 2
>
> Can anyone explain why the new functions are not available?

You need to bootstrap the compiler. I think "make bootstrap" will do the
trick, although you may have to run it twice if I remember correctly.

Mark

Goswin von Brederlow

unread,
Mar 17, 2010, 5:53:30 AM3/17/10
to Mark Shinwell, caml-list
"Mark Shinwell" <mshi...@janestreet.com> writes:

> On Wed, Mar 17, 2010 at 09:27:30AM +0100, Goswin von Brederlow wrote:
>> I want to rewrite the Digest module to expose a more lowlevel interface
>> to the md5 digest and add support to digest Bigarrays. I've patched the
>> respective files involved and it all looks alright but when I try to
>> build ocaml I get the following error:
>>
>> File "_none_", line 1, characters 0-1:
>> Error: Error while linking boot/stdlib.cma(Digest):
>> The external function `caml_md5_update_string' is not available
>> make[2]: *** [ocamlc] Error 2
>> make[2]: Leaving directory `/home/mrvn/src/debian/ocaml/ocaml-3.11.2'
>> make[1]: *** [world] Error 2
>> make[1]: Leaving directory `/home/mrvn/src/debian/ocaml/ocaml-3.11.2'
>> make: *** [build-stamp] Error 2
>>
>> Can anyone explain why the new functions are not available?
>
> You need to bootstrap the compiler. I think "make bootstrap" will do the
> trick, although you may have to run it twice if I remember correctly.
>
> Mark

Well, it is a bootstraping problem it seems but "make bootstrap" is not
the answere.

I need to build this in two passes. First only patch md5.[ch] and build
ocamlc. Then patch the rest and build with the ocamlc from the first
build copied to boot/.

Seems like ocamlc always uses its internal list of primitives even if it
compiles for a new runtime that will have different primitives.

MfG
Goswin

Goswin von Brederlow

unread,
Mar 17, 2010, 12:39:54 PM3/17/10
to caml-list
Goswin von Brederlow <goswi...@web.de> writes:

>> On Wed, Mar 17, 2010 at 09:27:30AM +0100, Goswin von Brederlow wrote:
>>> I want to rewrite the Digest module to expose a more lowlevel interface
>>> to the md5 digest and add support to digest Bigarrays. I've patched the
>>> respective files involved and it all looks alright but when I try to
>>> build ocaml I get the following error:

Ok, so I managed to bootstrap the compiler properly and build debian
packages with my new Digest interface. But something is still wrong as I
randomly get segfaults or

<No room for growing heap
Fatal error: out of memory.

The more threads I use to compute digests in parallel the more likely
the error becomes. But that might just be an issue with more allocations
hapening and not a race condition between threads.

Can anyone spot the bug?

I split the patch into two so it is easier to bootstrap. Apply
0008-md5-rewrite.patch, build ocamlc and copy it to boot and then apply
0009-digest-rewrite.patch and build everything. Debian user can also get
the debian source or amd64 debs.

Patches and source at: http://mrvn.homeip.net/ocaml/

Testcode:
----------------------------- foo.ml ---------------------------------
open Bigarray

let blocksize = 1048576
let num = ref 1024
let mutex = Mutex.create ()

let compute x =
Printf.printf "Thread %d started\n" x; flush_all ();
let buf = Array1.create int8_unsigned c_layout blocksize in
let rec loop () =
Mutex.lock mutex;
if !num = 0
then Mutex.unlock mutex
else begin
decr num;
Mutex.unlock mutex;
ignore (Array1.Digest.bigarray buf);
loop ()
end
in
loop ()

let main x =
Printf.printf "Running with %d threads\n" x; flush_all ();
let rec loop acc = function
0 -> acc
| x ->
let thread = Thread.create compute x
in
loop (thread :: acc) (x - 1)
in
let threads = loop [] x
in
List.iter Thread.join threads;
Printf.printf "All done\n"

let _ =
let control = Gc.get () in
let _ = control.Gc.verbose <- 0x3ff in
let _ = Gc.set control
in
Gc.compact ();
flush_all ();
Scanf.sscanf Sys.argv.(1) "%d" main;
Gc.compact ();
flush_all ()
----------------------------------------------------------------------

ocamlopt -thread -o foo unix.cmxa threads.cmxa bigarray.cmxa foo.ml && \
time ./foo 1 && time ./foo 2 && time ./foo 3 && time ./foo 4 && \
time ./foo 16

1-4 usualy runs while 16 usualy breaks. But sometimes it breaks earlier
or 16 runs too. Just repeat ./foo 16 till it fails.

MfG
Goswin

PS: Runtimes without the verbose GC:
/foo 1 7.28s user 0.01s system 100% cpu 7.291 total
/foo 2 8.75s user 0.02s system 196% cpu 4.474 total
/foo 3 9.76s user 0.03s system 298% cpu 3.285 total
/foo 4 10.69s user 0.04s system 371% cpu 2.891 total

Goswin von Brederlow

unread,
Mar 18, 2010, 6:58:11 AM3/18/10
to caml-list
Goswin von Brederlow <goswi...@web.de> writes:

> Goswin von Brederlow <goswi...@web.de> writes:
>
>>> On Wed, Mar 17, 2010 at 09:27:30AM +0100, Goswin von Brederlow wrote:
>>>> I want to rewrite the Digest module to expose a more lowlevel interface
>>>> to the md5 digest and add support to digest Bigarrays. I've patched the
>>>> respective files involved and it all looks alright but when I try to
>>>> build ocaml I get the following error:
>
> Ok, so I managed to bootstrap the compiler properly and build debian
> packages with my new Digest interface. But something is still wrong as I
> randomly get segfaults or
>
> <No room for growing heap
> Fatal error: out of memory.
>
> The more threads I use to compute digests in parallel the more likely
> the error becomes. But that might just be an issue with more allocations
> hapening and not a race condition between threads.

I finaly tracked down the issue with the help of Erkki Seppala.

First problem:

I had the function declared as "noalloc" but used CAMLparam2() in
it. That seems to cause random segfaults. I don't understand why but if
I remove the "noalloc" then it works.

Second problem:

When I remove the CAMLparam2() the finalizer is called too early:

CAMLprim value md5_update_bigarray(value context, value vb)
{
//CAMLparam2(context, vb);
struct helper *helper = (struct helper*)Data_custom_val(context);
struct MD5Context *ctx = helper->ctx;
fprintf(stderr, "update_bigarray: helper = %p, ctx = %p\n", helper, ctx);
struct caml_ba_array * b = Caml_ba_array_val(vb);
unsigned char *data = b->data;
uintnat len = caml_ba_byte_size(b);
caml_enter_blocking_section();
caml_MD5Update(ctx, data, len);
caml_leave_blocking_section();
//CAMLreturn(Val_unit);
return Val_unit;
}

let rec loop () =
Mutex.lock mutex;
if !num = 0
then Mutex.unlock mutex
else begin
decr num;
Mutex.unlock mutex;

let context = context () in

let () = update_bigarray context buf
in


loop ()
end
in
loop ()

This sometimes results in the following code flow:

context () <- allocates memory
update_bigarray context buf
caml_enter_blocking_section();
THREAD SWITCH
GC runs
context is finalized <- frees memory
THREAD SWITCH BACK
caml_MD5Update(ctx, data, len); <- writes to ctx which is freeed

Looks like ocamlopt really is so smart that is sees that context is
never used after the call to update_bigarray and removes it from the
root set before calling update_bigarray. It assumes the update_bigarray
will hold all its arguments alive itself, which is a valid assumption.

This is a tricky situation. The md5_update_bigarray() on its own is a
"noalloc" function. But due to the caml_enter_blocking_section() another
thread can alloc and trigger a GC run in parallel. So I guess that makes
the function actually not "noalloc".

Well, problem solved, lesson learned. :)

MfG
Goswin

David Baelde

unread,
Mar 30, 2010, 3:14:42 AM3/30/10
to Goswin von Brederlow, Savonet's developpers list, Caml Mailing List
On Thu, Mar 18, 2010 at 4:56 AM, Goswin von Brederlow <goswi...@web.de> wrote:
> This is a tricky situation. The md5_update_bigarray() on its own is a
> "noalloc" function. But due to the caml_enter_blocking_section() another
> thread can alloc and trigger a GC run in parallel. So I guess that makes
> the function actually not "noalloc".

Thanks for reporting about your experience! This made me suspect
noalloc in a bug of mine, and indeed removing a bunch of noalloc did
the trick. Now I'd like to understand.

Unlike in your example, global roots were registered for the bigarrays
in "my" functions. This should avoid that they are freed when the
global lock is released. Still, noalloc seems wrong with those
functions.

I'm not sure yet which function is problematic in my case, but they
all follow the same scheme, see for example
<http://savonet.rastageeks.org/browser/trunk/liquidsoap/src/stream/rgb_c.c#L563>.

So, is it really forbidden to release the global lock in a noalloc function?

Cheers,
--
David

Xavier Leroy

unread,
Mar 30, 2010, 11:57:19 AM3/30/10
to david....@ens-lyon.org, Savonet's developpers list, Caml Mailing List
> So, is it really forbidden to release the global lock in a noalloc function?

Yes. Actually, it is forbidden to call any function of the OCaml
runtime system from a noalloc function.

Explanation: ocamlopt-generated code caches in registers some global
variables of importance to the OCaml runtime system, such as the
current allocation pointer.

When calling a regular (no-"noalloc") C function from OCaml, these
global variables are updated with the cached values so that everything
goes well if the C function allocates, triggers a GC, or releases the
global lock (enabling a context switch).

This updating is skipped when the C function has been declared
"noalloc" -- this is why calls to "noalloc" functions are slightly
faster. The downside is that the runtime system is not in a
functioning state while within a "noalloc" C function, and must
therefore not be invoked.

The cost of updating global variables is small, so "noalloc" makes
sense only for short-running C functions (say, < 100 instructions) like
those from the math library (sin, cos, etc). If the C function makes
significant work (1000 instructions or more), just play it safe and
don't declare it "noalloc".

Hope this helps,

- Xavier Leroy

Markus Mottl

unread,
Mar 30, 2010, 12:20:06 PM3/30/10
to Xavier Leroy, Savonet's developpers list, Caml Mailing List, david....@ens-lyon.org
On Tue, Mar 30, 2010 at 11:57, Xavier Leroy <Xavier...@inria.fr> wrote:
> Yes. �Actually, it is forbidden to call any function of the OCaml
> runtime system from a noalloc function.

It may not always be clear to developers whether a function provided
by the OCaml API is safe. E.g. calling Val_int is fine (at least now
and for the foreseeable future), but caml_copy_string is not. I agree
that people should generally avoid noalloc. The speed difference is
clearly negligible in almost all practical cases.

Note, too, that sometimes people forget that they had declared a
previously safe function as "noalloc", but later change the C-code in
ways that breaks this property. The tiny extra performance may not be
worth that risk.

Regards,
Markus

--
Markus Mottl http://www.ocaml.info markus...@gmail.com

0 new messages