Map is not tail recursive

Juan Jose Garcia Ripoll

unread,

Jan 11, 1999, 3:00:00 AM1/11/99

to Caml list

Hi,

I've had a look at the List package and it seems that it is not properly
tail recursive. Even more, for medium to large lists it exhausts the
stack. I would suggest either recoding it as

let map f a =
let domap f done todo =
match todo with
[] -> List.reverse done
| (x::xs) -> domap (f x)::done xs
in domap f [] a

Another possibility would be to introduce destructive operations such as
Scheme's setcdr! and setcar!. This would eliminate the need of using
List.reverse, at the cost of introducing some imperative style.

Please excuse any mistake -- I'm quite new to this language.

Regards

Juanjo

David McClain

unread,

Jan 12, 1999, 3:00:00 AM1/12/99

to Juan Jose Garcia Ripoll, Caml list

Juan got me thinking about this problem... So here is a solution:

external rplacd : 'a list -> 'a list -> unit = "rplacd"

let list_map fn lst =
(* A properly tail recursive definition of Map *)
match lst with
[] -> [] (* OCAML represents [] specially - can't rplacd *)
| h :: t ->
let rslt = [fn h] in
let rec iter lst tail =
match lst with
[] -> rslt
| h :: t ->
let elt = [fn h] in
rplacd tail elt;
iter t elt
in
iter t rslt

-- and the external C code is

value rplacd(value cell, value item)
{
Store_field(cell,1,item);
return Val_unit;

Xavier Leroy

unread,

Jan 12, 1999, 3:00:00 AM1/12/99

to Juan Jose Garcia Ripoll, Caml list

> I've had a look at the List package and it seems that it is not properly
> tail recursive.

No, it is not. Like many other functions on lists.

> Even more, for medium to large lists it exhausts the
> stack. I would suggest either recoding it as

> [with List.reverse]

> Another possibility would be to introduce destructive operations such as
> Scheme's setcdr! and setcar!. This would eliminate the need of using
> List.reverse, at the cost of introducing some imperative style.

The solution using List.reverse has unacceptable run-time penalty. In
particular, it heap-allocates twice as much than the natural
implementation.

setcdr! for lists is not available in the language, since lists are
immutable. The underlying implementation model supports setcdr!
(at some cost, though, because of the generational garbage
collection), so we could have a "magic" implementation of List.map
that uses setcdr!, but it's always preferable to remain within what
the language can express.

Let me put this another way. There are some library functions for
which no "one size fits all" implementation exist: i.e. the
implementation favors one style of use over others. For List.map or
list concatenation, for instance, one has to choose between running at
full speed on small to medium-sized lists, or being tail-recursive and
handling arbitrarily large lists at a significant cost in speed and
clarity of implementation. I chose the former approach. So, if you
really need to handle large lists, you may have to write your own
(tail-rec and slow) map function. It's not really code duplication,
in that your map function will have quite different running behavior
than the one in the standard library.

I would also contend that if your program routinely manipulate
100000-element lists, then you're probably using the wrong data
structure anyway. But that's a different issue.

Regards,

- Xavier Leroy

Marc Rouaix

unread,

Jan 12, 1999, 3:00:00 AM1/12/99

to caml...@pauillac.inria.fr

I should have added that you can then write a function like this to choose your map function for you.

let general_map fn lst =
let n = List.length lst in
if n < 1000 then List.map fn lst
else jump_map (truncate (sqrt (float n))) fn lst

---
Marc

-----== Sent via Deja News, The Discussion Network ==-----
http://www.dejanews.com/ Easy access to 50,000+ discussion forums

William Chesters

unread,

Jan 12, 1999, 3:00:00 AM1/12/99

to caml...@pauillac.inria.fr

Xavier Leroy writes:
> I would also contend that if your program routinely manipulate
> 100000-element lists, then you're probably using the wrong data
> structure anyway. But that's a different issue.

Good, I was hoping you would say that! Just because things are
traditionally done in a certain way in FP textbooks doesn't mean it
actually makes sense to do them that way in real life.

People hardly ever use linked lists in C++ or Java, and the reasons
why mostly hold good in ocaml.

However, implementing a polymorphic `vector' (resizable array) in
ocaml requires a small amount of fancy footwork because of the lack of
a universal `null' value ... there might be a case for implementing
this as part of the underlying language.

French-like paraphrase:

Les livres sur PF utilises exclusivement la linked list, mai pour la
plupart des applications cette structure n'est pas la plus efficiente.
Alors on ne l'utilise presque point en C++ ou Java. Mais ce n'est pas
100% facile á faire une type `vector' en ocaml, parce ce que ocaml
manque un valeur "null".

Jacques GARRIGUE

unread,

Jan 13, 1999, 3:00:00 AM1/13/99

to caml...@pauillac.inria.fr

From: "David McClain" <dmcc...@azstarnet.com>

> Juan got me thinking about this problem... So here is a solution:
>
> external rplacd : 'a list -> 'a list -> unit = "rplacd"
>

> -- and the external C code is
>
> value rplacd(value cell, value item)
> {
> Store_field(cell,1,item);
> return Val_unit;
> }

While this is undocumented, there is a slightly simpler way to define
rplacd, wihout using C. This should be faster in most cases
(particularly if you inline it).

let rplacd (cell : 'a list) (item : 'a) =
Obj.set_field (Obj.repr cell) 1 (Obj.repr item)

As for using the null pointer to have more efficient representations
in data-structures, this is theoretically possible (and I believe that
Xavier Leroy had an implementation with it), but this is not in the
current version of ocaml.

Jacques

---------------------------------------------------------------------------
Jacques Garrigue Kyoto University garrigue at kurims.kyoto-u.ac.jp
<A HREF=http://wwwfun.kurims.kyoto-u.ac.jp/~garrigue/>JG</A>

Marc Rouaix

unread,

Jan 13, 1999, 3:00:00 AM1/13/99

to caml...@pauillac.inria.fr

It looks like this didn't get sent the first time. I made the suggestion of using a map function that maps chunks of the list at a time. Also, in a separate mail, I suggested using a map function that tested the length of the list and then chose what function to invoke, but I suppose that's a questionable strategy since finding the length of a list requires traversing it.

Anyway...

(* returns nth cons of a list, or [] if there is no such cons *)
let rec nth_tail ls n =
if n = 0 then ls else
match ls with
[] -> []
| x::xs -> nth_tail xs (n-1)

(* jump_map n should compute the same function as List.map for any n>0. But
jump_map n will use min(list_size, n+list_size/n) stack space whereas
List.map uses list_size stack space. jump_map costs an extra list
traversal compared to List.map. *)
let rec jump_map jump fn lst =
let rec do_a_chunk last stub chunk =
if chunk == last then stub else
(fn (List.hd chunk))::(do_a_chunk last stub (List.tl chunk))
in
if lst == [] then [] else
let last = nth_tail lst jump
in do_a_chunk last (jump_map jump fn last) lst

(* It may be a bad idea to use this because of the cost of List.length, but
you get the idea. *)

David McClain

unread,

Jan 14, 1999, 3:00:00 AM1/14/99

to caml...@pauillac.inria.fr, Jacques GARRIGUE

>(particularly if you inline it).

Is there a way to explicitly inline functions, other than cut and paste?

- DM

Juan Jose Garcia Ripoll

unread,

Jan 14, 1999, 3:00:00 AM1/14/99

to Caml list

David McClain wrote:
>
> >(particularly if you inline it).
>
> Is there a way to explicitly inline functions, other than cut and paste?

By watching OCaml work, and studying the compiler, I've seen it inline
many small functions, even when they are defined in other modules. I
pretty much like this, but I find only one minor inconvenience about the
way OCaml inlines. It is this one: say that you have

module Foo = struct
let ap_succ f n = f (n +. 1.0)
let test a =
let my_f x = cos x
in ap_succ my_f a
end

After inlining, the result is not a sort of
let test a = cos (n +. 1.0)
but something like
let my_f x = cos x
let test a = generic_apply my_f (a +. 1.0)
It is even worse if `my_f' uses, for instance, `a', because then a
closure is built, which would not be necessary. I think this could be
solved by making the `inliner' a bit smarter and re-inlining the
functions that are applied. This would make it attractive and efficient
to write small useful functions in separate modules.

Regards
Juanjo

--
Juan Jose Garcia Ripoll www: http://www.arrakis.es/~worm
Dpto. de Matematicas job: jjga...@ind-cr.uclm.es
E.T.S.I. Industriales home: wo...@arrakis.es
Univ. de Castilla-La Mancha, Ciudad Real E-13071 (Spain)