Difficulties with performance in ATS

160 views
Skip to first unread message

Vanessa McHale

unread,
Jul 19, 2018, 10:37:13 PM7/19/18
to ats-lan...@googlegroups.com
Hi all,

I have been working on an implementation of the Levenshtein distance
between two strings. I got a working implementation, but it is
significantly slower than the C version, viz.

#define MIN3(a, b, c) ((a) < (b) ? ((a) < (c) ? (a) : (c)) : ((b) < (c)
? (b) : (c)))

// taken from here:
https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#C
int levenshtein(char *s1, char *s2) {
    unsigned int s1len, s2len, x, y, lastdiag, olddiag;
    s1len = strlen(s1);
    s2len = strlen(s2);
    unsigned int column[s1len+1];
    for (y = 1; y <= s1len; y++)
        column[y] = y;
    for (x = 1; x <= s2len; x++) {
        column[0] = x;
        for (y = 1, lastdiag = x-1; y <= s1len; y++) {
            olddiag = column[y];
            column[y] = MIN3(column[y] + 1, column[y-1] + 1, lastdiag +
(s1[y-1] == s2[x-1] ? 0 : 1));
            lastdiag = olddiag;
        }
    }
    return(column[s1len]);

This is the ATS:

staload "prelude/SATS/string.sats"

fun min_3(x : int, y : int, z : int) : int =
  min(x, (min(y, z)))

fun bool2int(x : char, y : char) : int =
  if x = y then
    0
  else
    1

// Ported over from
https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#C
fn levenshtein {m:nat}{n:nat}(s1 : string(m), s2 : string(n)) : int =
  let
    val s1_l: size_t(m) = length(s1)
    val s2_l: size_t(n) = length(s2)
    val column: arrszref(int) = arrszref_make_elt(s1_l + 1, 0)
   
    fun loop1 { i : nat | i <= m } .<i>. (i : int(i)) : void =
      case+ i of
        | 0 => ()
        | i =>> {
          val () = column[i] := i
          val () = loop1(i - 1)
        }
   
    val () = loop1(sz2i(s1_l))
   
    fun loop2 { i : nat | i > 0 && i <= n+1 } .<n-i+1>. (x : int(i)) :
void =
      if x <= sz2i(s2_l) then
        {
          val () = column[0] := x
          val () = let
            fun inner_loop { j : nat | j > 0 && j <= m+1 } .<m-j+1>. (y
: int(j), last_diag : int) :
              void =
              if y <= sz2i(s1_l) then
                let
                  var old_diag = column[y]
                  val () = column[y] := min_3( column[y] + 1
                                             , column[y - 1] + 1
                                             , last_diag + bool2int(s1[y
- 1], s2[x - 1])
                                             )
                in
                  inner_loop(y + 1, old_diag)
                end
          in
            inner_loop(1, x - 1)
          end
          val () = loop2(x + 1)
        }
   
    val () = loop2(1)
  in
    column[s1_l]
  end

I've benchmarked them both, and the C version runs about 3 times faster
(!). Would anyone happen to have any insight into these? I may be doing
the benchmarking wrong but I believe it is something related to how I am
using arrays in ATS.

The full source code is at https://github.com/vmchale/edit-distance

Thanks,
Vanessa

signature.asc

Julian Fondren

unread,
Jul 19, 2018, 11:15:06 PM7/19/18
to ats-lang-users
On Thursday, July 19, 2018 at 9:37:13 PM UTC-5, Vanessa McHale wrote:

I've benchmarked them both, and the C version runs about 3 times faster
(!). Would anyone happen to have any insight into these? I may be doing
the benchmarking wrong but I believe it is something related to how I am
using arrays in ATS.


Consider:

#include "share/atspre_staload.hats"

implement main0() =
        let
                val x = (arrszref)$arrpsz{int}(1, 2, 3, 4)
        in
                println!(x[4]);
        end

As used:

$ patscc -DATS_MEMALLOC_LIBC -o arrsz arrszref.dats -latslib
$ ./arrsz
exit(ATS): uncaught exception at run-time:
/usr/local/lib/ats2-postiats-0.3.11/prelude/SATS/array.sats:ArraySubscriptExn(60)

Part of it may be that you're using a runtime-checked array type. 

Artyom Shalkhakov

unread,
Jul 20, 2018, 12:26:31 AM7/20/18
to ats-lang-users
Hi Vanessa,

I guess we could improve as follows:

1. just like in C, initialize the array only once (use array_ptr_alloc & array_initize)
2. replace the use of tail-rec functions with while loops :) (in classic C-style ATS)
3. maybe use alloca instead of malloc (since the C example does just that)

gmhwxi

unread,
Jul 20, 2018, 12:52:33 AM7/20/18
to ats-lang-users

Please use the following line to replace the corresponding one
in your code:

   val column: arrayref(int, m+1) = arrayref_make_elt(s1_l + 1, 0)

See how much difference it makes.

This is an example where you may actually have a good chance of
writing type-safe ATS code to beat the C code. It all depends on how well
the C compiler can perform the so-called strength-reduction optimization.

Vanessa McHale

unread,
Jul 20, 2018, 1:57:53 AM7/20/18
to ats-lan...@googlegroups.com

Thanks! That seems to do it. The ATS is now taking around 120ns to the C code's 100ns. I will have to take a look at for/while loops next :)

--
You received this message because you are subscribed to the Google Groups "ats-lang-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ats-lang-user...@googlegroups.com.
To post to this group, send email to ats-lan...@googlegroups.com.
Visit this group at https://groups.google.com/group/ats-lang-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ats-lang-users/05e8f028-7b43-4ae8-9cf2-adf2bd01b1d0%40googlegroups.com.
signature.asc

Vanessa McHale

unread,
Jul 20, 2018, 3:22:47 AM7/20/18
to ats-lan...@googlegroups.com

I'll have to take a look at for/while loops :). With Hongwei's suggestion it's nearly as fast as the C code (and it beats two common Rust libraries), but I wouldn't mind squeezing the last little bit out just for bragging rights (slash ATS advertising).

--
You received this message because you are subscribed to the Google Groups "ats-lang-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ats-lang-user...@googlegroups.com.
To post to this group, send email to ats-lan...@googlegroups.com.
Visit this group at https://groups.google.com/group/ats-lang-users.
signature.asc

Artyom Shalkhakov

unread,
Jul 20, 2018, 4:03:41 AM7/20/18
to ats-lang-users
On Friday, July 20, 2018 at 1:22:47 PM UTC+6, Vanessa McHale wrote:

I'll have to take a look at for/while loops :). With Hongwei's suggestion it's nearly as fast as the C code (and it beats two common Rust libraries), but I wouldn't mind squeezing the last little bit out just for bragging rights (slash ATS advertising).



Let's do it! Seriously, that's such a good opportunity. :-) I'd like to help you, if you don't mind. I could take e.g. the array allocation/deallocation and initialization parts.

What do you think?

Artyom Shalkhakov

unread,
Jul 20, 2018, 4:33:17 AM7/20/18
to ats-lang-users
On Friday, July 20, 2018 at 1:22:47 PM UTC+6, Vanessa McHale wrote:

I'll have to take a look at for/while loops :). With Hongwei's suggestion it's nearly as fast as the C code (and it beats two common Rust libraries), but I wouldn't mind squeezing the last little bit out just for bragging rights (slash ATS advertising).



Is this any faster?


The array is now initialized only once.

gmhwxi

unread,
Jul 20, 2018, 9:39:58 AM7/20/18
to ats-lang-users

This is the strength-reduction I talked about.
If it is significantly faster, then we can rewrite the
code to make it type-safe:

    fun loop2 { i : nat | i > 0 && i <= n+1 } .<n-i+1>. (x : int(i)) : void =
      if x <= sz2i(s2_l) then
        {
          val () = column[0] := x
          val p0 = arrayref2ptr(column)
          val p1 = ptr_succ<int>(p0)

          val () = let
            fun
            inner_loop
            { j : nat | j > 0 && j <= m+1 } .<m-j+1>.
            (y : int(j), p0: ptr, p1: ptr, last_diag : int) : void =

              if y <= sz2i(s1_l) then
        let
                  fun min_3(x : int, y : int, z : int) : int = min(x, (min(y, z)))

                  val c0 = $UN.ptr0_get<int>(p0)
                  val c1 = $UN.ptr0_get<int>(p1)
                  val c1_new = min_3(c0+1, c1+1, last_diag + bool2int(s1[y - 1]=s2[x - 1]))
                  val () = $UN.ptr0_set<int>(p1, c1_new)
                in
                  inner_loop(y + 1, p1, ptr_succ<int>(p1), c1)
        end
       
            inner_loop(1, p0, p1, x - 1)
To unsubscribe from this group and stop receiving emails from it, send an email to ats-lang-users+unsubscribe@googlegroups.com.
To post to this group, send email to ats-lang-users@googlegroups.com.

Vanessa McHale

unread,
Jul 20, 2018, 8:06:18 PM7/20/18
to ats-lan...@googlegroups.com

Yes indeed! This seems to be as fast as the C code. I can't find any substantial difference, at least. I'll push these changes and take a look at Hongwei's comments next :)

Thanks!

signature.asc

Vanessa McHale

unread,
Jul 20, 2018, 8:18:53 PM7/20/18
to ats-lan...@googlegroups.com
It doesn't seem to affect things in a way that I can measure, though
thankfully with Artyom's suggestions the code is now as fast as C (or at
least, I can't reliably measure any difference in the two) :)
signature.asc

Artyom Shalkhakov

unread,
Jul 20, 2018, 10:18:54 PM7/20/18
to ats-lang-users
Great to hear that you got on par with C performance-wise!

One other thing I'd like to draw attention to is allocation of [column]: in C code, it is via [alloca], so it gets freed as soon as the control exits the function, but in ATS code, [malloc] (or some such) is used, and then, a cast from [arrayptr] to [arrayref] means we are leaving it to GC to deallocate [column].

I suggest introducing a function that takes uninitialized [column] as parameter, and performs the rest of the computation. Then [levenshtein] is tasked with allocation and freeing the array. Plus, somebody could plug another allocation strategy this way.

There was a thread about using [alloca] in ATS somewhere.

сб, 21 июл. 2018 г., 6:18 Vanessa McHale <vanessa...@iohk.io>:
--
You received this message because you are subscribed to the Google Groups "ats-lang-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ats-lang-user...@googlegroups.com.
To post to this group, send email to ats-lan...@googlegroups.com.

gmhwxi

unread,
Jul 20, 2018, 11:43:37 PM7/20/18
to ats-lang-users
Could you test the code below:


This time I use some templates to do array initialization.

Vanessa McHale

unread,
Jul 21, 2018, 10:44:15 PM7/21/18
to ats-lan...@googlegroups.com

It does! Went from around 100ns to 67ns :) The code I'm using is below; I believe it is safe but I may have neglected something that others will point out. It's now faster than the C implementation!

extern
fun alloca {n:int}(bsz : size_t(n)) :<!wrt> [l:agz] (b0ytes(n) @ l | ptr(l)) =
  "mac#"

extern
castfn arrayptr_alloca_encode : {a:vt0p} {l:addr} {n:int} (array_v(INV(a), l, n) | ptr(l)) -<0> arrayptr(a, l, n)

extern
fun {a:vt0p} array_ptr_alloca {n:int} (asz : size_t(n)) :<!wrt> [l:agz] (array_v(a?, l, n) | ptr(l))

implement {a} array_ptr_alloca {n} (asz) =
  let
    val [l:addr](pf | p) = alloca(asz * sizeof<a>)
    prval pf = __assert(pf) where
    { extern
      praxi __assert(pf : b0ytes(n*sizeof(a)) @ l) : array_v(a?, l, n) }
  in
    (pf | p)
  end

In context: https://github.com/vmchale/edit-distance/blob/master/DATS/edit-distance.dats

signature.asc

gmhwxi

unread,
Jul 22, 2018, 1:06:31 AM7/22/18
to ats-lang-users

I think that a big part of this result is due to the input being small.
When tested on large input (say, strings containing 1000 chars), you
may see a different result.

By the way, my own test shows that the following code beats the C code
you presented by about 10% (when gcc is used):


I suspect that the gain is mostly due to the use of strength-reduction in
ATS code.
To unsubscribe from this group and stop receiving emails from it, send an email to ats-lang-users+unsubscribe@googlegroups.com.
To post to this group, send email to ats-lang-users@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "ats-lang-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ats-lang-users+unsubscribe@googlegroups.com.
To post to this group, send email to ats-lang-users@googlegroups.com.

Vanessa McHale

unread,
Jul 23, 2018, 1:28:27 PM7/23/18
to ats-lan...@googlegroups.com

I don't see any significant difference with the C code. This is much cleaner than Artyom's solution but it is still slower than using `alloca` (though perhaps more moral...)

--
You received this message because you are subscribed to the Google Groups "ats-lang-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ats-lang-user...@googlegroups.com.
To post to this group, send email to ats-lan...@googlegroups.com.
signature.asc

gmhwxi

unread,
Jul 24, 2018, 12:16:33 AM7/24/18
to ats-lang-users
I got the following numbers:

ATS:

real    0m0.898s
user    0m0.896s
sys     0m0.000s

C:

real    0m1.028s
user    0m1.024s
sys     0m0.000s

The input string I used consists of approx. 20K chars.

Using alloca is very risky. Debugging a run-time crash caused by alloca failure
could make pretty much anyone think twice about using alloca. In this case, you
can implement levenshtein as a template: using levenshtein$malloc to alloc memory.
The caller of levenshtein can then decide what malloc is to be used.
To unsubscribe from this group and stop receiving emails from it, send an email to ats-lang-users+unsubscribe@googlegroups.com.
To post to this group, send email to ats-lang-users@googlegroups.com.

Vanessa McHale

unread,
Jul 24, 2018, 10:54:46 AM7/24/18
to ats-lan...@googlegroups.com

Yes that seems sensible to me! Currently I use a macro to choose either `alloca` or `malloc`. I will check with larger input string; you may be correct about the strength reduction improving speed :)

To unsubscribe from this group and stop receiving emails from it, send an email to ats-lang-user...@googlegroups.com.
To post to this group, send email to ats-lan...@googlegroups.com.
signature.asc

Vanessa McHale

unread,
Jul 24, 2018, 11:45:28 AM7/24/18
to ats-lan...@googlegroups.com

Upon checking with a larger string, you are correct! The simpler ATS version is indeed faster.


On 07/23/2018 11:16 PM, gmhwxi wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to ats-lang-user...@googlegroups.com.
To post to this group, send email to ats-lan...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ats-lang-users/f2ae030b-bee9-47f0-b93d-03f41a38b4fc%40googlegroups.com.

--



Vanessa McHale
Functional Compiler Engineer | Chicago, IL

Website: www.iohk.io
Twitter: @vamchale
PGP Key ID: 4209B7B5

Input
          Output

Twitter Github LinkedIn


This e-mail and any file transmitted with it are confidential and intended solely for the use of the recipient(s) to whom it is addressed. Dissemination, distribution, and/or copying of the transmission by anyone other than the intended recipient(s) is prohibited. If you have received this transmission in error please notify IOHK immediately and delete it from your system. E-mail transmissions cannot be guaranteed to be secure or error free. We do not accept liability for any loss, damage, or error arising from this transmission
signature.asc
Reply all
Reply to author
Forward
0 new messages