I have been working on an implementation of the Levenshtein distance
between two strings. I got a working implementation, but it is
significantly slower than the C version, viz.
#define MIN3(a, b, c) ((a) < (b) ? ((a) < (c) ? (a) : (c)) : ((b) < (c)
? (b) : (c)))
// taken from here:
https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#C
int levenshtein(char *s1, char *s2) {
unsigned int s1len, s2len, x, y, lastdiag, olddiag;
s1len = strlen(s1);
s2len = strlen(s2);
unsigned int column[s1len+1];
for (y = 1; y <= s1len; y++)
column[y] = y;
for (x = 1; x <= s2len; x++) {
column[0] = x;
for (y = 1, lastdiag = x-1; y <= s1len; y++) {
olddiag = column[y];
column[y] = MIN3(column[y] + 1, column[y-1] + 1, lastdiag +
(s1[y-1] == s2[x-1] ? 0 : 1));
lastdiag = olddiag;
}
}
return(column[s1len]);
This is the ATS:
staload "prelude/SATS/string.sats"
fun min_3(x : int, y : int, z : int) : int =
min(x, (min(y, z)))
fun bool2int(x : char, y : char) : int =
if x = y then
0
else
1
// Ported over from
https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#C
fn levenshtein {m:nat}{n:nat}(s1 : string(m), s2 : string(n)) : int =
let
val s1_l: size_t(m) = length(s1)
val s2_l: size_t(n) = length(s2)
val column: arrszref(int) = arrszref_make_elt(s1_l + 1, 0)
fun loop1 { i : nat | i <= m } .<i>. (i : int(i)) : void =
case+ i of
| 0 => ()
| i =>> {
val () = column[i] := i
val () = loop1(i - 1)
}
val () = loop1(sz2i(s1_l))
fun loop2 { i : nat | i > 0 && i <= n+1 } .<n-i+1>. (x : int(i)) :
void =
if x <= sz2i(s2_l) then
{
val () = column[0] := x
val () = let
fun inner_loop { j : nat | j > 0 && j <= m+1 } .<m-j+1>. (y
: int(j), last_diag : int) :
void =
if y <= sz2i(s1_l) then
let
var old_diag = column[y]
val () = column[y] := min_3( column[y] + 1
, column[y - 1] + 1
, last_diag + bool2int(s1[y
- 1], s2[x - 1])
)
in
inner_loop(y + 1, old_diag)
end
in
inner_loop(1, x - 1)
end
val () = loop2(x + 1)
}
val () = loop2(1)
in
column[s1_l]
end
I've benchmarked them both, and the C version runs about 3 times faster
(!). Would anyone happen to have any insight into these? I may be doing
the benchmarking wrong but I believe it is something related to how I am
using arrays in ATS.
The full source code is at https://github.com/vmchale/edit-distance
Thanks,
Vanessa
I've benchmarked them both, and the C version runs about 3 times faster
(!). Would anyone happen to have any insight into these? I may be doing
the benchmarking wrong but I believe it is something related to how I am
using arrays in ATS.
#include "share/atspre_staload.hats"
implement main0() =
let
val x = (arrszref)$arrpsz{int}(1, 2, 3, 4)
in
println!(x[4]);
end
$ patscc -DATS_MEMALLOC_LIBC -o arrsz arrszref.dats -latslib
$ ./arrsz
exit(ATS): uncaught exception at run-time:
/usr/local/lib/ats2-postiats-0.3.11/prelude/SATS/array.sats:ArraySubscriptExn(60)
Thanks! That seems to do it. The ATS is now taking around 120ns
to the C code's 100ns. I will have to take a look at for/while
loops next :)
--
You received this message because you are subscribed to the Google Groups "ats-lang-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ats-lang-user...@googlegroups.com.
To post to this group, send email to ats-lan...@googlegroups.com.
Visit this group at https://groups.google.com/group/ats-lang-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ats-lang-users/05e8f028-7b43-4ae8-9cf2-adf2bd01b1d0%40googlegroups.com.
I'll have to take a look at for/while loops :). With Hongwei's
suggestion it's nearly as fast as the C code (and it beats two
common Rust libraries), but I wouldn't mind squeezing the last
little bit out just for bragging rights (slash ATS advertising).
--
You received this message because you are subscribed to the Google Groups "ats-lang-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ats-lang-user...@googlegroups.com.
To post to this group, send email to ats-lan...@googlegroups.com.
Visit this group at https://groups.google.com/group/ats-lang-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ats-lang-users/946b3ea9-eb24-4cf5-bf24-9aab7a8d9383%40googlegroups.com.
I'll have to take a look at for/while loops :). With Hongwei's suggestion it's nearly as fast as the C code (and it beats two common Rust libraries), but I wouldn't mind squeezing the last little bit out just for bragging rights (slash ATS advertising).
I'll have to take a look at for/while loops :). With Hongwei's suggestion it's nearly as fast as the C code (and it beats two common Rust libraries), but I wouldn't mind squeezing the last little bit out just for bragging rights (slash ATS advertising).
To unsubscribe from this group and stop receiving emails from it, send an email to ats-lang-users+unsubscribe@googlegroups.com.
To post to this group, send email to ats-lang-users@googlegroups.com.
Yes indeed! This seems to be as fast as the C code. I can't find
any substantial difference, at least. I'll push these changes and
take a look at Hongwei's comments next :)
Thanks!
To view this discussion on the web visit https://groups.google.com/d/msgid/ats-lang-users/fccac5d3-9276-4410-bf95-4516ba0859f7%40googlegroups.com
--
You received this message because you are subscribed to the Google Groups "ats-lang-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ats-lang-user...@googlegroups.com.
To post to this group, send email to ats-lan...@googlegroups.com.
Visit this group at https://groups.google.com/group/ats-lang-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ats-lang-users/108e32af-a016-2735-1307-9984df9a334f%40iohk.io.
It does! Went from around 100ns to 67ns :) The code I'm using is
below; I believe it is safe but I may have neglected something
that others will point out. It's now faster than the C
implementation!
extern
fun alloca {n:int}(bsz : size_t(n)) :<!wrt> [l:agz]
(b0ytes(n) @ l | ptr(l)) =
"mac#"
extern
castfn arrayptr_alloca_encode : {a:vt0p} {l:addr} {n:int}
(array_v(INV(a), l, n) | ptr(l)) -<0> arrayptr(a, l, n)
extern
fun {a:vt0p} array_ptr_alloca {n:int} (asz : size_t(n))
:<!wrt> [l:agz] (array_v(a?, l, n) | ptr(l))
implement {a} array_ptr_alloca {n} (asz) =
let
val [l:addr](pf | p) = alloca(asz * sizeof<a>)
prval pf = __assert(pf) where
{ extern
praxi __assert(pf : b0ytes(n*sizeof(a)) @ l) :
array_v(a?, l, n) }
in
(pf | p)
end
In context:
https://github.com/vmchale/edit-distance/blob/master/DATS/edit-distance.dats
To view this discussion on the web visit https://groups.google.com/d/msgid/ats-lang-users/CAKO6%3Dqga0%2Bvg5wAtbhVOC4zEBJZSTph6M%3DWAKJUsPiroZVEdZQ%40mail.gmail.com.
To unsubscribe from this group and stop receiving emails from it, send an email to ats-lang-users+unsubscribe@googlegroups.com.
To post to this group, send email to ats-lang-users@googlegroups.com.
Visit this group at https://groups.google.com/group/ats-lang-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ats-lang-users/108e32af-a016-2735-1307-9984df9a334f%40iohk.io.
--
You received this message because you are subscribed to the Google Groups "ats-lang-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ats-lang-users+unsubscribe@googlegroups.com.
To post to this group, send email to ats-lang-users@googlegroups.com.
I don't see any significant difference with the C code. This is
much cleaner than Artyom's solution but it is still slower than
using `alloca` (though perhaps more moral...)
--
You received this message because you are subscribed to the Google Groups "ats-lang-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ats-lang-user...@googlegroups.com.
To post to this group, send email to ats-lan...@googlegroups.com.
Visit this group at https://groups.google.com/group/ats-lang-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ats-lang-users/d46a35b0-6d9e-4edb-8315-aef1ece9bfc9%40googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to ats-lang-users+unsubscribe@googlegroups.com.
To post to this group, send email to ats-lang-users@googlegroups.com.
Yes that seems sensible to me! Currently I use a macro to choose
either `alloca` or `malloc`. I will check with larger input
string; you may be correct about the strength reduction improving
speed :)
To unsubscribe from this group and stop receiving emails from it, send an email to ats-lang-user...@googlegroups.com.
To post to this group, send email to ats-lan...@googlegroups.com.
Visit this group at https://groups.google.com/group/ats-lang-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ats-lang-users/f2ae030b-bee9-47f0-b93d-03f41a38b4fc%40googlegroups.com.
Upon checking with a larger string, you are correct! The simpler
ATS version is indeed faster.
To unsubscribe from this group and stop receiving emails from it, send an email to ats-lang-user...@googlegroups.com.
To post to this group, send email to ats-lan...@googlegroups.com.
Visit this group at https://groups.google.com/group/ats-lang-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ats-lang-users/f2ae030b-bee9-47f0-b93d-03f41a38b4fc%40googlegroups.com.