I’m having to rething slices as bit.
What’s a slice? Well, lets try this definition:
“A slice is any compact subset of a totally ordered set.”
For the integers including +/- infinity with the usual ordering there are exactly five
classes of slice:
* the empty set
* all the integers
* all the integers above and including N
* all the integers below and including X
* a subrange N to X inclusive
Note I had to throw in +/- infinity as upper and lower bounds.
Set intersection is a binary operation on the set of all slices.
(the intersection of two slices is a slice).
In computing integer representations like “int” or “long” etc can be
considered as subranges of the integers, so each type can be
considered as a slice. In this case, min and max value the type
can represent are the ends of the slice.
The current representation is:
union slice[T] =
| Slice_all
| Slice_from of T
| Slice_from_counted of T * T /* second arg is count */
| Slice_to_incl of T
| Slice_to_excl of T
| Slice_range_incl of T * T
| Slice_range_excl of T * T
| Slice_one of T
| Slice_none
;
Now the problem I’m trying to solve is this:
At present you can do this:
astring.[slice]
and it calculates a substring:
fun subscript (x:string, s:slice[int]):string =>
match s with
| #Slice_all => substring (x, 0,
x.len.int)
| Slice_from (start) => copyfrom (x, start)
| Slice_to_incl (end) => copyto (x, end + 1)
| Slice_to_excl (end) => copyto (x, end)
| Slice_range_incl (start, end) => substring (x, start, end + 1)
| Slice_range_excl (start, end) => substring (x, start, end)
| Slice_from_counted (start, count) => substring (x,start, start + count)
| Slice_one (index) => string x.[index]
endmatch
;
However it cheats!
fun substring: string * !ints * !ints -> string =
"::flx::rtl::strutil::substr($1:assign,$2:assign,$3:assign)" requires package "flx_strutil";
// normalise string positions Python style
// note substr requires 0<=b<=size, 0<=n,
// however n>size is OK
template<class T>
basic_string<T> substr(basic_string<T> const &s, int b, int e)
{
int n = s.size();
if(b<0) b=b+n;
if(b<0) b=0;
if(b>=n) b=n;
if(e<0) e=e+n;
if(e<0) e=0;
if(e>=n) e=n;
int m = e-b;
if(m<0) m=0;
return s.substr(b,m);
}
The idea is from Python, that -1 is the last char in the string,
-2 is the second last, etc. The reason is to get the tail of a string,
you have to refer to the end of the string and work backwards.
The algorithm above allows negative indices, it translates them
to positive ones by adding on the length of the string,
and then clips the resulting “slice” by intersecting it with the
slice 0..n-1 where n is the string length. At least I hope it does :)
At the moment you cannot slice arrays! So what I was looking
at was slicing varray to a varray, darray, bsarray, and sarray.
A slice of an farray (T^N kind of array) can’t be an farray because
the arguments of the slice are not known at compile time
so if done it would have to be at least a darray.
I was also thinking about slicing lists.
Also on top of that compehensions with slices, that is,
building a varray, etc or list from a slice (that is,
a list comprehension based on a slice would be a list
of integers in the range of the slice). For example:
([ 1..3 ]) == ([1,2,3])
The “negative indices” thing worries me. The math doesn’t feel right.
There’s another problem. Given a slice over int, suppose we want
to “convert” that to say long, or size. The union representation used
at the moment handles that fine provided the conversions go through.
More generally, the idea would be to define the result as the
intersection of the theoretical slice with the slice of the subrange
of integers supported by the target type. So the math is well founded,
even if the target is an unsigned type.
Now consider a slice of a varray. Obviously the definition is
to elements in the positions which are the intersection of the
range of the array 0..n-1 where n is the length of the array,
and the slice, converted to type size.
The conversion is tricky! If the slice contains say integer value -42,
we cannot just convert it to unsigned because that would be
#maxval[size] - 42uz
which is a rather large number. Actually we have to clip the -42
to zero.
The problem is that #minval[T] and #maval[T] are cleverly
well defined because they’re of type T. So promotions and demotions
are kind of tricky!
The core problem of negative indices is that given a string *expression*
we want to grab the tail of, there is no “logical” representation for
the end of the string. You would have to write this with only non-negative
indices:
let s = expr in
let n = len s in
let first = if first < 0 then n + first in
let last = if last < 0 then n + last in
s.[first..last]
The problem is that we don’t know the length of the string and its hard
to know it because its an expression. That’s why the substring function
“cheats” by hacking the slice, because the function has an actual string
now to get a length of. The problem is that
-3 .. -1
is a valid slice: -3,-2,-1 and its intersection with the range of the string
is always empty. The “cheat” depends on a second way of indexing
strings from the end, using negative numbers, but the WRONG
results are calculated for
s.[3 .. -1]
The calculation I’m using yields elements 3,4,5…n-1, but the
slice 3..-1 is actually empty!
What I need is a way to get stuff from the end of a string or varray
or whatever which isn’t hacked!
C++ uses a special value for the end which is the maxval of type size_t:
static const size_t npos = -1;
But this is a hack too.
—
John Skaller
ska...@internode.on.net