During the Euler stuff I came upon this following sub-problem. It is elementary enough to be in a library somewhere. An area giving as (addr_start, addr_end) contains items of length ``len''. I want to get of the duplicates.
Sorting the items is pretty easy, qsort can handle that. Now eliminating the duplicates, we need only compare one item to the next. Something along compact (start, end, len -- end' ) The new end should be returned, of course.
Even using variables/locals, I find this difficult. (More difficult then e.g. making a table for the number of dividers of N, which sounds much less trivial.)
-- -- Albert van der Horst, UTRECHT,THE NETHERLANDS Economic growth -- like all pyramid schemes -- ultimately falters. albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
On May 24, 3:05 pm, Albert van der Horst <alb...@spenarnc.xs4all.nl> wrote:
> Even using variables/locals, I find this difficult. > (More difficult then e.g. making a table for the number of > dividers of N, which sounds much less trivial.)
It's not difficult if you have a library of sequence operations:
Slava Pestov wrote: > On May 24, 3:05 pm, Albert van der Horst <alb...@spenarnc.xs4all.nl> > wrote: >> Even using variables/locals, I find this difficult. >> (More difficult then e.g. making a table for the number of >> dividers of N, which sounds much less trivial.)
> It's not difficult if you have a library of sequence operations:
Nice filter, but doing that in-place is not difficult, either.
I give you a solution for characters in a string, and Albert can generalize it. It is supposed that the string is sorted, so it will only remove sequenes of the same character:
: uniquify ( addr u -- addr u' ) over >r bounds dup dup c@ 2swap 1+ ?DO I c@ tuck <> IF swap 1+ 2dup c! swap THEN LOOP drop r> swap over - ;
On May 24, 6:23 pm, Slava Pestov <sl...@jedit.org> wrote:
> On May 24, 4:36 pm, Bernd Paysan <bernd.pay...@gmx.de> wrote:
> > : uniquify ( addr u -- addr u' ) > > over >r bounds dup dup c@ 2swap 1+ ?DO > > I c@ tuck <> IF swap 1+ 2dup c! swap THEN LOOP > > drop r> swap over - ; > This is why stack-based languages have a reputation of unreadability.
``uniquify'' ... I dunno, that word looks pretty readable to me.
Its *definition* is underfactored, of course, and underfactored definitions are always hard to read, but
uniquify ( addr u -- addr u' ) trim duplicate characters from a sorted string, in place
> During the Euler stuff I came upon this following sub-problem. > It is elementary enough to be in a library somewhere. > An area giving as (addr_start, addr_end) contains items > of length ``len''. I want to get of the duplicates.
> Sorting the items is pretty easy, qsort can handle that. > Now eliminating the duplicates, we need only compare one > item to the next. > Something along compact (start, end, len -- end' ) > The new end should be returned, of course.
> Even using variables/locals, I find this difficult. > (More difficult then e.g. making a table for the number of > dividers of N, which sounds much less trivial.)
> -- > -- > Albert van der Horst, UTRECHT,THE NETHERLANDS > Economic growth -- like all pyramid schemes -- ultimately falters. > albert@spe&ar&c.xs4all.nl &=nhttp://home.hccnet.nl/a.w.m.van.der.horst
I didnt test this but it looks right (dont take my word for it though :)
assumes string is 0 terminated though, which is most un-forthlike :P
: remove-dups ( a1 --- ) dup begin begin 1+ 2dup c@ swap c@ <> until swap 1+ swap 2dup c@ dup rot c! 0= until ;
Bruce McFarling <agil...@netscape.net> wrote: >> This is why stack-based languages have a reputation of unreadability. > ... > Its *definition* is underfactored, of course, and underfactored > definitions are always hard to read, ...
So, could you try your hand at a properly factored definition. This is a good example of the kind of thing I never code cleanly in Forth.
Slava Pestov wrote: > On May 24, 4:36 pm, Bernd Paysan <bernd.pay...@gmx.de> wrote: >> : uniquify ( addr u -- addr u' ) >> over >r bounds dup dup c@ 2swap 1+ ?DO >> I c@ tuck <> IF swap 1+ 2dup c! swap THEN LOOP >> drop r> swap over - ;
> This is why stack-based languages have a reputation of unreadability.
I both agree and disagree with you in this case. It is not the kind of code I would normally write, especially if in a hurry to get the job done. Of course I could spend the time to pick it apart and see exactly what is going. But why bother? The word works. Some people, like Bernd, can probably glance at code like this and easily/quickly see exactly what makes it tick. I am not one of those people, even though I like and use Forth.
I will hang on to 'uniquify' because I can see where it could be useful to me in the future. (Thanks Bernd!).
> Slava Pestov wrote: > > On May 24, 4:36 pm, Bernd Paysan <bernd.pay...@gmx.de> wrote: > >> : uniquify ( addr u -- addr u' ) > >> over >r bounds dup dup c@ 2swap 1+ ?DO > >> I c@ tuck <> IF swap 1+ 2dup c! swap THEN LOOP > >> drop r> swap over - ;
> > This is why stack-based languages have a reputation of unreadability.
> I both agree and disagree with you in this case. It is not the kind of > code I would normally write, especially if in a hurry to get the job > done. Of course I could spend the time to pick it apart and see exactly > what is going (on). But why bother? The word works. Some people, like > Bernd, can probably glance at code like this and easily/quickly see > exactly what makes it tick. I am not one of those people, even though I > like and use Forth.
Having said that, I thought I should show the way I would normally attack this kind of problem (context is resource-rich pc emvironment). The Mops Neon-based object library has some powerful string handling methods. So I would start with that and come up with something like the following. It may not be as readable to you as your Factor solution. But then for *me* this is far more readable than the Factor code because I know nothing about Factor.
(Note that Mops string objects maintain 2 pointers into the string, position and limit, or pos and lim. Manipulation of these pointers is the key to using string objects.)
> Sorting the items is pretty easy, qsort can handle that. > Now eliminating the duplicates, we need only compare one > item to the next.
What if the original ordering must be preserved? By sorting, you've destroyed that ordering. You would now have to annotate each item so that you could later sort again. So the computational cost of this now requires an annotation pass and two sorts... if the original ordering must be preserved.
A less general way to deal with this would be to construct a parallel data structure that recorded the values. In the case where each item was relatively small (say, a 16-bit integer or a 8-bit character), then a bitmap is an obvious data structure. As you move through the list of items, check first if the item is set in the bitmap. If it is, then it's a duplicate. If it isn't, then mark the bitmap. No sorting is necessary and duplicate detection occurs in a single pass.
As the size of the items being checked increases, storing it as a bitmap gets more costly. At that point, you can then using hashing (ideally picking a hash algorithm that works well for the data set) or you can use a bitmap representation that exploits any sparseness inherent in the data. In the case of hashing, storing hash collisions in a list would be one strategy. In the case of a sparse bitmap, a radix trie might work well.
Removing the actual duplicates is a separate matter. I would use two pointers-- one for reading the current item, the other for where to write it. As long as both pointers are the same, there have been no duplicates and thus no reason to do a write. When the pointers are no longer the same, then a write is needed. This could be slightly more intelligent by delaying the write until the next duplicate, allowing a block move instead of an item-by-item move.
>> On May 24, 3:05 pm, Albert van der Horst <alb...@spenarnc.xs4all.nl> >> wrote: >>> Even using variables/locals, I find this difficult. >>> (More difficult then e.g. making a table for the number of >>> dividers of N, which sounds much less trivial.)
>> It's not difficult if you have a library of sequence operations:
>Nice filter, but doing that in-place is not difficult, either.
>I give you a solution for characters in a string, and Albert can generalize >it. It is supposed that the string is sorted, so it will only remove >sequenes of the same character:
>: uniquify ( addr u -- addr u' ) > over >r bounds dup dup c@ 2swap 1+ ?DO > I c@ tuck <> IF swap 1+ 2dup c! swap THEN LOOP > drop r> swap over - ;
The main problem is that the content cannot be placed on the stack they cannot be compared with <> etc. The above solution foregoes the technical problem. The algorithmic solution you gave is of course easy enough to come up with. Although it looks ugly enough even for this simple case.
-- -- Albert van der Horst, UTRECHT,THE NETHERLANDS Economic growth -- like all pyramid schemes -- ultimately falters. albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
>Slava Pestov wrote: >> On May 24, 4:36 pm, Bernd Paysan <bernd.pay...@gmx.de> wrote: >>> : uniquify ( addr u -- addr u' ) >>> over >r bounds dup dup c@ 2swap 1+ ?DO >>> I c@ tuck <> IF swap 1+ 2dup c! swap THEN LOOP >>> drop r> swap over - ;
>> This is why stack-based languages have a reputation of unreadability.
>I both agree and disagree with you in this case. It is not the kind of >code I would normally write, especially if in a hurry to get the job >done. Of course I could spend the time to pick it apart and see exactly >what is going. But why bother? The word works. Some people, like >Bernd, can probably glance at code like this and easily/quickly see >exactly what makes it tick. I am not one of those people, even though I >like and use Forth.
No it doesn't work for me. It is hard to understand that using it for what I want is very hard. (Compare this to C : pc++ , now pc point to a struct instead of a char, no sweat).
>I will hang on to 'uniquify' because I can see where it could be useful >to me in the future. (Thanks Bernd!).
>-Doug
-- -- Albert van der Horst, UTRECHT,THE NETHERLANDS Economic growth -- like all pyramid schemes -- ultimately falters. albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
mark4 <i4...@mailcity.com> wrote: >On May 24, 1:05 pm, Albert van der Horst <alb...@spenarnc.xs4all.nl> >wrote: >> During the Euler stuff I came upon this following sub-problem. >> It is elementary enough to be in a library somewhere. >> An area giving as (addr_start, addr_end) contains items >> of length ``len''. I want to get of the duplicates.
>> Sorting the items is pretty easy, qsort can handle that. >> Now eliminating the duplicates, we need only compare one >> item to the next. >> Something along compact (start, end, len -- end' ) >> The new end should be returned, of course.
>> Even using variables/locals, I find this difficult. >> (More difficult then e.g. making a table for the number of >> dividers of N, which sounds much less trivial.)
>> -- >> -- >> Albert van der Horst, UTRECHT,THE NETHERLANDS >> Economic growth -- like all pyramid schemes -- ultimately falters. >> albert@spe&ar&c.xs4all.nl &=nhttp://home.hccnet.nl/a.w.m.van.der.horst
>I didnt test this but it looks right (dont take my word for it >though :)
>assumes string is 0 terminated though, which is most un-forthlike :P
>: remove-dups ( a1 --- ) > dup > begin > begin > 1+ > 2dup c@ swap c@ <> > until > swap 1+ swap > 2dup c@ dup rot c! > 0= > until ;
To be exact the array contains data structures representing a circuit composed of identical condensors. It is 8 bytes, and nothing must be assumed about the content. :-(
Groetjes Albert
-- -- Albert van der Horst, UTRECHT,THE NETHERLANDS Economic growth -- like all pyramid schemes -- ultimately falters. albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
Bruce McFarling wrote: >> This is why stack-based languages have a reputation of unreadability.
> ``uniquify'' ... I dunno, that word looks pretty readable to me.
Wow! This is truly an amazing breakthrough! In the past, reading code meant... reading code. What a chore! Worrying about balancing control structures, thinking about how each word affects the stack, considering how words might screw around with the input stream, etc.
But now, thanks to the awesome power of simply redefining English words so they mean what we want, we've solved that problem. Now, readability is merely the ability to read the name of a definition and a comment stating the purpose of the definition!
In article <8a7_j.3356$%T3....@fe105.usenetserver.com>, John Passaniti <n...@JapanIsShinto.com> wrote:
>Albert van der Horst wrote: >> Sorting the items is pretty easy, qsort can handle that. >> Now eliminating the duplicates, we need only compare one >> item to the next.
>What if the original ordering must be preserved? By sorting, you've >destroyed that ordering. You would now have to annotate each item so >that you could later sort again. So the computational cost of this now >requires an annotation pass and two sorts... if the original ordering >must be preserved.
Sorry dude, I have here a set problem at hand. The order is irrelevant, but I can't afford to waste time on duplicates. <SNIP>
Groetjes Albert
-- -- Albert van der Horst, UTRECHT,THE NETHERLANDS Economic growth -- like all pyramid schemes -- ultimately falters. albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
>> On May 24, 3:05 pm, Albert van der Horst <alb...@spenarnc.xs4all.nl> >> wrote: >>> Even using variables/locals, I find this difficult. >>> (More difficult then e.g. making a table for the number of >>> dividers of N, which sounds much less trivial.)
>> It's not difficult if you have a library of sequence operations:
>Nice filter, but doing that in-place is not difficult, either.
>I give you a solution for characters in a string, and Albert can generalize >it. It is supposed that the string is sorted, so it will only remove >sequenes of the same character:
>: uniquify ( addr u -- addr u' ) > over >r bounds dup dup c@ 2swap 1+ ?DO > I c@ tuck <> IF swap 1+ 2dup c! swap THEN LOOP
So is this a bug in lina, or does it fail the first test I throw at it? (I expect 5 of course.)
The code is not transparent enough that it would be of any value as specification of an algorithm, or in this case to debug.
>-- >Bernd Paysan
Groetjes Albert
-- -- Albert van der Horst, UTRECHT,THE NETHERLANDS Economic growth -- like all pyramid schemes -- ultimately falters. albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
In article <8a7_j.3356$%T3....@fe105.usenetserver.com>, John Passaniti <n...@JapanIsShinto.com> wrote: <SNIP>
>Removing the actual duplicates is a separate matter. I would use two >pointers-- one for reading the current item, the other for where to >write it. As long as both pointers are the same, there have been no >duplicates and thus no reason to do a write. When the pointers are no >longer the same, then a write is needed. This could be slightly more >intelligent by delaying the write until the next duplicate, allowing a >block move instead of an item-by-item move.
That is a good description and much in line with my thoughts. Programming it in Forth is a different matter. I started with telling that I have technical difficulties.
Groetjes Albert.
-- -- Albert van der Horst, UTRECHT,THE NETHERLANDS Economic growth -- like all pyramid schemes -- ultimately falters. albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
>>Removing the actual duplicates is a separate matter. I would use two >>pointers-- one for reading the current item, the other for where to >>write it. As long as both pointers are the same, there have been no >>duplicates and thus no reason to do a write. When the pointers are no >>longer the same, then a write is needed. This could be slightly more >>intelligent by delaying the write until the next duplicate, allowing a >>block move instead of an item-by-item move.
>That is a good description and much in line with my thoughts. >Programming it in Forth is a different matter. >I started with telling that I have technical difficulties.
I did something I didn't want to do. I wrote it in C first then I translated it to Forth: I could write the c-code almost without thinking in mere minutes. Mechanically translating to Forth was also mere minutes.
-----------------------------------------
sometype *unique( sometype aap[], int len ) { sometype *ps; sometype *pd;
\ Sort from BEGIN to END. Leave new END (adr1, adr2 -- adr3) : doit >R begin DUP ( pd ps) DUP R@ < WHILE BEGIN DUP CELL+ R@ < IF DUP @ OVER CELL+ @ = ELSE 0 THEN WHILE CELL+ REPEAT SWAP OVER @ OVER ! CELL+ SWAP CELL+ REPEAT CELL+ ;
-----------------------------------------
Neither of the above code was debugged. Debugging the Forth code, was relatively easy:
--------------------------------------- : doit >R DUP BEGIN ( pd ps) DUP R@ < WHILE BEGIN DUP CELL+ R@ < IF DUP @ OVER CELL+ @ .S = ELSE 0 THEN .S WHILE CELL+ REPEAT SWAP OVER @ OVER ! CELL+ SWAP CELL+ REPEAT DROP RDROP ;
Now it turns out, as usual, that I didn't need this in the first place. :-(.
Groetjes Albert.
>Groetjes Albert.
>-- >-- >Albert van der Horst, UTRECHT,THE NETHERLANDS >Economic growth -- like all pyramid schemes -- ultimately falters. >albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
-- -- Albert van der Horst, UTRECHT,THE NETHERLANDS Economic growth -- like all pyramid schemes -- ultimately falters. albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
> The main problem is that the content cannot be placed > on the stack they cannot be compared with <> etc.
I don't see the big problem. If you can't place the content on the stack, place a pointer. And compare with whatever compare function you need. If you want this a library function like qsort, define that your array is an array of pointers to the actual structures, and that you have a deferred word for the actual comparison. This will compare for <= (whatever that is for your objects), and is used for both the qsort and the duplicate removal. You just have to make sure that the comparison order is the right one, i.e. if you expect a to be <= b, compare b <= a for duplicate removal (will only be true if b=a).
Albert van der Horst <alb...@spenarnc.xs4all.nl> wrote:
> Sorting the items is pretty easy, qsort can handle that. > Now eliminating the duplicates, we need only compare one > item to the next. > Something along compact (start, end, len -- end' ) > The new end should be returned, of course.
> Even using variables/locals, I find this difficult. > (More difficult then e.g. making a table for the number of > dividers of N, which sounds much less trivial.)
There are lots of sort algorithms available. If you find the right one, you can delete duplicates when the sort notices them. Then you don't have to sort them.
I started to do that. I found a simple comb-sort that was supposed to be pretty fast, and I looked at how to make simple modifications so it would close up duplicates instead of sorting them. Then I checked back here and found that you no longer wanted the routine at all. Oh well.
I'm not sure it would have been worth doing. For a long sequence you might have a long block move for every individual duplicate. The simplest way it would be a block move on average half as long as the whole sequence. Easy to add a little complication and make it a block move on average a quarter as long as the whole sequence. Maybe a different sort would work better.
Here's the combsort, in case anybody has any interest.
Bernd Paysan <bernd.pay...@gmx.de> writes: >Albert van der Horst wrote: >> The main problem is that the content cannot be placed >> on the stack they cannot be compared with <> etc.
>I don't see the big problem. If you can't place the content on the stack, >place a pointer. And compare with whatever compare function you need. If >you want this a library function like qsort, define that your array is an >array of pointers to the actual structures, and that you have a deferred >word for the actual comparison.
No! Don't use a global variable (like a deferred word) for that; it's not reentrant. Instead, pass the xt of the comparison word on the stack, or pass the right comparison word with the data (e.g., by using an OO extension).
On Sun, 25 May 2008 01:54:39 -0400, John Passaniti wrote: > Albert van der Horst wrote: >> Sorting the items is pretty easy, qsort can handle that. >> Now eliminating the duplicates, we need only compare one >> item to the next.
> What if the original ordering must be preserved? By sorting, you've > destroyed that ordering.
Errr. "abcbdecadec" Eliminate duplicates but preserve the original order? Impossible.
Bill Spight wrote: >> What if the original ordering must be preserved? By sorting, you've >> destroyed that ordering.
> Errr. "abcbdecadec" Eliminate duplicates but preserve the original order? > Impossible.
"abcde"
I apparently just did the impossible.
Here, allow me to use a different string:
"edecdbea"
Using my awesome powers of removing duplicates without sorting and without changing the order as I progress through the string, I get:
"edcba"
Later, Albert van der Horst restated the problem not as removing duplicates from an array, but removing duplicates from a set. As a set problem, order doesn't matter. But still, I continue to wonder why this discussion involves any sorting. It doesn't make the problem any easier and it invariably requires pointless memory moves of either the contents of the array or a set of addresses to that contents. My solution required only a single pass through the array and the creation of a parallel data structure to mark values that were previously seen.
John Passaniti <n...@JapanIsShinto.com> wrote Re: Elementary but surprisingly difficult.
> Bill Spight wrote: >>> What if the original ordering must be preserved? By sorting, you've >>> destroyed that ordering. >> Errr. "abcbdecadec" Eliminate duplicates but preserve the original order? >> Impossible. > "abcde" > I apparently just did the impossible.