On Jan 12, 2014, at 9:28 PM, Scott Vokes <
vok...@gmail.com> wrote:
>> IIRC, ? for draw and deal were added in k4 so all of your tests fail in k3.
>> you can may be use something like the following (only showing the first test):
>> cv: _ci' 97 + 1000000 _draw 26 / allocate a vector of 10^6 random chars
>> \t do[100;?cv]
>> This gives 128ms with k, 4390ms with kona (latest).
> Well, I'm trying to speed up kona's existing interface. I don't have k3
> or k4 executables on hand. (And, for IP reasons, maybe I shouldn't...)
What I meant is _draw (or _deal) is more portable.
>> I think just one loop on the input vector is needed. The loop body:
>>
>> v = x[i]; if (!seen[x]) { u[j++] = x; seen[x] = 1;}
>>
>> u is the same size as x but only [0..j) values are filled. You
>> don't even have to initialize u to all zeroes! seen[256] is
>> zeroed at the start.
>>
>> But is this case worth optimizing?
Sorry, I was talking about just the char array case.
> I did it in multiple passes because I figured that there was more
> benefit in having a short, appropriately sized vector of unique values
> rather than having an equally long one artifcially truncated. It seems
> to me that unique/range is typically used to do things like grab the
> small set of unique keys that appear many times throughout database
> column. Feel free to convince me otherwise, though.
My logic is, for short arrays it doesn't matter. For very large arrays,
it is better to walk them *once* rather than twice. And the output will always
be shorter or the same (and can never exceed the domain of the element
type -- so 256 for chars). For ints etc. you can allocate in smaller chunks
if space is a concern but my view is that assuming an array of the same
size can be allocated simplifies things (and if you don't have that space,
you run into other issues very quickly).
> But, seen[256] _does_ need to be zeroed, it's stack allocated and
> likely to contain remnants of other stack frames. For the int
> variant, it looks like memory is zeroed when returned to the pool,
> but not when initially allocated. Am I misinterpreting unpool and
> the like in km.c?
I think one should not rely on the allocator zeroing memory as often
the array will be immediately filled with something more useful.
> And, while char vectors are probably low priority to optimize, it's
> not making anything else slower, and adding that code allowed some
> to be deleted elsewhere. The int optimization seems more clearly
> beneficial, due to the many ways k uses ints.
>
>> The seen vector above becomes a hash table when the input domain is large.
>> Though we need a proper hashtable.
> Sure. Are you talking about behavior in k3/k4/? I guess I don't
> have enough context. I could set up a temporary hash table later,
> at least for the sparse integer case.
No, just Kona. Currently dictionaly lookup does a linear search.
This is fine for short dictionaries but not for large dictionaries.
As an example, I create a dictionary of 10K symbols mapping to
10K random numbers.
Kona:
\t x:10000#?`$'100000 5#_ci'97+500000 _draw 26
1328
\t d:.+(x;10000 _draw 100)
54
\t d[x]
2109
K3:
\t x:10000#?`$'100000 5#_ci'97+500000 _draw 26
85
\t d:.+(x;10000 _draw 100)
187
\t d[x]
115
Now the same for 20K symbols (K3 only):
\t x:20000#?`$'100000 5#_ci'97+500000 _draw 26
82
\t d:.+(x;20000 _draw 100)
827
\t d[x]
520
As you can see, the lookup time quadrupled (the dictionary
creation time gets larger faster! Interesting....).
For general group and unique operations, search should dominate.
So I was saying two things:
1. Generalize dictionaries -- allow any type as key, not just syms
2. For large dictionaries add something more to create a proper
hash table (so that lookup becomes approx O(1)).
This may be worth it if you are doing lots of database type of
operations. But 2. has to maintain current semantics.