using kap to incrementally add rows to dict

Neeraj Rai

unread,

May 23, 2013, 7:59:56 AM5/23/13

to kona...@googlegroups.com

It seemed to me through experimentation that I can't have a dict (type 5 K struct) and add rows to it incrementally.

kap can be used to add to rows of similar type to an existing vector.

I added a new page with my understanding of dict struct.

https://github.com/kevinlawler/kona/wiki/Csv-loading-and-dictionary

At the lowest level, it comes down to

a vector of single element containing a header

a vector of n (=# of rows) element containing the col values

When I tried to add rows incrementally, the data was corrupted.

Can someone confirm the observations are correct or I am not using it right?

Kevin Lawler

unread,

May 23, 2013, 9:23:04 PM5/23/13

to kona...@googlegroups.com

Dictionaries (5-types) are basically re-typed Lists (0-types) with
entries of a special form.

See make/unmake/makeable

https://github.com/kevinlawler/kona/wiki/Dictionaries
https://github.com/kevinlawler/kona/blob/master/vd.c#L279

> --
> You received this message because you are subscribed to the Google Groups
> "Kona Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kona-dev+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Kevin Lawler

unread,

May 23, 2013, 9:26:05 PM5/23/13

to kona...@googlegroups.com

The tricky case is that a dictionary entry with two elements, the
first necessarily being a symbol and the second also happening to be a
symbol, should be a symbol vector (type -4).

Tom Szczesny

unread,

May 24, 2013, 11:38:12 AM5/24/13

to kona...@googlegroups.com

Neeraj:

1) Seems like you are headed in the direction of "q", with dictionaries as the basis for tables (flipped dictionaries).

In k, q and kdb what is the process for adding a whole new row to a table (or dictionary)?

2) Maybe you should open a new item in the issues list regarding this feature.

3) I plan to tinker with this more, but issues 198, 187 and 165 are currently higher on my priority list.

Neeraj Rai

unread,

May 25, 2013, 6:34:43 PM5/25/13

to kona...@googlegroups.com

Hi Kevin,

I created a dict using h!c where h was a list of headers and c was a list of list of cols

1. Would this result in same calls you reference below : make ?

2. is it possible to discuss the 3 lines from make here ?

   create a K dict , type=5 count=n the number of cols

      K z=newK(5,n);

   for each col, allocate 3 list (type=0, count=3)

DO(n, kK(z)[i]=newK(0,3);)

for each col (y),

for each row of the col (y[i]),

either ci the y[i] or assign it as string ?

DO(n, x=kK(z)[i]; y=kK(a)[i]; DO2(y->n,kK(x)[j]=y->t?Ks(kS(y)[j]):ci(kK(y)[j])) if(y->n<3)kK(x)[2]=_n()

Tom Szczesny

unread,

May 26, 2013, 12:46:11 PM5/26/13

to kona...@googlegroups.com

Adding a complete new row to a dictionary can be done using kona directly:

d:.((`a;1 2);(`b;1.1 2.2))

.((`a

1 2

)

(`b

1.1 2.2

))

d[`a`b]:d[!d],'(3;3.3)

(1 2 3

1.1 2.2 3.3)

d

.((`a

1 2 3

)

(`b

1.1 2.2 3.3

))

Bakul Shah

unread,

May 27, 2013, 3:48:28 PM5/27/13

to kona...@googlegroups.com

On Sat, 25 May 2013 15:34:43 PDT Neeraj Rai <rneer...@gmail.com> wrote:
> I created a dict using h!c where h was a list of headers and c was a list of
> list of cols

The following is equivalent to h!c

.+(h;c)

In C I would do something like the following:

U(h->n == c->n && h->t == -4);
K z = gtn(5, 0);
DO(c->n, kap(&z, gnk(3, ci(KS(h)[i]), ci(KK(c)[i]), 0)));
Or
U(h->n == c->n);
K z = gtn(5, c->n);
DO(c->n, KK(z)[i]=gnk(3, ci(KS(h)[i]), ci(KK(c)[i]), 0));

using the just commited C-K API library. If your h list contains
strings instead of symbols, you can use gsk():

K z = gtn(5, 0);
DO(c->n, kap(&z, gsk(KK(h)[i], ci(KK(c)[i]))));

Given a dictionary d, you can append a row with the ,:
operator. For example:

d:.+(`a`b`c;(`x`y;20 30;2.2 3.3));

d[`a`b`c],:(`z;40;4.4);
d
.((`a
`x `y `z
)
(`b
20 30 40
)
(`c
2.2 3.3 4.4
))

[NB: d[],:(`z;40;4.4) doesn't work due to a bug]

In C code it would be something like the following:

// x points to d and y points to a row
I rc = KK(KK(x)[0])[1]->n; // row count
U(rc == y->n);
DO(rc, kap(&KK(KK(x)[i])[1], &KK(y)[i]);

Though kap() is probably not smart enough and you need
this code to be typeaware.

Neeraj Rai

unread,

May 28, 2013, 10:40:03 PM5/28/13

to kona...@googlegroups.com

Hi Bakul,

I tried the your suggestion and the incremental add works.

but can't seem to make the c code work.

I create an empty dict and add 2 cols to it. 1st is string, the 2nd one is int.

When I add rows, it seems to get junk.

Can you please point out what is wrong with my code?

Appreciate any suggestions.

thanks

Neeraj

======= code for myincrdict

// gcc -g3 -fPIC -shared myadd.c -o myadd.so

//f: `"myadd.so" 2: (`myincrdict,1);

//f[1]

K myincrdict (K x)

{

K d=newK(5,0); // dictionary

// add 1st col

K c1=newK(0,3);

K r1=newK(0,0); // string

kS(c1)[0]=sp("col1"); kK(c1)[1]=r1; kK(c1)[2]=0;

kap(&d, &c1);

// add 2nd col

K c2=newK(0,3);

K r2=newK(-1,0); // int

kS(c2)[0]=sp("col2"); kK(c2)[1]=r2; kK(c2)[2]=0;

kap(&d, &c2);

// incrementally add rows

// 1st row

K e11 = newK(-4,5); strcpy(kC(e11), "hello1"); kap(&r1, &e11);

K e12 = newK(1,1) ; kI(e12)[0]=1 ; kap(&r2, &e12);

// 2nd row

K e21 = newK(-4,5); strcpy(kC(e21), "hello1"); kap(&r1, &e21);

K e22 = newK(1,1) ; kI(e22)[0]=1 ; kap(&r2, &e22);

Bakul Shah

unread,

May 28, 2013, 11:57:22 PM5/28/13

to kona...@googlegroups.com

One suggestion is to use g*() functions instead of newK etc so
as to protect your C code from future implementation changes.

Working example:

K d = gtn(5,0);
K c0 = gtn(0,0);
K c1 = gtn(0,0);
kap(&d, gsk("a", c0));
kap(&d, gsk("b", c1));
kap(&c0, gp("hello1"));
kap(&c1, gi(1));
kap(&c0, gp("hello2"));
kap(&c1, gi(2));
show(d);

kap() doesn't seem to behave like k3 kap() when its first arg
is not a general list. If it did work that way, we would have

I i;
K d = gtn(5,0);
K c0 = gtn(0,0);
K c1 = gtn(-1,0);
kap(&d, gsk("a", c0));
kap(&d, gsk("b", c1));
kap(&c0, gp("hello1"));
i = 1; kap(&c1, &i);
kap(&c0, gp("hello2"));
i = 2; kap(&c1, &i);
show(d);

Bakul Shah

unread,

May 29, 2013, 12:54:36 AM5/29/13

to kona...@googlegroups.com

On Tue, 28 May 2013 20:57:22 PDT Bakul Shah <ba...@bitblocks.com> wrote:
>
> kap() doesn't seem to behave like k3 kap() when its first arg
> is not a general list. If it did work that way, we would have
>
> I i;
> K d = gtn(5,0);
> K c0 = gtn(0,0);
> K c1 = gtn(-1,0);
> kap(&d, gsk("a", c0));
> kap(&d, gsk("b", c1));
> kap(&c0, gp("hello1"));
> i = 1; kap(&c1, &i);
> kap(&c0, gp("hello2"));
> i = 2; kap(&c1, &i);
> show(d);

Ok, I found the bug. Basically in km.c

extern K kap(K*a,V v){R kapn_(a,&v,1);}

needs to be changed to

extern K kap(K*a,V v){R kapn_(a,v,1);}

since v is already a ptr. Now the reason I am not making this
change is because kap is used elsewhere. Given its earlier
semantics, as a second arg objects of type K may have been
passed when they should be of type K*.

Fixing kap() would require the example code above to become

K d = gtn(5,0);
K c0 = gtn(0,0);
K c1 = gtn(-1,0);

K t;
K e0;
I e1;

t = gsk("a", c0)); kap(&d, &t);
t = gsk("b", c1)); kap(&d, &t);

e0 = gp("hello1")); kap(&c0, &e0);
e1 = 1; kap(&c1, &e1);

e0 = gp("hello2")); kap(&c0, &e0);
e1 = 2; kap(&c1, &e1);
show(d);

Bakul Shah

unread,

May 29, 2013, 1:17:20 PM5/29/13

to kona...@googlegroups.com

Ok, kap() is fixed. All the tests pass. Probably few more can
be enabled too. One remaining kap() difference from k3: k3
doesn't increment the refcount; kona does. In fact doing it
the k3 way will get rid of the cd() call made right after a
kap() call.

Will send out a push notification later today.

But first I should note a subtlety of kap() use. Originally
Neerav had code like this:

> K e11 = newK(-4,5); strcpy(kC(e11), "hello1"); kap(&r1, &e11);
> K e12 = newK(1,1) ; kI(e12)[0]=1 ; kap(&r2, &e12);
>
> // 2nd row
> K e21 = newK(-4,5); strcpy(kC(e21), "hello1"); kap(&r1, &e21);
> K e22 = newK(1,1) ; kI(e22)[0]=1 ; kap(&r2, &e22);

This won't work because kap() can potentially reallocate the
object. r1 or r2 will point to the new data but the
corresponding dictionary enty is still pointing to the old
object (which is gets recycled and may contain something else
or garbage). To fix this you can do one of two things: either
fix up the ptr later (How I update column 0 below) or pass the
appropriate ptr (column 1 below).

I i;
K d = gtn(5,0);
K c0 = gtn(0,0);
K c1 = gtn(-1,0);

K t0, t1, e;

t0 = gsk("a", c0); kap(&d,&t0);
t1 = gsk("b", c1); kap(&d,&t1);

e = gp("hello1"); kap(&c0,&e);
e = gp("hello2"); kap(&c0,&e);
KK(KK(d)[0])[1] = c0;

i = 1; kap(&KK(KK(d)[1])[1], &i);
i = 2; kap(&KK(KK(d)[1])[1], &i);

But kap will generally be slower. If you know the size
upfront, preallocate that much data and manually set things.

K c0 = gtn(0,2);
K c1 = gtn(-1,2);
KK(c0)[0] = gp("hello1"); KI(c1)[0] = 1;
KK(c0)[1] = gp("hello2"); KI(c1)[1] = 2;

K d = gtn(5,2);
KK(d)[0] = gsk("a", c0);
KK(d)[1] = gsk("b", c0);

When you are reading lots and lots of rows, kap() calls can be
expensive. An alternative is to buffer up a few locally and
then call kapn() on n entries. Will require adding kapn to the
API.

Neeraj Rai

unread,

May 29, 2013, 8:40:41 PM5/29/13

to kona...@googlegroups.com

Hi Bakul,

This is encouraging progress. I understand why my code didn't work.

You have some interesting comment about use of reading (batching etc), which is well appreciated.

I am still having trouble running your dict code.

I have pulled the latest version.

## to build libkona.a as per your c api wiki

>make lib

## build the code provided by you linked with libkona

>gcc -g3 -I . -L . bakuldict.c -o bakuldict -lkona -lm -ldl

It core dumps at line 10. The stack trace and code is listed below:

>>> code

#include "kona.h"

// make lib

// gcc -g3 -I . -L . bakuldict.c -o bakuldict -lkona -lm -ldl

K main (int _ac, char* _av)

{

K d = gtn(5,0);

K c0 = gtn(0,0);

K c1 = gtn(0,0);

kap(&d, gsk("a", c0));

kap(&d, gsk("b", c1));

kap(&c0, gp("hello1"));

kap(&c1, gi(1));

kap(&c0, gp("hello2"));

kap(&c1, gi(2));

show(d);

}

>> stack trace of core dump

Program received signal SIGSEGV, Segmentation fault.

0x000000000042d300 in sp (k=0x42e3c0 "a") at ks.c:21

21 N t=SYMBOLS, s=t->c[1],p=s,q=p,r; I a,x;

(gdb) bt

#0 0x000000000042d300 in sp (k=0x42e3c0 "a") at ks.c:21

#1 0x00000000004189b6 in gsk (s=<value optimized out>, k=0x7ffff7ffb000)

at kapi.c:42

#2 0x00000000004024bd in main (_ac=1,

_av=0x7fffffffdb88 "\n\337\377\377\377\177") at bakuldict.c:10

thanks

Neeraj

Bakul Shah

unread,

May 30, 2013, 2:14:55 AM5/30/13

to kona...@googlegroups.com

On Wed, 29 May 2013 17:40:41 PDT Neeraj Rai <rneer...@gmail.com> wrote:
>
> I am still having trouble running your dict code.
> I have pulled the latest version.
> ## to build libkona.a as per your c api wiki
> >make lib
> ## build the code provided by you linked with libkona
> >gcc -g3 -I . -L . bakuldict.c -o bakuldict -lkona -lm -ldl
> It core dumps at line 10. The stack trace and code is listed below:

> kap(&d, gsk("a", c0));
> kap(&d, gsk("b", c1));
> kap(&c0, gp("hello1"));
> kap(&c1, gi(1));
> kap(&c0, gp("hello2"));
> kap(&c1, gi(2));

This was old code that didn't work as the old defn of kap()
was broken. I fixed it and brought it in line with k3 in my
last commit.

Please make sure read and understood my entire last message
(posted today @ 10:17AM). If something in there doesn't make
sense, ask and I will be happy to explain, but all the
information you need should be there. The worked out example
in it does exactly what you want. But the key thing is to
understand how to use kap() correctly. Also, there is code in
kapi-test.c that does exactly what you want.

I haven't updated ckapi.txt as yet and the example in it is
not correct. I will do so soon but until then for kap() see
the last message.

Neeraj Rai

unread,

May 30, 2013, 9:05:26 PM5/30/13

to kona...@googlegroups.com

Hi Bakul,

the kapi-test in latest version core dumps for me.

I am using centos 6, 2.6.32-220.17.1.el6.x86_64

gcc --version

gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3)

make

make kapi-test

./kapi-test

--> core dumps

Please let me know if this is reproducable on your system.

Bakul Shah

unread,

May 30, 2013, 9:34:54 PM5/30/13

to kona...@googlegroups.com

In kapi-test.c, right below

extern K KTREE;

Add

extern K X(S);

Neeraj Rai

unread,

May 31, 2013, 10:44:03 PM5/31/13

to kona...@googlegroups.com

Hi Bakul,

adding extern K X(S) works for kapi-test.c

however, the extract that I posted still core dumps.

If I delete all lines from kapi-test.c and leave 77-94 - it core dumps.

Pls advise.

Neeraj

Neeraj Rai

unread,

May 31, 2013, 11:28:26 PM5/31/13

to kona...@googlegroups.com

Have the accessors changed meanings in kona.h?

I can't find kS(z) when I work with kona.h.

IMHO stiking with original macros is better. It took me a long time to get used to it.

gtn(I t, I n) seems to be renaming newK(I t, I n).

I'll see if I can dig up my orignal program and apply the pointer update you suggested to make it work.

thanks

Neeraj

Bakul Shah

unread,

May 31, 2013, 11:42:58 PM5/31/13

to kona...@googlegroups.com

On Fri, 31 May 2013 19:44:03 PDT Neeraj Rai <rneer...@gmail.com> wrote:
> however, the extract that I posted still core dumps.

Do you mean this extract?

K main (int _ac, char* _av)
{
K d = gtn(5,0);
K c0 = gtn(0,0);
K c1 = gtn(0,0);

kap(&d, gsk("a", c0)); <<<
kap(&d, gsk("b", c1)); <<<
kap(&c0, gp("hello1"));
kap(&c1, gi(1));
kap(&c0, gp("hello2"));
kap(&c1, gi(2));

show(d);
}

This can't work anymore. You have to do things like these:

K foo = gsk("a", c0);
kap(&d, &foo);

That is, pass a pointer to an object rather than the object
itself. Given that kap() is used to extend vectors of chars,
integers, floats, symbols, K etc. this is the only interface
that can work. This is explained in the doc. You also need
to realize that c0 and c1 may be reallocated by kap(&c0, ...).
So the lines marked with <<< will not work.

> If I delete all lines from kapi-test.c and leave 77-94 - it core dumps.

You need to initialize by first calling ksk("",0); You can
delete everything after that line.

You should use gdb to step through the code. May be that will
give you a better understanding.

Neeraj Rai

unread,

Jun 1, 2013, 1:21:24 PM6/1/13

to kona...@googlegroups.com

ok, some success for me atlast.

I went back to the original program and just updated the dict cols as you suggested and it seems to work.

thanks for your time and effort. - Neeraj

--------------------------------------------- code

#include "d.h"

//gcc -g3 -fPIC -shared mydict.c -o mydict.so

// use k_dyn and load shared libs

//f: `"/work/sid0/dev/builddir/nr-kona/mydict.so" 2: (`mydict,1);

//f[1]

K mydict (K x)

{

K d=newK(5,0); // dictionary

K c0=newK(0,0); // add int col

K c1=newK(-1,0); // add string col

K t0, t1, e0;

t0=newK(0,3);//t0--> kS(kK(d)[0])[0]

kS(t0)[0]=sp("col1"); kK(t0)[1]=c0; kK(t0)[2]=0; kap(&d, &t0);

t1=newK(0,3); // t1--> kS(kK(d)[1])[0]

kS(t1)[0]=sp("col2"); kK(t1)[1]=c1; kK(t1)[2]=0; kap(&d, &t1);

//e0-->kC(kK(c0)[0])

e0 = newK(-3,6); memcpy(kC(e0), "hello1", 6); kap(&c0,&e0);

kK(t0)[1]=c0; // update new ptr

// e1-->kI(c1)[0]

I e1 = 1; kap(&c1, &e1);

kK(t1)[1]=c1; // update new ptr

//e0-->kC(kK(c0)[1])

e0 = newK(-3,6); memcpy(kC(e0), "hello2", 6); kap(&c0,&e0);

kK(t0)[1]=c0; // update new ptr

e1 = 2; kap(&c1, &e1);//e1-->kI(c1)[1]

kK(t1)[1]=c1; // update new ptr

show(d);

Bakul Shah

unread,

Jun 1, 2013, 5:03:24 PM6/1/13

to kona...@googlegroups.com

On Sat, 01 Jun 2013 10:21:24 PDT Neeraj Rai <rneer...@gmail.com> wrote:
> ok, some success for me atlast.

Good!

> I went back to the original program and just updated the dict cols as you
> suggested and it seems to work.

You can do what you want of course but note that at present if
you include ts.h or other kona header files along with kona.h,
you may run into compile time conflicts. The whole point of
the c-k api was to avoid having to look into rest of the kona
source files. The c-k api exactly matches what was in k3
(modulo what is not yet implemented). For your purposes

newK==gtn
kK == KK
kI == KI
etc.

This

e0 = newK(-3,6); memcpy(kC(e0), "hello1", 6);

can be replace with this:
e0 = gp("hello1");
etc.

As a guideline
If you are directly extending kona, use kona's header files
If you has a C main program, use kona.h and link to kona library
If you are writing a C function to be dynamically loaded from k, use kona.h

Neeraj Rai

unread,

Jun 2, 2013, 7:34:26 PM6/2/13

to kona...@googlegroups.com

Hi Bakul,

Some context of this post is in my other recent posts. I'll recap some of that here for closure.

I started with writing a csv loader that can detect different formats and use appropriate decoder to pipe the data in.

The existing csv loader mmaps the whole file. That is not an option when using popen.

So I read in the whole file as 1st version. The missing piece for me was to add to the cols incrementally.

Adding to cols using kap didn't seem to work till you pointed out that kap changes the pointer so dict or array I originally held needed to point to new location.

I wanted to simulate the problem in a standalone code. When I first wrote it, there was no c interface.

I used the recently introduced shared lib interface to simulate the issue.

Either way (c interface or shared lib), the code is throw away as I'll finally put it in _0d_rdCsvWc

https://github.com/rneeraj/kona/commits/piperd

Right now, most of my files work with full read in - and the pipe reader is not merged - so I'm not in a hurry to add the enhancement.

I should note that I had given up on incremental read and you have provided a way forward on that front.

thanks

Neeraj

Neeraj Rai

unread,

Jun 16, 2013, 12:28:49 AM6/16/13

to kona...@googlegroups.com

Hi,

I have updated the pipe reader to create a dictionary incrementally.

https://github.com/rneeraj/kona/commit/6021c79001f97421a45966e9379ec7defb3abbc9

It seems to work for my 1M rows file. It also seems to be faster than original mmap csv reader.

It may have bugs and I'll continue to test it for some time.

I would like get some more feedback about correctness and performance for the code , now that it seems to works for basic cases.

The following feedbacks are still on my todo and should go in next month.

1. batching of kap calls - Bakul
2. detecting filetype based on ext to avoid separate option.

main reason for delaying this is, that I am testing correctness of my program against the orignal code.