extending newK(t,n)

65 views
Skip to first unread message

Neeraj Rai

unread,
May 10, 2013, 9:03:28 PM5/10/13
to kona...@googlegroups.com
Hi,

When allocating storage using newK, the size is fixed. Can it be changed on the fly to add additional data to K ?
This may be helpful in reading really large files incrementally.

What are my other options for really large data that maynot fit in memory ?
For data where total size is not known at the begining?

thanks
neeraj

Kevin Lawler

unread,
May 10, 2013, 10:36:18 PM5/10/13
to kona...@googlegroups.com
no

kap() will do this though
> --
> You received this message because you are subscribed to the Google Groups
> "Kona Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kona-dev+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Neeraj Rai

unread,
May 10, 2013, 11:24:01 PM5/10/13
to kona...@googlegroups.com
ok, that's what I was looking for.
grepping for kap, I see an interesting usage of in popen_charvec (called from _4d).
it initiates a list of 0 size and incrementally added to it each line returned by popen.

One question :
Following the popen_charvec flow, 
   l = newK(-3,n-1) // allocate for line
   strncpy(kC(l), s,n-1) // copy line to l
   kap (&z,l)     // add l to z
it seems, in this case, the memcpy of l to z inside kap is avoidable.
My use case would be similar, but kap would be called million times in a loop.
Is there a closer alternative ?

thanks
Neeraj

Neeraj Rai

unread,
May 12, 2013, 6:19:57 PM5/12/13
to kona...@googlegroups.com
my original concerns might have been misdirected.
I read through the memory management wiki and followed the kap (through kapn_, kexpander etc)
In all that follows, I may be still wrong in my reading of code, so please point in right direction.

It seems like kap calls newK under the hood and copies the content of older memory here, and frees the older mem.
It seems like the pattern is somewhat like the "new" and "delete" in c++.
I was hoping for something like "malloc" and "realloc" of c.

This is probably done to keep the memory contiguous.
However, If contiguous memory is available beyond the current K, should we try to use it and avoid memcpy?

In reading large files using popen, or running system cmd, we may not know the full size of needed by newK upfront.
However, we do know that once newK is called, no one else will free or allocate memory so we don't need to look for 
contiguous memory on each subsequent call.
Maybe we need another call kapc (for kap contiguous) ? It can fall back to kap if contiguous mem is unavailable.

thanks
Neeraj

On Friday, 10 May 2013 22:36:18 UTC-4, Kevin wrote:

Neeraj Rai

unread,
May 12, 2013, 10:53:31 PM5/12/13
to kona...@googlegroups.com
My understanding of kap must be flaowed. I tried the following program and it core dumps.
I'll need some guidance on how to use it.

compile using : gcc -g3 -fPIC -shared  myadd.c -o myadd.so
and load as shared lib to test.
K myrealloc (K x)
{
  K d  = newK(0,0);

  K i = newK(1,1) ; kI(i)[0] = 7;   kap(&d[0], i);
  K f = newK(2,1) ; kF(f)[0] = 3.0; kap(&d[1], f);
  K c = newK(-3,5); strncpy(kC(i), "hello", 5); kap(&d[2], c);

  print("t=%d", d->t);
}

Neeraj Rai

unread,
May 13, 2013, 9:03:53 PM5/13/13
to kona...@googlegroups.com
got simple kap to work. working on list of list next.
 1: K myrealloc (K x)
 2: {
 3: K d=newK(0,0);
 4:
 5:  K cc = newK(-3,5);  strncpy(kC(cc), "hello", 5);
 6: kap(&d, cc);
 7:
 8:  K di = newK(-1,2) ;  kI(di)[0] = 7; kI(di)[1]=5;
 9:  kap(&d, di);
10:
11:  K ef = newK(-2,1) ;  kF(ef)[0] = 3.0;kF(ef)[1]=4.0;
12:  kap(&d, ef);
13: }

Neeraj Rai

unread,
May 13, 2013, 9:43:02 PM5/13/13
to kona...@googlegroups.com
It seems the complex list doesn't work when updating row by row.
e.g. in the sample code below, myrealloc displays complex list as it was populated in order.
   but myrealloc1 doesn't display the last element (n) because I went back to updated element n-1

build the following into shared lib
//gcc -g3 -fPIC -shared  myadd.c -o myadd.so
K myrealloc (K x)
{
  K d=newK(0,0);
  K cc = newK(-3,5);  strncpy(kC(cc), "hello", 5);
  kap(&d, cc);
  K di = newK(-1,4) ;  int ii;for (ii=0; ii<4; ++ii) kI(di)[ii] = 7+ii;
  kap(&d, di);
  K ef = newK(-2,2) ;  kF(ef)[0] = 3.0;kF(ef)[1]=4.0;
  kap(&d, ef);
}
K myrealloc1 (K x)
{
  K d=newK(0,0);
  K cc = newK(-3,5);  strncpy(kC(cc), "hello", 5);
  kap(&d, cc);

  K di = newK(-1,0);
  int ii;  for (ii=0; ii<4; ++ii){ K dij = newK(-1,1);  kI(dij)[0] = 7+ii; kap(&di, dij); }
  kap(&d, di);

  K ef = newK(-2,2) ;  kF(ef)[0] = 3.0;kF(ef)[1]=4.0;
  kap(&d, ef);
  K dij = newK(-1,1);  kI(dij)[0] = 7+12;
  kap(&di, dij);
}

 K Console - Enter \ for help
  c: `"/work/sid0/dev/builddir/nr-kona/myadd.so" 2: (`myrealloc,1); 
  d: `"/work/sid0/dev/builddir/nr-kona/myadd.so" 2: (`myrealloc1,1); 

  c[1]  // displays correctly
("hello"
 7 8 9 10
 3 4.0)

  d[1]  // last row missing
1 1 1 1 1

Please advise if I am using it wrong.

Neeraj

Neeraj Rai

unread,
May 14, 2013, 7:42:42 AM5/14/13
to kona...@googlegroups.com
What kap does seems right for col based storage. To store all the cols contiguously, we need to know the size upfront.

since storing really big file in memory is likely to run into issues, 
I'm planning to do it in 2 pass.
1st pass: read the streaming popen output and store to per col file .
               each col gets a filename. data remains as string
               count the number of rows 
2nd pass : read back the individual cols from respective files.
              the size is known in advance.

I would like to hear of better solutions.
thanks
Neeraj
Reply all
Reply to author
Forward
0 new messages