How to save and load data in chuncks via Python generators?

10 views
Skip to first unread message

MasayoMusic

unread,
Jan 15, 2019, 12:16:15 AM1/15/19
to bcolz
I was introduced to framework via FastAI, and I cant sucessfully seem to save data/ load data without crashing my memory.

I was attempting something like this but it seems bcolz_array keeps getting larger takes up more memory as it grows?


Thank you.


def save_array(fname, generator_array, num_batches,data_type = "data"):
    if data_type == "data":
        bcolz_array = bcolz.carray(np.zeros([0,img_width, img_height,3], dtype=np.float32), mode='w', rootdir=fname)
    else:
        bcolz_array = bcolz.carray(np.zeros([0,len(labels)], dtype=np.float32), mode='w', rootdir=fname)
        

    data_dict = {"data": 0, "labels": 1}

    if data_type not in ["data", "labels"]:
        raise ValueError ("data or labels")

    for i in range(batches):
        bcolz_array.append(next(generator_array)[data_dict[data_type]])
    bcolz_array.flush()


Valentin Haenel

unread,
Jan 15, 2019, 1:38:18 PM1/15/19
to bc...@googlegroups.com
Hi,

answers inline below.


* MasayoMusic <bigmi...@gmail.com> [2019-01-15]:
> I was introduced to framework via FastAI, and I cant sucessfully seem to
> save data/ load data without crashing my memory.
>
> I was attempting something like this but it seems bcolz_array keeps getting
> larger takes up more memory as it grows?
>
>
> Thank you.
>
>
> def save_array(fname, generator_array, num_batches,data_type = "data"):
> if data_type == "data":
> bcolz_array = bcolz.carray(np.zeros([0,img_width, img_height,3], dtype=np.float32), mode='w', rootdir=fname)
> else:
> bcolz_array = bcolz.carray(np.zeros([0,len(labels)], dtype=np.float32), mode='w', rootdir=fname)

How long is 'labels' or otherwise: how big is the zeros array? Because
this might be part of your issue. I haven't use bcolz in years, but my
gut feeling is telling me, that you might want to try:

http://bcolz.blosc.org/en/latest/reference.html#bcolz.zeros

>
> data_dict = {"data": 0, "labels": 1}
>
> if data_type not in ["data", "labels"]:
> raise ValueError ("data or labels")
>
> for i in range(batches):
> bcolz_array.append(next(generator_array)[data_dict[data_type]])
> bcolz_array.flush()


Maybe you could let us know how it goes and if that helped anything?

V-

MasayoMusic

unread,
Jan 16, 2019, 4:37:00 AM1/16/19
to bc...@googlegroups.com
Thanks for the reply,

image width and height are either (299,299) or (480,480) each.
labels is 2

I will try changing the bcolz.carray to bcolz.zeroes and let you know.
Thanks!

When you say you havent used it for years, does this mean I should be using something more updated?
I was originally using hdf5 to store large data and load them in batches, but I was running into a strange bug.



--
You received this message because you are subscribed to the Google Groups "bcolz" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bcolz+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Valentin Haenel

unread,
Jan 16, 2019, 4:39:00 AM1/16/19
to bc...@googlegroups.com, MasayoMusic
No, just that I have been doing other things and haven't had a need for bcolz.

V-
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

MasayoMusic

unread,
Feb 1, 2019, 5:19:02 AM2/1/19
to bcolz
Sorry for the late reply but it seems I need to use chunks:

bcolz_array = bcolz.carray(np.zeros([0,img_width, img_height,3], dtype=np.float32), chunklen=1, mode='w', rootdir=fname)
Reply all
Reply to author
Forward
0 new messages