Fitting the model with Spark Dataframe

28 views
Skip to first unread message

Hamza Saaidia

unread,
Apr 15, 2023, 8:27:10 PM4/15/23
to User Group for BigDL
Hi team, 

I am fitting my Keras estimator using a Spark dataframe that I read from HDFS, but unlike Tensorflow dataset, it seems that it loads all the data in memory, which cause memory ussies, can you tell me how to optimize the use of memory, so it doesn't load all dataset in memory but use the batch size instead if it is possible...

Thanks,

Xin Qiu

unread,
Apr 17, 2023, 2:06:16 AM4/17/23
to User Group for BigDL
Does your Keras Estimator mean NNEstimator ?  https://bigdl.readthedocs.io/en/latest/doc/DLlib/Overview/nnframes.html
From the docs, you can enable  DISK_AND_DRAM option to reduce your use of memory, like   .setDataCacheLevel("DISK_AND_DRAM", 10) will only cache 10% of your dataset in memory.

Bests,
-Xin
Reply all
Reply to author
Forward
0 new messages