Something about tf.data

29 views

Skip to first unread message

Sun Aries

unread,

Apr 20, 2021, 1:05:20 AM4/20/21

to devel...@tensorflow.org

Hi TFers,
Thanks for your contributions in Machine Learning. I'm an old TF user working with TensorFlow for years since 2016, and these days I'm troubled by a problem about tf.data in days. The problem I'm working on is a little bit like this one, we can not use tf.data.Dataset.map in any more comfortable way, just a general python callback as map_fn with parallel acceleration, because of GIL problems. And I remember that tf.enqueue on TF 1.0 had no this problem before? Otherwise, this handicap I have met in 2018, and tf.data seems like a baby that year, so I solved that by transforming all of our codes with tf apis. But now I think tf.data has been growing up for years, and maybe strong enough in these cases, however, this problem seems too difficult to be solved by years. So, I want to know why we have the tf.data design like now, and what it is going to be in the future?
BTW, can I have another way to solve this problem in parallel? As details of my case, my origin data is a list of dictionary as "[{'image_path': a/b/c.jpg, 'class_ids': [1,2,3,4], 'gt_boxes':[[1,2,3,4], [3,2,1,0], [2,2,3,4],[5,6,7,8]] }, {'image_path': a/b/dd.jpg, 'class_ids': [2,3], 'gt_boxes':[[1,2,3,4], [3,2,1,0]]}]", and directly init Dataset with this input will go failed, and I init it with `tf.data.Dataset.range(samples_size)` and transpose them with indices will go to this problem.