Why 'tf.python_io.TFRecordWriter' is so SLOW and STORAGE-CONSUMING in TensorFlow?

617 views
Skip to first unread message

Eric Yue

unread,
Sep 19, 2016, 1:21:05 PM9/19/16
to dis...@tensorflow.org

I'm going to write to TFRecord file using this code The problem is that this process is very slow, such that it's not feasible to write a large dataset even in days! It's just a writer that serialize to disk. Why it's so slow?! Another problem is that the size of the output file is 10 times greater than the original file!

does anyone know any way to speed up the process of TFRecordWriter and compress the result?



-------------------------------
Best Regards,
Eric Yue ( Yue Bin )
Beijing,China



Clay Sheppard

unread,
Sep 19, 2016, 9:41:00 PM9/19/16
to Eric Yue, dis...@tensorflow.org

As far as the size, you can write the images as JPEGs or PNGs instead of raw. What size data set is taking days to write?

-Clay


--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/CAN4bGgOR7_QZmJJ%3D2oSs45iZg%3DtBtB-749iAxkCq5WMjNBKYbg%40mail.gmail.com.

Eric Yue

unread,
Sep 19, 2016, 9:54:41 PM9/19/16
to Clay Sheppard, dis...@tensorflow.org
my dataset is about 30G ,textline fotmat .

Clay Sheppard

unread,
Sep 19, 2016, 10:01:50 PM9/19/16
to Eric Yue, dis...@tensorflow.org

Are you doing some kind of processing on it before writing to the file? I can write 2gb of images in a few minutes.

Eric Yue

unread,
Sep 19, 2016, 11:04:17 PM9/19/16
to Clay Sheppard, dis...@tensorflow.org
just as https://github.com/tensorflow/tensorflow/blob/r0.10/tensorflow/examples/how_tos/reading_data/convert_to_records.py#L68
for index in range(num_examples):
image_raw = images[index].tostring()
example = tf.train.Example(features=tf.train.Features(feature={
'height': _int64_feature(rows),
'width': _int64_feature(cols),
'depth': _int64_feature(depth),
'label': _int64_feature(int(labels[index])),
'image_raw': _bytes_feature(image_raw)}))
writer.write(example.SerializeToString())

-------------------------------
Best Regards,
Eric Yue ( Yue Bin )
Beijing,China




Aarya Patel

unread,
Oct 2, 2018, 4:32:53 AM10/2/18
to Discuss
Yes, I'm doing pre-processing which includes Hough Circle Transform, CLAHE and resizing of images before writing to TFRecords. It's taking too much of time.


On Tuesday, 20 September 2016 07:31:50 UTC+5:30, Clay Sheppard wrote:

Are you doing some kind of processing on it before writing to the file? I can write 2gb of images in a few minutes.

On Sep 19, 2016 8:54 PM, "Eric Yue" <hi.moo...@gmail.com> wrote:
my dataset is about 30G ,textline fotmat .


2016年9月20日 +0800 09:40 Clay Sheppard <csh...@gmail.com>,写道:

As far as the size, you can write the images as JPEGs or PNGs instead of raw. What size data set is taking days to write?

-Clay

On Sep 19, 2016 12:21 PM, "Eric Yue" <hi.moo...@gmail.com> wrote:

I'm going to write to TFRecord file using this code The problem is that this process is very slow, such that it's not feasible to write a large dataset even in days! It's just a writer that serialize to disk. Why it's so slow?! Another problem is that the size of the output file is 10 times greater than the original file!

does anyone know any way to speed up the process of TFRecordWriter and compress the result?



-------------------------------
Best Regards,
Eric Yue ( Yue Bin )
Beijing,China



--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.
Reply all
Reply to author
Forward
0 new messages