Making Restore SavedModel endianness friendly

149 views
Skip to first unread message

Shahid Shaikh

unread,
Apr 10, 2019, 7:29:05 AM4/10/19
to TensorFlow Developers, cind...@ca.ibm.com, nth...@us.ibm.com, abo...@us.ibm.com
The problem:
SavedModel produced by a little endian machine is unreadable on a big endian machine and vice-versa. When Tensorflow serialize a tensor bundle, its dumping the raw contents of the TensorBuffer, which is in the endianness of the architecture it is on. While restoring SavedModel on an architecture with different endianness, Tensorflow identifies that it is reading a tensor bundle with different endianness and fails to load the model.


Analysis:
A tensor bundle is serialized with the help of methods implemented in 'BundleReader' and 'BundleWriter' class (tensorflow/core/util/tensor_bundle/tensor_bundle.cc/.h).
There are two important file formats in which a trained tensor model is saved / restored from disk:
1. A ".index" file which is a string-string immutable table. Each key in this table is a name of a tensor and its value is a serialized BundleEntryProto.
2. A ".data" file contains the content of a tensor i.e. a BundleEntryProto object which describes the metadata of a tensor.

BundleReader class methods are used to access tensor bundles restored in memory from data file.

BundleWriter class methods are used to write the built tensor bundle to data file.

The 'SaveTensors' and 'RestoreTensor' methods (tensorflow/core/kernels/save_restore_tensor.cc/.h) takes the responsibility of sending the tensor context to the writer and read the tensor context from the reader.

The BundleHeaderProto (tensorflow/core/protobuf/tensor_bundle.proto) is a special header associated with every tensor bundle. This header contains information of the bundle's endian-ness.


Proposed solution:
The serialized tensor model should get restored on both little as well as big endian architectures irrespective of the tensor bundle endianness it is saved for. If cross LE/BE scenario are detected, Tensorflow load methods will byteswap the model data after loading the model.

To achieve this, we will need following changes:
1. The endian information in 'BundleHeaderProto' will be used to identify tensor bundle's endianness
2. The 'RestoreTensor' and 'BundleReader' methods will be modified to byteswap the tensor data for cross-architecture scenario after restoring the tensor bundle in memory
3. The 'SaveTensors' and 'BundleWriter' methods will work with their usual functionality to serialize the tensor model in the endianness of the architecture it is on

The code changes which we have identified and listed above are not complete. 


References (for past discussions on this topic):


Looking forward to your suggestions.

Thanks & Regards,
Shahid

Martin Wicke

unread,
Apr 11, 2019, 2:02:02 PM4/11/19
to Shahid Shaikh, Allen Lavoie, Kathy Wu, TensorFlow Developers, cind...@ca.ibm.com, nth...@us.ibm.com, abo...@us.ibm.com
+Allen Lavoie +Kathy Wu 

Could we instead create the convention that serialized bundles are always little-endian, and that the reader of any bundle should byte-swap if the machine is big-endian? That seems to be the smaller change, but I may be missing something.

--
You received this message because you are subscribed to the Google Groups "TensorFlow Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to developers+...@tensorflow.org.
Visit this group at https://groups.google.com/a/tensorflow.org/group/developers/.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/developers/90f6bc58-7b06-4cc8-8b6d-781accd1649f%40tensorflow.org.

Cindy Lee

unread,
Apr 11, 2019, 2:42:05 PM4/11/19
to TensorFlow Developers, sha...@us.ibm.com, all...@google.com, kat...@google.com, cind...@ca.ibm.com, nth...@us.ibm.com, abo...@us.ibm.com
Thanks for the comment. We were proposing like this because we thought this will ensure no change if everything were done on 1 platform only,  LE or BE. The only time we need to do swap is LE going to BE or vice versa. If we always serialized bundles as LE, what will the overhead be like on BE machine? Especially for existing BE user that use tensorflow on BE only, will they end up seeing performance degradation from this change? I don't know tensorflow internal enough to know...
To unsubscribe from this group and stop receiving emails from it, send an email to devel...@tensorflow.org.

Sami Kama

unread,
Apr 11, 2019, 2:50:15 PM4/11/19
to Cindy Lee, TensorFlow Developers, sha...@us.ibm.com, all...@google.com, kat...@google.com, cind...@ca.ibm.com, nth...@us.ibm.com, abo...@us.ibm.com
Wouldn't it be roughly equivalent to a memcpy? And it will be done only at serialization times so shouldn't affect normal operation unless user is frequently checkpointing and loading the model. Having a defined endiannes is usually good idea. Otherwise you have to also mark the whether the file is BE/LE in a way that doesn't depend on current endiannes.

"\_(ッ)_/"

________________________________________
From: Cindy Lee <cws...@gmail.com>
Sent: Thursday, April 11, 2019 11:42:05 AM
To: TensorFlow Developers
Cc: sha...@us.ibm.com; all...@google.com; kat...@google.com; cind...@ca.ibm.com; nth...@us.ibm.com; abo...@us.ibm.com
Subject: Re: Making Restore SavedModel endianness friendly

Thanks for the comment. We were proposing like this because we thought this will ensure no change if everything were done on 1 platform only, LE or BE. The only time we need to do swap is LE going to BE or vice versa. If we always serialized bundles as LE, what will the overhead be like on BE machine? Especially for existing BE user that use tensorflow on BE only, will they end up seeing performance degradation from this change? I don't know tensorflow internal enough to know...

On Thursday, April 11, 2019 at 2:02:02 PM UTC-4, Martin Wicke wrote:
+Allen Lavoie<javascript:> +Kathy Wu<javascript:>

Could we instead create the convention that serialized bundles are always little-endian, and that the reader of any bundle should byte-swap if the machine is big-endian? That seems to be the smaller change, but I may be missing something.

On Wed, Apr 10, 2019 at 4:29 AM Shahid Shaikh <sha...@us.ibm.com<javascript:>> wrote:
The problem:
SavedModel produced by a little endian machine is unreadable on a big endian machine and vice-versa. When Tensorflow serialize a tensor bundle, its dumping the raw contents of the TensorBuffer, which is in the endianness of the architecture it is on. While restoring SavedModel on an architecture with different endianness, Tensorflow identifies that it is reading a tensor bundle with different endianness and fails to load the model.


Analysis:
A tensor bundle is serialized with the help of methods implemented in 'BundleReader' and 'BundleWriter' class (tensorflow/core/util/tensor_bundle/tensor_bundle.cc/.h<http://tensor_bundle.cc/.h>).
There are two important file formats in which a trained tensor model is saved / restored from disk:
1. A ".index" file which is a string-string immutable table. Each key in this table is a name of a tensor and its value is a serialized BundleEntryProto.
2. A ".data" file contains the content of a tensor i.e. a BundleEntryProto object which describes the metadata of a tensor.

BundleReader class methods are used to access tensor bundles restored in memory from data file.

BundleWriter class methods are used to write the built tensor bundle to data file.

The 'SaveTensors' and 'RestoreTensor' methods (tensorflow/core/kernels/save_restore_tensor.cc/.h<http://save_restore_tensor.cc/.h>) takes the responsibility of sending the tensor context to the writer and read the tensor context from the reader.

The BundleHeaderProto (tensorflow/core/protobuf/tensor_bundle.proto) is a special header associated with every tensor bundle. This header contains information of the bundle's endian-ness.


Proposed solution:
The serialized tensor model should get restored on both little as well as big endian architectures irrespective of the tensor bundle endianness it is saved for. If cross LE/BE scenario are detected, Tensorflow load methods will byteswap the model data after loading the model.

To achieve this, we will need following changes:
1. The endian information in 'BundleHeaderProto' will be used to identify tensor bundle's endianness
2. The 'RestoreTensor' and 'BundleReader' methods will be modified to byteswap the tensor data for cross-architecture scenario after restoring the tensor bundle in memory
3. The 'SaveTensors' and 'BundleWriter' methods will work with their usual functionality to serialize the tensor model in the endianness of the architecture it is on

The code changes which we have identified and listed above are not complete.


References (for past discussions on this topic):
https://github.com/tensorflow/tensorflow/issues/11290
https://github.com/tensorflow/tensorflow/issues/16364
https://github.com/tensorflow/tensorflow/pull/16003


Looking forward to your suggestions.

Thanks & Regards,
Shahid

--
You received this message because you are subscribed to the Google Groups "TensorFlow Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to devel...@tensorflow.org<javascript:>.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/developers/90f6bc58-7b06-4cc8-8b6d-781accd1649f%40tensorflow.org<https://groups.google.com/a/tensorflow.org/d/msgid/developers/90f6bc58-7b06-4cc8-8b6d-781accd1649f%40tensorflow.org?utm_medium=email&utm_source=footer>.

--
You received this message because you are subscribed to the Google Groups "TensorFlow Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to developers+...@tensorflow.org<mailto:developers+...@tensorflow.org>.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/developers/e2eba1be-347e-47a1-a5b5-4e0e1272dc37%40tensorflow.org<https://groups.google.com/a/tensorflow.org/d/msgid/developers/e2eba1be-347e-47a1-a5b5-4e0e1272dc37%40tensorflow.org?utm_medium=email&utm_source=footer>.
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

Allen Lavoie

unread,
Apr 11, 2019, 4:21:05 PM4/11/19
to Sami Kama, Cindy Lee, TensorFlow Developers, sha...@us.ibm.com, kat...@google.com, cind...@ca.ibm.com, nth...@us.ibm.com, abo...@us.ibm.com
On Thu, Apr 11, 2019 at 11:50 AM Sami Kama <sk...@nvidia.com> wrote:
Wouldn't it be roughly equivalent to a memcpy? And it will be done only at serialization times so shouldn't affect normal operation unless user is frequently checkpointing and loading the model. Having a defined endiannes is usually good idea. Otherwise you have to also mark the whether the file is BE/LE in a way that doesn't depend on current endiannes.

We do already have this flag in the BundleHeaderProto protocol buffer (it's used to generated the existing error message). So we already have enough information to do reordering either on read or on write. If we do it on write the flag becomes vestigial (which is fine).

I don't think it's a big performance issue either way. But since we already have the flag, doing it on read means existing SavedModels/checkpoints start working when loaded with different endianness, which is nice. So I'd lean slightly toward that.

Shahid Shaikh

unread,
Apr 17, 2019, 9:30:40 AM4/17/19
to TensorFlow Developers, sk...@nvidia.com, cws...@gmail.com, sha...@us.ibm.com, kat...@google.com, cind...@ca.ibm.com, nth...@us.ibm.com, abo...@us.ibm.com
I agree to the concerns raised by Cindy Lee.

We should consider a situation where user wish to continue on the same architecture and trying to load different endianness SavedModel for the first and last time. As with this proposal we are making restore SavedModel operation endianness friendly, in future it won't matter that for which endianness the model got serialized. If we continue to serialize the tensor model in the endianness of the architecture (which is the present behavior), it will reduce the overhead of byte-swapping during restore operation if user is continuing on the same architecture. 

Please share your opinions/thoughts so that we could reach to some conclusion and finalize this proposal.

Thanks & Regards,
Shahid

To unsubscribe from this group and stop receiving emails from it, send an email to devel...@tensorflow.org<mailto:developers+unsu...@tensorflow.org>.

Martin Wicke

unread,
Apr 17, 2019, 11:57:07 AM4/17/19
to Shahid Shaikh, TensorFlow Developers, Sami Kama, cws...@gmail.com, Kathy Wu, cind...@ca.ibm.com, nth...@us.ibm.com, abo...@us.ibm.com
Since we have the information in the SavedModel (and assuming the endinanness field is properly written), we can do the swap on read, so let's do that. 

This does mean that if you read the models on the same architecture, no swapping happens, though since this is an I/O process, I am not concerned about performance here -- the cost of byte swaps should be negligible compared to actually reading or writing the data.

To unsubscribe from this group and stop receiving emails from it, send an email to devel...@tensorflow.org<mailto:developers+...@tensorflow.org>.

Visit this group at https://groups.google.com/a/tensorflow.org/group/developers/.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/developers/e2eba1be-347e-47a1-a5b5-4e0e1272dc37%40tensorflow.org<https://groups.google.com/a/tensorflow.org/d/msgid/developers/e2eba1be-347e-47a1-a5b5-4e0e1272dc37%40tensorflow.org?utm_medium=email&utm_source=footer>.
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information.  Any unauthorized review, use, disclosure or distribution
is prohibited.  If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

--
You received this message because you are subscribed to the Google Groups "TensorFlow Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to developers+...@tensorflow.org.

Frederick R Reiss

unread,
Apr 24, 2019, 12:44:37 PM4/24/19
to wi...@google.com, Amod Borkar, Cindy Lee, cws...@gmail.com, devel...@tensorflow.org, kat...@google.com, Nayana Thorat, Shahid Shaikh, sk...@nvidia.com
Is anyone working on this change? If not, I'd be happy to put in a PR.
 
Fred
 

Cindy Lee

unread,
Apr 25, 2019, 8:20:57 AM4/25/19
to TensorFlow Developers, wi...@google.com, abo...@us.ibm.com, cind...@ca.ibm.com, cws...@gmail.com, kat...@google.com, nth...@us.ibm.com, sha...@us.ibm.com, sk...@nvidia.com
Hi Fred,   My team was about to start but we certainly are not as familiar with the code as u, will be great if u can work on it. Feel free to reach out to me if u need help later like testing different combination(LE<->BE) etc.
To unsubscribe from this group and stop receiving emails from it, send an email to devel...@tensorflow.org<mailto:developers+unsubscribe@tensorflow.org>.

Visit this group at https://groups.google.com/a/tensorflow.org/group/developers/.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/developers/e2eba1be-347e-47a1-a5b5-4e0e1272dc37%40tensorflow.org<https://groups.google.com/a/tensorflow.org/d/msgid/developers/e2eba1be-347e-47a1-a5b5-4e0e1272dc37%40tensorflow.org?utm_medium=email&utm_source=footer>.
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information.  Any unauthorized review, use, disclosure or distribution
is prohibited.  If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

 

--
You received this message because you are subscribed to the Google Groups "TensorFlow Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to devel...@tensorflow.org.

 

--
You received this message because you are subscribed to the Google Groups "TensorFlow Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to devel...@tensorflow.org.
Reply all
Reply to author
Forward
0 new messages