Hi XLA team and friends,
I built a model with tf.function and enabled XLA JIT. The model has input shape (None, 16) and all ops' the data type is int32. Basically, the model performs mathmatical operations with ops: tf.math.add, tf.math.mul, tf.math.bitwise_xor, tf.gather, tf.gather
I'd like to understand whether the input batch size will affect auto-clustering and fusion during inference time. First, I give input with batch size 10 and dump the XLA programs by setting XLA_FLAGS=--xla_dump_to. Second, I give input with a large batch size of 3162277. During execution, tensorflow gives the warning "Allocation of 3440557376 exceeds 10% of free system memory." With the same XLA_FLAGS, I dump and get another XLA programs.
By using diff command to compare the *.after_optimizations.txt files of two batch sizes, I find there are lots of difference. Basically, when with the small batch size, each fusion op seems to contain more tensorflow ops. Therefore, *.after_optimizations.txt file of the large batch size has more ops than that of small batch size. I have attached the two *.after_optimizations.txt files.
Would input batch size affect auto clustering and op fusion? If so, how does the large batch size affect auto clustering and op fusion?