Trying to find an approach to implement BatchNormalization

83 views
Skip to first unread message

Alexey Zinoviev

unread,
Jul 27, 2020, 1:42:53 PM7/27/20
to SIG JVM
Good evening, dear community! Hope it's right place for this discussion.

I have next problem: trying to repeat one of the modern CNN architectures on Java API. Most of them are using BatchNormalization as a popular layer with tf.nn.batchNormalization() op.

I trying to use old operands like BatchNormWithGlobalNormalization
 I've used but got Exception in thread "main" org.tensorflow.exceptions.TFUnimplementedException: Op BatchNormWithGlobalNormalization is not available in GraphDef version 175. It has been removed in version 9. Use tf.nn.batch_normalization().
at org.tensorflow.internal.c_api.AbstractTF_Status.throwExceptionIfNotOK(AbstractTF_Status.java:99)

This was deperecated years ago, but we have it in 1.15 and 2.x APIs. 

I am trying to implement it according simple schema https://r2rt.com/implementing-batch-normalization-in-tensorflow.html

* BatchNorm contains trainable params and as a result it participates in gradient calculation too (the internal state close to Optimizer and its internal variables, but it's part of the model weights). 

But it uses tf.nn.moments which is not presented in our API. Also all known Batch related operands are waiting results of tf.nn.moments as input parameters (looks strange).

@Jim I know you explored a lot of missed things in CC API, maybe you faced with this problem? I suppose it will be a big problem in Java Keras usage without such Layer

@Karl do we have a chance to fix this problem? Have you had deal with such kind of normalization?

Does anybody has working example with BatchNorm? 

If you have any ideas or related experience, please share in this thread.
If someone has a working snippet, could you share please?  

Alex

Jim Clarke

unread,
Jul 27, 2020, 2:05:44 PM7/27/20
to Alexey Zinoviev, SIG JVM
Alexey,

I haven’t come across this one, but ran across tf.tensordot this morning.  Previously I ran into tf.nn.sparse_softmax_cross_entropy_with_logits()
which I reimplemented in Java, but this is not a part of the main repository, yet.

There are a number of Ops implemented primarily in Python. 
I will do a scan of the python code to see if there is an easy way to flag them all.
We need to come up with a common approach on how we are to handle  these “higher level” Ops.
Should we reuse the same pattern that the C-api Ops use with Scope and @Operator(), @EndPoint() annotations?

jim
--
You received this message because you are subscribed to the Google Groups "SIG JVM" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jvm+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/jvm/9c64a90b-6c6d-4325-83cb-01296c37e9c9n%40tensorflow.org.

Jim Clarke

unread,
Jul 27, 2020, 4:25:08 PM7/27/20
to Alexey Zinoviev, SIG JVM
I did a quick scan of tf.math, and the following don’t appear to be implemented in Java.

  • confusion_matrix  (I have a version of this)
  • is_non_decreasing
  • is_strictly_increasing
  • l2_normalize (I have a version of this)
  • lbeta
  • logical_xor
  • polyval
  • scalar_mul
  • sobol_sample  (found org.tensorflow.op.math.SobolSample.java  missing @Operator(group = "math”) annotation)
  • top_k (I have a version of this).
  • unsorted_segment_sqrt_n
  • zero_fraction
  • reduce_euclidean_norm
  • reduce_logsumexp
  • reduce_std
  • reduce_variance

I didn’t count the methods that were deprecated.

Some of these would be rather easy to implement as they rely on other low-level ops.


jim

Alexey Zinoviev

unread,
Jul 28, 2020, 3:58:13 AM7/28/20
to Jim Clarke, SIG JVM
Agree, that part of tf.math could be re-implemented in Java, the huge problem here is the missed low-level functions in tf.nn.
Like in situation with BatchNormGlobalization which is deprecated and could not be used with TF Runtime for many years maybe we should not to generate ops for them (if it's deprecated and doesn't work)

What do you think, Jim, maybe we could collect together a list of missed points and unimplemented ops/gradients to discuss in the next meeting?

Alex

пн, 27 июл. 2020 г. в 23:25, Jim Clarke <jimcla...@gmail.com>:

Karl Lessard

unread,
Jul 28, 2020, 6:42:20 PM7/28/20
to Alexey Zinoviev, Jim Clarke, SIG JVM
By the way, it is super easy to prevent the generation of some ops, it would be great if you can compile up such list as you go with your tests.

On Jul 28, 2020, at 03:58, Alexey Zinoviev <zalesl...@gmail.com> wrote:



Jim Clarke

unread,
Aug 3, 2020, 8:39:18 AM8/3/20
to Alexey Zinoviev, SIG JVM
Alexey,

In Keras, the BatchNormalization layer is implemented mostly in Python. There are calls to “nn.fused_batch_norm”, “nn.moments” and “nn.batch_normalization”,
but, these are a small part of the total code source. (python/keras/layers/normalization.py).

jim

On Jul 27, 2020, at 1:42 PM, Alexey Zinoviev <zalesl...@gmail.com> wrote:

Alexey Zinoviev

unread,
Aug 17, 2020, 10:58:20 AM8/17/20
to SIG JVM, jimcla...@gmail.com, SIG JVM, Alexey Zinoviev
I'll add here, in this thread for memory: found yet one operations without gradient support

"Dear community, I found that yet one operation (tf.concat) have no gradient implementation (I got an exception Exception in thread "main" org.tensorflow.TensorFlowException: No gradient defined for op: ConcatV2. Please see https://www.tensorflow.org/code/tensorflow/cc/gradients/README.md for instructions on how to add C++ gradients.) What's the best way to get these gradients or anothers? Could you give me any advice here: wait implementation silently, report the issue to the TensorFlow tracker, waiting next meetin to rise the questions for googlers, try to make PR?"

понедельник, 3 августа 2020 г. в 15:39:18 UTC+3, jimcla...@gmail.com:

Andrew Schaumberg

unread,
Sep 24, 2020, 1:57:37 AM9/24/20
to Alexey Zinoviev, SIG JVM, jimcla...@gmail.com
Hi Alexey & all,

In the interim, would Concat work better than ConcatV2, since ConcatV2 doesn't have gradients implemented yet?  I have this problem too.
https://github.com/tensorflow/java/blob/1d35c17dcc85286a91f59a6ff0b94c48f1b8d4b1/tensorflow-core/tensorflow-core-api/src/gen/java/org/tensorflow/op/core/Concat.java#L52

Concat.java is automatically-generated and can't be edited, but might these temporary changes be OK?
# diff -u ./tensorflow-core/tensorflow-core-api/src/bazel/api_def/api_def_ConcatV2.pbtxt.ORIG ./tensorflow-core/tensorflow-core-api/src/bazel/api_def/api_def_ConcatV2.pbtxt
--- ./tensorflow-core/tensorflow-core-api/src/bazel/api_def/api_def_ConcatV2.pbtxt.ORIG 2020-08-18 09:03:38.724458973 -0400
+++ ./tensorflow-core/tensorflow-core-api/src/bazel/api_def/api_def_ConcatV2.pbtxt      2020-09-23 22:16:01.411494082 -0400
@@ -1,6 +1,4 @@
 op {
   graph_op_name: "ConcatV2"
-  endpoint {
-    name: "Concat"
-  }
+  visibility: SKIP
 }
# diff -u ./tensorflow-core/tensorflow-core-api/src/bazel/api_def/api_def_Concat.pbtxt.ORIG ./tensorflow-core/tensorflow-core-api/src/bazel/api_def/api_def_Concat.pbtxt
--- ./tensorflow-core/tensorflow-core-api/src/bazel/api_def/api_def_Concat.pbtxt.ORIG   2020-08-18 09:03:38.724458973 -0400
+++ ./tensorflow-core/tensorflow-core-api/src/bazel/api_def/api_def_Concat.pbtxt        2020-09-23 22:15:42.883753337 -0400
@@ -1,4 +1,6 @@
 op {
   graph_op_name: "Concat"
-  visibility: SKIP
+  endpoint {
+    name: "Concat"
+  }
 }
# diff -u ./tensorflow-core/tensorflow-core-api/src/gen/java/org/tensorflow/op/core/Concat.java.ORIG ./tensorflow-core/tensorflow-core-api/src/gen/java/org/tensorflow/op/core/Concat.java
--- ./tensorflow-core/tensorflow-core-api/src/gen/java/org/tensorflow/op/core/Concat.java.ORIG  2020-09-23 22:02:29.554854302 -0400
+++ ./tensorflow-core/tensorflow-core-api/src/gen/java/org/tensorflow/op/core/Concat.java       2020-09-24 01:45:35.712629366 -0400
@@ -48,10 +48,10 @@
    * @return a new instance of Concat
    */
   @Endpoint(describeByClass = true)
-  public static <T extends TType, U extends TNumber> Concat<T> create(Scope scope, Iterable<Operand<T>> values, Operand<U> axis) {
-    OperationBuilder opBuilder = scope.env().opBuilder("ConcatV2", scope.makeOpName("Concat"));
+  public static <T extends TType, U extends TNumber> Concat<T> create(Scope scope, Iterable<Operand<T>> values, Operand<U> concatDim) {
+    OperationBuilder opBuilder = scope.env().opBuilder("Concat", scope.makeOpName("Concat"));
+    opBuilder.addInput(concatDim.asOutput());
     opBuilder.addInputList(Operands.asOutputs(values));
-    opBuilder.addInput(axis.asOutput());
     opBuilder = scope.applyControlDependencies(opBuilder);
     return new Concat<T>(opBuilder.build());
   }

I can get that to build, but would any colleagues kindly recommend if there any reason this won't work?  Is there a way to make it work better?  Happy to help here, under some guidance.
-Andrew

Andrew Schaumberg

unread,
Sep 24, 2020, 2:17:01 AM9/24/20
to Alexey Zinoviev, SIG JVM, jimcla...@gmail.com
Hi Alexey & all,

It turns out there aren't gradients for v1 Concat either, which is surprising because I've used Concat in Keras.  Here's the same error when TF-Java uses Concat rather than ConcatV2:
Exception in thread "main" org.tensorflow.exceptions.TensorFlowException: No gradient defined for op: Concat. Please see https://www.tensorflow.org/code/tensorflow/cc/gradients/README.md for instructions on how to add C++ gradients.
        at org.tensorflow.internal.c_api.AbstractTF_Status.throwExceptionIfNotOK(AbstractTF_Status.java:101)
        at org.tensorflow.Graph.addGradients(Graph.java:649)
        at org.tensorflow.Graph.addGradients(Graph.java:267)
        at org.tensorflow.Graph.addGradients(Graph.java:301)
        at org.tensorflow.framework.optimizers.Optimizer.computeGradients(Optimizer.java:113)
        at org.tensorflow.framework.optimizers.Optimizer.minimize(Optimizer.java:94)
        at org.tensorflow.framework.optimizers.Optimizer.minimize(Optimizer.java:90)
        ...

Why might Concat and ConcatV2 both not have gradients?  Is something gradient-related missing from the Tensorflow used by TensorflowJava?  Is it advised that I try to add gradients myself to a fork?

Thanks for your time and expertise,
-Andrew

Alexey Zinoviev

unread,
Sep 24, 2020, 7:09:16 AM9/24/20
to Andrew Schaumberg, SIG JVM, jimcla...@gmail.com
Seems like a few operators have no gradients at the moment. And the file with gradient implementation in Core TensorFlow isn't updated for many months.

I asked on TensorFlow dev-list about this situation, the answer was: no plans to add something in the nearest future, but feel free to add your own implementation.

I could help with the test and review.

чт, 24 сент. 2020 г. в 09:17, Andrew Schaumberg <schaumbe...@gmail.com>:

Andrew Schaumberg

unread,
Sep 24, 2020, 11:09:49 AM9/24/20
to Alexey Zinoviev, SIG JVM, jimcla...@gmail.com
Hi Alexey & all,

If one vector is carefully zero-padded to the left, and another vector is carefully zero-padded to the right, may the vector sum (tf.add) of these two padded vectors become a 'fake concat'?

I'm wondering if this would possibly be a temporary work-around for concat.

Thanks again,
-Andrew

P.S. There's a prior TF issue about no gradients in Concat https://github.com/tensorflow/tensorflow/issues/19944
There is some mention at the end about registering via REGISTER_OP_GRADIENT vs REGISTER_GRADIENT_OP, which may be what Alexey is alluding to.

TF 2.2 has Concat and ConcatV2 grads, so I'm not sure how they're missing in TF-Java (perhaps they need to be registered with REGISTER_GRADIENT_OP instead of REGISTER_OP_GRADIENT?)
Also, ConcatGrad strings are in the built libtensorflow.so, so not sure how grads can be missing:
# strings ./libtensorflow.so|grep ConcatGrad
ConcatGrad
_ZGVZZN10tensorflow16ConcatGradHelperERKNS_9AttrSliceEPNS_11FunctionDefEbENKUliPKcE_clEiS6_E17vmodule_activated
_ZZZN10tensorflow16ConcatGradHelperERKNS_9AttrSliceEPNS_11FunctionDefEbENKUliPKcE_clEiS6_E17vmodule_activated
_ZN10tensorflow16ConcatGradHelperERKNS_9AttrSliceEPNS_11FunctionDefEb.cold.194
_ZN10tensorflow10ConcatGradERKNS_9AttrSliceEPNS_11FunctionDefE
_ZN10tensorflow12ConcatGradV2ERKNS_9AttrSliceEPNS_11FunctionDefE
_ZN10tensorflow16ConcatGradHelperERKNS_9AttrSliceEPNS_11FunctionDefEb


Andrew Schaumberg

unread,
Sep 24, 2020, 6:21:50 PM9/24/20
to Alexey Zinoviev, SIG JVM, jimcla...@gmail.com
Hi Alexey & all,

The zero-padding (tf.pad) and adding (tf.math.add) approach to make a "fake concat" doesn't fail due to missing gradients!

The general approach is for (c) to be a fake concat:
a = (some, numbers, in, this, vector, 0, 0, 0)
b = (0, 0, 0, 0, 0, numbers, here, too)
c = a+b = (some, numbers, in, this, vector, numbers, here, too)

If there's any reason this won't actually work, please let me know, this is not my expertise,
-Andrew
Reply all
Reply to author
Forward
0 new messages