GWT vs js performance: Collections and Strings

274 views
Skip to first unread message

Vassilis Virvilis

unread,
May 5, 2015, 8:34:46 AM5/5/15
to google-we...@googlegroups.com
Hi,

At some point in time I needed to handle lzw compressing/uncompressing in the client so I found https://gist.github.com/revolunet/843889

Later on I rewrite the LZW in java.

So I did. I benchmarked the difference and the new implementation was way slower when compressing. GWT/java is actually better when uncompressing.

I changed the java collections to javascript collections JsMap, and JsList and the difference was greatly reduced but not enough to be able to scrap the js implementation.

My guess any remaining gains are hidden in the StringBuilder but I may by wrong.

                                    Chrome                      Firefox                   IE
Js                                  544/61                       626/78               795/171
Java                              750/48                       1702/48              2401/468
JavaJsCollections          785/38                       1398/38             1877/351

The numbers are compressing/uncompressing
1000 iterations with a random string multiplied 10 times

I am using a single permutation with collapse-all. Can this be the culprit?

Any idea what to change in order to increase the performance in the java code?

Here is the code


package com.biovista.lib.gwt.client;

import com.google.gwt.core.client.JavaScriptObject;

public class Util {
    private static final char DICT_SIZE = 256;

    public static class JsMap<T> extends JavaScriptObject {
        protected JsMap() {
        }

        public final native void put(String key, T value) /*-{
            this[key] = value;
        }-*/;

        public final native boolean containsKey(String key) /*-{
            return (key in this);
        }-*/;

        public final native T get(String key) /*-{
            return this[key];
        }-*/;

        public static JsMap createInstance() {
            return (JsMap) createObject();
        }
    }

    public static class JsList<T> extends JavaScriptObject {
        protected JsList() {
        }

        public final native void add(T value) /*-{
            this.push(value);
        }-*/;

        public final native int size() /*-{
            return this.length;
        }-*/;

        public final native T get(char index) /*-{
            return this[index];
        }-*/;

        public static JsList createInstance() {
            return (JsList) createArray();
        }
    }

    public static String compressLZW(String text) {
        // Build the dictionary.
        // final Map<String, Character> map = new HashMap<String, Character>(
        // DICT_SIZE);
        final JsMap<Character> map = JsMap.createInstance();
        for (char i = 0; i < DICT_SIZE; i++) {
            map.put("" + i, i);
        }

        char dict_size = DICT_SIZE;

        String w = "";
        final StringBuilder result = new StringBuilder();
        for (char c : text.toCharArray()) {
            final String wc = w + c;
            if (map.containsKey(wc))
                w = wc;
            else {
                result.append(map.get(w));
                // Add wc to the dictionary.
                map.put(wc, dict_size++);
                w = "" + c;
            }
        }

        // Output the code for w.
        if (!w.equals(""))
            result.append(map.get(w));

        return result.toString();
    }

    /** uncompress a string. */
    public static String uncompressLZW(String compressed_text) {
        // Build the dictionary.
        // final List<String> rmap = new ArrayList<String>(DICT_SIZE);
        final JsList<String> rmap = JsList.createInstance();
        for (char i = 0; i < DICT_SIZE; i++) {
            rmap.add("" + i);
        }
        final char[] compressed = compressed_text.toCharArray();

        String w = "" + compressed[0];
        final StringBuilder result = new StringBuilder(w);
        for (int i = 1; i < compressed.length; i++) {
            final char k = compressed[i];
            final String entry = k < rmap.size() ? rmap.get(k) : w
                    + w.charAt(0);
            result.append(entry);
            // Add w+entry[0] to the dictionary.
            rmap.add(w + entry.charAt(0));
            w = entry;
        }
        return result.toString();
    }

    // LZW-compress a string
    public static native String lzwCompress(String s) /*-{
        var dict = {};
        var data = (s + "").split("");
        var out = [];
        var currChar;
        var phrase = data[0];
        var code = 256;
        for (var i = 1; i < data.length; i++) {
            currChar = data[i];
            if (dict[phrase + currChar] != null) {
                phrase += currChar;
            } else {
                out.push(phrase.length > 1 ? dict[phrase] : phrase
                        .charCodeAt(0));
                dict[phrase + currChar] = code;
                code++;
                phrase = currChar;
            }
        }
        out.push(phrase.length > 1 ? dict[phrase] : phrase.charCodeAt(0));
        for (var i = 0; i < out.length; i++) {
            out[i] = String.fromCharCode(out[i]);
        }
        return out.join("");
    }-*/;

    // Decompress an LZW-encoded string
    public static native String lzwDecompress(String s) /*-{
        var dict = {};
        var data = (s + "").split("");
        var currChar = data[0];
        var oldPhrase = currChar;
        var out = [ currChar ];
        var code = 256;
        var phrase;
        for (var i = 1; i < data.length; i++) {
            var currCode = data[i].charCodeAt(0);
            if (currCode < 256) {
                phrase = data[i];
            } else {
                phrase = dict[currCode] ? dict[currCode]
                        : (oldPhrase + currChar);
            }
            out.push(phrase);
            currChar = phrase.charAt(0);
            dict[code] = oldPhrase + currChar;
            code++;
            oldPhrase = phrase;
        }
        return out.join("");
    }-*/;

    public static void benchmark() {
        final int N = 1000;
        final int M = 10;
        final String test1 = "opaopaopaopa236747862348 hfjsgfhjsd g4782q65 87324658764 8 37g fj hdshfjdhjsb hjsgfbhjgb hjdf sjjhsgd 87345y 783256 8736 8573587 68327563";
        String test = "";
        for (int i = 0; i < M; i++) {
            test += test1;
        }

        // log.info("old compress " + lzwCompress(test));
        // log.info("new compress " + compressLZW(test));
        // log.info("old uncompress " + lzwDecompress(lzwCompress(test)));
        // log.info("new uncompress " + uncompressLZW(compressLZW(test)));
        long t1 = System.currentTimeMillis();
        for (int i = 0; i < N; i++) {
            lzwCompress(test);
        }
        long t2 = System.currentTimeMillis();
        for (int i = 0; i < N; i++) {
            compressLZW(test);
        }
        long t3 = System.currentTimeMillis();
        log.info("Compressing: old " + (t2 - t1) + " vs new " + (t3 - t2));

        final String test_c = compressLZW(test);
        t1 = System.currentTimeMillis();
        for (int i = 0; i < N; i++) {
            lzwDecompress(test_c);
        }
        t2 = System.currentTimeMillis();
        for (int i = 0; i < N; i++) {
            uncompressLZW(test_c);
        }
        t3 = System.currentTimeMillis();
        log.info("UnCompressing: old " + (t2 - t1) + " vs new " + (t3 - t2));

        if (!(lzwDecompress(lzwCompress(test)).equals(test))) {
            log.error("old failure");
        }

        if (!(uncompressLZW(compressLZW(test)).equals(test))) {
            log.error("new failure");
        }
    }
}


--
Vassilis Virvilis

JonL

unread,
May 5, 2015, 9:50:00 AM5/5/15
to google-we...@googlegroups.com
Why all the "final" variables in the Java version?  You aren't passing the variables to anonymous inner classes or anything, so there should be no need to mark anything final.  I'm not 100% sure what effect that will have on the output from the GWT compiler though as far as speed.

As far as I can tell, the biggest difference is in the way you are checking if a value is already in  your map.

In java code you are calling :

            if (map.containsKey(wc))
                w = wc;

            map.containsKey converts to: "return (key in this)"

In javascript you are calling:

            if (dict[phrase + currChar] != null) {
                phrase += currChar;


According to the below speed test, "key in this" is the slowest of the options for comparing if an object contains a key and would almost double the runtime.

Vassilis Virvilis

unread,
May 5, 2015, 10:15:23 AM5/5/15
to google-we...@googlegroups.com
Hi,

I am using final out of habit mostly (coming from c++) although I could cite some arguments in favor of using it anywhere possible.

I tried the optimization you suggested and it didn't make a difference. Good catch though...


    Vassilis

--
You received this message because you are subscribed to the Google Groups "Google Web Toolkit" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-web-tool...@googlegroups.com.
To post to this group, send email to google-we...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-web-toolkit.
For more options, visit https://groups.google.com/d/optout.



--
Vassilis Virvilis

Jens

unread,
May 5, 2015, 10:37:41 AM5/5/15
to google-we...@googlegroups.com
GWT already optimizes a HashMap<String, ...> similar to what you have implemented. You can see this in [1] and the specialized methods if the key type is String.class. Also ArrayList is backed by a native JS array. Of course GWT carries some overhead because it emulates the full Java API but because it's already quite optimized you don't see a huge improvement when switching from Java to JavaJsCollection. The largest improvement is on IE but IMHO thats not a surprise.

StringBuilder in GWT is just appending strings and does not use any buffer array like in real Java. It is basically s += toAppend which is pretty fast in browsers.

The biggest difference I see is that the Java / JavaJsCollection version use String.toCharArray() which actually creates a new array, walks the string and fills the array. It is probably faster to do

for (int i = 0; i < string.length(); i++) { 
  char c = string.charAt(i); 
  ....
}

because Java's String.charAt(i) is directly mapped to JavaScript's String.charCodeAt(i) (see [2]) and you avoid the char array creation.




Vassilis Virvilis

unread,
May 5, 2015, 4:26:26 PM5/5/15
to google-we...@googlegroups.com
Jens,

thanks do the suggestion, I tried it with minimal difference.

Tomorrow I will try to profile the compression part.

     Vassilis

--
You received this message because you are subscribed to the Google Groups "Google Web Toolkit" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-web-tool...@googlegroups.com.
To post to this group, send email to google-we...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-web-toolkit.
For more options, visit https://groups.google.com/d/optout.



--
Vassilis Virvilis

Eric Ponthiaux

unread,
May 6, 2015, 5:35:52 AM5/6/15
to google-we...@googlegroups.com
Have you done any testing with JsArray or JsArrayOf ? 

Vassilis Virvilis

unread,
May 6, 2015, 9:13:13 AM5/6/15
to google-we...@googlegroups.com
Hi Eric,

I haven't use the JsArray because the compression algorithm requires map access and not list access.


   Vassilis


--
You received this message because you are subscribed to the Google Groups "Google Web Toolkit" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-web-tool...@googlegroups.com.
To post to this group, send email to google-we...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-web-toolkit.
For more options, visit https://groups.google.com/d/optout.



--
Vassilis Virvilis

Vassilis Virvilis

unread,
May 6, 2015, 10:35:14 AM5/6/15
to google-we...@googlegroups.com
Anyway I profiled a bit. Here is the results

compress total time:  721
compress self: 203  ? What is it doing? The main loop probably
   containsKey 180 (Map operation with jonL suggestion)
   put 164 (Map)
   append self 91 / total 119 (StringBuilder)
           toString() 16 (StringBuilder)
   valueOf 16 (convert dict_size++ to something in order to put it in the map)

The difference with the js version is around 200-250 msec. I can't imagine where the next low hanging fruit would  be.

JonL after implementing __correctly__ your suggestion I was able to trim 40-45 msec from the profile run (SDM). Unfortunately I cannot see that improvement in the fully compiled version.


   Vassilis
--
Vassilis Virvilis

JonL

unread,
May 6, 2015, 11:30:57 AM5/6/15
to google-we...@googlegroups.com
Hmm.. in the Javascript version of the compress, you don't have a loop to initialize the "dict" like you do the map.  It appears you are updating the "dict" during the same loop.  While in the Java version, you have two loops.  Why the slight difference in logic?
To unsubscribe from this group and stop receiving emails from it, send an email to google-web-toolkit+unsub...@googlegroups.com.

To post to this group, send email to google-we...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-web-toolkit.
For more options, visit https://groups.google.com/d/optout.



--
Vassilis Virvilis



--
Vassilis Virvilis

Vassilis Virvilis

unread,
May 6, 2015, 2:35:11 PM5/6/15
to google-we...@googlegroups.com
Personal preference I guess...

It looks more correct to me to have the initialization phase clearly separated from the main loop.

Do you think that something like that can create that difference? I have been surprised in the past trying to understand where the computer spends the time but that would be really - really weird,

      Vassilis

To unsubscribe from this group and stop receiving emails from it, send an email to google-web-tool...@googlegroups.com.
To post to this group, send email to google-we...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-web-toolkit.
For more options, visit https://groups.google.com/d/optout.



--
Vassilis Virvilis



--
Vassilis Virvilis

--
You received this message because you are subscribed to the Google Groups "Google Web Toolkit" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-web-tool...@googlegroups.com.

To post to this group, send email to google-we...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-web-toolkit.
For more options, visit https://groups.google.com/d/optout.



--
Vassilis Virvilis

Jonathon Lamon

unread,
May 6, 2015, 2:44:00 PM5/6/15
to google-we...@googlegroups.com
Why would that be weird for there to be a difference?  The more loops you have, the more computation time.  Your java version is doing 256 additional iterations.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Web Toolkit" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-web-toolkit/XKQRRaPgkw0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-web-tool...@googlegroups.com.

Vassilis Virvilis

unread,
May 6, 2015, 2:54:33 PM5/6/15
to google-we...@googlegroups.com
Oups...

You are right. They are not functionally the same...

My java code is doing more operations.

If the input is 0 bytes it is doing +256 operations
if the input is big enough it eventually phases out the difference. It still makes +256 iterations but without the meat of them that is get/push/append

So maybe - my trivial data input was not a good example.

I will check tomorrow...

    Vassilis

Reply all
Reply to author
Forward
0 new messages