Conversion from String to OutputStream without heap allocation

166 views
Skip to first unread message

David Ryan

unread,
Jul 19, 2017, 12:02:55 AM7/19/17
to mechanical-sympathy

Hi all,

My first post here.  Not sure it's the best place for it, but hoping someone here might be able to assist.  We're developers of a streaming application that does 100k+ messages per second processing, so anything that allocates to the heap can cause GC pressure.  We've been targeting removing allocations recently and one difficult ones is conversion from String to UTF-8 to OutputStream.  The advised methods for String to UTF-8 all create byte[] as intermediary objects.

We've been able to read Strings from streams using a ThreadLocal ByteBuffer and CharBuffer which will allocate the char[] and String object only.  For reference, we've done the following.  In a 1minute JMC Flight Recorder char[] and String make up about 1 gig or allocations which is unavoidable because we're processing a lot of string data:

    public static final class StringBuffers
    {
        ByteBuffer buffer = ByteBuffer.allocate(512);
        CharBuffer charBuffer = CharBuffer.allocate(300);
        CharsetDecoder decoder = Charset.forName("UTF8").newDecoder();
    }

    public static final class U8Utf8MethodHandleReader extends AbstractReader
    {
        private final ThreadLocal<StringBuffers> buffers = new ThreadLocal<StringBuffers>()
        {
            @Override
            public StringBuffers initialValue()
            {
                return new StringBuffers();
            }
        };

        public U8Utf8MethodHandleReader(final MethodHandle setHandle)
        {
            super(setHandle);
        }

        @Override
        public void read(final Object o, final TypeInputStream in) throws Throwable
        {
            final int len = in.read();

            // Grab a thread local set of buffers to use temporarily.
            final StringBuffers buf = buffers.get();

            // get a reference to the buffers.
            final ByteBuffer b = buf.buffer;
            final CharBuffer c = buf.charBuffer;

            b.clear();
            c.clear();

            // read the stream into the byte buffer.
            in.getStream().read(b.array(), 0, len);
            b.limit(len);

            // decode the bytes into the char buffer.
            final CharsetDecoder decoder = buf.decoder;
            decoder.reset();
            decoder.decode(b, c, true);

            // flip the char buffer.
            c.flip();

            // get a copy of
            final String str = c.toString();

            // finally set the string value via method handle.
            setHandle.invoke(o, str);
        }
    }

For writing Strings we've tried a similar method:

    public static final class StringBuffers
    {
        ByteBuffer buffer = ByteBuffer.allocate(512);
        CharsetEncoder encoder = Charset.forName("UTF8").newEncoder();
    }

    public static final class U8Utf8MethodHandleWriter extends AbstractWriter
    {
        private final ThreadLocal<StringBuffers> buffers = new ThreadLocal<StringBuffers>()
        {
            @Override
            public StringBuffers initialValue()
            {
                return new StringBuffers();
            }
        };

        public U8Utf8MethodHandleWriter(final MethodHandle getHandle)
        {
            super(getHandle);
        }

        @Override
        public void write(final Object o, final TypeOutputStream out) throws Throwable
        {
            // finally set the string value.
            final String str = (String) getHandle.invoke(o);

            final OutputStream os = out.getStream();

            // empty strings just write 0 for length.
            if (str == null)
            {
                os.write(0);
                return;
            }

            // Grab a thread local set of buffers to use temporarily.
            final StringBuffers buf = buffers.get();

            // get a reference to the buffers.
            final ByteBuffer b = buf.buffer;

            // this does allocate an object, but at least it isn't copying the buffer!
            final CharBuffer c = CharBuffer.wrap(str);

            // clear the byte buffer.
            b.clear();

            // decode the bytes into the char buffer.
            final CharsetEncoder encoder = buf.encoder;
            encoder.reset();
            encoder.encode(c, b, true);

            // flip the char buffer.
            b.flip();

            final int size = b.limit();

            if (size > 255)
            {
                throw new TypeException("u8ascii: String length exceeded max length of 255.  len =" + size);
            }

            if (writeNotNull)
            {
                os.write(1);
            }

            os.write(size);
            os.write(b.array(), 0, size);
        }
    }

The offending CharBuffer.wrap(str) currently allocates 766MB in a one minute period and has the largest allocation profile for the application. 

Interested if anyone else has found a better solution for this or can suggest alternative solutions.

Thanks,
David.




Avi Kivity

unread,
Jul 19, 2017, 3:28:32 AM7/19/17
to mechanica...@googlegroups.com, David Ryan

Out of curiosity, if you're doing heavy duty processing, why did you choose a garbage-collected language?

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Ryan

unread,
Jul 19, 2017, 3:44:27 AM7/19/17
to Avi Kivity, mechanica...@googlegroups.com
Many reasons to have chosen Java.  Very good proven libraries support for a start.  Proven performance.  Many good off-heap libraries and designs that have proven it works well for heavy duty processing, lots of great tools such as Java Mission Control for debugging, etc, etc.

Of course, just after I sent the post I found a way to do what I wanted.  For some reason I missed the CharBuffer.put(CharSequence str) method.  I've since updated and now getter lower memory pressure.

Next stage is to perform string de-duplication before actually creating the string.  Useful for some specific fields in the application that acts as keys, etc.

Here's the code for anyone interested. 

    public static final class StringBuffers
    {
        ByteBuffer buffer = ByteBuffer.allocate(512);
        CharBuffer charBuffer = CharBuffer.allocate(300);
        CharsetEncoder encoder = Charset.forName("UTF8").newEncoder();
    }

    public static final class U8Utf8MethodHandleWriter extends AbstractWriter
    {
        private final ThreadLocal<StringBuffers> buffers = new ThreadLocal<StringBuffers>()
        {
            @Override
            public StringBuffers initialValue()
            {
                return new StringBuffers();
            }
        };

        public U8Utf8MethodHandleWriter(final MethodHandle getHandle, final boolean writeNotNull)
        {
            super(getHandle, writeNotNull);

        }

        @Override
        public void write(final Object o, final TypeOutputStream out) throws Throwable
        {
            // get the string from the getter.

            final String str = (String) getHandle.invoke(o);

            final OutputStream os = out.getStream();

            // empty strings just write 0 for null.

            if (str == null)
            {
                os.write(0);
                return;
            }

            // Grab a thread local set of buffers to use temporarily.
            final StringBuffers buf = buffers.get();

            // get a reference to the buffers.
            final ByteBuffer b = buf.buffer;

            // copy the string into the charBuffer. Better than CharBuffer.wrap(str) because it doesn't allocate a buffer.
            // replace CPU for allocations.

            final CharBuffer c = buf.charBuffer;
            c.clear();
            c.append(str);
            c.flip();


            // clear the byte buffer.
            b.clear();

            // decode the bytes into the char buffer.
            final CharsetEncoder encoder = buf.encoder;
            encoder.reset();
            encoder.encode(c, b, true);

            // flip the char buffer.
            b.flip();

            final int size = b.limit();

            if (size > 255)
            {
                throw new TypeException("u8utf8: String length exceeded max length of 255.  len =" + size);
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Nikolay Tsankov

unread,
Jul 19, 2017, 5:44:57 AM7/19/17
to mechanica...@googlegroups.com
I think Martin explained the reasoning quite well in his talk High Performance Managed Languages

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages