Delimited Codec Behavior

7 views
Skip to first unread message

James Bridges

unread,
Jul 9, 2015, 2:23:44 PM7/9/15
to reactor-...@googlegroups.com
Hello,

I am trying to get a feel for codec behavior so I decided to write a little test program

public class CodecTest {
  private final static String filePath = "C:\\Users\\test\\test.txt";

  public static void main(String[] args) {
    Environment.initialize().assignErrorJournal();
      
    Streams.wrap(IO.readFile(filePath, 10))
                .decode(new DelimitedCodec<>((byte)32, true, StandardCodecs.STRING_CODEC)) //32 corresponds to a space
                .consume(d -> {System.out.println(d);}, Throwable::printStackTrace, d -> System.out.println("EOF"));
  }
}



Where test.txt contains the following text:

Hello, the quick brown fox jumps over the lazy dog.

If I omit the chunk-specifier (i.e., 10), Buffer.SMALL_BUFFER_SIZE will be used, I receive the following expected output:

Hello,
the
quick
brown
fox
jumps
over
the
lazy
EOF


Dog did not print because there was not a space. I believe.

If I keep it at 10 I get the output of:

Hello,

quick
wn
fox
ps
over
e
lazy

EOF

In the smaller buffer case it looks as if the buffers do not aggregate in anticipation of another delimiter, and functions through the flows are limited by the parent's buffer-size. Is this the expected behavior?

I ask because what I am ultimately trying to do is convert a stream of bytes into a message, and then pass that message along the stream. The stream of bytes contains a magic word and a length to separate each message. There is no guarantee that a read from the file (based on chunk size) will contain a full message; it could contain parts, or multiple messages (so the ability to buffer is necessary). Codec may not be the best way, but it seemed straightforward at first (because the stream is ultimately a collection of messages); map might have its uses. There shouldn't be multiple threads attacking this bit of code either since it's job is only to create messages for downstream subscribers. Any insight would be great.

Thanks,
James

Stephane Maldini

unread,
Jul 9, 2015, 6:14:55 PM7/9/15
to James Bridges, reactor-framework
Codec are under review for 2.1 for the specific reason you highlighted, they can need aggregation over multiple chunks and they can produce more results than expected too (multiple delimiters inside a same chunks).

Still there is an issue with dog as you highlighted that might be fixed in a next release (having just released 2.0.4). Codec should be your answer in the end but right now it might be an issue you have to bypass using a Stream for instance.
Suggestions welcome, the place where we plan to address that is in BufferCodec.decode/encode methods.

--
You received this message because you are subscribed to the Google Groups "reactor-framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to reactor-framew...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Stephane Maldini | Solutions Architect, CSO EMEA | London | Pivotal

Reply all
Reply to author
Forward
0 new messages