Avoid using escaped/unescaped to enhance performance?

26 views
Skip to first unread message

Mark Li

unread,
Aug 23, 2014, 2:54:36 AM8/23/14
to ocaml...@googlegroups.com
Hi,

I understand Writer.write_line writes a newline at the end as a delimiter. Thus, Reader.read_line can read up to newline to get an integral message.
But the data I need to transfer is encrypted and may contain a number of characters need to be escaped, including newline character. What I did is to first encrypt the data, then escape it, 
and call Writer.write_line. When I read data from Reader.read_line, I first 'unescaped' it by calling function in Core_extended.Extended_string and then decrypt the data I got.

Is there a way to complete the same operation without escaping and unescaping because I think doing escaping/unescaping may lose some performance, and could be somehow avoided?

Malcolm Matalka

unread,
Aug 23, 2014, 5:03:51 AM8/23/14
to ocaml...@googlegroups.com

Why does the data need to be escaped?  Can't you just transfer the raw bytes with some length information?

--
You received this message because you are subscribed to the Google Groups "ocaml-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ocaml-core+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mark Li

unread,
Aug 23, 2014, 5:10:54 AM8/23/14
to ocaml...@googlegroups.com
I guess after encryption, the data contains some newline('\n') characters. If I call Reader.read_line, I can only read a part of the data, which cannot be decrypted since I use block cipher AES.

Malcolm Matalka

unread,
Aug 23, 2014, 5:12:36 AM8/23/14
to ocaml...@googlegroups.com

Right, but if you transmit length and the data you don't have to use read_line.

Mark Li

unread,
Aug 23, 2014, 5:19:17 AM8/23/14
to ocaml...@googlegroups.com
Do you mean that I can first write the length of the bytes to be transferred, then read the length, and then read up to that number of bytes? But that means the data being transferred will have a pattern, a header indicating the length. I want to figure out a way to avoid data having any header or showing any pattern.

On Saturday, August 23, 2014 5:03:51 PM UTC+8, Malcolm wrote:

David House

unread,
Aug 23, 2014, 6:29:18 AM8/23/14
to ocaml...@googlegroups.com

The only ways I know of for encoding strings are:

1. Explicitly annotate with the length.

2. Have a termination character, like null or newline, and escape the character wherever it appears in your string.

3. Have a termination character, and ensure somehow that that character never appears in your string. (Eg C null terminated strings.)

You could maybe do this by mapping your byte stream to an encoding that uses a smaller alphabet, eg by base64 encoding your byte stream. But this is just as computationally expensive as escaping, and furthermore there is still the problem of having an obvious pattern: rather than being preceeded with a length, your data will be followed by a termination character.

4. Have all strings be a fixed length. That might be an option for you: perhaps you could split your plaintext into chunks of a fixed length, pad the last chunk so it is also that length, then encrypt the chunks. Presumably then your ciphertext is also of a fixed length, which could be known by your decoder.

What do standard encryption tools tend to do in this case? This feels like it should be a solved problem.

Malcolm Matalka

unread,
Aug 23, 2014, 6:51:49 AM8/23/14
to ocaml...@googlegroups.com

You have to encode the length somehow.  Currently you're implicitly doing it with new lines.  It's semantically equivalent to giving the length at the beginning.  Doing it at the beginning just means you don't have to deal with escapes.

Mark Li

unread,
Aug 23, 2014, 6:59:04 AM8/23/14
to ocaml...@googlegroups.com
Thank you, David.

My code uses the second way now and I think I may lose some performance.
The encryption tool I use is Xavier Leroy's Cryptokit. My encryptor and decryptor is simply written as below:

  let encryptor ~key ~iv ~plain = 
    let encBox = Cipher.aes ~pad:Padding.length ~iv key Cipher.Encrypt in
    return (transform_string encBox plain)

  let decryptor ~key ~iv ~ctext =
    let decBox = Cipher.aes ~pad:Padding.length ~iv key Cipher.Decrypt in
    return (transform_string decBox ctext)

Signature:
 val encryptor : key:string -> iv:string -> plain:string -> string Deferred.t
    
 val decryptor : key:string -> iv:string -> cipher:string -> string Deferred.t

Say the length of data is 16 bytes. The encrypted data will be padded to have length of 32 bytes.
if ... 15 bytes, .... will be padded to 16 bytes.

Option 4 seems nice because I am actually reading in plain text into a buffer of fixed size (size of 4096 bytes, for example), which means that I can naturally process data as chunks. And at the end, the plain text I read in may have size less than 4096 bytes, I can pad it into 4096 bytes. All plain text chunks are 4096 bytes and encrypted chunks should also have same length(maybe 4112?). I then need to hardcode the number 4112 in my program...

The only problem is left is how I pad that last chunk of data and restore it... I would appreciate if you could offer some advice.

Thanks.

Mark Li

unread,
Aug 23, 2014, 7:01:59 AM8/23/14
to ocaml...@googlegroups.com
Yeah, you are right Malcolm. I am just doing it implicitly somehow, which makes no difference. This means both indicating the length at the beginning and adding newline character at the end are deprecated for me now.
Reply all
Reply to author
Forward
0 new messages