Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

need performance help please - xor of string

79 views
Skip to first unread message

tombert

unread,
Jul 26, 2018, 4:43:01 AM7/26/18
to
Hi all,

I was bit shocked that base64 encoding is so much faster than my tries of simple xor of two bytes. I needed a simple scrambling algorithm making text not readable for humans.
Any ideas of a solution that run faster than the TCL native base64 encoding?

thx

Here's how I tested:

Creating 3MB of data:

string length [set t [string repeat "abc" 1048576]]

Two alternatives to [binary encode base64]:

proc strxor1 {str} {
set result {}
foreach char [split $str ""] {
append result [binary format c [expr {[scan $char %c] ^ 42}]]
}
return $result
}

proc strxor2 {str} {
set result {}
binary scan $str c* chars
foreach char $chars {
append result [binary format c [expr {$char ^ 42}] c]
}
return $result
}


**************************

> time {binary encode base64 $t} 100
11641.15 microseconds per iteration

> time {strxor1 $t} 5
4722118.2 microseconds per iteration

> time {strxor2 $t} 5
1749262.2 microseconds per iteration

Christian Gollwitzer

unread,
Jul 26, 2018, 6:48:49 AM7/26/18
to
Am 26.07.2018 um 10:42 schrieb tombert:
> Hi all,
>
> I was bit shocked that base64 encoding is so much faster than my tries of simple xor of two bytes. I needed a simple scrambling algorithm making text not readable for humans.
> Any ideas of a solution that run faster than the TCL native base64 encoding?

Do you really mean faster than base64, or faster than your experiments
with scrambling?

This is not surprising to me at all. Tcl is slow. If you can't use a
simple function in C (via tcc4tcl maybe?), which would give you an
expected speedup between 10-100x, then maybe you can put together
something with "string map". A simple Cesar chiffre would be possible,
e.g. I guess that string map is still faster than your loop for this case.

Christian



Christian

tombert

unread,
Jul 26, 2018, 8:34:06 AM7/26/18
to
I really mean faster than base64, or at least the same speed without 30% more data.

I now implemented an [xor] in C as a package. It gives me same speed.

Rich

unread,
Jul 26, 2018, 9:47:12 AM7/26/18
to
tombert <tomber...@live.at> wrote:
> Hi all,
>
> I was bit shocked that base64 encoding is so much faster than my
> tries of simple xor of two bytes.

Why would you be shocked? The base64 command is implemented in C, your
xor's below are in Tcl. C will in general always be faster than Tcl,
esp. for the type of use below.

> I needed a simple scrambling algorithm making text not readable for
> humans. Any ideas of a solution that run faster than the TCL native
> base64 encoding?

How not readable do you want. Just "not recognizable as human text", or
"a determined adversary can not work out the origional input", or
somewhere in between?

Someone else in this thread mentioned string map. With an approriate
mapping list, it might in the same speed range as the base64 command.

> string length [set t [string repeat "abc" 1048576]]
> ...
> proc strxor1 {str} {
> set result {}
> foreach char [split $str ""] {
> append result [binary format c [expr {[scan $char %c] ^ 42}]]
> }
> return $result
> }
>
> proc strxor2 {str} {
> set result {}
> binary scan $str c* chars
> foreach char $chars {
> append result [binary format c [expr {$char ^ 42}] c]
> }
> return $result
> }

In both of those above you end up creating 1,048,576 Tcl_Obj's (which
each have to be allocated the first time they get created), then in Tcl
you iterate each Tcl_Obj performing a single xor of one byte stored in
each obj. So there are two pointer indirections (one to go from the
list to the Tcl_Obj, one to go from the Tcl_Obj to the single byte)
plus the xor. The append is likely reasonably quick compared to the
rest of the loop, but it is also incurring two pointer indirections
(get to Obj, get to string) plus Tcl overhead, to append single bytes.
Then, when the foreach exist, all 1,048,576 Tcl_Obj's have to be
deallocated (this will be faster than the alloc, as Tcl will just put
them on an internal free list rather than calling the C free() function
on each).

Neither one of these will match a C variant that can do single pointer
lookups, and direct overwrite (presuming preallocation of the arrays).

for (i=0; i<array_length; i++) {
r[i] = a[i] ^ b[i];
}

That loop above in C would likely compile to an assembly loop of with a
loop core of only 3-4 instructions (the xor itself, an increment for
the pointer, a compare for the array_length (unless the compiler
recognized it could decrement towards zero with the same result) and a
branch to the top of the loop. Tcl will never beat a core loop that is
only 3-4 native CPU instructions.

As another poster mentioned, if you really do want to bulk xor a large
block of data, you'll be much better off performance wise creating a
small Tcl C extension to do the xor operation on the data.

sled...@gmail.com

unread,
Jul 26, 2018, 2:40:32 PM7/26/18
to
How about Blowfish...its available with tcl
Message has been deleted

tombert

unread,
Jul 30, 2018, 5:13:50 AM7/30/18
to
It's pure TCL and not fast enough.
But thx for the tip, I am using blowfish anyhow for another issue.

tombert

unread,
Jul 30, 2018, 5:19:13 AM7/30/18
to
On Thursday, 26 July 2018 15:47:12 UTC+2, Rich wrote:
I followed your advice and did C code, but run into another issue puzzling me: https://groups.google.com/forum/#!topic/comp.lang.tcl/QN9QNKJ_0g0
0 new messages