Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

encrypt/decrypt with blowfish

314 views
Skip to first unread message

.

unread,
Jun 5, 2019, 12:17:40 AM6/5/19
to
Hello,

I need your assistance/guidance on this encryption boundary condition and suggestion on best way to handle it for the particularly narrow scope of the problem space that I have and the assumptions we can make about solving it (listed at the bottom after the testcase).

Problem Statement:
There is a tcl file (myProcs.tcl) that I need to encrypt, decrypt, then source in a tclsh shell.

However, I find at the end of the decrypted file there are extra "encrypted chars" such that when it is sourced in a tclsh shell the last line in the file throws an error. (i.e. "source myProcs.tcl") (testcase below)

As a workaround -- I comment out this last line after decryption (ie. everything after the last newline char is trimmed) -- but I would like to handle this better.

I presume these extra chars are actually introduced at the encryption stage because I am not handling the required multiples of 8 bytes assumption that blowfish (and other encryption methods also require of the data it receives):

"Blowfish is a 64-bit block cipher. This means that the data must be provided in units that are a multiple of 8 bytes. The blowfish command will by default add nul characters to pad the input data to a multiple of 8 bytes if necessary"

I have used the -in <channel> option of the "blowfish" command to let it chunk the file -- however obviously at the end of the file the padding gets added and that's really where the crux of the problem is because I was expecting that a padding with a blank char would simply "work". But it does not.

So should I fix it by either:

a) pre-processing the input file into a multiple of 8 bytes (adding padding that is obviously file/format specific but for my purposes padding with 'blank space char' is okay), then encrypt it?

or

b) is there a way to leverage the -pad <char> option of the ::blowfish::blowfish command such that I can pass it an arbitrary "tcl file" and have it encrypt and add "blank spaces"?

or

c) other?

I tried option (b) but it seems to have extra chars on the last line of the decrypted file.

TESTCASE:

./myProcs.tcl
1 proc hello {args} {
2 puts "$args"
3 }
4 proc world {args} {
5 puts "$args"
6 }

./test.tcl
1 package require blowfish
2 # 1.0.4
3
4 set inFile "myProcs.tcl"
5 set outFile "myProcs.dat"
6 set outDecrypt "myProcs.decrypt.tcl"
7
8 set key "KEY"
9 set padChar " "
10
11 set handle(in) [open $inFile rb]
12 set handle(out) [open $outFile wb]
13
14 # Encrypt and write: <>.dat file
15 ::blowfish::blowfish -mode ecb -dir encrypt -key $key -out $handle(out) -pad $padChar -in $handle(in)
16 close $handle(in)
17 close $handle(out)
18
19 # Decrypt and write: <>.decrypt.tcl
20 set handle(in) [open $outFile rb]
21 set handle(out) [open $outDecrypt wb]
22 ::blowfish::blowfish -mode ecb -dir decrypt -key $key -out $handle(out) -pad $padChar -in $handle(in)
23 close $handle(in)
24 close $handle(out)

Output File:
% cat myProcs.decrypt.tcl
1 proc hello {args} {
2 puts "$args"
3 }
4 proc world {args} {
5 puts "$args"
6 }
7 ^@^@^@^@^@^@^@^@^@^@^@^@^@^@à:Ás KH?


ASSUMPTIONS:
The assumptions we can make for viable solutions are as follows (and this should simplify the problem space quite a bit):

blowfish version = 1.0.4 from tcllib
(can be updated if needed)

I control/create the input file and can modify or preprocess it if needed.

I create the output files and can post process any if needed.

I do not care about speed. I do care about that last line (line 7) in the output file.

You can assume the input file format (content of the file) is always going to be composed of just tcl code such that "source myProcs.tcl" can be done on it. The iput file is created by the concatenation of tcl source files into this single file:
Example: cat *.tcl > myProcs.tcl
This creation can be more elaborate if needed to help simplify or resolve the issue above.

Security of the overall solution is handled separately, this is reduction of the problem to the essential part just for the purpose of figuring out the best way to handle the boundary condition issue above (or suggestions).

If I missed anything -- please let me know...I can add details.

Thank you,
Jim

s.effe...@googlemail.com

unread,
Jun 5, 2019, 4:30:53 AM6/5/19
to
Looking at the procedure blowfish::Chunk I can see several problems at once.

1. There's a FIXME comment and it's true. It's especially problematic if the stream is non-blocking and returns less than the desired chunk size even if it's not at the end of the stream.

2. The blowfish::Pad procedure pads an empty string to a string with 8 bytes. This is problematic as reading a stream via file events causes the callback to be called at least twice. Once for the data and once for EOF, that means you'll always have 8 more bytes in addition to the regular padding.

3. The Pad procedure is called without the user provided padding char.

It seems that all problems are neatly gathered in the input stream procedure. You may quickly solve your problem if you read the input yourself and pass it as plain data to the blowfish module.

-- Stephan

Rich

unread,
Jun 5, 2019, 7:42:35 AM6/5/19
to
. <lmn...@gmail.com> wrote:
> Problem Statement:
> There is a tcl file (myProcs.tcl) that I need to encrypt, decrypt,
> then source in a tclsh shell.
>
> However, I find at the end of the decrypted file there are extra
> "encrypted chars" such that when it is sourced in a tclsh shell the
> last line in the file throws an error. (i.e. "source myProcs.tcl")
> (testcase below)

Yes, block ciphers (of which blowfish is one) always operate on data
that is an exact multiple of the block cipher's primitive size (8 bytes
for blowfish).

In cryptogolgy, the handling of arbitrary size inputs while encrypting
with a block cipher is a task that is to be taken care of at a
higher level than the block cipher itself (i.e., the block cipher
always sees a multiple of 8 bytes, and some other code handles the
size missmatch).

You have a couple options:

1) store the length of your input prior to encryption somewhere, use
that stored length to truncate the decrypted output back to the
original size

2) add code to provide one of the standard cryptographic padding
primitives prior to passing the data to blowfish, and remove the
standard paddding after decryption (this does not require a separate
length storage. A description of several standard crypto padding
primitives can be found here: https://en.wikipedia.org/wiki/Padding_(cryptography)
Please note that some of these padding primitives also introduce
weaknesses that allow for decryption in a time less than brute force
(although don't read that as "allow decryption without having key",
for a cryptographer, less than brute force can still mean
"impossible to do in a reasonable time").

3) assuming the -pad option functions properly, and as long as your
input will always be a Tcl script, provide a control-z (EOF)
characters as the parameter to -pad so that the blowfish proc will
pad with ASCII EOF characters. Tcl's source command considers ASCII
EOF to indicate the end of the input file for sourcing Tcl code
(also clearly documented in the 'source' manual page). So when you
source the result of the decrypt, the EOF padding characters will be
legal "end of input" terminators and everything should work.

Option 3, assuming the -pad option functions as documented, and
assuming you always plan to encrypt Tcl scripts, is likely to be the
easiest alternative for you.

sled...@gmail.com

unread,
Jun 5, 2019, 3:32:35 PM6/5/19
to
What am I missing about the problem.

If on the same Win7\10 system, that is handled with no issue by the OS. All my tcl scripts are encrypted.

If on another Win7\10 system, use the same user id...again, I have not had any issues with this form of encryption.

If the tcl files are moved to a system (though be it my laptop for example) then encryption prevents them from being accessed.

Rich

unread,
Jun 5, 2019, 3:56:01 PM6/5/19
to
sled...@gmail.com wrote:
> What am I missing about the problem.

The OP is using the Tcllib blowfish module to encrypt their scripts.

.

unread,
Jun 5, 2019, 4:14:57 PM6/5/19
to
> You may quickly solve your problem if you read the input yourself and pass it as plain data to the blowfish module.

Thank you for the suggestion and for pointing out the issue(s) underlying with the other method.

I solved it as you suggested by reading the input file myself (using a [read <inFile> 4096] loop) that checks each chunk is a multiple of 8-bytes, pads it (if needed) and then passes it to the Encrypt cmd. In my case, the padChar will only be inserted on the last chunk and adds a couple extra blank spaces to the file -- which is okay for me.

Key Ingredients:
::blowfish::Init
::blowfish::Encrypt
Tcl cmd: "read <file> 4096" in a loop to process the entire file chunk-by-chunk.
Check each chunk is 8-byte boundary and insert padChar if a chunk is not (this should only occur on the last chunk).

UPDATED TESTCASE

1 package require blowfish
2 # 1.0.4
3
4 set inFile "myProcs.tcl"
5 set outFile "myProcs.dat"
6 set outDecrypt "myProcs.decrypt.tcl"
7
8 set key "KEY"
9
10 set CHUNK_SIZE 4096
11 set padChar {0}
12 set iv [string repeat "0" 8]
13
14 set handle(in) [open $inFile rb]
15 set handle(out) [open $outFile wb]
16
17 set _key [::blowfish::Init ecb $key $iv]
18
19 while {1} {
20 set data_chunk [read $handle(in) $CHUNK_SIZE]
21 set len [string length $data_chunk]
22
23 # process a chunk (add padding as needed, should only be on the last chunk):
24 if { $len != 0 } {
25 if { ($len % 8) != 0 } {
26 # add <n> padding chars if chunk is not a multiple of 8-bytes
27 set num_chars_2_pad [expr { 8 - ($len % 8) }]
28 set pad_string [string repeat $padChar $num_chars_2_pad]
29 append data_chunk $pad_string
30 }
31 } else {
32 # last chunk was empty, no need to add 8-bytes in this case...
33 break
34 }
35
36 # process the chunk:
37 set len [string length $data_chunk]
38
39 #puts "\[$data_chunk\],\[$len\],\[[expr { ($len % 8) } ]\]"
40
41 set ciphertext [::blowfish::Encrypt ${_key} ${data_chunk}]
42 #set decrypt [::blowfish::Decrypt ${_key} ${ciphertext}]
43
44 #puts " -- C\[$ciphertext\]\n1\n2\n3"
45 #puts " ++ D\[$decrypt\]\n1\n2\n3"
46 puts -nonewline $handle(out) "$ciphertext"
47
48 # if last chunk was also the end of the file...
49 if { [eof $handle(in)] } {
50 break
51 }
52 }
53 close $handle(in)
54 close $handle(out)

Debug or Inspection Suggestion(s):
Uncomment line 39 & 42-45 to inspect results.
Change padChar to "J" or another visible char vs current blank char.
Pass a small input file with maybe one or two lines in it, testing different lengths of input files until satisfied of all results with the combinations of inputs desired.

Jim

.

unread,
Jun 5, 2019, 4:21:26 PM6/5/19
to
On Wednesday, June 5, 2019 at 12:32:35 PM UTC-7, sled...@gmail.com wrote:
> What am I missing about the problem.

See line 7 in the output/result. (FYI: Others offered up great solutions to solve issue on line 7. I picked one of those solutions, implemented it and proceeed to a successful conclusion. Attached updated testcase with solution in that newer post as well).

s.effe...@googlemail.com

unread,
Jun 6, 2019, 4:56:55 AM6/6/19
to
Why are you splitting the stuff into chunks of 4096 bytes? Any reason I'm missing? I think blowfish is doing it just in case someone wants to encrypt a multi-GB file.

Meanwhile, I proposed a fix in tcllib's ticket system (where I attached the patch as a user comment instead of attaching a file, *sigh*).

-- Stephan

Rich

unread,
Jun 6, 2019, 6:44:27 AM6/6/19
to
s.effe...@googlemail.com wrote:
> Why are you splitting the stuff into chunks of 4096 bytes?

No idea, although one guess is it could be to not run out of ram when
encrypting a huge file by constraining the total memory usage.

But much more likely, they just saw some example where someone was
reading in 4096 byte chunks and copied that.

> Any reason I'm missing? I think blowfish is doing it just in case
> someone wants to encrypt a multi-GB file.

No, Blowfish is splitting the data because Blowfish is a block cipher.
https://en.wikipedia.org/wiki/Block_cipher Block ciphers are defined
only in terms of a single block (in Blowfish's case, the size of that
single block is 8 bytes).

To encrypt more than 8 bytes with Blowfish, one has to use the block
cipher in one of the many 'block cipher modes' to extend it to more
than 8 bytes total: https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation

s.effe...@googlemail.com

unread,
Jun 6, 2019, 12:23:39 PM6/6/19
to
Jim, if the file that you want to encrypt easily fits into RAM (i.e. not a rip of the latest Avengers movie), then all you need to do is to simply provide the contents of the file. It's just the input stream feature of blowfish that has a padding bug.

set handle(in) [open $inFile rb]
set data [read $handle(in)]
close $handle(in)

::blowfish::blowfish -mode ecb -dir encrypt -key $key -out $handle(out) -pad $padChar -- $data

That's the only change you have to make and it's the same simple change for decryption. If you can live with the padding characters in the decrypted file (e.g. if you're really encrypting Tcl scripts) then you're done. If you need an exact replication then you need to remember the original file size as Rich already explained.

Rich

unread,
Jun 6, 2019, 12:33:50 PM6/6/19
to
s.effe...@googlemail.com wrote:
> ::blowfish::blowfish -mode ecb -dir encrypt -key $key -out $handle(out) -pad $padChar -- $data

Note that using ecb mode for encrypting an amount of data larger than a
couple cipher blocks in size is not at all secure overall.

Note the ecb example encrypted linux penguin picture on wikipedia here:

https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Electronic_Codebook_%28ECB%29

If the data is more that a couple blowfish blocks in size (and a Tcl
script will likely be), using cbc mode will be much more secure. But
then the OP will have to derive a random IV and store that IV along
with the encrypted data to be able to decrypt later.

.

unread,
Jun 7, 2019, 1:36:02 AM6/7/19
to
Ah, you wanna see the application code (more details), beyond the test case to see how this is actually implemented and performs well.

I used 4096 in the test case to show how to chunk it and it can be changed as needed.[1]

Or as you suggested reading the file all at once into a single variable, however this works only in limited situations.[2]

[1] Enhancing the test case
example with more elaborate chunking definition, or to add threading, reassembling steps are particularly beneficial if you have large files or need better performance. But not necessary to solve the OP boundary issue asked about.

[2] Tcl has a 2GB limit per variable.

So:

set data [read <entire_file>]

Will crash tclsh trying to save > 2GB data to a single variable.

In fact, the 'source' command crashes at some point too so I have a custom source command to deal with large files, threads them and reassembles them into main process memory, etc... (Then there is a db scheme as well)...not everything fits into one solution so multiple concurrent things all working together here...

At any rate, tcl will not crash reading < 2GB maller data into a single variable.

However, that is not the case for my application (tcl data files).

Also agree 4096 isn't optimal for processing a big data file efficiently ignoring blowfish or cipher the chunk size becomes very important as data size increases.

However encumbering the example code with details how to address efficiently processing big files would hide relevant details for solving the issue with "extra chars added at the end of the file".

I do see huge benefit to asking these questions you did, as anybody contemplating their own applications can consider them.

Each issue has a solution or solutions that would work for their needs and would depend on their particulars.

Efficient file processings is a whole other subject (can of worms). You have to consider defining chunk size vs I/O wait, latency, your network, distributing processing across cores (threading) or machines, using a cache, local disk, network disk, a db, etc... So you basically need to consider what is being sent where and when and how much and what response times do you need etc..a very important topic indeed.

I guess the one thing that is generic to all applications I would say is if you can get the data all on local disk, lots of complexity goes away. Unfortunately I have local, network, virtual, cache, distributed, and all sorts of desperately complicated sources of data, permissions, stability (or not stability as the case may be) to contend with...so much of the code is checking, if something can go wrong it will go wrong and making sure it stays resilient in the face of all dangers.

The more control (valid assumptions) you have over these things the better as it can significantly reduce your problem space.

Jim

sled...@gmail.com

unread,
Jun 7, 2019, 8:40:43 PM6/7/19
to
Got that..

Just helping him think outside of the box; particularly since this is simply a file creation issue...and it responds to the described need...

.

unread,
Jun 7, 2019, 9:37:16 PM6/7/19
to
Ah, the assumption that this is a user on a win7/10 machine is not correct and even if it were on win machine I fail to see how that alone addresses the issue asked about?

Sorry, I am finding it hard to believe there was any good intention in your reply.

Maybe I'm wrong and you can add color? Or I'm pessimistic and you had the best intentions?

At any rate, I don't think my view is likely solitary in that regard, even going back and rereading it a couple times now.

If you care to elaborate on your intentions with the first post, it would be well received.

And most importantly, maybe there is a solution I am missing in what you stated? It remains illusive to me.

Jim


0 new messages