Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

a fairly simple encryption algorithm...

635 views
Skip to first unread message

Chris M. Thomasson

unread,
Aug 8, 2018, 5:00:35 PM8/8/18
to
When you get some time to spare, and are interested, read all:

http://funwithfractals.atspace.cc/ct_cipher

and try this C and Python code for starters:

https://pastebin.com/raw/feUnA3kP
(C)

https://pastebin.com/raw/NAnsBJAZ
(Python Test Vector)

Now, notice in my crude little paper, it recommends using a TRNG! Well,
this experimental test code uses the damn rand() function in C. It
should be output from a TRNG. Anyway, here is the programs usage:

Usage:
__________________________
Usage: program in_file out_file mode_flag

mode_flag -e is encrypt where the in_file gets encrypted as out_file

mode_flag -d is decrypt where the in_file gets decrypted as out_file

Example:

program plaintext ciphertext -e
program ciphertext plaintext_decrypt -d
__________________________

Can you get it to work? Can you get the encrypt/decrypt cycle to work in C?

Thanks.

Chris M. Thomasson

unread,
Aug 12, 2018, 6:48:59 PM8/12/18
to
On 8/8/2018 2:00 PM, Chris M. Thomasson wrote:
> When you get some time to spare, and are interested, read all:
>
> http://funwithfractals.atspace.cc/ct_cipher
>
> and try this C and Python code for starters:
>
> https://pastebin.com/raw/feUnA3kP
> (C)
[...]

using the following, fairly clean, SHA2 lib:

https://github.com/ogay/hmac

You can use any hash you want, just make sure that taking a digest does
not alter the current state. The function that does this is:
__________________________
void ct_hmac_sha256_digest(
hmac_sha256_ctx* ctx,
unsigned char* digest
) {
hmac_sha256_ctx ctx_copy = *ctx;
hmac_sha256_final(&ctx_copy, digest, HMAC_SZ);
}
__________________________


It takes a total copy of the HMAC internal state, then performs the
digest on said copy.

> Can you get it to work? Can you get the encrypt/decrypt cycle to work in C?
>
> Thanks.

Has anybody even tried it out?

Chris M. Thomasson

unread,
Aug 16, 2018, 2:53:22 AM8/16/18
to
On 8/12/2018 3:48 PM, Chris M. Thomasson wrote:
> On 8/8/2018 2:00 PM, Chris M. Thomasson wrote:
>> When you get some time to spare, and are interested, read all:
>>
>> http://funwithfractals.atspace.cc/ct_cipher
>>
>> and try this C and Python code for starters:
>>
>> https://pastebin.com/raw/feUnA3kP
>> (C)
[...]
> Has anybody even tried it out?

Even one little try at a go?

Scott

unread,
Aug 16, 2018, 4:47:05 AM8/16/18
to
I looked. The code on the first page won't compile. Something to do
with not being C code, I think.

The second link, well, any site that needs me to turn on javascript,
turn on cookies, turn on tracking stripping, turn on ads, etc. for
even minimal read-only usability? That's no kind of site I'll want to
visit again.

fir

unread,
Aug 16, 2018, 5:45:02 AM8/16/18
to
you would need to write easy to understanding introduction in post.. something that will encourage someone ito going that topic a bit

then if some will introduce slightly then he may introduce more in depth.. otherwise it is no much zeal to go into that

Bart

unread,
Aug 16, 2018, 12:43:13 PM8/16/18
to
The first link is to Python code.

The second link is to a raw text page, so you shouldn't have any such
problems.




--
bart

Bart

unread,
Aug 16, 2018, 12:50:51 PM8/16/18
to
I tried compiling the C code (using your code plus 4 .c and .h files
from the github link).

It compiled and ran (that is, converting from plaintext to ciphertext,
and back to plainttext) with gcc, pelles C and MSVC.

lccwin and DMC had compile errors (to do with initialisation).

tcc (not the latest) compiled it, but crashed on the first line of output.

Mine had a compile error with a different initialisation problem (a
limitation in my compiler that doesn't like dynamic init info for
structs), but after working around that, it also crashed. So something
to look into.

(Note it's a bit worrying when running the decipher part, when it says
"plaintext:..." at the end, followed by a bunch of hex codes. It's just
a hex dump of the text, but looks as cryptic as the ciphertext.)

Anyway it looks like it'll be a nice benchmark when I get it working.
And there will be working versions to help with that.

--
bart

Chris M. Thomasson

unread,
Aug 16, 2018, 10:47:36 PM8/16/18
to
On 8/16/2018 9:50 AM, Bart wrote:
> On 16/08/2018 07:53, Chris M. Thomasson wrote:
>> On 8/12/2018 3:48 PM, Chris M. Thomasson wrote:
>>> On 8/8/2018 2:00 PM, Chris M. Thomasson wrote:
>>>> When you get some time to spare, and are interested, read all:
>>>>
>>>> http://funwithfractals.atspace.cc/ct_cipher
>>>>
>>>> and try this C and Python code for starters:
>>>>
>>>> https://pastebin.com/raw/feUnA3kP
>>>> (C)
>> [...]
>>> Has anybody even tried it out?
>>
>> Even one little try at a go?
>
> I tried compiling the C code (using your code plus 4 .c and .h files
> from the github link).
>
> It compiled and ran (that is, converting from plaintext to ciphertext,
> and back to plainttext) with gcc, pelles C and MSVC.

I have all of those compilers, and everything works for me as well.
Thank you so much for taking the time to create this excellent write up.
I really do appreciate it Bart.


> lccwin and DMC had compile errors (to do with initialisation).
>
> tcc (not the latest) compiled it, but crashed on the first line of output.

DAMN! Does the latest tcc run it without crashing? Ahhh, I can try this
for myself.


> Mine had a compile error with a different initialisation problem (a
> limitation in my compiler that doesn't like dynamic init info for
> structs), but after working around that, it also crashed. So something
> to look into.

Most definitely! Do not have tcc and DMC installed. Will do. Strange. I
need to look over the code to see if I missed some damn bugger. Can
anybody notice any blatant errors in the code itself?

Humm... A nice error test would be to use a different hash lib, and see
if it crashes or not.


> (Note it's a bit worrying when running the decipher part, when it says
> "plaintext:..." at the end, followed by a bunch of hex codes. It's just
> a hex dump of the text, but looks as cryptic as the ciphertext.)

Okay. Actually, the dumps do not need to be in there in the first place.
I sort of needed to output hex to handle non-printable characters, and
the program is in a highly experimental state.


> Anyway it looks like it'll be a nice benchmark when I get it working.
> And there will be working versions to help with that.

Thanks again. I will install tcc and DMC, and try to reproduce the
error. Fun times! :^)

Chris M. Thomasson

unread,
Aug 16, 2018, 11:51:43 PM8/16/18
to
On 8/16/2018 9:50 AM, Bart wrote:
> On 16/08/2018 07:53, Chris M. Thomasson wrote:
>> On 8/12/2018 3:48 PM, Chris M. Thomasson wrote:
>>> On 8/8/2018 2:00 PM, Chris M. Thomasson wrote:
>>>> When you get some time to spare, and are interested, read all:
>>>>
>>>> http://funwithfractals.atspace.cc/ct_cipher
>>>>
>>>> and try this C and Python code for starters:
>>>>
>>>> https://pastebin.com/raw/feUnA3kP
>>>> (C)
>> [...]
>>> Has anybody even tried it out?
>>
>> Even one little try at a go?
>
> I tried compiling the C code (using your code plus 4 .c and .h files
> from the github link).
>
> It compiled and ran (that is, converting from plaintext to ciphertext,
> and back to plainttext) with gcc, pelles C and MSVC.
>
> lccwin and DMC had compile errors (to do with initialisation).
>
> tcc (not the latest) compiled it, but crashed on the first line of output.

Just tried it out on TCC version:

tcc version 0.9.27 (x86_64 Windows)

and everything worked out fine. Humm, what version are you using?

Still need to try out DMC.

Bart

unread,
Aug 17, 2018, 5:22:14 AM8/17/18
to
On 17/08/2018 04:51, Chris M. Thomasson wrote:
> On 8/16/2018 9:50 AM, Bart wrote:

>> tcc (not the latest) compiled it, but crashed on the first line of
>> output.
>
> Just tried it out on TCC version:
>
> tcc version 0.9.27 (x86_64 Windows)
>
> and everything worked out fine. Humm, what version are you using?

Mine might have been one step before that (and several years older).

I remember that it had some bugs to do with returning structs by value.
And my compiler's problems are to do with its struct handling. So it
sounds like they've fixed that, and there's nothing wrong with your code.

(Well, I've no idea about its cryptographic properties.)

> Still need to try out DMC.

I'll tell you the error in my next post as my system is unstable ATM and
won't run any command prompts, I'll have to restart. Usenet tends to
still work though...

--
bart

Bart

unread,
Aug 17, 2018, 5:35:59 AM8/17/18
to
On 17/08/2018 04:51, Chris M. Thomasson wrote:
> On 8/16/2018 9:50 AM, Bart wrote:

>> lccwin and DMC had compile errors (to do with initialisation).

> Still need to try out DMC.

DMC doesn't like this line of cipher.c (your program):

struct ct_buf buf = { calloc(1, file_sz), file_sz };
^
c:\c\cipher.c(101) : Error: constant initializer expected

(Actually neither does my compiler, but one bug is, it doesn't detect
that particular dynamic initialisation, only when it contains a pointer
to a local variable as happens later on. Then buf contains garbage.)

So that might be a compiler limitation (DMC is very old however).

lccwin doesn't like lines like this:

unsigned char update[2] = {
P_byte, C_byte
};

Another dynamic initialisation? (But it was OK with the buf declaration
earlier.)

I think if these are OK with C, then it's not your problem.

--
bart

Chris M. Thomasson

unread,
Aug 20, 2018, 12:36:12 AM8/20/18
to
I think these are okay in C11 and/or C99. The one with calloc should be
okay as well. Humm... Not exactly sure about the latter. Still have not
installed DMC, will do.

Chris M. Thomasson

unread,
Aug 20, 2018, 12:41:04 AM8/20/18
to
On 8/17/2018 2:22 AM, Bart wrote:
> On 17/08/2018 04:51, Chris M. Thomasson wrote:
>> On 8/16/2018 9:50 AM, Bart wrote:
>
>>> tcc (not the latest) compiled it, but crashed on the first line of
>>> output.
>>
>> Just tried it out on TCC version:
>>
>> tcc version 0.9.27 (x86_64 Windows)
>>
>> and everything worked out fine. Humm, what version are you using?
>
> Mine might have been one step before that (and several years older).
>
> I remember that it had some bugs to do with returning structs by value.
> And my compiler's problems are to do with its struct handling. So it
> sounds like they've fixed that, and there's nothing wrong with your code.
>
> (Well, I've no idea about its cryptographic properties.)

It uses, or abuses?, HMAC to gain total ciphertext sensitivity down to
the bit level. Perhaps, play Eve for a moment, when you get some time
try to change any bit within a ciphertext as in some sort of
man-in-the-middle attack. Then decrypt, playing the role of Bob. Well,
do you get any part of the original plaintext? I think not.

Not exactly sure about the cryptographic properties, however I think
they are "okay", in a sense. Need to use a good TRNG to generate the
random bytes... Need to use a good crypto HMAC, with a crypto secure hash.


>
>> Still need to try out DMC.
>
> I'll tell you the error in my next post as my system is unstable ATM and
> won't run any command prompts, I'll have to restart. Usenet tends to
> still work though...
>

Thanks.

Chris M. Thomasson

unread,
Aug 21, 2018, 12:39:29 AM8/21/18
to
Oh yeah... Fwiw, I installed DMC, and see the problems it is having.
Worked around them, and got the whole thing executing perfectly where I
can run full encrypt/decrypt cycles. Here is a main.c with some quick
little fixes I added in. They are sufficient enough to get DMC up and
working with my cipher:

https://pastebin.com/raw/gHuNmAXm
(raw text, ;^)

Notice the odd differences wrt the initialization of structs? For some
reason, it is giving me a warning about:
______________________
main.c:
for (file_sz = 0; fgetc(file) != EOF; ++file_sz);
^
main.c(86) : Warning 7: possible extraneous ';'
sha2.c:
hmac_sha2.c:
link main+sha2+hmac_sha2,,,user32+kernel32/noi;
______________________


Fwiw, the version of DMC I grabbed is:
______________________
Digital Mars Compiler Version 8.42n
Copyright (C) Digital Mars 2000-2004. All Rights Reserved.
Written by Walter Bright www.digitalmars.com/ctg/sc.html
DMC is a one-step program to compile and link C++, C and ASM files.
______________________


My cipher is working on DMC! Nice. Btw, thanks a million for your most
valuable input Bart. I seriously appreciate all of your time wrt the
help in slaying these compiler issues.

Fwuw, can you get it to work on your DMC? Btw, what version do you have?

Thanks again. :^D

James Kuyper

unread,
Aug 21, 2018, 8:34:04 AM8/21/18
to
On 08/21/2018 12:39 AM, Chris M. Thomasson wrote:
...
> Notice the odd differences wrt the initialization of structs? For some
> reason, it is giving me a warning about:
> ______________________
> main.c:
> for (file_sz = 0; fgetc(file) != EOF; ++file_sz);
> ^
> main.c(86) : Warning 7: possible extraneous ';'

Try changing the final ';' to "continue;" If that doesn't work, try {}.

Ben Bacarisse

unread,
Aug 21, 2018, 10:05:44 AM8/21/18
to
Or even

file_sz = 0; while (fgetc(file) != EOF) ++file_sz;

which seems to me more direct.

--
Ben.

Bart

unread,
Aug 21, 2018, 10:22:15 AM8/21/18
to
On 21/08/2018 05:39, Chris M. Thomasson wrote:
> On 8/19/2018 9:36 PM, Chris M. Thomasson wrote:

> https://pastebin.com/raw/gHuNmAXm
> (raw text, ;^)
>
> Notice the odd differences wrt the initialization of structs? For some
> reason, it is giving me a warning about:
> ______________________
> main.c:
>     for (file_sz = 0; fgetc(file) != EOF; ++file_sz);
>                                                    ^
> main.c(86) : Warning 7: possible extraneous ';'

That's one of those harmless warnings that it can be hard to get rid of,
without working around the problem, eg. using {} instead of ;

That means having to modify legal C code to get slightly different legal
code but which the compiler won't object to. (This is why I used to hide
all warnings.)

> My cipher is working on DMC! Nice. Btw, thanks a million for your most
> valuable input Bart. I seriously appreciate all of your time wrt the
> help in slaying these compiler issues.
>
> Fwuw, can you get it to work on your DMC? Btw, what version do you have?

Same as yours; it's been frozen since 2004.

It works now on 7 Windows compilers. (Including mine, which needed a bug
fix, and a tweak of the code. Later I'll remove the need for a tweak.)
Tiny C was updated to the recent version.


(I turned it into a benchmark, which involved disabling the printfs that
generate large amounts of output.

The task was encrypting 1 million lines of plaintext, then decrypting,
then comparing the two lots of plaintext to verify. (The latter
involving using 'FC' on Windows, which is included in the timings, but
it's a maximum 5% of runtime).

Compilers on Windows performed as follows, with relative timings

Pelles C 1.0 The fastest (was roughly 10 seconds)
MSVC 1.1
lccwin 1.2
gcc 5.1.0 1.3
DMC 1.8 (32 bits; rest are 64 bits)
Tiny C (.27) 2.6
bcc (mine) 3.1

(Mine could do with some attention [with optimising calls I think], but
I'm mainly concerning with getting it working. Code generation efforts
will be focused on my other compiler, and eventually any improvements
will trickle down.

Being C, if a faster program is needed, you just run it through another
compiler.))

--
bart

Ben Bacarisse

unread,
Aug 21, 2018, 10:50:25 AM8/21/18
to
Bart <b...@freeuk.com> writes:

> (I turned it into a benchmark, which involved disabling the printfs
> that generate large amounts of output.

(I think it's the SHA library you are benchmarking. When I tried it,
96% of the time was spent in three sha256_* functions.)

> The task was encrypting 1 million lines of plaintext, then decrypting,
> then comparing the two lots of plaintext to verify. (The latter
> involving using 'FC' on Windows, which is included in the timings, but
> it's a maximum 5% of runtime).
>
> Compilers on Windows performed as follows, with relative timings
>
> Pelles C 1.0 The fastest (was roughly 10 seconds)
> MSVC 1.1
> lccwin 1.2
> gcc 5.1.0 1.3
> DMC 1.8 (32 bits; rest are 64 bits)
> Tiny C (.27) 2.6
> bcc (mine) 3.1

Why do you keep posting lists like this with no indication of the flags
used? In this case the optimisation is likely to be highly significant.
I see a factor of three (slightly more in fact) between gcc with -O2 and
gcc without.

--
Ben.

Ben Bacarisse

unread,
Aug 21, 2018, 10:59:47 AM8/21/18
to
Sorry, I meant a factor of two, not three (2.2 in fact).

--
Ben.

Bart

unread,
Aug 21, 2018, 12:26:12 PM8/21/18
to
Usually I will say that optimisation is turned on. And usually that will
be the maximum level on the product (although two of them don't have any
optimisation).

And usually when gcc is outperformed by lesser compilers (and I got only
a 1.5 factor between gcc -O3 and -O0), that tends to mean it is using a
slower standard library.

I haven't looked into the program to see how much is I/O, apart from
obvious printf calls. Doing that now, I see file i/o is done a byte at a
time via fgetc and fputc.

I'd have to mess with the program to change that, and then probably the
timings will open up (and means the relative speed of mine will get
worse, so I think that can wait...).

Doing this timing however has highlighted that aspect.

--
bart

Bart

unread,
Aug 21, 2018, 1:45:56 PM8/21/18
to
Here are revised timings with faster i/o. Task is the same one of
encrypting/decrypting a 1M line text file, but now excludes verification
(done separately):

gcc 5.1.0 1.0 -O3
MSVC 1.0 /O2
Pelles C 1.6 -Ot
DMC 1.8 -o (32-bits, rest are 64)
lccwin 2.0 -O
Tiny C .27 3.2
bcc (mine) 4.2

As I said, this makes a useful benchmark, and one that is not trivial (I
don't think so anyway, unless I discover most of its time is spent in
one small function).

The modified main program used is here: https://pastebin.com/raw/JXRF2Sjj

--
bart

Chris M. Thomasson

unread,
Aug 21, 2018, 5:47:08 PM8/21/18
to
On 8/19/2018 9:40 PM, Chris M. Thomasson wrote:
> On 8/17/2018 2:22 AM, Bart wrote:
>> On 17/08/2018 04:51, Chris M. Thomasson wrote:
>>> On 8/16/2018 9:50 AM, Bart wrote:
>>
>>>> tcc (not the latest) compiled it, but crashed on the first line of
>>>> output.
>>>
>>> Just tried it out on TCC version:
>>>
>>> tcc version 0.9.27 (x86_64 Windows)
>>>
>>> and everything worked out fine. Humm, what version are you using?
>>
>> Mine might have been one step before that (and several years older).
>>
>> I remember that it had some bugs to do with returning structs by
>> value. And my compiler's problems are to do with its struct handling.
>> So it sounds like they've fixed that, and there's nothing wrong with
>> your code.
>>
>> (Well, I've no idea about its cryptographic properties.)
>
> It uses, or abuses?, HMAC to gain total ciphertext sensitivity down to
> the bit level. Perhaps, play Eve for a moment, when you get some time
> try to change any bit within a ciphertext as in some sort of
> man-in-the-middle attack. Then decrypt, playing the role of Bob. Well,
> do you get any part of the original plaintext? I think not.
>
> Not exactly sure about the cryptographic properties, however I think
> they are "okay", in a sense. Need to use a good TRNG to generate the
> random bytes... Need to use a good crypto HMAC, with a crypto secure hash.

I forgot to mention that the "Password" is hardcoded in. Next version
will be more elaborate and allow a user to load in a file for the secret
key. The HMAC key should be generated by a good crypto TRNG, and perhaps
very slightly altered by the creator. For instance, you create a little
input, then the rest of the key is derived from a crypto TRNG, perhaps
even hashed with the user input. Humm...

Ben Bacarisse

unread,
Aug 21, 2018, 6:36:10 PM8/21/18
to
I still don't know what flags you used for those compilations.

> And usually when gcc is outperformed by lesser compilers (and I got
> only a 1.5 factor between gcc -O3 and -O0), that tends to mean it is
> using a slower standard library.

I don't follow that.

> I haven't looked into the program to see how much is I/O, apart from
> obvious printf calls. Doing that now, I see file i/o is done a byte at
> a time via fgetc and fputc.

The program spends 96% of the time in the SHA2 library. I/O has nothing
(significant) to do with it, at least on Linux. Windows I/O does seem
to be slightly higher cost, but I doubt it can make much difference here.

> I'd have to mess with the program to change that, and then probably
> the timings will open up (and means the relative speed of mine will
> get worse, so I think that can wait...).
>
> Doing this timing however has highlighted that aspect.

What aspect? What does profiling on Windows show the program is doing?
Is it significantly different to what I reported?

--
Ben.

Ben Bacarisse

unread,
Aug 21, 2018, 6:44:32 PM8/21/18
to
Bart <b...@freeuk.com> writes:

> As I said, this makes a useful benchmark, and one that is not trivial
> (I don't think so anyway, unless I discover most of its time is spent
> in one small function).

On Linux, it spends almost all of the time in one function:
sha256_transf (more than 83%).

--
Ben.

Bart

unread,
Aug 21, 2018, 8:09:47 PM8/21/18
to
On 21/08/2018 23:36, Ben Bacarisse wrote:
> Bart <b...@freeuk.com> writes:

>> Usually I will say that optimisation is turned on. And usually that
>> will be the maximum level on the product (although two of them don't
>> have any optimisation).
>
> I still don't know what flags you used for those compilations.

See my other post on this. With gcc I always use -O3 to optimise.

>> And usually when gcc is outperformed by lesser compilers (and I got
>> only a 1.5 factor between gcc -O3 and -O0), that tends to mean it is
>> using a slower standard library.
>
> I don't follow that.
>
>> I haven't looked into the program to see how much is I/O, apart from
>> obvious printf calls. Doing that now, I see file i/o is done a byte at
>> a time via fgetc and fputc.
>
> The program spends 96% of the time in the SHA2 library. I/O has nothing
> (significant) to do with it, at least on Linux. Windows I/O does seem
> to be slightly higher cost, but I doubt it can make much difference here.

On Windows, gcc -O3 took about 13 seconds in all for the task (including
0.5 seconds for file compare).

After I switched from fgetc/fputc to block-fread/fwrite, gcc -O3 took
just under 6 seconds for the same task (this time excluding the 0.5
seconds file compare). (I think next time I'll leave in the actual timings.)

So speeding up file i/o doubled the speed for gcc on Windows. This was
the aspect I was talking about. gcc must use a slower file i/o library
than the three other compilers that were a bit faster.

--
bart

Bart

unread,
Aug 21, 2018, 8:34:21 PM8/21/18
to
You might be right.

If I put 'return;' at the top of that function, than gcc finishes in 2
seconds instead of 6, so 2/3 of the time is spent there.

The proportion is higher for the poorer compilers. But the code has an
option UNROLL_LOOPS macro, which significantly improves performance for
compilers like bcc and Tiny C, and the proportion spent in sha256_transf
is smaller (but still 75% with my compiler).

On gcc it makes no difference; presumably it can do its own loop unrolling.

>> in one small function).

Well, sha256_transf is not that small a function, especially with the
UNROLL macro defined. But it has lost some interest for me as a
benchmark suitable for honing a compiler of programs rather than of
functions and bottlenecks.

--
bart

Ben Bacarisse

unread,
Aug 21, 2018, 9:07:23 PM8/21/18
to
Bart <b...@freeuk.com> writes:

> On 21/08/2018 23:36, Ben Bacarisse wrote:
>> Bart <b...@freeuk.com> writes:
>
>>> Usually I will say that optimisation is turned on. And usually that
>>> will be the maximum level on the product (although two of them don't
>>> have any optimisation).
>>
>> I still don't know what flags you used for those compilations.
>
> See my other post on this. With gcc I always use -O3 to optimise.
>
>>> And usually when gcc is outperformed by lesser compilers (and I got
>>> only a 1.5 factor between gcc -O3 and -O0), that tends to mean it is
>>> using a slower standard library.
>>
>> I don't follow that.
>>
>>> I haven't looked into the program to see how much is I/O, apart from
>>> obvious printf calls. Doing that now, I see file i/o is done a byte at
>>> a time via fgetc and fputc.
>>
>> The program spends 96% of the time in the SHA2 library. I/O has nothing
>> (significant) to do with it, at least on Linux. Windows I/O does seem
>> to be slightly higher cost, but I doubt it can make much difference here.
>
> On Windows, gcc -O3 took about 13 seconds in all for the task
> (including 0.5 seconds for file compare).
>
> After I switched from fgetc/fputc to block-fread/fwrite, gcc -O3 took
> just under 6 seconds for the same task (this time excluding the 0.5
> seconds file compare). (I think next time I'll leave in the actual
> timings.)

Yes, that would help in these sorts of cases.

> So speeding up file i/o doubled the speed for gcc on Windows. This was
> the aspect I was talking about. gcc must use a slower file i/o library
> than the three other compilers that were a bit faster.

It was hard to unravel your figures, but changing the relative speeds
you give back into times, and subtracting 0.5s (for the compare) from
the first set you gave I get this table:

using using relative
fgetc block I/O change
gcc 5.1.0 12.5 6.0 2.1
MSVC 10.5 6.0 1.8
Pelles C 9.5 9.6 1.0
DMC 17.5 10.8 1.6
lccwin 11.5 12.0 1.0
Tiny C .27 25.5 19.2 1.3
bcc (mine) 30.5 25.2 1.2

So the change to block I/O also pretty much doubles the speed for MSVC
as well.

(Extrapolating from the rough times you give for the fastest entries may
be causing some oddities because both Pelles and lccwin got slower.)

Do you know what libc your gcc is using? Some installations of gcc on
Windows use Microsoft's C library. That would explain the almost
parallel change is speed for both programs. And I don't think you can't
really say that the I/O library is slow since the end result is fast.
You could say that the fgetc/fputc implementation is slow, but not the
I/O library by itself.

But the results overall still surprise me since on my Linux box the
program spends no noticeable time doing I/O at all. You would likely
get more revealing results from profiling.

--
Ben.

David Brown

unread,
Aug 22, 2018, 3:52:57 AM8/22/18
to
On 22/08/18 03:07, Ben Bacarisse wrote:

>
> But the results overall still surprise me since on my Linux box the
> program spends no noticeable time doing I/O at all. You would likely
> get more revealing results from profiling.
>

This could depend on the Windows version. I gather Windows 10 is a bit
better here, but for older Windows versions small file I/O can often be
very much slower than Linux because it does a poor job of caching. On
Linux, tests like this will likely never touch the disk, but on Windows
it will. So if you want to benchmark the calculations rather than the
disk speed, avoid actually reading files on Windows.

Bart

unread,
Aug 22, 2018, 6:33:51 AM8/22/18
to
On 22/08/2018 02:07, Ben Bacarisse wrote:
> Bart <b...@freeuk.com> writes:

>> So speeding up file i/o doubled the speed for gcc on Windows. This was
>> the aspect I was talking about. gcc must use a slower file i/o library
>> than the three other compilers that were a bit faster.
>
> It was hard to unravel your figures, but changing the relative speeds
> you give back into times, and subtracting 0.5s (for the compare) from
> the first set you gave I get this table:
>
> using using relative
> fgetc block I/O change
> gcc 5.1.0 12.5 6.0 2.1
> MSVC 10.5 6.0 1.8
> Pelles C 9.5 9.6 1.0
> DMC 17.5 10.8 1.6
> lccwin 11.5 12.0 1.0
> Tiny C .27 25.5 19.2 1.3
> bcc (mine) 30.5 25.2 1.2
>
> So the change to block I/O also pretty much doubles the speed for MSVC
> as well.
>
> (Extrapolating from the rough times you give for the fastest entries may
> be causing some oddities because both Pelles and lccwin got slower.)

That's a pretty good summary. Although the actual block I/O for tcc and
bcc were 1-2 seconds faster, which means that if you took the absolute
differences in timings, then gcc, tcc and bcc are all 6-7 seconds
faster. So maybe they're all using the MS library (except MSVC...).

> You could say that the fgetc/fputc implementation is slow, but not the
> I/O library by itself.

You usually know particular C standard functions are slower than they
ought to be, when gcc is out-performed by smaller compilers.

> But the results overall still surprise me since on my Linux box the
> program spends no noticeable time doing I/O at all. You would likely
> get more revealing results from profiling.

Does it have a spinning hard drive?

Note that my test invoked this program twice, once to convert file A to
B, and again to convert B to C, in all cases using explicit files, not
pipes.

Actually, if I run this program in a script language which also uses
block I/O:

data := readstrfile("A")
writestrfile("B", data)

data := readstrfile("B")
writestrfile("C", data)

it only takes 0.3 seconds, using a spinning hard drive, but taking
advantage of any file caching.

So that's probably the true overhead. ("A" is a 977Kline, 22.5MB text
file, mostly consisting of all the source codes of my projects.)

--
bart

Ben Bacarisse

unread,
Aug 22, 2018, 6:54:14 AM8/22/18
to
It shouldn't be a mystery what library is being linked against. I don't
know enough about Windows to tell you how to find out, but it must be
possible, surely? And what do you mean they might all be using the MS
library except MSVC?

--
Ben.

Bart

unread,
Aug 22, 2018, 7:12:50 AM8/22/18
to
In the case of gcc/mingw/tdm/whatever, everything is a mystery.

In the case of bcc, it dynamically (never statically) links against
msvcrt.dll, so that one is easy to determine.

With gcc, perhaps you can get a clue from how it calls its linker (see
below), and its list of 1700 .a files.

> know enough about Windows to tell you how to find out, but it must be
> possible, surely? And what do you mean they might all be using the MS
> library except MSVC?

I wasn't being serious, only remarking on the speed-up only being 4
seconds instead of 6 so maybe it was using something better.

--------------------------------

0:
c:/TDM/bin/../lib/gcc/x86_64-w64-mingw32/5.1.0/../../../../x86_64-w64-mingw32/bin/ld.exe
1: -plugin
2: c:/TDM/bin/../libexec/gcc/x86_64-w64-mingw32/5.1.0/liblto_plugin-0.dll
3:
-plugin-opt=c:/TDM/bin/../libexec/gcc/x86_64-w64-mingw32/5.1.0/lto-wrapper.exe
4: -plugin-opt=-fresolution=C:\Users\user\AppData\Local\Temp\ccLmE3cV.res
5: -plugin-opt=-pass-through=-lmingw32
6: -plugin-opt=-pass-through=-lgcc
7: -plugin-opt=-pass-through=-lmoldname
8: -plugin-opt=-pass-through=-lmingwex
9: -plugin-opt=-pass-through=-lmsvcrt
10: -plugin-opt=-pass-through=-lpthread
11: -plugin-opt=-pass-through=-ladvapi32
12: -plugin-opt=-pass-through=-lshell32
13: -plugin-opt=-pass-through=-luser32
14: -plugin-opt=-pass-through=-lkernel32
15: -plugin-opt=-pass-through=-lmingw32
16: -plugin-opt=-pass-through=-lgcc
17: -plugin-opt=-pass-through=-lmoldname
18: -plugin-opt=-pass-through=-lmingwex
19: -plugin-opt=-pass-through=-lmsvcrt
20: -m
21: i386pep
22: --exclude-libs=libpthread.a
23: -Bdynamic
24:
c:/TDM/bin/../lib/gcc/x86_64-w64-mingw32/5.1.0/../../../../x86_64-w64-mingw32/lib/../lib/crt2.o
25: c:/TDM/bin/../lib/gcc/x86_64-w64-mingw32/5.1.0/crtbegin.o
26: -Lc:/TDM/bin/../lib/gcc/x86_64-w64-mingw32/5.1.0
27: -Lc:/TDM/bin/../lib/gcc
28:
-Lc:/TDM/bin/../lib/gcc/x86_64-w64-mingw32/5.1.0/../../../../x86_64-w64-mingw32/lib/../lib
29: -Lc:/TDM/bin/../lib/gcc/x86_64-w64-mingw32/5.1.0/../../../../lib
30:
-Lc:/TDM/bin/../lib/gcc/x86_64-w64-mingw32/5.1.0/../../../../x86_64-w64-mingw32/lib
31: -Lc:/TDM/bin/../lib/gcc/x86_64-w64-mingw32/5.1.0/../../..
32: cipher.obj
33: sha2.obj
34: hmac.obj
35: -lmingw32
36: -lgcc
37: -lmoldname
38: -lmingwex
39: -lmsvcrt
40: -lpthread
41: -ladvapi32
42: -lshell32
43: -luser32
44: -lkernel32
45: -lmingw32
46: -lgcc
47: -lmoldname
48: -lmingwex
49: -lmsvcrt
50: c:/TDM/bin/../lib/gcc/x86_64-w64-mingw32/5.1.0/crtend.o


--
bart

Ben Bacarisse

unread,
Aug 22, 2018, 10:08:15 AM8/22/18
to
Can't you tell from the executable, or do you always link to static
libraries? On Linux I can just "ldd <executable file>" and see the
libraries required at run-time (but objdump is safer).

> In the case of bcc, it dynamically (never statically) links against
> msvcrt.dll, so that one is easy to determine.
>
> With gcc, perhaps you can get a clue from how it calls its linker (see
> below), and its list of 1700 .a files.

The line

> 19: -plugin-opt=-pass-through=-lmsvcrt

is suggestive but the simplest way to be sure would to look at the
executable.

--
Ben.

Robert Wessel

unread,
Aug 22, 2018, 10:22:41 AM8/22/18
to
On Wed, 22 Aug 2018 15:08:02 +0100, Ben Bacarisse
<ben.u...@bsb.me.uk> wrote:
>Can't you tell from the executable, or do you always link to static
>libraries? On Linux I can just "ldd <executable file>" and see the
>libraries required at run-time (but objdump is safer).


The rough equivilent with the normal MS tools is dumpbin, and if you
want a list of DLL dependencies for an executable, dumpbin /dependents
will do it. It is actually still pretty common to link against static
C libraries in Windows, though.

Ben Bacarisse

unread,
Aug 22, 2018, 10:52:16 AM8/22/18
to
I think in this case the program using dynamic libraries, but I am not
certain. Anyway, thanks for clearing that up. There had to be a
Windows equivalent.

--
Ben.

Chris M. Thomasson

unread,
Aug 22, 2018, 5:45:29 PM8/22/18
to
On 8/16/2018 9:50 AM, Bart wrote:
> On 16/08/2018 07:53, Chris M. Thomasson wrote:
>> On 8/12/2018 3:48 PM, Chris M. Thomasson wrote:
>>> On 8/8/2018 2:00 PM, Chris M. Thomasson wrote:
>>>> When you get some time to spare, and are interested, read all:
>>>>
>>>> http://funwithfractals.atspace.cc/ct_cipher
>>>>
>>>> and try this C and Python code for starters:
>>>>
>>>> https://pastebin.com/raw/feUnA3kP
>>>> (C)
>> [...]
>>> Has anybody even tried it out?
>>
>> Even one little try at a go?
>
> I tried compiling the C code (using your code plus 4 .c and .h files
> from the github link).
>
> It compiled and ran (that is, converting from plaintext to ciphertext,
> and back to plainttext) with gcc, pelles C and MSVC.
>
> lccwin and DMC had compile errors (to do with initialisation).
>
> tcc (not the latest) compiled it, but crashed on the first line of output.
>
> Mine had a compile error with a different initialisation problem (a
> limitation in my compiler that doesn't like dynamic init info for
> structs), but after working around that, it also crashed. So something
> to look into.
>
> (Note it's a bit worrying when running the decipher part, when it says
> "plaintext:..." at the end, followed by a bunch of hex codes. It's just
> a hex dump of the text, but looks as cryptic as the ciphertext.)
>
> Anyway it looks like it'll be a nice benchmark when I get it working.
> And there will be working versions to help with that.
>

File io aside for a moment, I removed the calls to hmac_sha256_update
within the inner loop of ct_crypt_round. Here is an updated function:
_________________________
unsigned char*
ct_crypt_round(
struct ct_secret_key* SK,
unsigned char* P,
size_t P_sz,
int M
) {
hmac_sha256_ctx H;

hmac_sha256_init(&H, SK->hmac_key, SK->hmac_key_sz);
ct_reverse(SK->hmac_key, SK->hmac_key_sz);
hmac_sha256_update(&H, SK->hmac_key, SK->hmac_key_sz);
ct_reverse(SK->hmac_key, SK->hmac_key_sz);

unsigned char D[HMAC_SZ] = { 0 };
unsigned char U[HMAC_SZ * 2] = { 0 };

size_t P_I = 0;

while (P_I < P_sz)
{
ct_hmac_sha256_digest(&H, D);

size_t D_I = 0;

while (P_I < P_sz && D_I < HMAC_SZ)
{
unsigned char P_byte = P[P_I];
unsigned char C_byte = P_byte ^ D[D_I];

P[P_I] = C_byte;

if (M == 0)
{
U[D_I * 2] = P_byte;
U[D_I * 2 + 1] = C_byte;
}

else
{
U[D_I * 2] = C_byte;
U[D_I * 2 + 1] = P_byte;
}

++P_I;
++D_I;
}

hmac_sha256_update(&H, U, HMAC_SZ * 2);
}

return P;
}
_________________________

That might help out a little bit wrt efficiency.

Will have more time to get into this later on today. Btw, thanks for all
of the profiling everybody. Fwiw, was using byte-by-byte file io because
I was doing something with the individual bytes before coding this
particular implementation. I just left them as is because speed was not
a concern at that stage. Btw, your modification works fine. However, I
need to put in some error handling.

Chris M. Thomasson

unread,
Aug 22, 2018, 7:38:44 PM8/22/18
to
On 8/21/2018 2:46 PM, Chris M. Thomasson wrote:
> On 8/19/2018 9:40 PM, Chris M. Thomasson wrote:
>> On 8/17/2018 2:22 AM, Bart wrote:
>>> On 17/08/2018 04:51, Chris M. Thomasson wrote:
>>>> On 8/16/2018 9:50 AM, Bart wrote:
[...]
> I forgot to mention that the "Password" is hardcoded in. Next version
> will be more elaborate and allow a user to load in a file for the secret
> key.

The secret key can specify other hashes, therefore I need to build a
plug in system. The default can be SHA-256, however the user should be
able to create their own HMAC impl plugin; SHA2 has more than one
choice. My cipher can a common API. Should. The API for ct_cipher should
do it.

I need to define a plug in arch,
define an API for the shared libs, or "plugins",
and allow the secret key to choose one.

Notice the secret key struct:

struct ct_secret_key
{
unsigned char* hmac_key;
size_t hmac_key_sz;
char* hmac_algo;
size_t rand_n;
};

hmac_algo should be able to define the hash the user needs by a string,
or name.

I was thinking of breaking things down into individual shared libs,
where filename is the name of the lib that the program dynamically
loads. So, SHA2 would be comprised of multiple shared libraries.

This still can be extended out to custom HMAC functions that use the
underlying hash.

Should have a mock up of the overall functionality ready sometime today
or tomorrow. Wish I had some more time to work on this.

Sorry.


[...]

Chris M. Thomasson

unread,
Aug 22, 2018, 7:50:14 PM8/22/18
to
Thank you. Fwiw, I removed the call to hmac_sha256_update from the inner
loop of ct_crypt_round. It just might reduce some of the time involved:
____________________________
____________________________

Works for me, and should reduce, or amortize the number of calls to
sha256_transf across digest size blocks. Will have some more time later
on tonight. Thanks Ben.

Imvvho, this version of ct_crypt_round should reduce the pressure
exerted upon the sha lib.

hmac_sha256_update goes to:

sha256_update that calls:

sha256_transf

twice in succession:

sha256_transf(ctx, ctx->block, 1);
sha256_transf(ctx, shifted_message, block_nb);

So, removing the call to hmac_sha256_update on the inner loop of
ct_crypt_round should be a "good thing"?

Bart

unread,
Aug 22, 2018, 8:27:54 PM8/22/18
to
On 22/08/2018 22:45, Chris M. Thomasson wrote:
> On 8/16/2018 9:50 AM, Bart wrote:

>> Anyway it looks like it'll be a nice benchmark when I get it working.
>> And there will be working versions to help with that.

> Btw, thanks for all
> of the profiling everybody. Fwiw, was using byte-by-byte file io because
> I was doing something with the individual bytes before coding this
> particular implementation. I just left them as is because speed was not
> a concern at that stage. Btw, your modification works fine. However, I
> need to put in some error handling.

OK, but check out ct_file_get_size() that I also modified (some may not
agree with that method of determining a file's size).

And those tweaks that split up a struct declaration/initialisation so
that it used a separate assignment can be removed (no longer needed by
me and not needed anyway by anyone else).

--
bart


Chris M. Thomasson

unread,
Aug 22, 2018, 9:01:14 PM8/22/18
to
On 8/22/2018 5:27 PM, Bart wrote:
> On 22/08/2018 22:45, Chris M. Thomasson wrote:
>> On 8/16/2018 9:50 AM, Bart wrote:
>
>>> Anyway it looks like it'll be a nice benchmark when I get it working.
>>> And there will be working versions to help with that.
>
>> Btw, thanks for all of the profiling everybody. Fwiw, was using
>> byte-by-byte file io because I was doing something with the individual
>> bytes before coding this particular implementation. I just left them
>> as is because speed was not a concern at that stage. Btw, your
>> modification works fine. However, I need to put in some error handling.
>
> OK, but check out ct_file_get_size() that I also modified (some may not
> agree with that method of determining a file's size).

I think the original is bit "safer" at:
________________
size_t
ct_file_get_size(
FILE* file
) {
size_t file_sz = 0;
for (file_sz = 0; fgetc(file) != EOF; ++file_sz);
rewind(file);
return file_sz;
}
________________


>
> And those tweaks that split up a struct declaration/initialisation so
> that it used a separate assignment can be removed (no longer needed by
> me and not needed anyway by anyone else).
>

Agreed.

Bart

unread,
Aug 23, 2018, 7:02:26 AM8/23/18
to
On 23/08/2018 02:01, Chris M. Thomasson wrote:
> On 8/22/2018 5:27 PM, Bart wrote:

>> OK, but check out ct_file_get_size() that I also modified (some may
>> not agree with that method of determining a file's size).
>
> I think the original is bit "safer" at:
> ________________
> size_t
> ct_file_get_size(
>     FILE* file
> ) {
>     size_t file_sz = 0;
>     for (file_sz = 0; fgetc(file) != EOF; ++file_sz);
>     rewind(file);
>     return file_sz;
> }
> ________________


You can do that; it's your program. But it makes my gcc-O3 version run
40% slower on my test inputs. It is, after all, working out the size by
reading the file byte by byte.

But maybe on Linux it makes no difference since apparently file I/O is
always blazingly fast on that system. Or maybe typical inputs are small
enough that it doesn't matter.

Both have the limitation I think where it can't take input from stdin,
because rewind() and fseek() won't work.

And both have the issue where, it between determining the size, then
reading the file, the file could be updated by other process. (Whether
that can happen when the file remains open by your program, I'm not sure.)

--
bart

Ian Collins

unread,
Aug 24, 2018, 11:13:25 PM8/24/18
to
On 22/08/18 22:33, Bart wrote:
> On 22/08/2018 02:07, Ben Bacarisse wrote:
>
>> But the results overall still surprise me since on my Linux box the
>> program spends no noticeable time doing I/O at all. You would likely
>> get more revealing results from profiling.
>
> Does it have a spinning hard drive?

That probably wouldn't matter, the file will be cached. A quick look on
a Linux box:

Overhead Command Shared Object Symbol


71.15% a.out a.out [.] sha256_transf


15.41% a.out libc-2.27.so [.] __memcpy_ssse3


2.65% a.out a.out [.] ct_hmac_sha256_digest


2.63% a.out a.out [.] sha224_update


2.42% a.out a.out [.] sha256_final


1.52% a.out a.out [.] ct_crypt_round


0.78% a.out a.out [.] ct_crypt


0.74% a.out a.out [.] hmac_sha256_update


0.66% a.out a.out [.] memcpy@plt


Built with gcc -O2, input file was a 65M byte, 1M line base64 encoded
random text file.

--
Ian.

Bart

unread,
Aug 25, 2018, 7:35:51 AM8/25/18
to
I was wondering how they could profile these things so accurately, since
the only method I know (using 'rdtsc' on x86) has some hefty overheads
of its own. [Trying to use it on another project slowed everything down
by a fact of four.]

It turns out the sha256_transf function is called called 5.6M times [for
my 22.5MB input] for a runtime of 6-24 seconds, so is not significant.


However, trying to measure those calls brought up an issue with C that
I'd forgotten about:

* A project of N modules where each share a header with 'int ncalls;'
declared.

* Maybe, one module also contains 'int ncalls=0;' to show it 'owns' the
variable.

Most compilers linked this properly, whether or not there was one module
that initialised the variable (I think one required this).

Mine however complained that there were multiple variables defined. I
had to fix it by having 'extern int ncalls;' in the shared header, and
'int ncalls=0;' in the owner module.

I hesitate to ask which ones are doing it properly. From past form, none
of those other compilers are doing anything wrong. If anything is
perceived as sloppy, it is only for the purposes of supporting legacy
code, thus perpetuating the same poor coding habits. Or maybe C has
nothing to say about this aspect at all.

The advantage of this approach however is the following alternative to
using unions:

module A:

int abc; // at module scope

module B:

char* abc; // at module scope

All compilers except mine linked this project of A and B modules with no
problem. (lccwin warned of a mismatch of sizes, but that only occurs on
64-bit systems.)

Mine (using my linker) said that 'abc' was multiply-defined.

Dare I ask once again which compiler-system is behaving properly? Which
one is safer?

(I think I already know the answer - the fault is mine for not
anticipating that there may be such clashes, and for not seeking the
correct gcc option to use so that they can be detected.

I'm sorry but isn't one of the primary jobs of a linker to detect such
clashes? It's not something you have to prod it into doing!)



--
bart

Richard Damon

unread,
Aug 25, 2018, 9:10:23 AM8/25/18
to
On 8/25/18 7:35 AM, Bart wrote:
>
> * A project of N modules where each share a header with 'int ncalls;'
> declared.
>
> * Maybe, one module also contains 'int ncalls=0;' to show it 'owns' the
> variable.
>
> Most compilers linked this properly, whether or not there was one module
> that initialised the variable (I think one required this).
>

By the standard the header needs to be extern int ncalls;

The statement

int ncalls;

doesn't just 'declares' ncalls, but actually defines it. It is a
tentative definition, so later in that translation unit, you can
redefine it with int ncalls=0; but if you don't do anything else with
it, it becomes a full definition at the end (and in fact, the equivalent
of int ncalls = 0; as file scope variables are zero initialized).

Thus your program has multiple definitions of ncalls, and as they are in
different translation units, this violation does not need to be
diagnosed (but can be) but is undefined behavior (which is allowed, but
not required to do what you want).

Many linkers quietly handle this case because other languages treat the
multiple definition differently, so need the merging. It also does help
with some other constructs.

This highlights the danger of coming to an understanding of a language
by seeing what 'works', especially with a language like C which makes
many of these mistakes Undefined Behavior, and where frequently (but not
always) the resultant behavior is what one would expect.

Bart

unread,
Aug 25, 2018, 2:38:30 PM8/25/18
to
On 25/08/2018 14:10, Richard Damon wrote:
> On 8/25/18 7:35 AM, Bart wrote:
>>
>> * A project of N modules where each share a header with 'int ncalls;'
>> declared.
>>
>> * Maybe, one module also contains 'int ncalls=0;' to show it 'owns' the
>> variable.
>>
>> Most compilers linked this properly, whether or not there was one module
>> that initialised the variable (I think one required this).
>>
>
> By the standard the header needs to be extern int ncalls;
>
> The statement
>
> int ncalls;
>
> doesn't just 'declares' ncalls, but actually defines it. It is a
> tentative definition, so later in that translation unit, you can
> redefine it with int ncalls=0; but if you don't do anything else with
> it, it becomes a full definition at the end (and in fact, the equivalent
> of int ncalls = 0; as file scope variables are zero initialized).
>
> Thus your program has multiple definitions of ncalls, and as they are in
> different translation units, this violation does not need to be
> diagnosed (but can be) but is undefined behavior (which is allowed, but
> not required to do what you want).
>
> Many linkers quietly handle this case because other languages treat the
> multiple definition differently, so need the merging. It also does help
> with some other constructs.

Most of the C compilers I play with have a dedicated linker. (Only gcc
uses 'ld' which is part of some mysterious ecosystem only familiar to
Linux aficionados.)

Which makes the behaviour odd, as it is usually undesirable, like in my
example which makes it possible to access the same region of storage as
a different type in each module.

A C compiler works on one translation unit at a time, so some things it
will not know about, like whether the same symbol is also exported from
another module. But it should know whether it is exported from this one!
And indicate that in the object file.

And you would surely expect the linker, or whatever process is used to
combine modules, to know about multiple exports of the same name which
is a potentially serious bug. It should not be necessary to ramp up the
error reporting level.

(Even if the linkers worked as expected, C still allows 'abc' to be
exported as one type from a module, and imported as another. But this
will be rarer than inadvertently using a variable in one file, that
clashes with the same-named variable in another, when someone hasn't
bothered with 'static'.)

--
bart

Richard Damon

unread,
Aug 25, 2018, 4:05:45 PM8/25/18
to
Most C compilers I have used generate object files in the format defined
for the processor, and then use the a linker compatible that that format
(they often provide it, but you are generally able to use anyone's
linker). This is needed to allow you to mix files of different languages.

Sometimes there is a special 'pre-linker' to perform whole program
optimizations that takes an uses a specially (sometimes private) format
for the compiler output, and is really the back end for the compiler.
This may produce the standard object files out, or take them in and act
as a linker. This later method is definitely not required by the standard.

Most systems that I have used can be setup to catch the multiple
definition error, but seem to be bowing to the community pressure to not
flag this as a fatal error, in part because so much code is 'broken' in
this way.

Ian Collins

unread,
Aug 25, 2018, 4:25:01 PM8/25/18
to
The data was generated using perf.

--
Ian.

Chris M. Thomasson

unread,
Aug 26, 2018, 2:00:54 AM8/26/18
to
It better not in my code, it might create corrupted ciphertext and/or
crash. There should be some sort of "implied" read lock on it.

Anyway, I have to make a plug in system for this code in order to
correctly handle the secret key. The user needs to be able to specify
what HMAC and hash algorithm to use, custom or SHA2, whatever.

Was thinking of something simple like:
_________________________
#include <stdio.h>
#include <stdlib.h>


void ct_hex_printf(
FILE* fout,
unsigned char* buf,
size_t buf_sz
) {
for (size_t i = 0; i < buf_sz; i++)
{
fprintf(fout, "%02x", buf[i]);
}
}


typedef void (ct_hmac_func) (void*, unsigned char*, size_t);


struct ct_hmac
{
ct_hmac_func* update;
ct_hmac_func* digest;
};


struct ct_hmac_ximpl
{
struct ct_hmac vt;
int foo;
};


void
ct_hmac_ximpl_update(
void* self_raw,
unsigned char* buf,
size_t size
) {
struct ct_hmac_ximpl* const self = self_raw;

printf("ct_hmac_ximpl_update:%p, %p, %lu\n", (void*)self,
(void*)buf, (unsigned long)size);
printf("hex(buf):");
ct_hex_printf(stdout, buf, size);
printf("\n\n");
}


void
ct_hmac_ximpl_digest(
void* self_raw,
unsigned char* buf,
size_t size
) {
struct ct_hmac_ximpl* const self = self_raw;

printf("ct_hmac_ximpl_digest:%p, %p, %lu\n", (void*)self,
(void*)buf, (unsigned long)size);
printf("hex(buf):");
ct_hex_printf(stdout, buf, size);
printf("\n\n");
}


int main()
{
struct ct_hmac_ximpl hmac = {
{ ct_hmac_ximpl_digest,
ct_hmac_ximpl_update
},
123
};

struct ct_hmac* hmac_impl = &hmac.vt;

{
unsigned char buf[] = "The HMAC Secret Key";

hmac_impl->digest(&hmac, buf, sizeof(buf) - 1);
}

{
unsigned char buf[] = "Some bytes to update HMAC with";

hmac_impl->update(&hmac, buf, sizeof(buf) - 1);
}

return 0;
}
_________________________

Humm... Perhaps this should be in its own thread?

Bart

unread,
Aug 26, 2018, 6:28:23 AM8/26/18
to
On 26/08/2018 07:00, Chris M. Thomasson wrote:
> On 8/23/2018 4:02 AM, Bart wrote:

>> And both have the issue where, it between determining the size, then
>> reading the file, the file could be updated by other process. (Whether
>> that can happen when the file remains open by your program, I'm not
>> sure.)
>>
>
> It better not in my code, it might create corrupted ciphertext and/or
> crash. There should be some sort of "implied" read lock on it.

Apparently that's what happens, in Windows at least:

#include <stdio.h>
#include <conio.h>

int main(void) {
FILE * f;
int count = 0;

f = fopen("input", "rb");
while (fgetc(f) != EOF) ++count;
printf("bytes1: %d\n", count);

getch();

rewind(f);
count = 0;
while (fgetc(f) != EOF) ++count;
fclose(f);
printf("bytes2: %d\n", count);

}

While waiting at the getch() with the file open, I was able to overwrite
the file with something else (using a COPY command which creates a new
version of the file; I haven't tested an in-place modification).

Then the second count of bytes was different from the first.

--
bart

Melzzzzz

unread,
Aug 26, 2018, 6:41:17 AM8/26/18
to
Isn't it that, on Windows, file is exclusivelly locked and cannot be
written by other processes? Unix/Linux does not have mandatory locking.

--
press any key to continue or any other to quit...

David Brown

unread,
Aug 26, 2018, 10:37:53 AM8/26/18
to
Well, you /have/ been told all the details about this problem before,
and given the gcc switches used to detect and avoid it.

The C standards are quite clear on the matter (section 6.9 - it's not
long, and not nearly as hard to understand as some other parts) - if a
translation unit has "int ncalls;" in it then that is a definition for
the "ncalls" object. And a program shall have exactly /one/ definition
for each object used. So if you have "int ncalls;" in a header, used it
in more than one C file, and link it all together then the result is not
a valid C program. Since the problem is at the linker stage, the
compiler can't help spot it (unless the compiler is also the linker, of
course).

Before going into what is really going on, and who - if anyone - is to
blame, let's make the correct solution clear.

Make sure you only ever have /one/ definition of each non-static,
non-local variable in a program. If a have a variable "int abc;", then
you only have it /once/. Use an initialisation if you want. Everywhere
else - especially in headers - use a non-defining declaration "extern
int abc;". For most purposes, the simple rule is "extern" declarations
in headers, non-static definitions only in implementation files, and
make everything static if you can.

Compilers may have flags or settings that can help deal with this. In
gcc, the flag you want is "-fno-common". (I have filed a gcc bug asking
for this to be made the default - the gcc developers agree, but are
worried that the change may break existing software.) It looks like
your compiler is already "no-common". I can't answer for any other
compilers.


The problem here stems from before C - and I mean before C, not just
before standardised C. Programming languages from assembly upwards need
a way to refer to the same symbols in separately compiled (or assembled)
units. And they also need a way to ensure that these common symbols are
"defined", not just declared. So the idea of a "common" section was
born. When a symbol (let's just focus on data objects) is defined in a
programming language, there needs to be a symbol definition, a storage
allocation, and optionally a value placed in a linker section. Ignoring
more complicated systems and name differences, the usual choices here
are a "rodata" section for read-only data, a "data" section for
initialised data, and a "bss" section for uninitialised data (so that it
can efficiently be filled with zeros at startup). If a linker sees a
name clash in these segments from definitions in different units, it
throws an error.

But people wanted some symbols to be the same no matter which unit they
were defined in - they wanted the symbols and definitions to be common
to the whole program. So if a compiler/assembler puts a variable in the
"common" section, the linker does not complain about the duplication -
it merges the symbols and their data. The linker never sees information
about types - just sizes - and it will usually use the largest size for
any give symbols (but could reject mismatches with an error, or use the
first size seen, or the last size, or anything else it liked). As long
as all modules that write "int abc;" mean the same variable, it all
works - so it was a feature that a huge body of C code has used. (And I
think some languages, like Fortran, require it.)

As you have seen, it can also be unreliable, and it makes it easy to do
horrible things like have "int abc;" in one file and "char * abc;" in
another. C does not require a "common" section - indeed, standard C
does not allow multiple definitions in a program. But because lots of C
code uses such multiple definitions, most C compilers allow them by default.





Bart

unread,
Aug 26, 2018, 1:20:04 PM8/26/18
to
On 26/08/2018 15:37, David Brown wrote:
> On 25/08/18 13:35, Bart wrote:

>> I'm sorry but isn't one of the primary jobs of a linker to detect such
>> clashes? It's not something you have to prod it into doing!)
>>
>
> Well, you /have/ been told all the details about this problem before,
> and given the gcc switches used to detect and avoid it.

Was I? Am I supposed to remember each one of dozens of options for every
blatant error that gcc ignores unless you give it a good kick up the
backside?


> Before going into what is really going on, and who - if anyone - is to
> blame, let's make the correct solution clear.

I know perfectly well how this is supposed to work. It's everyone else
who seems to ignore the rules.


> The problem here stems from before C - and I mean before C, not just
> before standardised C.  Programming languages from assembly upwards need
> a way to refer to the same symbols in separately compiled (or assembled)
> units.  And they also need a way to ensure that these common symbols are
> "defined", not just declared.

Yeah. I used /COMMON/ in Fortran - in 1979.

Is there no route through which a programming language can move on?

> But people wanted some symbols to be the same no matter which unit they
> were defined in - they wanted the symbols and definitions to be common
> to the whole program.

(If I wanted to emulate 'common' now, I would do it something like this
[not C]:

global [0:512]byte common

[256]char abc @ common[0]
[256]char def @ common[128]
real x @ common[8]

The last 3 variables have no storage of there own, but are aliases for
different regions in another called 'common' (which is not a special
name; it could be any variable).

So abc and def overlap; x shares same memory as abc[8..15].

No linker rules are broken: there is one memory region called 'common',
the others are aliases. In my assembly, it's roughly equivalent to:

common:: resb 512
abc = common
def = common + 128
x = common + 8

(I tried to implement this just now as I haven't used such code for
years, but it might need another hour's work.))

--
bart

Rosario19

unread,
Aug 26, 2018, 3:03:29 PM8/26/18
to
On Sun, 26 Aug 2018 11:28:15 +0100, Bart wrote:
>Apparently that's what happens, in Windows at least:
>
> #include <stdio.h>
> #include <conio.h>
>
> int main(void) {
> FILE * f;
> int count = 0;
>
> f = fopen("input", "rb");
> while (fgetc(f) != EOF) ++count;
> printf("bytes1: %d\n", count);
>
> getch();
>
> rewind(f);
> count = 0;
> while (fgetc(f) != EOF) ++count;
> fclose(f);
> printf("bytes2: %d\n", count);
>
> }
>
>While waiting at the getch() with the file open, I was able to overwrite
>the file with something else (using a COPY command which creates a new
>version of the file; I haven't tested an in-place modification).
>
>Then the second count of bytes was different from the first.

this means that even if the first count for example is used for malloc
memory for the file, when load the file, it has to check too that the
memory is enought and that it arrive to the end of file

Geoff

unread,
Aug 26, 2018, 4:53:14 PM8/26/18
to
In my experience it's not true on Windows. There is no implied
protection. I have seen renames fail but not appends or copying into
an open file. You must do so explicitly and it's not standard C.
On Windows, locking of the entire file requires opening it as shared
with denial of read-write to other processes.

#include <stdio.h>
#include <conio.h>
#include <io.h>
#include <share.h>
#include <fcntl.h>

int main(void) {
FILE * f;
long count = 0;
int fd;

fd = _sopen("input", _O_BINARY | _O_RDONLY, _SH_DENYWR);
f = _fdopen(fd, "rb");
while (fgetc(f) != EOF) ++count;
printf("bytes1: %d\n", count);

getch();

rewind(f);
count = 0;
while (fgetc(f) != EOF) ++count;
printf("bytes2: %d\n", count);
fclose(f);
}

Chris M. Thomasson

unread,
Aug 26, 2018, 6:37:24 PM8/26/18
to
I would need to use impl specific POSIX, win whatever API's to get the
read lock on the file.

I only need to lock out writes, not concurrent reads for
ct_file_get_size. The file operations would be for read access only.

David Brown

unread,
Aug 27, 2018, 1:53:46 AM8/27/18
to
On 26/08/18 19:19, Bart wrote:
> On 26/08/2018 15:37, David Brown wrote:
>> On 25/08/18 13:35, Bart wrote:
>
>>> I'm sorry but isn't one of the primary jobs of a linker to detect
>>> such clashes? It's not something you have to prod it into doing!)
>>>
>>
>> Well, you /have/ been told all the details about this problem before,
>> and given the gcc switches used to detect and avoid it.
>
> Was I? Am I supposed to remember each one of dozens of options for every
> blatant error that gcc ignores unless you give it a good kick up the
> backside?

No, you are supposed to remember a bit about how the C language,
compilers, and linkers work when you have had a discussion about them
and when you claim to be capable of writing a C compiler and a linker.

But we have long established that you have the memory of a giraffe when
it comes to C, which is why I have explained it again. With a bit of
luck, some of it will stick this time.

>
>
>> Before going into what is really going on, and who - if anyone - is to
>> blame, let's make the correct solution clear.
>
> I know perfectly well how this is supposed to work. It's everyone else
> who seems to ignore the rules.
>
>
>> The problem here stems from before C - and I mean before C, not just
>> before standardised C.  Programming languages from assembly upwards
>> need a way to refer to the same symbols in separately compiled (or
>> assembled) units.  And they also need a way to ensure that these
>> common symbols are "defined", not just declared.
>
> Yeah. I used /COMMON/ in Fortran - in 1979.
>
> Is there no route through which a programming language can move on?

Backwards compatibility can be a PITA, but it's the way the world works.
I'm typing this with a keyboard layout designed to solve a mechanical
problem that was fixed 100 years ago, rather than to be comfortable or
efficient for use now. It's all about backwards compatibility.

This makes it very hard to move on for a language like C - it is its
greatest strength, and its greatest weakness. You can contrast this
with C++ that /does/ move on, though it too is somewhat hampered by
backwards compatibility. Newer C++ standards give a great deal more
power to the language, and can greatly simplify many types of coding,
but restrict the compilers you can use and force the use of newer tools,
and mean that programmers have to keep learning all the new stuff.

Compiler writers "solve" the backwards compatibility issue with C by
saying that by default, their tools will work the same way as they have
always worked - but if you want to, you can /choose/ to use a somewhat
newer and better language, or a safer subset of the language, by
choosing appropriate compiler flags.

People who don't choose to use such improvements should not really be
complaining about the lack of them.

>
>> But people wanted some symbols to be the same no matter which unit
>> they were defined in - they wanted the symbols and definitions to be
>> common to the whole program.
>
> (If I wanted to emulate 'common' now, I would do it something like this
> [not C]:
>
>     global [0:512]byte common
>
>     [256]char abc @ common[0]
>     [256]char def @ common[128]
>     real x @ common[8]
>
> The last 3 variables have no storage of there own, but are aliases for
> different regions in another called 'common' (which is not a special
> name; it could be any variable).

That is not a /use/ of "common" storage - that is an illustration of the
problems people get with it.

Bart

unread,
Aug 27, 2018, 7:18:53 AM8/27/18
to
On 27/08/2018 06:53, David Brown wrote:
> On 26/08/18 19:19, Bart wrote:
>> On 26/08/2018 15:37, David Brown wrote:
>>> On 25/08/18 13:35, Bart wrote:
>>
>>>> I'm sorry but isn't one of the primary jobs of a linker to detect
>>>> such clashes? It's not something you have to prod it into doing!)
>>>>
>>>
>>> Well, you /have/ been told all the details about this problem before,
>>> and given the gcc switches used to detect and avoid it.
>>
>> Was I? Am I supposed to remember each one of dozens of options for
>> every blatant error that gcc ignores unless you give it a good kick up
>> the backside?
>
> No, you are supposed to remember a bit about how the C language,
> compilers, and linkers work

Which ones? I routinely use half a dozen (7 actually) different C compilers.

If the C compiler is called THISCC then I just want to be able to do this:

THISCC prog.c

and it is expected to produce an object file (some require -c or
equivalent).

It is also expected to tell me what's wrong with the program - actually
wrong, not wrong according to some interpretation agree beforehand with
the compiler via sets of options. All those one-line program posted 10
hours ago I would expect to fail.

As to linkers: I just want them to do their job, and report three kinds
of error: can't find a module, can't find a symbol, or multiple
definitions of the same symbol. It's not hard.

In short, a C compiler and linker should just be expected to do their
jobs. The only options you need to remember are the basic:

* How to enable optimising
* How to select between targets if there is a choice with the same build
of the compiler
* How to do compile-only or preprocess only when it normally attempts
something else.

Which will all vary across compilers.

when you have had a discussion about them
> and when you claim to be capable of writing a C compiler and a linker.

Mine might have poor qualities in some respects, but they manage to get
the basics right. They outright reject (not just pass, with
admonishments) those one-liners, with having to feed in any extra
options, and the linker does exactly what is expected of it. Namely,
detect when the same symbol is exported from different modules, WITHOUT
having to be given special instructions.

As usual, this is not going to cut any ice with anyone here. The way all
those cumbersome tools behave MUST be better!


>> The last 3 variables have no storage of there own, but are aliases for
>> different regions in another called 'common' (which is not a special
>> name; it could be any variable).
>
> That is not a /use/ of "common" storage - that is an illustration of the
> problems people get with it.

It's an illustration of how to get the effects of COMMON, without
foregoing the ability to detect multiple definitions of a symbol.

--
bart

Richard Damon

unread,
Aug 27, 2018, 8:07:01 AM8/27/18
to
RTFM.

I will ask you how many of YOUR programs, when executed in the manner
described will always do what *I* want them to do?

This means that when I run the 'bart compiler' it compiles the language
I want, not the language you think I probably want.

Remember, gcc is NOT 'just a c compiler', but a program to compile a
whole lot of languages (many variations on Standard C). To get it to
compile the program as Standard C, you need to read the instructions,
and give it the right options. Yes, maybe the options are a bit obscure,
but that is largely because the authors of GCC anticipate that you want
your .c files to be actually the enhanced C language.

David Brown

unread,
Aug 27, 2018, 9:04:10 AM8/27/18
to
On 27/08/18 13:18, Bart wrote:
> On 27/08/2018 06:53, David Brown wrote:
>> On 26/08/18 19:19, Bart wrote:
>>> On 26/08/2018 15:37, David Brown wrote:
>>>> On 25/08/18 13:35, Bart wrote:
>>>
>>>>> I'm sorry but isn't one of the primary jobs of a linker to detect
>>>>> such clashes? It's not something you have to prod it into doing!)
>>>>>
>>>>
>>>> Well, you /have/ been told all the details about this problem
>>>> before, and given the gcc switches used to detect and avoid it.
>>>
>>> Was I? Am I supposed to remember each one of dozens of options for
>>> every blatant error that gcc ignores unless you give it a good kick
>>> up the backside?
>>
>> No, you are supposed to remember a bit about how the C language,
>> compilers, and linkers work
>
> Which ones? I routinely use half a dozen (7 actually) different C
> compilers.

My explanation covers the way most assemblers, C compilers and linkers
are built up. It does not go into the fine details. And of course some
tools will be implemented differently, or be more limited. A
general-purpose linker will support "common" data, but one written
specifically for C might not.

>
> If the C compiler is called THISCC then I just want to be able to do this:
>
> THISCC prog.c
>
> and it is expected to produce an object file (some require -c or
> equivalent).

Whether a C compiler produces an object file or an assembly file is a
matter of implementation detail. What sections are used in the object
file, what format it has, and all the other bits and pieces are
implementation dependent (possibly specified by the OS or target ABI,
but certainly not by C standards).

What you really want is a magical tool that will find all your errors,
take your incorrect input code and generate the output code you wanted,
read your mind regarding what mixture of C standards and extensions you
like, and give you a result that suits your precise needs - all without
telling the compiler anything. Ideally it should figure out for itself
which input file to use, and the result should be generated before you
have even finished writing the source code. The compiler should, of
course, be so compact that it can be put in the signature of a Usenet post.

Until you have this tool, you have to live with the ones that exist in
the real world. I have been doing my best to explain how they work, how
you should use them, and (sometimes) why they are the apparently odd way
they sometimes are.

As I see it, you have two choices here. You can continue to complain
that others don't re-arrange the entire programming world, including all
C code ever written, to suit your desires and your fantasies. Or you
can write a little "my_gcc.bat" file with the common compiler options
you need, and use that as your "THISCC". When you have figured out
which path would be most productive, let us all know.

>
> It is also expected to tell me what's wrong with the program - actually
> wrong, not wrong according to some interpretation agree beforehand with
> the compiler via sets of options. All those one-line program posted 10
> hours ago I would expect to fail.
>
> As to linkers: I just want them to do their job, and report three kinds
> of error: can't find a module, can't find a symbol, or multiple
> definitions of the same symbol. It's not hard.
>
> In short, a C compiler and linker should just be expected to do their
> jobs. The only options you need to remember are the basic:
>
> * How to enable optimising
> * How to select between targets if there is a choice with the same build
> of the compiler
> * How to do compile-only or preprocess only when it normally attempts
> something else.
>

And many, many other things - because there are many, many people that
use compilers, for many types of code and many purposes. The world does
not revolve around /you/.

> Which will all vary across compilers.

Unfortunate, perhaps, but true.

>
> when you have had a discussion about them
>> and when you claim to be capable of writing a C compiler and a linker.
>
> Mine might have poor qualities in some respects, but they manage to get
> the basics right. They outright reject (not just pass, with
> admonishments) those one-liners, with having to feed in any extra
> options, and the linker does exactly what is expected of it. Namely,
> detect when the same symbol is exported from different modules, WITHOUT
> having to be given special instructions.

And your tools are useless to most C programmers. Whatever qualities
they may or may or may not have, the won't do the job for other people.

>
> As usual, this is not going to cut any ice with anyone here. The way all
> those cumbersome tools behave MUST be better!
>

Of course - tools that work are /better/. Your tools may be good for
/you/, and the programs /you/ write for the systems /you/ use. They are
not suitable for most other purposes. Tools that /are/ suitable for the
wide range of uses C gets are, necessarily, large and complicated.


>
>>> The last 3 variables have no storage of there own, but are aliases
>>> for different regions in another called 'common' (which is not a
>>> special name; it could be any variable).
>>
>> That is not a /use/ of "common" storage - that is an illustration of
>> the problems people get with it.
>
> It's an illustration of how to get the effects of COMMON, without
> foregoing the ability to detect multiple definitions of a symbol.
>

No, it isn't - because it is not even close to C. I am sure that some
sort of "@" extension could be added to a C compiler (I have seen such
extensions in embedded compilers, though the address is given as fixed
absolute addresses). There are other ways to simulate accessing data in
different ways but the same address - from legal methods (unions) to
ones that will work on almost any compiler (pointer casts with
"volatile") to ones that might work sometimes (pointer casts without
volatile) and very implementation-specific methods (weak aliases,
assembly, linker scripts).

The point, however, is that common symbols are a /bad/ idea. They are a
product of a bygone age, and are only supported now because they have
been used in old code that still needs to be supported by modern
compilers. We don't want to simulate them or get the same effect in
other ways!


mark.b...@gmail.com

unread,
Aug 27, 2018, 10:28:52 AM8/27/18
to
How did Chris' thread get hijacked for yet another round of FUD?
Can I propose a new law for this newsgroup? I call it "Bart's Law".
It states that
"Any discussion on comp.lang.c will eventually become an attempt to
teach someone who claims to have written a C compiler the fundamentals
of the C language."

Scott Lurndal

unread,
Aug 27, 2018, 10:39:15 AM8/27/18
to
I really don't understand why people keep engaging Bart. He's clearly
trolling, and quite successfully.

Chris M. Thomasson

unread,
Aug 27, 2018, 4:02:22 PM8/27/18
to
Actually, Bart helped me out by making me install some more compilers.
Now my cipher works on each one. I don't think I would have ever
installed DMC. Still have not checked it out on Comeau:

http://www.comeaucomputing.com

He also got it working on his own personal compiler. That's nice. :^)

Chris M. Thomasson

unread,
Aug 27, 2018, 4:43:07 PM8/27/18
to
Indeed. Afaict, this seems to show that the COPY command wrt overwrite
from another process, can potentially "corrupt" any pure C program
dealing with files on windows. Humm... The file is not protected from
the point of:

f = fopen("input", "rb");

to:

fclose(f);

There really should be a read lock where another process cannot mutate
the damn file in between!

Chris M. Thomasson

unread,
Aug 27, 2018, 5:07:20 PM8/27/18
to
On 8/26/2018 12:08 PM, Rosario19 wrote:
Actually, need to double check, but I think my C code is totally safe
wrt memory overruns even if the file changes between getting the size
and rewind().

https://pastebin.com/raw/feUnA3kP

I am not looping on an EOF condition except for getting the size of the
file, file_sz. For instance, take a look at the following function:
__________________________
struct ct_buf
ct_load_from_file(
const char* fname
) {
FILE* file = fopen(fname, "rb");
assert(file);

size_t file_sz = ct_file_get_size(file);

struct ct_buf buf = { calloc(1, file_sz), file_sz };

if (buf.p)
{
// Append the original plaintext
for (size_t i = 0; i < file_sz; ++i)
{
int byte = fgetc(file);
assert(byte != EOF);
buf.p[i] = byte;
}
}

fclose(file);

return buf;
}
__________________________


If file_sz is 5, then buf.p is 5.

If the actual file size changes to 10, everything is fine because the
for loop condition is i < file_sz. We simply read 5 bytes from the now
larger file.


If the actual file size changes to, say, 3, everything is fine because
the for loop condition is i < file_sz. We simply read 3 bytes from the
now smaller file before that assert condition trips.

However, I still want to prevent the file from being mutated between
fopen and fclose in the first place! Grrr.

Chris M. Thomasson

unread,
Aug 28, 2018, 1:04:46 AM8/28/18
to
Fwiw, the next evolution of my cipher is going to be a plug in based
scheme than can handle user defined HMAC and hash algorithms. This is
what the actual secret key requires, section 2:

http://funwithfractals.atspace.cc/ct_cipher

Chris M. Thomasson

unread,
Aug 28, 2018, 1:07:51 AM8/28/18
to
I forgot to mention that a user needs to be able to use their own TRNG
hardware... The secret key in my cipher is very user friendly, so to
speak... ;^)

Ben Bacarisse

unread,
Aug 28, 2018, 6:31:12 AM8/28/18
to
Yes, it looks fine from that point of view to me too. But I wanted to
make a couple of points. First, why do you need the size? It's really
handy for many programs like this to be able to read non-rewindable
inputs like pipes and network connections. If you can manage it, it's
better to just read the data as you go along. I know this is
proof-of-concept type to program, but you might want to think about that
for the future.

Secondly, you seem to use assert in an odd way. It should be used to
catch logic errors in the program rather than normal run-time errors.

> FILE* file = fopen(fname, "rb");
> assert(file);
>
> size_t file_sz = ct_file_get_size(file);

Here you will get into trouble if the file can't be opened and the
program is compiled with NDEBUG. A file being unopenable is a normal
run-time condition that a program should check in the usual way with an
'if'.

> struct ct_buf buf = { calloc(1, file_sz), file_sz };
>
> if (buf.p)
> {
> // Append the original plaintext
> for (size_t i = 0; i < file_sz; ++i)
> {
> int byte = fgetc(file);
> assert(byte != EOF);
> buf.p[i] = byte;
> }
> }

Again, this is a bit odd. The program behaves quite differently when
compiled with and without NDEBUG being defined.

> fclose(file);
>
> return buf;
> }

<snip>
--
Ben.

Ben Bacarisse

unread,
Aug 28, 2018, 7:04:09 AM8/28/18
to
Ben Bacarisse <ben.u...@bsb.me.uk> writes:

> "Chris M. Thomasson" <invalid_chr...@invalid.invalid> writes:
<snip>
>> for (size_t i = 0; i < file_sz; ++i)
>> {
>> int byte = fgetc(file);
>> assert(byte != EOF);
>> buf.p[i] = byte;
>> }
>> }
>
> Again, this is a bit odd. The program behaves quite differently when
> compiled with and without NDEBUG being defined.

Oh, I should have said that if you want your program to simply stop when
something bad happens at run-time, you should define your own function
or macro to do it. Using a macro lets you generate the error string
from the condition like assert does:

#define MUST_HAVE(c) ((c) || (fprintf(stderr, "Must have %s\n", #c), \
exit(EXIT_FAILURE), 1))

though it's usually better to make a proper fatal_error function that
can take an error message you supply. If you use vfprintf inside it,
you can provide for a format as well:

_Noreturn void fatal_error(const char *msg, ...)
{
fprintf(stderr, "Fatal error: ");
va_list args;
va_start(args, msg);
vfprintf(stderr, msg, args);
va_end(args);
fputc('\n', stderr);
exit(EXIT_FAILURE);
}

Although I usually give such a function a return type of int and I
forego C11's _Noreturn. The reason is that you can then use a call to
fatal_error in a context that requires a value. That would make
defining MUST_MAVE in terms of fatal_error simpler, for example.

--
Ben.

Chris M. Thomasson

unread,
Aug 28, 2018, 5:06:29 PM8/28/18
to
On 8/28/2018 3:31 AM, Ben Bacarisse wrote:
> "Chris M. Thomasson" <invalid_chr...@invalid.invalid> writes:
>
>> On 8/26/2018 12:08 PM, Rosario19 wrote:
>>> On Sun, 26 Aug 2018 11:28:15 +0100, Bart wrote:
>>>> Apparently that's what happens, in Windows at least:
[...]
>> Actually, need to double check, but I think my C code is totally safe
>> wrt memory overruns even if the file changes between getting the size
>> and rewind().
>
> Yes, it looks fine from that point of view to me too.

Yeah. So far I cannot find a memory issue.


> But I wanted to
> make a couple of points. First, why do you need the size?

I need to load the entire file into memory for the ct_crypt function to
work on. The ct_encrypt/decrypt functions do two passes on the bytes.
First pass; reverse bytes; second pass. This is key to my cipher because
it creates total ciphertext diffusion within the plaintext and vise
verse. If one single bit of ciphertext is altered, then the decrypted
plaintext will be radically different and not represent the original at
all. This is key to my HMAC-based cipher experiment.


> It's really
> handy for many programs like this to be able to read non-rewindable
> inputs like pipes and network connections. If you can manage it, it's
> better to just read the data as you go along. I know this is
> proof-of-concept type to program, but you might want to think about that
> for the future.

I think I could do it that way. Need to think some more. It would
involve creating a ciphertext in stages. It would create a file for the
first pass. Then it would have to reverse the bytes in the file, and
finally run the second pass to create the actual ciphertext. Humm...
Thinking...

Loading the entire plaintext into memory is more convenient for me at
this stage, so to speak.


>
> Secondly, you seem to use assert in an odd way. It should be used to
> catch logic errors in the program rather than normal run-time errors.
>
>> FILE* file = fopen(fname, "rb");
>> assert(file);
>>
>> size_t file_sz = ct_file_get_size(file);
>
> Here you will get into trouble if the file can't be opened and the
> program is compiled with NDEBUG. A file being unopenable is a normal
> run-time condition that a program should check in the usual way with an
> 'if'.
>
>> struct ct_buf buf = { calloc(1, file_sz), file_sz };
>>
>> if (buf.p)
>> {
>> // Append the original plaintext
>> for (size_t i = 0; i < file_sz; ++i)
>> {
>> int byte = fgetc(file);
>> assert(byte != EOF);
>> buf.p[i] = byte;
>> }
>> }
>
> Again, this is a bit odd. The program behaves quite differently when
> compiled with and without NDEBUG being defined.
>
>> fclose(file);
>>
>> return buf;
>> }

Right. If NDEBUG is defined, all of those assert's go away. At this
stage in the program, NDEBUG should never be defined quite yet. I simply
need to replace these with proper error handling. This code is in an
embryonic, pre-alpha stage. Ahhh... That is no excuse. I am currently
working on an updated version.

Chris M. Thomasson

unread,
Aug 28, 2018, 5:08:09 PM8/28/18
to
Agreed with all of that! Proper error handling will be in the next
version. Thanks Ben.

Wish I had some more time to dedicate to this. I am sort of working on
it from time to time.

Chris M. Thomasson

unread,
Aug 28, 2018, 5:15:51 PM8/28/18
to
Heck, I need to check fclose for errors as well. Gotta love EINTR.

http://pubs.opengroup.org/onlinepubs/007904975/functions/fclose.html

Around a decade ago, I remember debugging somebody's code that was
trying to wait on a POSIX semaphore, and failed to handle EINTR:

http://pubs.opengroup.org/onlinepubs/7908799/xsh/sem_wait.html

They did not loop back around and try to call sem_wait again. Bad mojo.

;^)

Ben Bacarisse

unread,
Aug 28, 2018, 7:49:24 PM8/28/18
to
"Chris M. Thomasson" <invalid_chr...@invalid.invalid> writes:

> On 8/28/2018 3:31 AM, Ben Bacarisse wrote:
<snip>
>> But I wanted to
>> make a couple of points. First, why do you need the size?
>
> I need to load the entire file into memory for the ct_crypt function
> to work on.

I was not precise enough with my question. I should have asked why do
you need to know the size before you read the data?

<snip>
>> It's really
>> handy for many programs like this to be able to read non-rewindable
>> inputs like pipes and network connections. If you can manage it, it's
>> better to just read the data as you go along. I know this is
>> proof-of-concept type to program, but you might want to think about that
>> for the future.
>
> I think I could do it that way. Need to think some more. It would
> involve creating a ciphertext in stages. It would create a file for
> the first pass. Then it would have to reverse the bytes in the file,
> and finally run the second pass to create the actual
> ciphertext. Humm... Thinking...

No need for anything that complex -- just read the data into a buffer
that grows, or into some other structure that grows.

There are many simple utilities that can't generate any out until they
have all the input (sort and tac on my Ubuntu systems come to mind) but
the good ones will be designed to work on non-seekable inputs.

> Loading the entire plaintext into memory is more convenient for me at
> this stage, so to speak.

Then that's the way to go, but you don't need to know the input size
beforehand.

You describe the code as embryonic pre-alpha, so it's just something to
consider for later.

<snip>
--
Ben.

Rosario19

unread,
Aug 29, 2018, 2:51:20 AM8/29/18
to
On Wed, 29 Aug 2018 00:49:14 +0100, Ben Bacarisse wrote:

>No need for anything that complex -- just read the data into a buffer
>that grows, or into some other structure that grows.

if the file change its size, from the time that it is read the 1 time
to the time is read the 2 time

it the size expecially grow

it means something not good happen, someone search buffer overflow?

so stop the program with error code "file can not grow during the
process of traslation" would be the right path

Keith Thompson

unread,
Aug 29, 2018, 12:32:37 PM8/29/18
to
That depends on what the program is doing. A program that processes its
input as a stream doesn't need to care whether the file is growing.
"tail -f" is one simple example. (Note that "tail -f" works fine with
pipes, which don't have a defined size.)

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Chris M. Thomasson

unread,
Aug 29, 2018, 8:20:58 PM8/29/18
to
On 8/28/2018 4:49 PM, Ben Bacarisse wrote:
> "Chris M. Thomasson" <invalid_chr...@invalid.invalid> writes:
>
>> On 8/28/2018 3:31 AM, Ben Bacarisse wrote:
> <snip>
>>> But I wanted to
>>> make a couple of points. First, why do you need the size?
>>
>> I need to load the entire file into memory for the ct_crypt function
>> to work on.
>
> I was not precise enough with my question. I should have asked why do
> you need to know the size before you read the data?

Just to quickly knock out a mock impl wrt a single dynamic allocation
instead of a growing dynamic container. I simply was not worried about
sheer speed at that stage and wanted to get it right; peer reviewed; fix
anything found, then think about performance factors. Something like:

https://groups.google.com/d/msg/comp.arch/QVl3c9vVDj0/DHOi0PDEAQAJ
(read all if interested in implementing atomic swap...)

;^)


> <snip>
>>> It's really
>>> handy for many programs like this to be able to read non-rewindable
>>> inputs like pipes and network connections. If you can manage it, it's
>>> better to just read the data as you go along. I know this is
>>> proof-of-concept type to program, but you might want to think about that
>>> for the future.
>>
>> I think I could do it that way. Need to think some more. It would
>> involve creating a ciphertext in stages. It would create a file for
>> the first pass. Then it would have to reverse the bytes in the file,
>> and finally run the second pass to create the actual
>> ciphertext. Humm... Thinking...
>
> No need for anything that complex -- just read the data into a buffer
> that grows, or into some other structure that grows.

I mentioned the pure file idea to avoid using any dynamic allocation. It
can be made to work, not efficient, but can execute correctly.


> There are many simple utilities that can't generate any out until they
> have all the input (sort and tac on my Ubuntu systems come to mind) but
> the good ones will be designed to work on non-seekable inputs.
>
>> Loading the entire plaintext into memory is more convenient for me at
>> this stage, so to speak.
>
> Then that's the way to go, but you don't need to know the input size
> beforehand.

You are right. I can combine the first pass or first call to
ct_crypt_round in ct_crypt with the first read of each byte within the
plaintext. It would perform multiple tasks under a single loop.

I do not need to read each byte to get a file size. Although, Barts
ftell aside for a moment... Which might be faster and more efficient
than a dynamic container. My code, with NDEBUG _undefined_, already
handles memory issues, so far I cannot find any crap lurking around in
there. There are no memory overruns no matter how many times the file
changes during encryption or decryption.

I need to add something to the pre-alpha code:
_____________
#if defined (NDEBUG)
#error Sorry, NDEBUG is not safe wrt ver:0.0.0 pre-alpha!
#endif
_____________

;^)

> You describe the code as embryonic pre-alpha, so it's just something to
> consider for later.

Exactly right. The next version of the code will be up on github,
perhaps with a makefile.

Thanks again Ben. :^)

Ben Bacarisse

unread,
Aug 29, 2018, 9:09:32 PM8/29/18
to
"Chris M. Thomasson" <invalid_chr...@invalid.invalid> writes:

> On 8/28/2018 4:49 PM, Ben Bacarisse wrote:
<snip>
>> I was not precise enough with my question. I should have asked why do
>> you need to know the size before you read the data?
>
> Just to quickly knock out a mock impl wrt a single dynamic allocation
> instead of a growing dynamic container.

OK.

<snip>
>> No need for anything that complex -- just read the data into a buffer
>> that grows, or into some other structure that grows.
>
> I mentioned the pure file idea to avoid using any dynamic
> allocation. It can be made to work, not efficient, but can execute
> correctly.

I don't know what you mean. I suspect we are talking a cross purposes.

>> There are many simple utilities that can't generate any out until they
>> have all the input (sort and tac on my Ubuntu systems come to mind) but
>> the good ones will be designed to work on non-seekable inputs.
>>
>>> Loading the entire plaintext into memory is more convenient for me at
>>> this stage, so to speak.
>>
>> Then that's the way to go, but you don't need to know the input size
>> beforehand.
>
> You are right. I can combine the first pass or first call to
> ct_crypt_round in ct_crypt with the first read of each byte within the
> plaintext. It would perform multiple tasks under a single loop.

Yes, we definitely have crossed wires. Sorry about that. I am at a
disadvantage because I've don't know what your program does. I was only
suggesting you consider making the program work with non-seekable inputs.

<snip>
--
Ben.

Rosario19

unread,
Aug 30, 2018, 1:22:50 AM8/30/18
to
On Wed, 29 Aug 2018 09:32:29 -0700, Keith Thompson wrote:

>Rosario19 <R...@invalid.invalid> writes:
>> On Wed, 29 Aug 2018 00:49:14 +0100, Ben Bacarisse wrote:
>>>No need for anything that complex -- just read the data into a buffer
>>>that grows, or into some other structure that grows.
>>
>> if the file change its size, from the time that it is read the 1 time
>> to the time is read the 2 time
>>
>> it the size expecially grow
>>
>> it means something not good happen, someone search buffer overflow?
>>
>> so stop the program with error code "file can not grow during the
>> process of traslation" would be the right path
>
>That depends on what the program is doing. A program that processes its
>input as a stream doesn't need to care whether the file is growing.
>"tail -f" is one simple example. (Note that "tail -f" works fine with
>pipes, which don't have a defined size.)

for using as stream, there would be need a fixed size buffer

for using as a file, in the first pass for calculate the size one
could compute one signature variable (example sum all the char read in
that variable) so if the file change its size or its content from one
external program or by the user, the program catch the error and
return error for one wrong translation (because file change during
translation)

Chris M. Thomasson

unread,
Aug 31, 2018, 12:08:27 AM8/31/18
to
It loads an input file (plaintext or ciphertext) into memory, then
performs the following algorithm on the buffer wrt the ct_crypt and
ct_crypt_round functions:

http://funwithfractals.atspace.cc/ct_cipher

After that, it saves the buffer to an output file (plaintext or ciphertext)

The ct_encrypt function encrypts a certain number of, ideally TRNG,
bytes directly into the ciphertext. This is the SK.rand_n variable in my
crude little paper. Hard coded at 32 bytes with a SK.hash_algo of sha256
and an 8-byte HMAC password of "Password". ;^)

The secret key is:
___________________
SK.hmac_key = "Password"; // 8 bytes
SK.hash_algo = sha256;
SK.rand_n = 32;
___________________

A chiphertext is SK.rand_n larger than the original.


The ct_decrypt function simply disregards those random bytes after the
decrypt cycle is completed.


> I was only
> suggesting you consider making the program work with non-seekable inputs.

I can do that. There is a way to combine the encrypt decrypt cycle
during the loading phase. Right now I am just loading file; working on
memory buffer; output buffer to file. It would be much more efficient to
work on the file while loading it.


Ben Bacarisse

unread,
Aug 31, 2018, 6:59:45 AM8/31/18
to
"Chris M. Thomasson" <invalid_chr...@invalid.invalid> writes:

> On 8/29/2018 6:09 PM, Ben Bacarisse wrote:
<snip>
>> I was only
>> suggesting you consider making the program work with non-seekable inputs.
>
> I can do that. There is a way to combine the encrypt decrypt cycle
> during the loading phase.

Just to be clear, this has never been what I am suggesting. I am simply
saying your don't need to know the file size to read it all into memory
before starting to work on it.

--
Ben.

Chris M. Thomasson

unread,
Aug 31, 2018, 6:05:19 PM8/31/18
to
Yes, I know that Ben. However, your suggestion made me think of
something else wrt combining the loading with the encrypt/decrypt cycle
itself. It should improve performance by eliminating the blind loading
into a buffer, namely the following functions:

ct_prepend_from_file for encrypt, and ct_load_from_file for decrypt.

Combining these would help. I can use a dynamic vector as well.

It is all good. Thanks.

Chris M. Thomasson

unread,
Aug 31, 2018, 6:20:18 PM8/31/18
to
Basically, while I am filling a dynamic container with bytes from the
input I can be performing first pass encrypting or decrypting at the
same time. Then I would reverse the buffer, perform a second pass of the
encrypt/decrypt cycle then finally write to output.

Bart

unread,
Aug 31, 2018, 6:40:54 PM8/31/18
to
I have to say that I prefer these things be separated out, unless you
really want a tight application in the manner of Linux utilities where
i/o is done via pipes.

The core of your program, represented by function F, takes one string as
input, and produces another as output (or it might be modified in-place).

The input string is in memory, and so is the output string.

The function F shouldn't need to concern itself with where the data came
from, or what will happen it afterwards.

This way, it becomes easier to package F into a library that can be
incorporated into a bigger application, than if the processing done by F
was all tied up with file-handling routines.

--
bart

Chris M. Thomasson

unread,
Sep 1, 2018, 12:55:06 AM9/1/18
to
Correct. My C implementation modifies the buffer in place wrt
ct_crypt_round and ct_reverse functions. The Python test vector for my C
code wrt defining PYTHON_TEST_VECTOR makes a copy:

https://pastebin.com/raw/NAnsBJAZ


> The function F shouldn't need to concern itself with where the data came
> from, or what will happen it afterwards.

This is why I made my pseudo-code work on memory and not concern itself
with files. However, I think it should be a little faster to combine
things when dealing with files. Well, it would make the logic more
complicated in a sense. Seems like a specialization based on use case.
Humm...


> This way, it becomes easier to package F into a library that can be
> incorporated into a bigger application, than if the processing done by F
> was all tied up with file-handling routines.

Humm... I cannot really disagree with this. The way it does the
encrypt/decrypt in memory is fairly convenient.

Bart

unread,
Sep 3, 2018, 9:33:04 AM9/3/18
to
On 21/08/2018 18:45, Bart wrote:

> Here are revised timings with faster i/o. Task is the same one of
> encrypting/decrypting a 1M line text file, but now excludes verification
> (done separately):
>
>     gcc 5.1.0       1.0      -O3
>     MSVC            1.0      /O2
>     Pelles C        1.6      -Ot
>     DMC             1.8      -o (32-bits, rest are 64)
>     lccwin          2.0      -O
>     Tiny C .27      3.2
>     bcc (mine)      4.2
>
> As I said, this makes a useful benchmark, and one that is not trivial (I
> don't think so anyway, unless I discover most of its time is spent in
> one small function).
>
> The modified main program used is here: https://pastebin.com/raw/JXRF2Sjj

I'm not going to let Tiny C beat me on this.

My revised 'bcc' compiler (revised for other reasons) gives these
timings for the same task:

gcc 5.8 seconds -O3
bcc 16.9
tcc 18.0

Both the last two using unrolled loops (gcc does its own unrolling).
Otherwise both are roughly the same and about half the speed of gcc.

Which is not bad considering this bottleneck uses such tight, complex
code, exactly what gcc excels in. The unrolled version has a main loop
that contains lines like this, preprocessed:


t1=wv[7]+(((wv[4]>>6)|(wv[4]<<((sizeof(wv[4])<<3)-6)))^((wv[4]>>11)|(wv[4]<<((sizeof(wv[4])<<3)-11)))^((wv[4]>>25)|(wv[4]<<((sizeof(wv[4])<<3)-25))))+((wv[4]&wv[5])^(~wv[4]&wv[6]))
+sha256_k[j]+w[j];

t2=(((wv[0]>>2)|(wv[0]<<((sizeof(wv[0])<<3)-2)))^((wv[0]>>13)|(wv[0]<<((sizeof(wv[0])<<3)-13)))^((wv[0]>>22)|(wv[0]<<((sizeof(wv[0])<<3)-22))))+((wv[0]&wv[1])^(wv[0]&wv[2])^(wv[1]&wv[2]));

There are plenty of things that could be done still (to generate
somewhat faster code), but this will do. For that purpose, someone can
just use an optimising compiler.

If they want a small (smaller than Tiny C if you include standard
headers), very fast, one-file self-contained compiler that is not
ambivalent about what is and is not an error, then that's where mine
fits in.

--
bart

Bart

unread,
Sep 3, 2018, 9:37:29 AM9/3/18
to
On 03/09/2018 14:32, Bart wrote:

> My revised 'bcc' compiler (revised for other reasons) gives these
> timings for the same task:
>
>     gcc      5.8 seconds   -O3
>     bcc     16.9
>     tcc     18.0
>
> Both the last two using unrolled loops (gcc does its own unrolling).
> Otherwise both are roughly the same and about half the speed of gcc.

I mean both use un-unrolled loops. There is a #define in the source at
enables either the normal code (not unrolled) or separate manually
unrolled code. The test runs the normal code.

Chris M. Thomasson

unread,
Sep 6, 2018, 6:29:14 PM9/6/18
to
Wrt a time factor, I might be finished with another version fairly soon.
It should show the difference wrt performance and profiling wrt
combining two stages into one. Got caught up in some fractals within the
time being. Fwiw, here is some of my work:

https://plus.google.com/101799841244447089430

https://www.shadertoy.com/user/Chris_M_Thomasson

https://www.youtube.com/channel/UC_DhsJu-AbQ6Msnxdf8z6Kg

;^)

Chris M. Thomasson

unread,
Sep 23, 2018, 1:43:40 AM9/23/18
to
On 8/8/2018 2:00 PM, Chris M. Thomasson wrote:
> When you get some time to spare, and are interested, read all:
>
> http://funwithfractals.atspace.cc/ct_cipher
>
> and try this C and Python code for starters:
>
> https://pastebin.com/raw/feUnA3kP
[...]
>
> Can you get it to work? Can you get the encrypt/decrypt cycle to work in C?
>
> Thanks.

Will have some more time after some fractal work:

https://plus.google.com/101799841244447089430/posts/ZtbLRXkoJeE

Chris M. Thomasson

unread,
Nov 6, 2018, 1:07:55 AM11/6/18
to
On 8/8/2018 2:00 PM, Chris M. Thomasson wrote:
> When you get some time to spare, and are interested, read all:
>
> http://funwithfractals.atspace.cc/ct_cipher
>
> and try this C and Python code for starters:
>
> https://pastebin.com/raw/feUnA3kP
> (C)
>
> https://pastebin.com/raw/NAnsBJAZ
> (Python Test Vector)
[...]

Humm... I think there is a nice way to get this to work in a mode such
that it breaks a message apart in a way compatible with streaming, or
piping in. Each message would have bit-level sensitivity, but not the
whole. This is very different than the per-file version.

Now, would it be bad to make the program require a special flag such
that it knows it is in a "streaming" mode?

foobar.exe generates input for "ct_crypt -encrypt -stream" to output to
another stream... ;^)

My crypto is tailor made for dealing with individual files. Need to
think about another mode of operation where my program can fit in a
complex form of pipes and indirection's. A stream.


Daniel Joyce

unread,
Mar 25, 2021, 1:15:01 PM3/25/21
to
Hi Chris,
I looked at your Python code and you got to be kidding me.
It may be valid but, Duh, way over my head.

The German Enigma machine was an interesting concept in
using electronic signals through different platforms.
Turing along with the Poles and British could not have
cracked the code without vital information about
the machine and one that was captured along with papers
pertaining to it from a Uboat. Also info. the Poles had from
the early 1930s
It was fantastic work they did at Bletchley Park but without that
vital info, they would never have cracked the Enigma code
and the outcome of the war could have gone in a different direction.

With that in mind, the encryption I play with is not even child play
but still interesting how even simple concepts can be difficult to
decrypt.
Not saying yours is child play, BTW.

Chris M. Thomasson

unread,
Mar 25, 2021, 4:35:02 PM3/25/21
to
If you have any questions about my C and/or Python code, I will be glad
to answer them Daniel. Fwiw, take a look at the following function wrt
Python:

__________________________
# Generate n random bytes
# These need should ideally be from a truly random, non-repeatable
# source. TRNG!
def ct_rand_bytes(n):
rb = "";
for i in range(n):
#rb = rb + chr(random.randint(0, 255));
# HACK to get the same numbers
rb = rb + chr(i);
return rb;
__________________________


As-is, this is just meant to get on the same page as my C code with the
PYTHON_TEST_VECTOR macro defined. Hence the "# HACK to get the same
numbers" comment. You will notice that the macro is commented out in my
C code: https://pastebin.com/raw/feUnA3kP line 33

//#define PYTHON_TEST_VECTOR


Now, try altering the Python function to:
__________________________
def ct_rand_bytes(n):
rb = "";
for i in range(n):
rb = rb + chr(random.randint(0, 255));
# HACK to get the same numbers
#rb = rb + chr(i);
return rb;
__________________________


Run it several times in a row, and take a look at what happens to the
ciphertext bytes, and digests. Keep in mind that a TRNG should be used
here for a real implementation as opposed to using the random.randint
function.


> The German Enigma machine was an interesting concept in
> using electronic signals through different platforms.
> Turing along with the Poles and British could not have
> cracked the code without vital information about
> the machine and one that was captured along with papers
> pertaining to it from a Uboat. Also info. the Poles had from
> the early 1930s
> It was fantastic work they did at Bletchley Park but without that
> vital info, they would never have cracked the Enigma code
> and the outcome of the war could have gone in a different direction.

Big Time! Thank God they were able to crack it. Wow. Scary.


> With that in mind, the encryption I play with is not even child play
> but still interesting how even simple concepts can be difficult to
> decrypt.
> Not saying yours is child play, BTW.
>

If you have any ideas on how to crack my cipher, I am all ears. Very
interested. Thanks!

Daniel Joyce

unread,
Apr 6, 2021, 5:58:01 PM4/6/21
to
What about my idea for a private key that only 7 numbers (Alpha group identifiers)
have to be changed after each message sent whereas both sides could send these
encrypted messages. These 7 group identifiers would be protecting the identity of the onetime
random order set for each of the 7 Alpha characters within each group.
This eliminates any possible outside attack on repeated messages where the identifiers
are not changed they may identify where some Alpha characters are located within each
group after analyzing many messages.
A private key string read from left to right holding 7 random groups of 7 alpha characters
and 7 group identifiers placed only in odd count positions of the string.
When the scan (read) of the string has reached the end of the string it just starts the scan
at the beginning of the string again. This happens as many times as needed for each message.
Only one letter can be drawn from any one particular group on each scan.
The random order of the 7 group identifiers is only changed after each completed message.
The set random order for the Alpha characters within each group of 7 is not changed but could also be.
See below.
A little further explanation of my 7^2 encryption.
49 Alpha characters = 26*2 = 52 minus one X, one Y, and one Z = 49 Alpha characters
and 7 group identifiers 1-7 for a total string length of 56.
The encrypted number (group identifier) reads the string from left(start of scan) to right then identifying
the same identifier in the ciphertext (odd-numbered position) then the next number in the ciphertext is the
Alpha characters position within that group(1-7 positions within all the 7 groups ).
Say the first identifier, the start of the encrypted message is 3. The left-right scan (read)
stops at 3 then from the next number after the identifier is 5 the alpha character in that
position is T or the 5th position of the random Alpha group. So the first letter of the encrypted message is a
(T) and so on. Only drawing a limit of one letter from anyone group on every completed scan.
Some full scans may only produce as little as 1 o 2 matching letters but moat full scans involve more than 2.
Reaching the end of the string and then starting the scan from the start of the string again
until each number in the even positions of the encryption has drawn a letter to complete the
plain text message.

Example of a private key string ---
2KPMJNOL5VQSUWRT4KHMNJIL6IDCGHFE1BECGFDA3SOQUTRP7YAVXBWZ

And an encrypted public message created from the above private key ---
574267274635574767113626554462326431523446632147712454433767163273673657426727177771663274

This message contains all the letters of the Alphabet.

Just follow the K-I-S-S rules above and you can decrypt it very easily
by scanning and comparing the above private key with the public key over and over
from left to right.
The latter part of the encryption shows 4 -7's but it does not break the rule of 1 letter limit from
one particular group per complete scan. So a list of the same numbers in a row is not uncommon in a long encrypted message because one is a group identifier and the next is the position within that group.

A nice little hint. These one-timers (XYZ) also could be spread out a little more in the string where
all 3 Alpha characters now reside in group # 7 identifiers.
An added feature could also be employed on my 7^2 encryption brainstorm to send encrypted messages
back and forth from sender to receiver and visa versa.
I believe Chris passes off the new information on updating the private key string when a ciphertext is
sent by either party, Bob or Alice, it updates the receiver's file which intern gives the private
key new information. In essence, changing the private key of the receiver or how it processes
the new information from a file or some other direct process.
The fractional values on Chris's algorithm that are sent along with the encrypted message are the
catalyst for sending different encrypted messages with different hex to letter conversions from Bob or Alice.
This, I believe, Chris's fractional values sent with the ciphertext do just that.

In my system, it is much simpler.
Instead of a few fractional values sent with the ciphertext, as in Chris's system, a random order of
1,2,3,4,5,6,7 as group identifiers could either update the present private key of the receiver or
choose a whole new string with randomly placed Alpha characters from a database that will match
the new random order of the group identifiers that are sent with the ciphertext.
I would prefer the latter.
For added security, a rational number could be sent with the ciphertext where it's hidden continued
fraction where that could look like [2:7,3,4,6,5,1,18,22,1] or embedded in the CF as say every 3rd term after the
first 2 terms.
The number of combinations for 1,2,3,4,5,6,7 are <7! but still, a good number that could be used
again.
What is the actual number of combinations 1-7 for the number of unique group identifiers?
( 0,8,or 9) can ever be in the body of the number.

Dan

Chris M. Thomasson

unread,
Apr 7, 2021, 8:13:59 PM4/7/21
to
[...]
> A nice little hint. These one-timers (XYZ) also could be spread out a little more in the string where
> all 3 Alpha characters now reside in group # 7 identifiers.
> An added feature could also be employed on my 7^2 encryption brainstorm to send encrypted messages
> back and forth from sender to receiver and visa versa.
> I believe Chris passes off the new information on updating the private key string when a ciphertext is
> sent by either party, Bob or Alice, it updates the receiver's file which intern gives the private
> key new information. In essence, changing the private key of the receiver or how it processes
> the new information from a file or some other direct process.
> The fractional values on Chris's algorithm that are sent along with the encrypted message are the
> catalyst for sending different encrypted messages with different hex to letter conversions from Bob or Alice.
> This, I believe, Chris's fractional values sent with the ciphertext do just that..

I am not sure what you mean. My HMAC algorithm we are discussing here in
comp.lang.c, does not send any fractional numbers along with a
ciphertext. You simply must be mistaken my HMAC cipher, with my FFE
cipher. They are completely different things. The one I posted here in
comp.lang.c in this very thread, is the HMAC one.


> In my system, it is much simpler.
> Instead of a few fractional values sent with the ciphertext, as in Chris's system, a random order of
> 1,2,3,4,5,6,7 as group identifiers could either update the present private key of the receiver or
> choose a whole new string with randomly placed Alpha characters from a database that will match
> the new random order of the group identifiers that are sent with the ciphertext.
> I would prefer the latter.

Again, you must be mistaken my HMAC cipher with my FFE cipher.


> For added security, a rational number could be sent with the ciphertext where it's hidden continued
> fraction where that could look like [2:7,3,4,6,5,1,18,22,1] or embedded in the CF as say every 3rd term after the
> first 2 terms.
> The number of combinations for 1,2,3,4,5,6,7 are <7! but still, a good number that could be used
> again.
> What is the actual number of combinations 1-7 for the number of unique group identifiers?
> ( 0,8,or 9) can ever be in the body of the number.

I need to properly read you're idea, but I strongly suggest posting it
over in sci.crypt. My HMAC code was posted here because its in portable
C99. I wanted the great C programmers here to take a look at it, just in
case I missed something, or if they have any comments, ideas on how to
make things better. A very nice thread evolved from that. Read it all.

Keep in mind, that there are no fractional numbers sent in my HMAC cipher.

0 new messages