Help with a cyrptography puzzle...

Jwoody

unread,

Mar 16, 2005, 4:12:26 AM3/16/05

to

Hi there,

I am looking for assistance with a problem I am having.

Thanks in advance for any assistance...

I have a file with a list of Customer IDs. Each number is between 1
and 15 characters and all values are unique. I need to give my list of
customer IDs to somebody but do NOT want that person to know the real
ID. The replaced values need to look like real customer numbers. I
cannot simply store the hash value, or replace with sequential numbers.

The challenge comes by introducting the requirement that everytime you
transpose the source data, the original values always need to map to
the same replaced value. Eg, ID 6789876473 will always move to
8567957421

As with the source file, all replaced numbers need to be unique.

Any thoughts?

Jonathan

Lars Fischer

unread,

Mar 16, 2005, 1:36:30 PM3/16/05

to

On 16 Mar 2005 01:12:26 -0800, Jwoody <jwo...@gmail.com> wrote:
> transpose the source data, the original values always need to map to
> the same replaced value.

> As with the source file, all replaced numbers need to be unique.

Maybe I don't see any problem here. But what about just making up one
big number and XOR it (ore something similar) to your original ID's? Has
a one-on-one maping and the uniqeness is preserved.

L.

Urban Koistinen

unread,

Mar 22, 2005, 11:45:40 PM3/22/05

to

[Moderator's note:
a. I've edited this to trim the quoting, remove
MIME guff, and eliminate top-posting.
b. Mike Amling expressed doubt about whether the
moderators (me in this case) should have posted
the original. I think we aim to be lenient. At
least it has generated meaningful replies!
-- scr co mod (ggr).]

Jwoody wrote:
> I have a file with a list of Customer IDs. Each number is between 1
> and 15 characters and all values are unique. I need to give my list of
> customer IDs to somebody but do NOT want that person to know the real
> ID. The replaced values need to look like real customer numbers. I
> cannot simply store the hash value, or replace with sequential numbers.
>
> The challenge comes by introducting the requirement that everytime you
> transpose the source data, the original values always need to map to
> the same replaced value. Eg, ID 6789876473 will always move to
> 8567957421
>
> As with the source file, all replaced numbers need to be unique.

You need to define exactly a set of acceptable Customer IDs.
Try to make the set as large as possible without getting complicated.
Why does it have to look like Customer IDs?
Is it too fool someone that he is getting the real thing or is it so
that it will work with software that has been written to handle Customer
IDs?
I assume the second.

Chose a secret key S.
Hash the original Customer ID with a secret key to get a number n.
Map the number to a customer ID from the set of acceptable Customer IDs.
If the acceptable customer IDs are on the form:
AAAAYYYYMMDD-NNNN where
A may be any character 'A=B4..'Z'
YYYY is a year 1953-1987
MM is a month 01-12
DD is a day 01-28 (to avoid trouble in february)
- is always just a dash '-'
NNNN is a number 0010-9999 where the last digit is a check digit that is
computed from the previous ID characters.
Now in pseudo C (or pseudo Java):
for each A
select character with n%26
n =3D n/26
YYYY =3D 1953+(n%25)
n =3D n/25
MM =3D 01 + (n%12)
n =3D n/12
DD =3D 01 + (n%28)
n =3D n/28
NNN =3D 001 + (n%999)
final N =3D computeFromIDChars

If you use a cryptographically secure hash and keep the secret key S
suitably secret someone who gets the Customer IDs would only be told how
many Customer IDs he got, not which Customer ID was which original
Customer ID.

Urban Koistinen

Nicholas Weaver

unread,

Mar 22, 2005, 11:47:14 PM3/22/05

to

In article <pgpmoose.200503162012.17035@garie>,

Just have a large hashtable by customer ID. If its not in the table,
create a random ID to put in that table, and use that random value to
replace the ID in all subsequent occurances of the customer ID.
--
Nicholas C. Weaver. to reply email to "nweaver" at the domain
icsi.berkeley.edu

Michael Amling

unread,

Mar 22, 2005, 11:13:27 PM3/22/05

to

This is hardly a research topic, so I don't know how this got past
the moderators.

You can do this with a Luby-Rackoff cipher. Split each Customer ID in
half (approximately). Call the halves left and right. The original
Luby-Rackoff construction is

left=f(right)^left;
right=g(left)^right;
left=h(right)^left;
right=i(left)^right;

where f(), g(), h() and i() are independent pseudo-random functions,
such as hash functions. The ciphertext is then the final values of left
and right, joined together.
In your case, if the customer ID consists of decimal digits, using
modular addition is easier than XOR. You can make up 4 secret keys key1,
key2, key3, key4, and choose a hash function such as MD5 or SHA-1. Then,
to convert a 10-digit customer ID foo to a "replaced value" baz use the
following:

upper=foo/100000; // Split the customer ID into two parts.
lower=foo%100000;
upper=(hash(key1 || lower)+upper)%10000;
lower=(hash(key2 || upper)+lower)%10000;
upper=(hash(key3 || lower)+upper)%10000;
lower=(hash(key4 || upper)+lower)%10000;
baz=upper%100000+lower; // Join the two parts back together.

where || indicates concatenation. This generates a unique baz for each
foo. In fact foo can be recovered from baz by reversing the above
transitions, if you know the values of key1, key2, key3 and key4.

--Mike Amling

Herman Rubin

unread,

Mar 22, 2005, 11:11:35 PM3/22/05

to

In article <d19ube$8p5$1...@taverner.CS.Berkeley.EDU>,

This looks dangerous. One matching between the original
and replaced values, and all are decodable.

--
This address is for information only. I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Department of Statistics, Purdue University
hru...@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558

David A. Scott

unread,

Mar 22, 2005, 11:46:56 PM3/22/05

to

"Jwoody" <jwo...@gmail.com> wrote in
news:pgpmoose.200503162012.17035@garie:

The easy is way is just have two lists. In one list
is the customer ID in the other list is the REPLACEMENT ID

example customer number 12 on origninal list you gave an ID
of 6789876473 on the secret list you go down 12 locations and have
your 12th random ID say 8567957421

When the customer gives you has ID if its was real ID you look on
the list notice its the 12th entry then you go down second list and
use the 12th entry. The advantage of this compared to encryption
is that for small customer base it faster. Also if the ID given to
you by customer is not on the list you can tell right away.

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

Ilmari Karonen

unread,

Apr 24, 2005, 11:16:20 AM4/24/05

to

Jwoody <jwo...@gmail.com> kirjoitti 16.03.2005:
>
> I have a file with a list of Customer IDs. Each number is between 1
> and 15 characters and all values are unique. I need to give my list of
> customer IDs to somebody but do NOT want that person to know the real
> ID. The replaced values need to look like real customer numbers. I
> cannot simply store the hash value, or replace with sequential numbers.
>
> The challenge comes by introducting the requirement that everytime you
> transpose the source data, the original values always need to map to
> the same replaced value. Eg, ID 6789876473 will always move to
> 8567957421
>
> As with the source file, all replaced numbers need to be unique.

If I understand you correctly, what you're essentially asking for is a
block cipher with a block size of 15 decimal digits.

Below is my quick and dirty attempt at creating such a beast. It's
basically a simple Feistel cipher, using an actual cryptographic hash
function (MD5) as the round function to compensate for the complete
lack of any cryptanalytic testing beyond "it sure looks random".

Of course that, and the fact that it's written Perl, make it a really,
really slow cipher. Nonetheless, it can do well over 10000 rounds per
second on my not-so-modern computer, which should be quite fast enough
for your needs.

Ps. any critique is gladly accepted.

#!/usr/bin/perl -w
use strict;
use Digest::MD5 'md5'; # any decent hash should do

sub round {
my ($l, $r, $key, $n, $sign) = @_;
my @hash = unpack "C*", md5($r, $key, $n);
$l =~ s/(\d)/($1 + 1000 + $sign * pop @hash) % 10/eg;
return ($r, $l);
}

sub cipher {
my ($block, $key, $rounds, $decrypt) = @_;

my @seq = (1 .. 2*$rounds);
@seq = reverse @seq if $decrypt;
my $sign = ($decrypt ? -1 : 1);

my $half = int(length($block) / 2);
my ($l, $r) = ($block =~ /^(\d{$half})(\d+)$/)
or die "Malformed block '$block'!\n";

($l, $r) = ($r, $l) if $decrypt;
foreach my $n (@seq) {
($l, $r) = round($l, $r, $key, $n, $sign);
}
($l, $r) = ($r, $l) if $decrypt;
return "$l$r";
}

use Getopt::Long;

GetOptions('rounds|r=i' => \(my $rounds = 5),
'decrypt|d' => \(my $decrypt),
) and @ARGV or die <<"HERE";
Usage: $0 [-d] [-r <n>] <key> [<file(s)>]

Options:
--decrypt, -d decrypt previously encrypted data
--rounds, -r number of rounds / 2 (default 5)

Input should consist of blocks of 2-32 digit numbers,
one per line. Remember to pad numbers to the desired
length with leading zeros. The key can be any string.
HERE

my $key = shift @ARGV;

while (<>) {
chomp;
warn "Skipping bad line $.: $_\n" and next
if /\D/ or length($_) < 2 or length($_) > 32;
print cipher($_, $key, $rounds, $decrypt), "\n";
}

__END__

--
Ilmari Karonen
To reply by e-mail, please replace ".invalid" with ".net" in address.