Sorry if this is out of concern of this newsgroup (I don't know where
I should post).
If the given string is 48-bit data, and take the CRC64, does it
collide?
The problem is that, one of the source codes that I need to verify,
uses CRC64 value as a hash value, where the input data is 48bit mac
address.
I need to make sure if the CRC64 value collide with other or not when
the input data is arbitrary 48-bit string.
Unfortunately, I'm very poor at math and CRC64, so I don't know if it
is possible or not.
Regards,
I didn't mention that we uses ISO 3309 standard.
(generator poly is x^64 + x^4 + x^3 + x + 1)
> The problem is that, one of the source codes that I need to verify,
> uses CRC64 value as a hash value, where the input data is 48bit mac
> address.
>
> I need to make sure if the CRC64 value collide with other or not when
> the input data is arbitrary 48-bit string.
>
> Unfortunately, I'm very poor at math and CRC64, so I don't know if it
> is possible or not.
If "no collisions" is a requirement, CRC64 is a terrible choice. Why
not just send the 48-bit value itself? That will save you 16 bits,
save you some CPU, and is, of course, guaranteed not to collide.
DS
Yes, I found that CRC64 (ISO) is considered weak for hashing, but the
software is already implemented, so I couldn't do anything for now.
Our software has two parts; part-A for generating unique ID (I know
that it is a bad idea, but we uses CRC64 from the 48bit string as an
unique ID), part-B for processing some job based on that unique ID and
write a log file.
The problem is, the log file from part-B has mysterious entries that
shows two entities has a same ID. The testing team asks me that
whether it is possible to generate duplicated ID (CRC64 from 48bit
arbitrary data) theoretically.
If that is possible, I need to report the tester team that we need to
change the whole process to generate unique ID. It could be very
expensive for our current situation, sadly..
If not, I need to check whether the original 48 bit data source is
unique or not. Unfortunately, we do not have enough history data to
prove this.
That's why I asked the original question; Is it possible that two
CRC-64 ISO values from unique 48-bit data to collide each other?
Thanks.
Thus,
> That's why I asked the original question; Is it possible that two
> CRC-64 ISO values from unique 48-bit data to collide each other?
I haven't done the math, but I wouldn't expect collisions given that the
input is shorter than the hash itself.
There have been cases of duplicate hardware addresses, and duplicate
software addresses (in virtual machines, for instance, or software
overriding the hardware address at runtime) is even more likely.
Assuming the MAC addresses are unique is probably a mistake.
Chris
It's a bad idea to assume MACs are unique. I've personally witnessed a case
where the OEM shipped several bulk orders of network cards all with the same
MAC address. The issue took awhile to surface because, IIRC, the effected
batches were never run in the relevant configuration _together_; only
individually tested or run in production against earlier batches.
Given the profit margins on network cards, I wouldn't be surprised if this
is very common today.
I vageuly recollect that the OEM apologized not about the cloned MAC
addresses, but thay they had shipped them all to the same purchaser.
CRC is, mostly, the remainder of an euclidean division. Therefore, I expect
that if you take CRCn and feed it all 2^m possible m-bits strings, with m >=
n, you will find every 2^n possible outputs exactly 2^(m-n) times. And if
m < n, there should be no collisions.
But I agree with the other people in the thread: using a hash function
bigger than its input, and then relying on the fact that there will be no
collision is a terrible design. There are standards way to derive a 64-bits
unique identifier from a MAC address, for example the one used for IPv6
autoconfiguration.
> Our software has two parts; part-A for generating unique ID (I know that
> it is a bad idea, but we uses CRC64 from the 48bit string as an unique
> ID), part-B for processing some job based on that unique ID and write a
> log file.
>
> The problem is, the log file from part-B has mysterious entries that
> shows two entities has a same ID. The testing team asks me that whether
> it is possible to generate duplicated ID (CRC64 from 48bit arbitrary
> data) theoretically.
>
> If that is possible, I need to report the tester team that we need to
> change the whole process to generate unique ID.
If you'll have to switch to a new uuid-generator:
OSSP uuid:
http://www.ossp.org/pkg/lib/uuid/
http://packages.debian.org/lenny/libossp-uuid-dev
The uuid library in e2fsprogs:
http://e2fsprogs.sourceforge.net/
http://packages.debian.org/lenny/uuid-dev
See the debian package links for concise descriptions.
lacos
That only ensures that the mapping from MAC to identifier is unique. It
doesn't ensure that the end identifier is unique (since there are
documented cases of duplicate MAC addresses).
Chris
Exactly. IPv6 autoconfiguration specifically does not guarantee that
the address is globally unique. Only that's as unique as the MAC. (If
MACs collide on a LAN, you're hosed anyway. So using IPv6
autoconfiguration on a LAN creates no new problems.)
DS