Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Generate Unique Strings

716 views
Skip to first unread message

Randy Adanza

unread,
Jul 19, 2007, 3:22:20 AM7/19/07
to
Cheers,

Hopefully this is the right group.

I need to generate a unique string with a length of about 8 to12 characters.
At first I though of just generating a GUID and use it but surely there must
be a
better approach. And I need to generate about 1million.

It will be used as a UserName/Password pair for a scratch card.

Any ideas on how to better implement this?


TIA.

danny heijl

unread,
Jul 19, 2007, 4:32:19 AM7/19/07
to
Randy Adanza schreef:

> I need to generate a unique string with a length of about 8 to12 characters.
> At first I though of just generating a GUID and use it but surely there must
> be a
> better approach. And I need to generate about 1million.

A GUID is designed for generating unique strings, but you may not like
the predictable patterns in a GUID for use as password.

You could use a good random generator like ISAAC (Bob Jenkins,
http://www.burtleburtle.net/bob/rand/isaacafa.html), there's a Delphi
version available (http://sebsauvage.free.fr/isaac/). A good seed for a
random number generator is the SHA-1 hash of a GUID, see
http://codebetter.com/blogs/brendan.tompkins/archive/2005/03/09/59496.aspx
for some explanation and code (C#).

This will give you really random numbers but they may not be unique, I
don't know. You could always store all used strings in hashtable so that
you can skip duplicates.

Danny
---

David Ninnes

unread,
Jul 19, 2007, 7:42:57 AM7/19/07
to

This will give you a million 8 digit numbers
with a random spacing between 0 and 100, fiddle with the mapping to get
whichever characters you'd like.

start := 100000000; // (10^8)
for i := 0 to 999999 do
begin
start := start + rand(100);
end

I think that's right - 1 million numbers starting from 100 million with
a random difference of between 0 and 100 will give you numbers from 100
million to somewhere less than 200 million, then chop off the first digit.

These probably won't pass a random number test but would probably do for
scratch-it's. If you want to map to A-Z then do the sums base 26.

hth,
Dave

Randy Adanza

unread,
Jul 19, 2007, 9:10:59 PM7/19/07
to
Thanks for the replies.

"Randy Adanza" <rad...@hotmail.com> wrote in message
news:469f...@newsgroups.borland.com...

Loren Pechtel

unread,
Jul 19, 2007, 9:41:40 PM7/19/07
to
On Thu, 19 Jul 2007 21:42:57 +1000, David Ninnes
<david_...@hotmail.com> wrote:

>This will give you a million 8 digit numbers
>with a random spacing between 0 and 100, fiddle with the mapping to get
>whichever characters you'd like.
>
>start := 100000000; // (10^8)
>for i := 0 to 999999 do
>begin
> start := start + rand(100);
>end
>
>I think that's right - 1 million numbers starting from 100 million with
>a random difference of between 0 and 100 will give you numbers from 100
>million to somewhere less than 200 million, then chop off the first digit.
>
>These probably won't pass a random number test but would probably do for
>scratch-it's. If you want to map to A-Z then do the sums base 26.

This will be horribly non-random. You're going to get a huge peak
around 50 million.

It also doesn't satisfy his need of them being unique.


I would generate random strings based on whatever criteria is wanted
and add them to a tstringlist, watching for dupes.

buffer : array [1..10] of char;
...
for loop := 1 to 10 do
buffer[loop] := char(rand(26)+64);

David Ninnes

unread,
Jul 20, 2007, 12:41:53 AM7/20/07
to
Hi Loren,

> This will be horribly non-random.

It will be fairly random - true randomness wasn't a requirement just
uniqueness, you'll get a linear spread of numbers from '000000' up to
somewhere around '500000'


> You're going to get a huge peak
> around 50 million.

I don't think so - the distribution would be linear, the difference
between each element would be a random number between 0 and 100 (i.e. 1
to 99)


>
> It also doesn't satisfy his need of them being unique.

I think they would be - unless I missed something? I've assumed rand
doesn't return 0 (this is pseudocode not real code)


>
>
> I would generate random strings based on whatever criteria is wanted
> and add them to a tstringlist, watching for dupes.
>
> buffer : array [1..10] of char;
> ...
> for loop := 1 to 10 do
> buffer[loop] := char(rand(26)+64);

I'd be going home before you did :-) - this would be very slow. It's the
checking for dupes in a stringlist is the hard bit e.g. by the time you
reached item 500,000 you'd have to scan the list which would have
499,999 elements - this is O(n^2)=10^12 overall, that's why Danny
suggested a hash in the previous post, then you'd be back to O(n) - a
hash lookup is O(1). This way you don't need a hash and would probably
be as good for most purposes.

V2 - To give a smooth spread between 0 and 1000000 you could do this


start := 100000000; // (10^8)
for i := 0 to 999999 do
begin

start := start + 100;
result := start + rand(100);
end
then every 100 from 0 to a 100 million you'll get a number somewhere
around the 100, again I've assumed rand(100) returns a number between 1
and 99.

Since there going on scratch cards I assumed the numbers would all be
put in a bag at the end and handed out randomly, if you needed to print
them in random order then put the numbers in a list and pick one out at
random each time they're printed and throw the number away.


Dave

David Ninnes

unread,
Jul 20, 2007, 6:19:00 AM7/20/07
to
> V2 - To give a smooth spread between 0 and 1000000 you could do this
> start := 100000000; // (10^8)
> for i := 0 to 999999 do
> begin
> start := start + 100;
> result := start + rand(100);
> end
no V2's rubbish stick to V1

Arthur Hoornweg

unread,
Jul 20, 2007, 6:24:20 AM7/20/07
to
Randy Adanza wrote:
> Cheers,
>
> Hopefully this is the right group.
>
> I need to generate a unique string with a length of about 8 to12 characters.
> At first I though of just generating a GUID and use it but surely there must
> be a
> better approach. And I need to generate about 1million.
>
> It will be used as a UserName/Password pair for a scratch card.
>


The binary representation of a Guid is a 16 bytes structure.

The "normal" hexadecimal guid representation consists of 32
digits plus some parentheses and dashes, but there are other
possibilities to encode these 16 bytes more compactly:

If you encode these using base64 encoding you'll get something like
21 ASCII characters. No information gets lost, the result is still
guaranteed to be unique.


--
Arthur Hoornweg

(In order to reply per e-mail, please just remove the ".net"
from my e-mail address. Leave the rest of the address intact
including the "antispam" part. I had to take this measure to
counteract unsollicited mail.)

John Herbster

unread,
Jul 20, 2007, 9:30:11 AM7/20/07
to
"Arthur Hoornweg" <antispam...@casema.nl.net> wrote
> ... the result is still guaranteed to be unique ...

Is this difficult to prove? <g>
--JohnH

Arthur Hoornweg

unread,
Jul 21, 2007, 3:14:56 AM7/21/07
to
> Is this difficult to prove? <g>

Google for it, the algorithm has been discussed often on the
web.

AFAIK a GUID contains the MAC address (which is unique in
itself), the time/date, a counter and a random number.
So it is comprised of a constant but unique part, a continuously
incremental part (datetime, counter) and a random part which
are then shuffled somehow. It would take some mighty serious
tweaking to produce a collision.


Arthur Hoornweg
...To answer me, please just remove the .NET from my e-mail address.


John Herbster

unread,
Jul 21, 2007, 7:12:35 AM7/21/07
to
> > > ... the result [a GUID] is still guaranteed to be unique ...

> > Is this difficult to prove? <g>

"Arthur Hoornweg" wrote


> Google for it, the algorithm has been discussed often
> on the web. AFAIK a GUID contains the MAC address
> (which is unique in itself),

http://www.google.com/search?hl=en&q=%22identical+MAC+addresses%22
I guess that I Googled for the wrong thing! <g>

David Ninnes

unread,
Jul 21, 2007, 9:45:57 AM7/21/07
to
Well I had to know if it would work so here's a little procedure.
Drop a Memo, 2 edit boxes and a button on a form
and set the button click method. It will work for up to 13 characters
after this string two long ints together for overflow.

cheers,
Dave

----8<------8<------------------

procedure TForm18.Button1Click(Sender: TObject);
var
start, last : Int64;
i : integer;
lne : integer;
strings : TStringList;
maxRand, newNum : integer;

const
lookup : array [0..25] of char = ('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H',
'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U',
'V', 'W', 'X', 'Y', 'Z');

procedure printIt(a : int64);
var
b : int64;
outs : array [0..8] of char;
i : integer;
begin
outs[8] := #0;
for I := 7 downto 0 do
begin
b := a mod 26;
a := a div 26;
outs[i] := lookup[b];
end;
strings.Add(outs);
end;

begin
// start := 000000000; // (108)
// Map A = 0
// B = 1
// ...
// Z = 26
// so start := AAAAAAAAA = 0
// A long int will store up to 2^64 ~ 13 A - Z (log (2^64) base 26)
//
start := 0;
lne := 0;
strings := TStringList.Create;
randomize;
maxRand := 0;

try
for i := 0 to 1000000 do
begin
printIt(start);
// fiddle with this a bit
// as 1 million is around 2000 base 26
//
newNum := random(26*26*26*26 div 2) + 1;
if newNum > maxRand then
begin
Edit2.text := IntToStr(newNum);
maxRand := newNum;
end;
start := start + newNum;
if start <= last then
assert(1=0, 'Start ' + IntToStr(start) + ' <= than last number ' +
IntToStr(last) + ' i is ' + intToStr(i) + ' Random number is
' + IntToStr(newNum));
last := start;
if i mod 1000 = 0 then
begin
Edit1.Text := IntToStr(i);
memo1.Lines.Add(strings[i]);
end;
end;
finally
strings.free;
end;
end;

bmahn

unread,
Jul 21, 2007, 3:35:43 PM7/21/07
to
This works good. But what is to do, for a length of 12 chars. Then the first
four chars are always 'A'.
cu.

Erik Turner

unread,
Jul 21, 2007, 3:30:50 PM7/21/07
to
Use a multiplicative generator. Very fast and duplicates are eliminated by
choosing
a multiplier that is relatively prime to the modulus (1 trillion in my
code). On my machine
it generates about 3 million passwords per second. The only fly in the
ointment is if
the seed goes to zero. I seem to remember Knuth having a value that you add
as well.

Code is inline below.


"Randy Adanza" <rad...@hotmail.com> wrote in message
news:469f...@newsgroups.borland.com...


unit Util_Generate;

interface

type
TGenerator = class
private
FSeed: Int64;
FMult: Int64;
public
constructor Create(ASeed: Int64; AMult: Int64);
function GetNextPassword: Int64;
end;

implementation

constructor TGenerator.Create(ASeed: Int64; AMult: Int64);
begin
FSeed := ASeed;
FMult := AMult;
end;

function TGenerator.GetNextPassword: Int64;
begin
FSeed := FSeed * FMult MOD 1000000000000;
if FSeed < 0
then Result := 1000000000000 + FSeed
else Result := FSeed;
end;

end.


program RandPass;
{$APPTYPE CONSOLE}
uses
SysUtils, Windows,
Util_Generate in 'Util_Generate.pas';

const
RepCount = 1000000;
var
Gen: TGenerator;
R: Integer;
StartTime: Longword;
ElapTime: Longword;
begin
Gen := TGenerator.Create(877456352356475, 1948976123447);
StartTime := GetTickCount;
for R := 1 to RepCount do
Gen.GetNextPassword;
ElapTime := GetTickCount - StartTime;
Writeln('Passwords/second = ', RepCount / ElapTime * 1000:0:0);
Readln;
end.


Loren Pechtel

unread,
Jul 21, 2007, 4:23:07 PM7/21/07
to

If you keep the list sorted the performance will be O(n log n).

>
>V2 - To give a smooth spread between 0 and 1000000 you could do this
>start := 100000000; // (10^8)
> for i := 0 to 999999 do
> begin
> start := start + 100;
> result := start + rand(100);
> end
>then every 100 from 0 to a 100 million you'll get a number somewhere
>around the 100, again I've assumed rand(100) returns a number between 1
>and 99.
>
>Since there going on scratch cards I assumed the numbers would all be
>put in a bag at the end and handed out randomly, if you needed to print
>them in random order then put the numbers in a list and pick one out at
>random each time they're printed and throw the number away.

This is still going to produce a horribly non-random distribution as
well as not guaranteeing that they are unique.

bmahn

unread,
Jul 21, 2007, 5:00:47 PM7/21/07
to
This generates numbers only, but not Strings...
cu.

Jens Gruschel

unread,
Jul 22, 2007, 6:09:01 AM7/22/07
to
> The binary representation of a Guid is a 16 bytes structure.
> [...] the result is still
> guaranteed to be unique.

Not after generating more than 2^128 GUIDs ;-)


--
Jens Gruschel
http://www.pegtop.net

David Ninnes

unread,
Jul 22, 2007, 9:30:06 AM7/22/07
to
Hi Loren,

>
> If you keep the list sorted the performance will be O(n log n).
>
point taken.
<snip>

>> them in random order then put the numbers in a list and pick one out at
>> random each time they're printed and throw the number away.
>
> This is still going to produce a horribly non-random distribution as
I don't quite see the grounds for this - I haven't done a test on the
output - but they look randomish. I'd be interested to hear your reasoning.

> well as not guaranteeing that they are unique.
They are unique see my other post.

This is probably moot now anyway, Erik has posted a pseudo random number
generator that should give superior results.

cheers,
Dave

John Herbster

unread,
Jul 22, 2007, 2:35:37 PM7/22/07
to

"Erik Turner" <er...@spamless.turner.org> wrote

> Use a multiplicative generator. Very fast and duplicates are
> eliminated by choosing a multiplier that is relatively prime
> to the modulus (1 trillion in my code). On my machine it
> generates about 3 million passwords per second. The only
> fly in the ointment is if the seed goes to zero. I seem to
> remember Knuth having a value that you add as well.

> constructor TGenerator.Create(ASeed: Int64; AMult: Int64);


> begin
> FSeed := ASeed;
> FMult := AMult;
> end;

> function TGenerator.GetNextPassword: Int64;
> begin
> FSeed := FSeed * FMult MOD 1000000000000;
> if FSeed < 0
> then Result := 1000000000000 + FSeed
> else Result := FSeed;
> end;

> Gen := TGenerator.Create(877456352356475, 1948976123447);

Erik, The central operation


FSeed := FSeed * FMult MOD 1000000000000;

would be more clear to a lot of us if written
FSeed := (FSeed * FMult) MOD 1000000000000;

More importantly though, and pardon me for mentioning this,
but the above code may be a little sloppy. I checked and
found that Log2(877456352356475) + Log2(1948976123447) > 90
and less than 91, which so that the product FSeed*FMult
may stretch to 91 bits and thus overflow the 63 bit Int64
range. This may give an overflow error unless the overflow
checking has been turned OFF with {$Q-}.

The problem with overflow is that it complicates the analysis
to calculate the cycle length and distribution statistics; and
like you say it has a problem with zero.

If I recall correctly the Delphi function Random uses a
muliplicative-congruent method and has a period of 2^32, with
a LongInt (i.e. 32-bit) seed, which, of course, means that it
has no missing values and does include zero.

You who are interested might like to read and add your
comments on the proposal to expand the Delphi function
Random. You can find this in QualityCentral:
Report No: 6619 Status: Open
Expand the Random function
http://qc.codegear.com/wc/qcmain.aspx?d=6619

HTH, JohnH

Loren Pechtel

unread,
Jul 22, 2007, 4:22:32 PM7/22/07
to
On Sun, 22 Jul 2007 23:30:06 +1000, David Ninnes
<david_...@hotmail.com> wrote:

>Hi Loren,
>>
>> If you keep the list sorted the performance will be O(n log n).
>>
>point taken.
><snip>
>>> them in random order then put the numbers in a list and pick one out at
>>> random each time they're printed and throw the number away.
>>
>> This is still going to produce a horribly non-random distribution as
>I don't quite see the grounds for this - I haven't done a test on the
>output - but they look randomish. I'd be interested to hear your reasoning.
>> well as not guaranteeing that they are unique.
>They are unique see my other post.

You're adding up a bunch of randoms. The result will be a bell curve
distribution. With the number of items that go into it the sides of
the curve are going to be pretty steep.

0 new messages