8 Digit Random

0 views
Skip to first unread message

Rachelle Kun

unread,
Aug 3, 2024, 1:29:24 PM8/3/24
to pestmelliti

You tell that you check for duplicates, but be cautious since when most numbers will be used, the number of "attempts" (and therefore the time taken) for getting a new number will increase, possibly resulting in very long delays & wasting CPU resources.

There are some great answers, but many use functions that are flagged as not cryptographically secure. If you want a random 6 digit number that is cryptographically secure you can use something like this:

Caution This function does not generate cryptographically secure values, and should not be used for cryptographic purposes. If you need a cryptographically secure value, consider using random_int(), random_bytes(), or openssl_random_pseudo_bytes() instead.

Second time through the loopGenerate a random number between 100,000 and x1 call this xt2, then generate a random number between x1 and 999,999 call this xt3, then randomly choose x2 or x3, call this x2

Note that here we use random_int, which was introduced in PHP 7 and uses a cryptographic random generator, something that is important if you want random codes to be hard to guess. random_bytes was also introduced in PHP 7 and likewise uses a cryptographic random generator.

The code above generates a string of 6 decimal digits. If you want to use a bigger character set (such as all upper-case letters, all lower-case letters, and the 10 digits), this is a more involved process, but you have to use random_int or random_bytes rather than rand(), mt_rand(), str_shuffle(), etc., if the string will serve as a password, a "confirmation code", or another secret value. See an answer to a related question, and see also: generating a random code in php?

I'm using the following perl code to generate random alphanumeric strings (uppercase letters and numbers, only) to use as unique identifiers for records in my MySQL database. The database is likely to stay under 1,000,000 rows, but the absolute realistic maximum would be around 3,000,000. Do I have a dangerous chance of 2 records having the same random code, or is it likely to happen an insignificantly small number of times? I know very little about probability (if that isn't already abundantly clear from the nature of this question) and would love someone's input.

Based on the equations given at _paradox#Approximation_of_number_of_people, there is a 50% chance of encountering at least one collision after inserting only 55,000 records or so into a universe of this size:

As mentioned previously, the birthday paradox makes this event quite likely. In particular, a accurate approximation can be determined when the problem is cast as a collision problem. Let p(n; d) be the probability that at least two numbers are the same, d be the number of combinations and n the number of trails. Then, we can show that p(n; d) is approximately equal to:

While I don't know the specifics of exactly how you want to use these pseudo-random IDs, you may want to consider generating an array of 3000000 integers (from 1 to 3000000) and randomly shuffling it. That would guarantee that the numbers are unique.See Fisher-Yates shuffle on Wikipedia.

The Mersenne Twister is a fast pseudorandom number generator (PRNG) that is capable of providing large volumes (> 10^6004) of "high quality" pseudorandom data to applications that may exhaust available "truly" random data sources or system-provided PRNGs such as rand.

A pseudo-random number generator (PRNG) is typically programmed using a randomizing math function to select a "random" number within a set range. These random number generators are pseudo-random because the computer program or algorithm may have unintended selection bias. In other words, randomness from a computer program is not necessarily an organic, truly random event.

A true random number generator (TRNG) relies on randomness from a physical event that is external to the computer and its operating system. Examples of such events are blips in atmospheric noise, or points at which a radioactive material decays. A true random number generator receives information from these types of unpredictable events to produce a truly random number.

A random number is a number chosen from a pool of limited or unlimited numbers that has no discernible pattern for prediction. The pool of numbers is almost always independent from each other. However, the pool of numbers may follow a specific distribution. For example, the height of the students in a school tends to follow a normal distribution around the median height. If the height of a student is picked at random, the picked number has a higher chance to be closer to the median height than being classified as very tall or very short. The random number generators above assume that the numbers generated are independent of each other, and will be evenly spread across the whole range of possible values.

A random number generator, like the ones above, is a device that can generate one or many random numbers within a defined scope. Random number generators can be hardware based or pseudo-random number generators. Hardware based random-number generators can involve the use of a dice, a coin for flipping, or many other devices.

A pseudo-random number generator is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. Computer based random number generators are almost always pseudo-random number generators. Yet, the numbers generated by pseudo-random number generators are not truly random. Likewise, our generators above are also pseudo-random number generators. The random numbers generated are sufficient for most applications yet they should not be used for cryptographic purposes. True random numbers are based on physical phenomena such as atmospheric noise, thermal noise, and other quantum phenomena. Methods that generate true random numbers also involve compensating for potential biases caused by the measurement process.

The Folly of Infinite Compressionis one that crops up all too often in the world of Data Compression. The man-hours wasted on comp.compressionarguing about magic compressors is now approaching that of the Manhattan Project.

How does it work? I took awell-known fileof random digits created by the RAND group (no doubt at taxpayer expense), and converted that file into binary format, which squeezed out all the representational fluff.

This challenge has a special place for the variant of Magic Compressors known as Recursive Compressors. Some savants will claim that they have a compressor that can compress any file by a very small amount, say 1%. The beauty of this is that of course they can repeately use the output of one cycle as the input to another, compressing the file down to any size they wish.

For those people, a working program should be able to meet the challenge quite easily. If their compressed data file is a mere 512 bytes, that leaves 400K of space for a decompressor that can be called repeatedly until the output is complete.

If you can tell us the entire story of what you want we don't have to keep coming back with more questions. So far we know you want a 7 digit random that must be unique. Do you have a table this belong in or this a separate table of these values? Do you need to generate these one at a time (like for an insert) or do you need a big list of them? The reason is because depending on the needs it will greatly affect how to go about this.

There are exactly 8,999,999 distinct integers between 1,000,000 and 9,999,999. Depending on what these numbers are being generated for, you could start having collisions reasonably soon (thanks to our good friend the birthday paradox). If the only criteria is length and uniqueness, and identity column starting at 1000000 with a check constraint limiting it to less than 10,000,000 would constraint it such.

A new telephone survey experiment finds that, despite major structural differences, an opinion poll drawn from a commercial voter file can produce results similar to those from a sample based on random-digit-dialing (RDD). The study intentionally pushed the boundaries of current polling practices by employing a voter file and a registration-based sampling (RBS) approach as the basis of a full national sample. While voter files are widely used for election surveys at the state and local level, relatively few pollsters have employed them for national surveys. As a result, there are few settled best practices for how to draw national samples from voter files and how to handle missing phone numbers.

The study also tackles the question of how successful voter files are in representing Americans as a whole, including those who are not registered to vote. This research was possible because voter file vendors are increasingly trying to provide coverage of all U.S. adults, including those who are not registered to vote, by combining state voter rolls with other commercially available databases.

On the large majority of survey questions compared (56 of 65), RBS and RDD polls produced estimates that were statistically indistinguishable.1 Where the polls differed, the RBS results tilted somewhat more Democratic than the RDD results.

An analysis of survey participation among registered voters in the RBS sample found that any partisan differences between RDD and RBS surveys are unlikely to be the result of too many Democrats responding. In fact, the set of confirmed registered voters who participated in the RBS survey were somewhat more Republican than the national voter file as a whole in terms of their modeled partisanship (38% vs. 33%, respectively).2 The routine demographic weighting applied to the sample corrected most of this overrepresentation.

The major limitation of RBS for telephone polling is the absence of a phone number for wide swaths of the public. Unlike RDD samples, which are based on telephone numbers, RBS samples are based on lists of people who may or may not have an associated telephone number on the file. In the national voter file used in this study, a phone number was available for 60% of registered voter records and 54% of the nonregistered adult records. A key finding is that this low coverage rate did not translate into inferior estimates, relative to RDD. On 15 questions where benchmark data were available from government surveys, the RBS and RDD polls showed similar levels of accuracy on estimates for all U.S. adults and also in a companion analysis that examined five benchmark questions for registered voters. When the RBS and RDD estimates differed from the benchmarks, they both tended to overrepresent adults who are struggling financially. For example, the American Community Survey finds that about one-in-ten U.S. adults (10%) do not have health insurance, but this rate was 13% in the RDD survey and 14% in the RBS.

c80f0f1006
Reply all
Reply to author
Forward
0 new messages