I want to calculate the strength of the password. But my question is
related to the entropy of the characters. So I have the program that
calculates the frequencies of the symbols, single character, bigrams,
word starting and ending chars.
I want to calculate the entropy of the given password based on these
character probabilities.
I know that the entropy is defined as:
H(X)= - Sum [P(x_i).logP(x_i) ]
for a random variable X, with n, outcomes { x_i : i = 1,... ,n}.
If I want to calculate the entropy of a single character, how will I
use this formula? So I can use it to calculate the entropy of the
password "password" by using:
H(password)=- (P(p).logP(p) + P(a).logP(a) + .... + P(d).logP(d))
so this is equivalent with:
H(password) = H(p) + H(a) + .... + H(d)
that means I can create a table that will store the entropy of each
character and whenever the user enters the password to measure its
strength I would just look to the table and sum the entropies of the
characters in the given password.
Is it the case when you want to calculate the entropy of the password?
But actually if I consider the conditional probabilities it gets more
complicated with the formulas:
H(password) = H(p) + H(a|p) + H(s|pa) + ... + H(d|passwor)
where H(y|x) means the conditional entropy of y where x is given.
So which formula should I use to calculate the entropy?
I hope I am clear enough.
Thanks a lot in advance.
ASLI
None. The question has no unique or even well defined answer.
That formula is only useful if the probabilities are independent. The
entropy is also relative to the "search procedure". Thus it may be that
the searcher has special love for the phrase "ajkl&T)(Salkelkap7uy70 "
in which case the entropy of that phrase would be very low for him.
Another searcher might not. A common Russian word might have high entropy for
an english speaker, but clearly not for a Russian.
Ie, your question is ill defined. Given a search method you could make
an estimate of the "entropy" Eg for an exhaustive search which ran
through the alphabet. ( all strings with less than 15 letters, starting
with "a" then "b" etc) the word zoo would have a very high entropy,
While if the search did 1 letter, then 2 letter, then 3 etc, it would be
low.
>Hello all,
>
>I want to calculate the strength of the password. But my question is
>related to the entropy of the characters.
You might find it easier to pick the strength that you require and
then generate a password/passphrase with that amount of entropy.
I would suggest Diceware:
http://world.std.com/~reinhold/diceware.html
as one possibility.
rossum
Thanks a lot for your reply. That is the reason why everything gets
complicated. If you check the below link, there exists a strength
checker. The important part for me is the area that shows the entropy.
http://www.certainkey.com/demos/password/
I really wonder how they calculate it. The code is:
function calcEntropy(pswd){
var ai=new Array();
for(var i=0;i<pswd.length;i++){
var c=pswd.charCodeAt(i);
if(ai[c]==undefined)
ai[c]=0;
ai[c]++;
}
entropy=0;
for(var i=0;i<ai.length;i++){
if(ai[i]!=undefined &&ai[i]!=0){
var d=ai[i]/ pswd.length;
entropy+=d * Math.log(1.0 / d);
}
}
entropy /=Math.log(2);
var p=entropy,v=0;
var ret="";
p-=v=Math.floor(p);
p *=10;
ret+=v+".";
p-=v=Math.floor(p);
p *=10;
ret+=v;
p-=v=Math.floor(p);
p *=10;
ret+=v;
return ret;
}
Thanks a lot "rossum". I will check Diceware and comment as soon as
possible.
Greets,
ASLI
Or you could try wgen-- (www.theory.physics.ubc.ca/wgen/wgen.c) a crypto
password generator that generates "English" ) or whatever language you
choose) type words (Ie they seem to follow the pronunciation style of
English) with an entropty estimate. They use a dictionary to derive the
trigram and quadrigram frequencies of the letters in the words, and then
randomly generate strings of letters with the same frequencies, together
with an estimate of the probability of getting that particular string of
letters if one generated those lists many many many times.
By default it uses /usr/share/dict/words in Linux.
Any large word list from English would do.
As unruh noted, entropy in this sense ("Shannon entropy") is a
property of a probability distribution. It does not make sense to
talk about the entropy of a single, fixed value (except to state that
it is zero, which is technically true, if trivial).
When we speak of "the entropy of a password", that's really shorthand
for the entropy of the probability distribution according to which the
password was randomly chosen. That shorthand makes little sense for
user-chosen passwords, since we generally cannot know the distribution
according to which a given user chooses their passwords.
[snip]
> Thanks a lot for your reply. That is the reason why everything gets
> complicated. If you check the below link, there exists a strength
> checker. The important part for me is the area that shows the entropy.
>
> http://www.certainkey.com/demos/password/
>
> I really wonder how they calculate it. The code is:
>
> function calcEntropy(pswd){
> var ai=new Array();
> for(var i=0;i<pswd.length;i++){
> var c=pswd.charCodeAt(i);
> if(ai[c]==undefined)
> ai[c]=0;
> ai[c]++;
> }
> entropy=0;
> for(var i=0;i<ai.length;i++){
> if(ai[i]!=undefined &&ai[i]!=0){
> var d=ai[i]/ pswd.length;
> entropy+=d * Math.log(1.0 / d);
> }
> }
> entropy /=Math.log(2);
What this code calculates, if I'm reading it correctly, is the entropy
of picking a single random character from the password. (The rest,
which I snipped, just seems truncate the result to two decimal places,
Rube Goldberg style. It could all be replaced with a simple "return
entropy.toFixed(2);" statement.)
Anyway, I wouldn't consider this method at all useful as an indicator
of password strength. For example, it returns the same value for both
"abcdefghijklmnopqrstuvwxyz" and "poskvlqbtacynmxwfgirdjuhze", even
though the latter is obviously a stronger password.
--
Ilmari Karonen
To reply by e-mail, please replace ".invalid" with ".net" in address.