Password With Unicode Character

1 view

Skip to first unread message

Boleslao Drinker

unread,

Aug 4, 2024, 1:53:45 PM8/4/24

to posthukage

Passwordsare usually meant to be typable on any generic keyboard, so they are typically generated using the commonly available characters. But for passwords which will only be kept digitally, would it be a good idea to maximize the guessing time by using the entire Unicode range of characters? Or are there reasons to believe some or most Unicode supporting sites/apps could still limit their allowed character range?

This is a good idea from a security perspective. A password containing unicode characters would be harder to brute-force than a password containing ASCII characters of the same length. This holds up even if you compare byte-length instead of character length, because Unicode uses the most significant bit whereas ASCII does not.

However, I think it wouldn't be practical since Unicode bugs are so common. I think if you use Unicode passwords everywhere you will encounter more than a couple of sites where you would have problems logging in, because the developers didn't correctly implement Unicode support for passwords.

This sounds a lot like Fencepost Security. Imagine you're running a facility that has chain-link fencing around it that is 500 feet high. How much would the security improve by making that fencing 3,000 feet high? None - because anyone trying to get in isn't going to climb the 500 feet; they're going to dig underneath, cut a hole, etc.

Likewise, you've got a password that's, say, 20 random alphanumeric characters. That's 62^20 possibilities. You're considering changing it to 20 random unicode characters. Which raises the possibility space much higher, except brute-forcing a 20-character randomized password isn't how things are going to get compromised.

Ignoring the Security by Obscurity argument it is a basic question of entropy. An 8 character unicode password is more secure than an 8 character ASCII password but less secure than a 64 character ASCII password.

In general I agree with Sjoerd - these are likely to cause more inconvenience than benefit. On top of this if ever you need to manually enter a password random Unicode is likely going to make your life miserable.

However for the edge case where you need to use a service which actively supports unicode whilst enforcing a maximum password length limit (again ignoring this usually indicates other security failings) there is an argument for it.

The only valid reason I can think of for using Unicode characters in passwords is if the number of characters (not bytes) in a password for a particular site is limited (like this dumb bank that previously had a max of 10 characters), so that it would be easily guessed in a day or two. In this case, you can use Unicode (if the site owners let you) to get more entropy into your password in the mean time while you ask the site owners to comply with NIST 800-63-3 and remove length restrictions (and hash properly so password storage is not a concern).

While true for normal ASCII, the extended-ASCII that some password managers (e.g. KeePass) can use in password generation uses every single bit of each byte, thus having a higher entropic density than even Unicode, which still has some structure to indicate how many of the following bytes are part of the same character (Note that there is such a thing as an invalid byte sequence in Unicode).

Since a site that limits you to short passwords probably doesn't even hash their passwords properly (causing Unicode passwords to fail or be stored with the wrong encoding), you should almost never waste the time to bother with Unicode passwords because while you are so distracted with your funny characters (and the fact that you have to reset your password because it was stored strangely), an attacker could be guessing your (in)-security questions or using chocolate cryptography to gain access to your account.

Confidentiality of passwords is achieved through the principles of entropy: how 'unguessable' is your password? This is commonly measured by the size of the brute force guessing space, expressed in terms of powers of 2 or bits. A brute force attacker has only so much capacity to guess; by selecting a longer password to increase this entropy you can exceed any known or predicted capability to guess. Getting the entropy over 80 bits (or pick your value) will put the password out of reach of even nation state actors. Regardless of the overly simplified description above, the point is that going above and beyond whatever "out of reach" is doesn't significantly add to your security. And it isn't relevant to security if you achieve the desired entropy by using 10 Unicode characters or 17 ASCII characters.

Availability means "can I get to my data when I need it?" If you use full Unicode character sets, you risk running afoul of various sites that don't support Unicode, or browsers or OSes that implement Unicode incorrectly, or sites that invisibly translate Unicode to ASCII under the covers. The resulting confusion increases the risk of restricting your future access to the data. This represents a potential decrease in future Availability.

In general, the likelihood of an attacker brute forcing your 80 bit password is not nearly as high as the likelihood of encountering a poorly coded site that doesn't handle Unicode properly. Therefore your overall security could be decreased instead of increased.

Of course, many sites have password length and other restrictions that dramatically limit the entropy of your passwords, too. In those cases, using the full Unicode set may increase the entropy of your passwords, assuming they don't have other hidden flaws. So on those sites, you may be improving your security; but it's virtually impossible to tell from outside if a site is properly handling your password data.

This might not be a direct answer, but if the password is only kept digitally, then you should ask yourself why you are generating a password at all, instead of a byte-array. Once you look at the whole thing as simply bytes, the question doesn't apply anymore.

This document describes updated methods for handling Unicode strings representing usernames and passwords. The previous approach was known as SASLprep (RFC 4013) and was based on Stringprep (RFC 3454). The methods specified in this document provide a more sustainable approach to the handling of internationalized usernames and passwords.

The other side of the problem is how difficult it is to remember the password, and to type the password. Imagine you try to type your password on an iPhone and you realise you can't (I haven't checked how hard it is to type arbitrary Unicode characters). Or you realise that it is very, very difficult to type it 100% correctly. Or it just takes you ages - I might have a password with the same entropy, and more characters, but twice as fast to type. And you need four attempts to get it right, while mine is right the first time.

Others have amply remarked on the risk that the services you use those passwords for will not implement Unicode correctly. I'll add that services that today do might tomorrow cease to do so, but otherwise I'll skip that topic.

Well, Unicode is 'just' a list of something over a 130000 characters. UTF-8 is the most common encoding that takes that one big number and 'converts' it to a base-256 number (or more precisely, the rules make more sense in binary octets) according to a set of rules. Thus if you want to use a utf-8 encoding, you'd be bound to a lot of rules effectively decreasing the randomness you might desire. And I don't know how you might to use the whole Unicode interpretation.

If you are not concerned about printable characters, you might consider the whole ASCII (or more preferably some 8 bit extension), but at that point, why even bother with character interpretation standard at all? Couldn't you simply use some simple formless random binary structure then?

No. You want to increase the base of an exponential function while risking a lot of things breaking (i.e. devices which cannot type your special characters, etc.).Calculate the entropy of your unicode password and make your ascii password longer until it has more entropy.

a-z alone is 26^(length). lets say you get 256^(length) and possibly 2 bytes per character with unicode. Then you can find the break even 26^(ascii_length) > 256^(2*unicodelength) somewhere. Choose this length as ascii_length and you can still write down your password and have the same security.

If the site does not support long passwords (shame on them), I would suspect they cannot guarantee good unicode support either. Maybe you will be locked out the next time they upgrade some internal library. So why risk a problem there? And a problem, which is hard to explain to the user support, which hardly knows what unicode means.

It is not a good idea to generate random Unicode passwords, because the generated password may be unreadable to the user. But the text of the question talks about using Unicode passwords, and this is a good idea and recommended by NIST 800-63-3 section 5.1.1.2 Memorized Secret Verifiers which says:

Verifiers SHALL require subscriber-chosen memorized secrets to be at least 8 characters in length. Verifiers SHOULD permit subscriber-chosen memorized secrets at least 64 characters in length. All printing ASCII [RFC 20] characters as well as the space character SHOULD be acceptable in memorized secrets. Unicode [ISO/ISC 10646] characters SHOULD be accepted as well.

If Unicode characters are accepted in memorized secrets, the verifier SHOULD apply the Normalization Process for Stabilized Strings using either the NFKC or NFKD normalization defined in Section 12.1 of Unicode Standard Annex 15.

If you have the right codepage active (UTF-8), you can do this in theory (at least in an graphical terminal), but it's really not recommended. The problem is that you can end up trying to enter your password where the codepage doesn't contain the character that you're trying to input, which is a little bit of a problem.

It's better to follow good password creation rules (at least 12 characters, mix of uppercase, lowercase, numbers and symbols, and random) and stick with ASCII-printable characters: the extra characters provided by Unicode probably don't outweigh the issues you can face when you can't enter your password because the system doesn't allow half your password to be input!