[ANN] Comeonin password hashing library v0.7.0 -- many changes to bcrypt

David Whitlock

未讀,

2015年4月17日晚上8:54:202015/4/17

收件者：elixir-l...@googlegroups.com

Hi all,
A new version of Comeonin, a password hashing library which supports bcrypt and pbkdf2_sha512, is now available here.
Under the hood, there have been many changes to the bcrypt implementation to make it more Erlang-VM-scheduler-friendly. Instead of calling one long-running NIF, it now calls certain NIFs multiple times (each time for a very short time).
The other main change is that crypto.strong_rand_bytes is now used by default for salt generation.
Any questions, or feedback, please let us know,
David Whitlock

José Valim

未讀,

2015年4月18日凌晨3:31:442015/4/18

收件者：elixir-l...@googlegroups.com

Fantastic work David. I love the docs and the new improvements related to the Erlang VM. Btw, have you had the chance to play with dirty schedulers in Erlang 18? (just being curious).

I have one recommendation to make though. Today you use the String module in many operations in the bcrypt module and I believe you do not want to use it. The String module is about utf-8 and codepoints and it seems you are always working with bytes. For example, take this line:

https://github.com/elixircnx/comeonin/blob/master/lib/comeonin/bcrypt.ex#L74

If, by any chance, you have two bytes representing the letter "é" in the salt, you will split on the 30th byte instead of the 29nth one. It seems you encode the passwords using base64, so that is unlikely to happen, but I believe using the functions in the :binary module make much more sense conceptually and it would be a better practice here. The same for the conversions (list_to_binary or String.to_char_list and so on).

Have a good weekend!

José Valim

www.plataformatec.com.br

Skype: jv.ptec

Founder and Lead Developer

--
You received this message because you are subscribed to the Google Groups "elixir-lang-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-ta...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-talk/51605f52-d55c-404a-9ef7-d792f237a309%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Whitlock

未讀,

2015年4月18日清晨6:42:212015/4/18

收件者：elixir-l...@googlegroups.com、jose....@plataformatec.com.br

Hi, thanks for all the feedback. I'm glad you like the docs, as I see them as an important part of the whole package (especially for security-related software). I've heard about dirty schedulers, but I haven't played with them yet. We did try using the enif_schedule_nif function, but it made the C code a lot more complex, so I opted to write as much in Elixir as possible and keep the C code to a minimum.
About using the String module, dealing with the different types is a bit tricky, especially when dealing with lengths. In the case you mention, I think it needs to be a string or a charlist, and it does get converted to a charlist just before the first C function is called. It might be an idea to convert the password and salt to charlists earlier (maybe at the entry of the main hashing function rather than halfway through), and that's something I'll look at later.
If you have any other questions, just let me know,
David

José Valim

未讀,

2015年4月18日清晨6:56:132015/4/18

收件者：elixir-l...@googlegroups.com

Even if you are working with codepoints, the String functions could still be wrong because they work on graphemes above all else (I was not completely precise, sorry). So we consider "e" and "^" to be two codepoints but when calculating the length, it is a single grapheme "ê". So I would definitely avoid it, even if using base64, because they are less efficient.

The choice between binaries and lists supposedly does not matter if you have bytes or everything in base64 encoding (ascii-only). It is a matter of choosing which API makes the code look better and doing straight list <-> binary conversions (and not string <-> char list ones).

José Valim

www.plataformatec.com.br

Skype: jv.ptec

Founder and Lead Developer

To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-talk/898f6425-d0d2-49d5-b0a5-48decb15e27e%40googlegroups.com.

David Whitlock

未讀,

2015年4月19日凌晨12:59:392015/4/19

收件者：elixir-l...@googlegroups.com、jose....@plataformatec.com.br

At the moment, I'm working on an Erlang version of the bcrypt implementation, and that's making it easier to see how to avoid using Strings. I'm aiming to use mainly charlists and do a String -> char_list conversion as close to the entry of the module as possible, and then convert it to a string/binary on returning. I need to avoid using binary -> list conversion of the password because that will cause problems with passwords that contain non-ascii characters.

James Fish

未讀,

2015年4月19日清晨6:54:262015/4/19

收件者：elixir-l...@googlegroups.com、José Valim

Why would using binary to list causes problems with non-ascii characters? The only problem it would cause is that old hashes would nolonger work because handling of non-ascii characters would be different. However bcrypt works on bytes, not unicode codepoints, so it is very strange to require a utf-8 encoded string rather than a binary.

Looking at your NIF for bcrypt it will raise a badarg error if the codepoints of the char list are not in the latin1 range which is 0 to 255, or equivalent to a list of bytes. This is a contradiction and means that passwords containing codepoints greater than or equal to 256 can not be hashed. This can be solved by using binary to list instead of converting to a char list. However this will mean hashes are incompatible with those from prior versions, as mentioned above.

I do not understand your comment about writing an Erlang version to avoid using String because an Erlang string is equivalent to an Elixir char list. Note that the same issue is going to occur there for codepoints greater than or equal to 256.

To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-talk/97b8bd6e-fde1-4c29-97cf-2b32aa6f7177%40googlegroups.com.

David Whitlock

未讀,

2015年4月19日晚上8:53:542015/4/19

收件者：elixir-l...@googlegroups.com、jose....@plataformatec.com.br

First of all, thanks for pointing out the bug with characters not in the latin1 range. That's something I need to test and fix soon.
Actually, when I said non-ascii characters last time, I should have said characters from the extended ascii set. I was having problems using binary to list because it was giving me the wrong key length. Maybe this shell output can explain it better:
iex(2)> String.length "£"
1
iex(3)> byte_size "£"
2
iex(4)> cl = String.to_char_list "£"
[163]
iex(5)> length cl
1
iex(6)> bl = :binary.bin_to_list "£"
[194, 163]
iex(7)> length bl
2
When it gets to the NIF, this character will be interpreted as one character, and if I send it a key_len of 3 (including the NULL character), it will not work.
My point about the erlang version is just that it was useful to clear things up in my own mind on how to approach this. I am well aware that an Erlang string is equivalent to an Elixir char list.
I've started a new branch, which I think is a lot cleaner (the string is converted to a charlist as soon as it enters the module) and maybe explains what I'm trying to do better.

José Valim

未讀,

2015年4月20日凌晨3:14:082015/4/20

收件者：elixir-l...@googlegroups.com

When it gets to the NIF, this character will be interpreted as one character, and if I send it a key_len of 3 (including the NULL character), it will not work.

That is the case only if you calculate the length in one way and convert it in another way. If you use both:

iex(6)> bl = :binary.bin_to_list "£"
[194, 163]
iex(7)> length bl
2

You definitely have a list of 2 entries and the length is 2 (without null bytes). The reason I am confident binary to list should not be an issue as long as you use the proper functions throughout is that [194, 163] is also a valid list of codepoints/characters and the layer below has no mechanism for knowing what you had originally, so both should work as long as you are consistent.

Overall, that's the benefit of working with bytes, you have the luxury of not caring at all which encoding is given to you and get better performance. And, even if you call the Base functions on them, you can still assume you are working only with bytes, because ascii maps to the proper subset.

José Valim

未讀,

2015年4月20日凌晨3:27:312015/4/20

收件者：elixir-l...@googlegroups.com

Ok, please ignore my previous comment. :)

I have tried to do the changes myself and it fails the OpenBSD tests. It seems that, in order to be compatible with other libraries, you need to count characters and not bytes. So even though working with bytes would be better, you have other constraints that does not allow you to do such.

José Valim

www.plataformatec.com.br

Skype: jv.ptec

Founder and Lead Developer

José Valim

未讀,

2015年4月20日清晨5:40:092015/4/20

收件者：elixir-l...@googlegroups.com

Ok, James pointed out on IRC that the tests are actually wrong. \xA4 in a string is going to add the codepoint represented by \xA4, therefore taking two bytes. I have sent a pull request that fixes the test and uses bin_to_list everywhere and everything still passes:

https://github.com/elixircnx/comeonin/pull/15

This breaks all hashes currently stored if the password contains non-ascii bits, however, I believe it is the correct thing to do.

PS: in retrospect, I realize it is confusing, compared to other languages, that \x represents a codepoint instead of bytes. It makes sense in Elixir (strings are about utf-8) but it can be confusing on first look.

José Valim

www.plataformatec.com.br

Skype: jv.ptec

Founder and Lead Developer

David Whitlock

未讀,

2015年4月20日下午4:58:442015/4/20

收件者：elixir-l...@googlegroups.com、jose....@plataformatec.com.br

Thanks to you and James for your feedback and help. The program's a lot better now, and I've learned a lot in the process.
About passwords with non-ascii characters, as James pointed out, there would be an error before, but now they work.

回覆所有人

回覆作者

轉寄