Hello Steve,
You have raised a good point here.
One more reason for binary is memory consumption and IPC overhead.
On another hand list allows to represent a code point per element.
iolists are also very handy to dynamically compose a complex strings.
I am afraid that this is an application specific questions… However, I tend to use binary for strings...
----- Original Message -----
> From: Masklinn <mask...@masklinn.net>
...
>
> I don't think the former and the latter match. Erlang/OTP can be nice at
> string processing where "string" is understood as "sequence of
> bytes",
> but it remains rather ungood at *text* processing: *as far as I know*,
> aside from encoding and decoding UTFs it has very limited support for
> it[0]: no support (note: by "support" I mean "support built into
> the
> core distribution", it's always possible to call into ICU) for
> UnicodeData queries (codepoint meta-information), unicode case folding,
> grapheme cluster handling, the important text-processing annexes (UAX 14
> "line breaking algorithm", UAX 15 "normalization forms", UAX
> 29 "text
> segmentation") or standards (UTS 10 "collation algorithm" and UTS
> 18
> "regular expressions" as well — for other parts of the system but also
> part of unicode itself — UTS 35 "LDML" and the its data-formatting and
> data-parsing components), …
Good point. A strong erlang unicode library implementing the above would be very nice.
(I'm not a great fan of drivers myself.)
Best regards to Arcturus,
Thomas
Hi,I'm not sure I understand your 20.000 files example.
Are you suggesting that the user should limit the number of erlang processes to the number of cores or are you suggesting that the VM compresses the erlang process data when not running?
That would be really nice if it can be done with only a small performance penalty, say 10-20%.
In my case I start 50.000 to 100.000 processes , one per file (it's an (map reduce like) application to do feature extraction for some machine learning algorithms) .
One erlang node uses about 7GB of memory. I can probably tune it a bit (a lot?) more by using binaries but it would be nice to have an option to compress process data when not running, for people that are lazy/not an erlang expert/does not have that much time (my case)/or just as an indication of how much memory that could be saved by using binaries.
Hi,I'm not sure I understand your 20.000 files example.
Are you suggesting that the user should limit the number of erlang processes to the number of cores or are you suggesting that the VM compresses the erlang process data when not running?
That would be really nice if it can be done with only a small performance penalty, say 10-20%.
In my case I start 50.000 to 100.000 processes , one per file (it's an (map reduce like) application to do feature extraction for some machine learning algorithms) .
One erlang node uses about 7GB of memory. I can probably tune it a bit (a lot?) more by using binaries but it would be nice to have an option to compress process data when not running, for people that are lazy/not an erlang expert/does not have that much time (my case)/or just as an indication of how much memory that could be saved by using binaries.