I have a similar problem to parse some fixed length files (TAP files
used in the telecoms industry). I found binary pattern matching to
be vastly faster then other options, but as you say, there are many
limits on matching options
So I have lots of functions named "parse", one of which will match
and nibble something from the input (technically every record is 160
octets in my problem, but we don't make use of that)
Example function:
https://gist.github.com/ewildgoose/9793fff12e4092e75383
The reason for the repetitive formulation is that I want to rewrite
this using a macro to generate the code from a specification, so
something more like how Ecto defines a table format. The macros
aren't done, but things I observed in the research process:
- It seems to be the same speed to use nested pattern matches as
flat patterns
- So for example you could use a macro to generate the repetitive:
<<j::utf8, o::utf8, s::utf8, e::utf8>> piece to match on
- It seems feasible to then compose the whole match using further
macros:
<< <<j::utf8, o::utf8, s::utf8, e::utf8>>,
.... >>
- So in your case you could take the pain away from matching long
utf8 strings if you use a macro to compose the function heads? (you
would need to end up with a lot of extra variables which you would
glue back together to get the desired string)
I guess it would be nice to raise a feature request upstream to be
able to do something more like:
<< name::utf8-size(20) >>
Is this likely to be accepted?
Ed W