The new stuff is at the bottom: https://github.com/pao/julia/blob/topic/strpack/extras/strpack.jl
Wow, Patrick, this is amazing. (Note to others: I understood the
show_struct_pads output better once I displayed Patrick's message with a fixed-
width font.)
I don't know much about packing, but I presume that packing is defined in the
source, right? So it's not the case that it will depend upon whatever compiler
flags were set when the library was compiled?
What happens if a structure has a structure member? Do you just concatenate
the strings?
i8 Int8i16 Int16i32 Int32i64 Int64u8 Uint8u16 Uint16u32 Uint32u64 Uint64f32 Float32f64 Float64b1 1-bit Boolb8 8-bit Boolb32 32-bit Boolc Char (32-bit)s<n> n-byte UTF-8 strings NUL-terminated UTF-8 string
_<n> n-byte padding
s"i32 f32" => natives"i32 f32"b => big-endians"i32 f32"l => little-endian
s"i32 f64[12]"
s"i32 f64^12"
s"count: i32, values: f64[12]"
Another thought is that I find struct specifiers completely unreadable. Even though I've used the pack and unpack functions in Perl and Ruby hundreds of times, I have to pull up the damned documentation *every* time to lookup what the letter specifiers mean. Could we do something a little saner that matches Julia's type names better and is a bit more combinatorial instead? Something like this:
For clarity, I would suggest that spaces and commas be allowed and completely ignored in struct specifications, letting the programmer organize things visually as they see fit.
Bools are an interesting case because they are represented by ints in C, which are (typically) 32-bits, but in C++ bool is an actual type and it's 8-bit. C structs also support packed bitfields, so there are lots of different cases. Maybe b1 could mean a single-bit boolean, while b8 would be a bool represented as 8-bits, while b32 would be a bool represented as a 32-bit int. Obviously, we could support 16-bit and 64-bit versions too, just for the sake of completeness.
The numbers are a little mixed-up because some things use bit counts and other use byte counts, but giving a string length in bits is really weird. Doing everything in bytes makes sense, but there's a strong case to be made for b1, although maybe b without a number could mean a 1-bit Bool field in that case.For endianness specifications, it would be nice to be able to just use the non-standard string literal suffix business to specify it for the whole thing:s"i32 f32" => natives"i32 f32"b => big-endians"i32 f32"l => little-endian
I'm not sure how useful it really is be to be able to specify per-field endianness. It's hard to imagine actual use cases where that's a reasonable thing to do.
There's also a question of arrays and repetition. Consider the case where a struct contains an inline array of 12 doubles. Maybe that could be written like this:s"i32 f64[12]"This struct would have two fields, the second of which is an array. Repetition is related but slightly different: you may want to avoid writing out 12 individual fields; we could possibly use ^ for this:
s"i32 f64^12"This struct would have 13 fields, rather than two.
Yet another thought: what about named fields? Something like this might make sense:s"count: i32, values: f64[12]"
This may be starting to look a little too much like a mini programming language though.
Starting to look a lot like erlang's bit syntax.
Starting to look a lot like erlang's bit syntax.
Ok, sounds like you're all over this. I'm not sure how much of a concern porting struct code from Python really is and Perl and Ruby already have a different and incompatible pack/unpack DSL, so I feel like a fresh start that's easier to understand is pretty defensible.
Regarding * versus ^ for repetition, Julia strings use * for concatenation and ^ for repetition already, which is more commensurate with the mathematical view of strings as a non-commutative monoid. Using + for concatenation is an algebraic horror because string concatenation is very non-commutative, while + is only used to denote a commutative operation in algebra. Using * for repetition is also weird because it's a little hard to remember if you should write str*n or n*str. With ^ it's completely obvious: the power has to be the count and the "base" has to be the string. This also allows using the power_by_squaring algorithm to do more efficient string repetition, although I created the RepString type instead (which Jeff really hates).
Regarding * versus ^ for repetition, Julia strings use * for concatenation and ^ for repetition already, which is more commensurate with the mathematical view of strings as a non-commutative monoid. Using + for concatenation is an algebraic horror because string concatenation is very non-commutative, while + is only used to denote a commutative operation in algebra. Using * for repetition is also weird because it's a little hard to remember if you should write str*n or n*str. With ^ it's completely obvious: the power has to be the count and the "base" has to be the string. This also allows using the power_by_squaring algorithm to do more efficient string repetition, although I created the RepString type instead (which Jeff really hates).