@regex_bind a_string
["skip.to.regex" stuff_between [integer "[0-9]"]
grab_what_is_after]
{stuff_between, integer, grab_what_is_after}
I assert `.head == :hcat` currently. I choose hcat because i prefer
spaces over commas in this case.
Biggest place where this is incomplete is that if it doesn't match, it
just stops and does nothing!!
Of course more things could be defined, instead of `[integer "[0-9]"]`,
perhaps allow `integer::Int64` to tell how to parse that. And also
allow negative values in this case. Which could be a caveit, because
without unsigned types we can't specify otherwise. Alternatively simply
do `[integer :integer]` or `[integer integer]`. Maybe `[expr expr]` to
fetch an expression. Now i think of that, `[output
reference_function_that_eats_string]` could be very extendable.
Otherwise one could define a lot of stuff, for instance `{}`, `[,]`
or even 'function calls' having different meanings, but that would
likely make too much of a mess.
Probably other names, like `regex_var`, `var_filling_regex` are better
(maybe even just `regex`, though `Regex` already taken)
One alternative to this whole thing is using a stream and having a
`string_upto_regex`, `next_regex_match`, `skip_to_regex`,
`upto,match = upto_and_match_next_regex`. But that potentially scatters
what is being matched. It also requires typing the function name
all the time.(something a macro could somewhat alieviate)
Talking about streams, is there a string-stream already? Like this
http://linux.die.net/man/3/fmemopen, or the Common Lisp one. What about
macros like `with-open-file`(opening and closing a file for you)
`with-string-stream`(same for string) Maybe i'll try make a bit of a
Common Lisp-Julia rosetta stone, for both reference and inspiration for
Julia(macro) (standard)libraries.
m = match(r"b(a)(r)", str)if m != nothingx, y = m.captures# do something with the match contentselse# handle not matchingend
match(r"b(a)(r)", str) do x,y# do something with the match contentsend
match(r"b(a)(r)", str) do x,y# do something with the match contentselse# handle not matchingend
match((x,y)->begin# do something with the match contentsend, ()->begin# handle not matchingend, r"b(a)(r)", str)
Ok, sorry, that was sort of a tangent, but this reminded me of it... Can you talk a little more about your motivations for this destructuring regex approach? At one point a while back, I was talking about how the problem with regular expression matching is that it has to both be able to capture all the matching metadata and bind them to variables with names *and* be able to indicate when there is no match. This leads to the return type of the match function being Union(Nothing,RegexMatch), which is not horrible, but kind of unfortunate. Every usage ends up looking like this too:m = match(r"b(a)(r)", str)if m != nothingx, y = m.captures# do something with the match contentselse# handle not matchingend
The origin of the idea is basically Common Lisps `destructuring-bind` i
used the `regex` library to make `regex-list` which makes a list of
matches when given a list of regular expressions and a string.
https://github.com/o-jasper/j-basic/blob/master/src/regex-sequence.lisp
Logically it followed that there be a `destructuring-regex`. Note that
the route via the list is (in principle)less efficient that directly
throwing the stuff into variables, like i did in the Julia macro.(might
update the lisp version)
It is lispy simply because it is direct 'translation' of the concept
About syntax, personally i don't feel that syntax is really needed, but
i think it might be for others, and it might be needed to attract
programmers. Anyway, if syntax maps to s-expressions neatly, it doesn't
matter much. Which seems to be true for Julia afaics so far.
I have made an initial CL function list for 'inspiration', or learning
from its mistakes
https://github.com/JuliaLang/julia/wiki/common-lisp-rosetta-for-the-devs
To be honest, i dont have any Ruby or Perl experience, and i only
very shortly tried Haskell. Features of those languages might
be a better option than a macro, like some of the macros in that wiki
page might be done better with object-destructors.
I gave paragraphs titles so it is slightly less wall-of-texty.
== Syntax of regular expressions with variables ==
Basically, since it is macroexpand-time, the string would be ..not
something calculated at run-time. Hence the 'string-inserting'
notation `"$a$b"` would not be available for it. Maybe we can use it
for indicating variables to match instead. Instead of
`["skip.to.regex" stuff_between [integer "[0-9]"] grab_what_is_after]`
do
`"skip.to.regex$(stuff_between)$(integer::"[0-9]")$grab_what_is_after"
or some such. `$(i::Unsigned)` could be defined as `$(i::"[0-9]")`,
could go as far as allowing every type to have its own regex.(i am a
poor example-maker)
== Mismatch condition ==
On one hand, maybe things can get messy when you have to check if stuff
is `nothing`, on the other hand, i feel we might be putting too much
features into the macro if we try deal with it there.
The thing with the `if ... then .. end` syntax is that `then` and `end`
take two lines, making it feel a bit clunky sometimes. `.. ? .. : ..`
helps in this respect, though.(it is identical to `if`? I dont think it
is in C)
(Does Julia have any kind of `case` or `switch` yet?)Next to the
'basic' regex we could make a `regex_case`, each case having a
regex-with-variables input, and a body, the body is just executed(with
the variables available, of course!) if it matches.
The user might also be interested in partial matches though, not sure
how to allow for that. Maybe:
regex_case input_stream_or_string
case matcher
... # Complete match
case matcher
... # Complete match (optional)
case 2 # `2` is incorrect for a matcher; it indicates 'at least two
... # variables matched of the previous one.
else #Optionally, of course.(maybe `default`)
end
Now i notice that that would be similar to continuing with `else
match(..) do ...`. Anyway, there might also be a 'plain' version:
regex input_stream_or_string matcher
...body..
end
I guess Ruby blocks aren't quite the same as the macros, i'll try look
at Ruby/Perl a bit more. I do see that you could make a function:
matcher_function(regex,if_match::Function,if_not_matched:Function)
And have `if_match` catch the arguments from regex somehow, and
`if_not_matched` can just use `matcher` again. Not sure if i am
describing what Ruby does here though.
Both `matcher_function`, and `match(..) do` require the thing that is
matched with to be repeated though. We could have both
`matcher_function`, or `match(..,str) do ..` and a macro mopping it up.
How do we optimize, or keep the route to optimization free, though? If
the start of the string to match is the same, and it didnt match
before, it won't match later, for instance. What the macro could do is
make a 'tree' each node branching as each bit of regular expression is
different. Probably requires good understanding of how the regex works,
and a bit of care though, and if not taking into account, chopping up
regular expression too granularly, it might actually slow things down.
== Macros first-classy enough? ==
To be honest, i dont really like the `@macro` notation, if macros can
not simply name themselves and `end` without a `begin`, they're less
second-class citizens. If they can't the above wouldn't be able to be a
standard library. Though i guess it could just be altered to start
`@regex` or `@regex_case`, and have a slightly superfluous `begin` in
there, and the body of the `begin .. end` would behave a 'bit
strange', because we'd be looking for the variable `case` in there.
== Notes ==
Numbered (preferably local)variables are ugly, but i think they might
also be convenient. Maybe we can make them, and put them in a 'quick and
dirty' module/package/namespace. (for instance for interactive repl use)
They might als be useful for making functions passed into `map`, `anyp`
or anything else taking a function as argument. (left/right)currying
and composing can help there too, but anything more than one level of
them starts getting harder to read.
Looks like was wrong about `"$.."` notation being like a formating
function, quoting it i see `macrocall`, still i dont think non-constant
regular expressions belong in destructuring-regex so using 'that
notation in reverse' is still an idea.