Propose a modification to the parser to support @ as a 'infix' operator for macros

362 views
Skip to first unread message

Michael Francis

unread,
Jan 30, 2015, 3:54:31 PM1/30/15
to juli...@googlegroups.com
I'm working on a project which uses Julia as an advanced DSL for manipulating financial data. 

The proposal is to apply the string on the right of the @ operator to the macro identified on the left. The element to the right of the @ operator would be parsed as an un-escaped string up to the first white space. 

For example 

foo@myvalue is equivalent to @foo "myvalue"

view this as equivalent to the non-standard abstract string literal form foo"myvalue"

The motivator for this is to support a terse natural DSL within Julia,

EQUITY@MSFT # Fetch a specify equity by ticker

an additional examples might be 

BOND@T,4.75,2041  # Fetch a specific bond

BOND@T,*,2020-2030 # Fetch all treasury bonds maturing in the given year range

I'm happy to have a look at the parser, though my scheme is not as good as my lex/yacc,


Michael 


Milan Bouchet-Valat

unread,
Jan 30, 2015, 4:23:45 PM1/30/15
to juli...@googlegroups.com
I'm not sure I understand the advantages of the syntax you propose.
@BOND T,4.75,2041 or BOND"T,4.75,2041" look as readable as
BOND@T,4.75,2041 to me -- or maybe even more readable in a Julian
context.


Regards

Stefan Karpinski

unread,
Jan 30, 2015, 5:38:23 PM1/30/15
to juli...@googlegroups.com
Have to say I agree with Milan.

Michael Francis

unread,
Jan 30, 2015, 5:47:03 PM1/30/15
to juli...@googlegroups.com
I agree there is no intrinsic difference between the following expressions 

BOND"<string>" and BOND@<string>

where as @BOND <expr> will be interpreted in a variety of ways by the parser, some of which will not be valid Julia expressions, it also makes for a complex task to recombine the AST into the callers intent. Consider a cusip 012810QN1, Julia will treat this as 12810 * QN1, @BOND T,4.75,2041 passes a three element tuple to the macro, in the range case the macro gets passed a Julia expression including a '-' node etc.  

I personally find FOO"<string>" harder to read and limits the ability of Julia to be used where you want a terse DSL for a group of end users who tend to run from syntax. Hence the motivation. 

Jason Merrill

unread,
Jan 30, 2015, 6:21:27 PM1/30/15
to juli...@googlegroups.com
Jeff proposed a very similar thing for special number literals on julia-users a while back:

https://groups.google.com/d/msg/julia-users/wzlALj2LGus/GRSjwEDOT0YJ

Jeff's proposal was that x@digits could be similar to x"digits" without the need for closing quotes, but extending it to x@nonwhitespace being equivalent to x"nonwhitespace" makes a lot of sense to me.

Stefan Karpinski

unread,
Jan 30, 2015, 7:09:58 PM1/30/15
to juli...@googlegroups.com
I guess this is possible but I have some reservations. It's weird to have just random stuff allowed after the @ up to whitespace; feels kind of off somehow. I hate to have multiple syntaxes for basically the same thing. The custom string syntax is already a pretty close alternative to a macro invocation with a string argument. The best thing about it is that the string isn't escaped, allowing nice writing of regrexes, for example. This syntax seems soooo close to the custom string syntax but with even less difference. Being able to use it for custom number syntaxes is a little compelling but it doesn't seem like enough.

Jason Merrill

unread,
Jan 30, 2015, 7:17:45 PM1/30/15
to juli...@googlegroups.com
I'm happy with the existing non-standard string literals, but tastes vary, and I can recognize the argument that having to close the quotes, especially in the case of very short literals, makes the syntax heavier than it absolutely must be.

Michael Francis

unread,
Jan 30, 2015, 10:51:00 PM1/30/15
to juli...@googlegroups.com
Consider an application which connects via http. I can imagine us using

server = URL@http://www.google.com
data = fetch(server, "page1235")

The URL macro returns a type which encapsulates the connection to the given web server. It takes the hit of connection once at compile time.

I'd also draw a parallel between this numeric formats (as others have done) for example

0x123abc123

Is implicity the same as

@hex "123abc123"

(To be clear I am not proposing that hex numbers work this way. )

But we do treat a restricted set of non-numeric values after 0x as number. We then promote them.


Stefan Karpinski

unread,
Feb 3, 2015, 6:52:53 PM2/3/15
to Julia Dev
I've been thinking about and I think this idea has some problems. Consider this, for example:

log(BOND@T,4.75,2041)

How would this be parsed? The whitespace rule would dictate that the close parens is part of what gets passed to the BOND macro, which is almost certainly not what the user intended.

I keep coming back to this: if you really need a DSL for users who aren't programmers, then you probably want a custom parser and a language that is simpler than a turing complete programming language.

Michael Francis

unread,
Feb 5, 2015, 8:34:12 AM2/5/15
to juli...@googlegroups.com
That example is certainly one where it would generate surprise and perhaps we shouldn't do it. 

Michael Francis

unread,
Feb 5, 2015, 9:18:30 AM2/5/15
to juli...@googlegroups.com
I could see a production rule which terminates at white space or close of a list (you could also feedback parser state to the lexer, but that gets ugly quickly). This would allow 

a[FOO@BAR] 
a{FOO@BAR}
a(FOO@BAR) 

to work as expected. 

The challenge would be ( for the example I gave ) the use of traditional infix operators such as +,-,comma, semi-colon, colon etc. perhaps a subset, those that support list composition not be allowed, that would be comma, semi-colon and ...  these would terminate the symbol. 

so 

[FOO@BAR,WOW] is a two element list of a macro expanded literal and a symbol WOW

[FOO@BAR~WOW] is a one element list of a macro expanded literal which is a little more confusing

As a concrete example REUTERS@O#.JULIA might represent the chain of options on the company Julia, published by Reuters. REUTERS@JULIA.N might be the listing on the NYSE of the underlying stock. 

perhaps though this added complexity makes it not worth doing. 

Stefan Karpinski

unread,
Feb 5, 2015, 9:38:00 AM2/5/15
to Julia Dev
Even if you terminate on closing parens (bracket, braces, etc.), you can encounter things like this:

log(FOO@bar(1),baz(2))

This would then parse as

log( @FOO "bar(1" ), baz(2)

with an extra trailing close parens causing a syntax error. In order to address such issues fully, you need this syntax to parse all balanced pairs, and even then, you can construct examples where it does something surprising. The fundamental issue is that it mixes structured text (code) with unstructured text. The FOO"bar(1),baz(2)" syntax avoids this since the double quotes tell the parser precisely what text is unstructured.

Michael Francis

unread,
Feb 5, 2015, 10:17:09 AM2/5/15
to juli...@googlegroups.com
I agree, it gets complex real fast. So while it would be nice to have I can see why this is not something that would add much. A more general solution would be to allow plug-able sub parser. But for now lets close this. 

Stefan Karpinski

unread,
Feb 5, 2015, 10:58:31 AM2/5/15
to Julia Dev
Pluggable subparsing is unfortunately an unsolved research problem:

Simon Byrne

unread,
Feb 6, 2015, 7:14:39 AM2/6/15
to juli...@googlegroups.com
I do think a "number literal" operator would be incredibly useful, as it would solve the "BigFloat(0.1)" problem (as well as allowing for a convenient decimal syntax): I think it would make sense to specify type as postfix, e.g.

1.23@BigFloat
1.23@Decimal64
12@Int16

etc.

-Simon

Mike Innes

unread,
Feb 6, 2015, 7:29:18 AM2/6/15
to juli...@googlegroups.com
Is that so much nicer than big"1.23"?

You'll only have a problem when converting to higher precision, not lower, right?

Simon Byrne

unread,
Feb 6, 2015, 7:42:50 AM2/6/15
to juli...@googlegroups.com
Is that so much nicer than big"1.23"?

One reason I liked it as a postfix is that is kind of a stronger form of the type assertion operator (::)
 
You'll only have a problem when converting to higher precision, not lower, right?

Not necessarily: you can get double-rounding problems when converting to lower precision, or if you're using a different base (as for a decimal type).
Reply all
Reply to author
Forward
0 new messages