Possible String enhancement?

372 views
Skip to first unread message

Jacek Furmankiewicz

unread,
Aug 21, 2009, 8:11:28 AM8/21/09
to Project Lombok
Maybe I am dreaming in technicolor here, but would it be possible with
Lombok to add support (assuming a new annotation, let's say @Rich) for

a) multiline String support with """" as the start and end, e,g.

@Rich String sql = """
SELECT *
FROM table a1, table2 a2
WHERE a1.column1 = a2.column1"""";

b) ability to embed local and instance variables in strings, like in
Ruby and most other modern scripting languages

Long someValue = 45;

@Rich String msg = "This is a message with a value of #{someValue}";

c) Raw strings, where you don't have to escape characters, e.g. for
regex expressions

@Raw String emailRegex = "^\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,3}$";

I think this would be a godsend to Java programmers everywhere :-)

Reinier Zwitserloot

unread,
Aug 22, 2009, 3:08:02 AM8/22/09
to Project Lombok
Are you dreaming? Yes, and no.

Lombok, right now, works solely on ASTs. So, in order to do that, we
must first go from raw source characters to an AST before lombok is
ever invoked. Thus, the code has to pass the parser/AST phase. There
can't be any syntax errors. And to a normal java compiler, 3 quotes is
a syntax error. Everything lombok does right now is legal java code
until you start resolving things, and then it falls apart without
lombok's transformations (ie: The compiler realizes there's no getFoo
() method anywhere, and raises a symbol not found compilation
problem).

However, if you look at the disableCheckedExceptions experiment
(another post on this newsgroup), we actually do change a core java
rule there. It's proof that changes any rule is possible. It's a LOT
of work though - you have to live-patch the class files that make up
the java compiler and add whole new bits of grammar to it.

We do, however, plan to go there someday.

The @Rich would be meaningless: What's the point? The triple quotes
are more than enough.

@Raw is similarly dumb in my opinion; they are practically speaking
only used for regexps, so, why not make a regexp literal instead? You
can get some error checking on your regexp, and optimization at
compile time, which is nice.

Lastly, it's parser/grammar wise difficult, and not very useful, to
let an annotation on a variable declaration have an effect on how to
_PARSE_ the value assigned to it. Clearly, your email regex example
ought to look like:

Pattern emailRegex = /^\w+@[a-z_]+?\.[a-z]{2,3}$/i;

As a public service to you: Ditch that regexp. It's horribly broken:

- to the left of the @, almost anything goes, a lot more than what \w
will cover. If it has at least 1 character and contains no spaces and
@ signs, take it.

- there are plenty of top-level domains with more than 3 characters.
Presuming you want to only accept email addresses routed to an
internet-wide accessible host, you can check if the thingie after the
@ has at least 1 dot in it, and after the last dot there's at least 1
character. That's where you should stop. So:

/^[^@]+@([^.]+\.)+[^.][^.]+$/

Reinier Zwitserloot

unread,
Aug 22, 2009, 3:22:24 AM8/22/09
to Project Lombok
Just in case it wasn't obvious: When I say "@Raw is dumb", I meant: I
don't think it's as good an idea as implementing whatever literal type
you'd want to use @Raw for, directly, and I also meant: An annotation
is not going to be a good fit there. I'm somewhat overly prone to
hyperbole, for which I apologize.

I also forgot to mention that lombok does not need to work via
annotations. Everything lombok does so far is done with an annotation
because, for all the things lombok does right now, annotations are a
good fit.

Jacek Furmankiewicz

unread,
Aug 22, 2009, 8:58:30 AM8/22/09
to Project Lombok
No offense taken :-)
I know the regex is broken, took the first one I found for a sample
from regexlib. :-)

Yes, regex (and maybe XML like in Scala) literals would be amazing,
but I never thought Lombok could go THAT far.
If you will one day, my hats off to you.

Reinier Zwitserloot

unread,
Aug 23, 2009, 5:14:09 AM8/23/09
to Project Lombok
Yes, I'm fairly sure we're at least going to try to get that far. The
rich string thing in particular will most likely become the 'proof of
concept' that lombok can change grammar rules. It's trivially simple,
has absolutely no hard questions and choices involved (there's just
one way to do rich text, and it should obviously involve 3 quotes),
I'm 99% sure the parser can easily be retrofitted to support it, and
it's reasonably useful.

Jacek Furmankiewicz

unread,
Aug 23, 2009, 9:12:08 PM8/23/09
to Project Lombok
If you do it, it will be a small step for programming, but a giant
step for Java :-)

Mark Proctor

unread,
Oct 4, 2013, 3:34:05 PM10/4/13
to project...@googlegroups.com
Not sure if the multiline thread is dead. But saw this the other day:

Where it uses a static method to load a String on the fly, to provided multi-line support, without doing anything too funky on the parser. The RuntimeException is too slow for runtime use cases, or when there are a large number of unit tests. But I realised that Lombok could probably replace the argument of the static method with the actual String, at compile time. So there is no runtime speed loss.

Sound feasible?

Reinier Zwitserloot

unread,
Oct 8, 2013, 8:20:16 AM10/8/13
to project...@googlegroups.com
This library requires the source to be on the classpath right next to the classfile _AT RUNTIME_ which is a highly unusual state of affairs.

But, lombok can do all this at compile time, so.....

I'm speechless. This is dynamite; you're a dastardly genius for adding up lombok and this hack by blog.efftinge.de (which is either utterly brilliant or deplorably silly. Probably both!): Yup, we can eliminate both the dependency on lombok (or any other library) and the source files. We can even do compile-time error reporting on the contents of the comments, generally with the error right on the appropriate characters, even!

There is some immediate downside. The first one is the sheer WTF factor of this: Most of lombok seems impossible to an average java coder who is not familiar with lombok, but the code is at least readable. So, while someone reading lomboked code might go: Huh, HOW?, they do know what is intended. Whereas with this, I can easily imagine a non-clued in code reader to just think something went wrong here. Also, IDEs will colour the literal in green or whatever colour is used for comments, and in general we coders are trained to look past a comment when trying to figure out how a certain snippet of code works.

But, the upside. Boy, there's a lot we can do with this. The sky is the limit, pretty much: Unless it contains backslash-u or star-slash, literally anything will go.


Before we build rome with this, let's start simple. Which features are both useful and have the property that the stuff in the comment won't be anything like java (because you won't get IDE support at all, let's stay away from putting java code in comments for now). Two obvious candidates:

* 'long' strings that can contain anything (well, except */), as per your example.
* regexp literals. Hey, look ma, no backslash escapes, and... compile-time checked for validity (which is less useful than you might think; almost any random collection of characters is a syntactically valid regexp. We're pretty much checking that all the parens and brackets are matched, that's pretty much it).

It would look like:

String longString = Lombok.S(/*This \ will be a long string
that can contain newlines and backslashes if you want*/);

With the 'S' name statically importable, and of course we can and should debate on a good name for it. It'll compile down to a plain jane string literal, there won't be a runtime dependency on either your source file or lombok.

Regexps:

Pattern p = Lombok.pattern(/*^[CD]:\([^\]+\)(.*)$*/ Pattern.CASE_INSENSITIVE);

name again debatable (either 'pattern' or 'regexp' seem like the obvious candidates). Again no deps at all.



So, thoughts, opinions? Should we do this? I'm rather enamored (and messing with comments is still fresh in our minds; we had to do some pretty fancy footwork to make 'copy javadoc to generated getters and setters' work on all platforms (javac6, javac7, and javac8 all handle comments very differently). We haven't looked at comment parsing _AT ALL_ on the eclipse side yet, but I doubt it'll be a big problem to find the comment in between the parens.

Fabrizio Giudici

unread,
Oct 8, 2013, 8:53:59 AM10/8/13
to project...@googlegroups.com, Reinier Zwitserloot
On Tue, 08 Oct 2013 14:20:16 +0200, Reinier Zwitserloot
<rein...@gmail.com> wrote:

> There is some immediate downside. The first one is the sheer WTF factor
> of
> this:

I would be really worried of seeing that. If Lombok pushes too far, as you
said it could be perceived as a hack, rather than a compiler extension and
this could limit its popularity. To me, when you do some violence to
syntax and semantics, you pushed too far - I mean, for annotations we're
just extending their scope, but an annotation in the end is still an
annotation. Here we're talking about turning a comment into a piece of
code!

--
Fabrizio Giudici - Java Architect @ Tidalwave s.a.s.
"We make Java work. Everywhere."
http://tidalwave.it/fabrizio/blog - fabrizio...@tidalwave.it

Mike Power

unread,
Oct 8, 2013, 10:15:37 AM10/8/13
to project...@googlegroups.com
On 10/08/2013 05:53 AM, Fabrizio Giudici wrote:
> On Tue, 08 Oct 2013 14:20:16 +0200, Reinier Zwitserloot
> <rein...@gmail.com> wrote:
>
>> There is some immediate downside. The first one is the sheer WTF
>> factor of
>> this:
>
> I would be really worried of seeing that. If Lombok pushes too far, as
> you said it could be perceived as a hack, rather than a compiler
> extension and this could limit its popularity. To me, when you do some
> violence to syntax and semantics, you pushed too far - I mean, for
> annotations we're just extending their scope, but an annotation in the
> end is still an annotation. Here we're talking about turning a comment
> into a piece of code!
>
Might try some sort of comment convention that softens the blow...

Javadocs use an extra *
/**
*
*/

Lombok could require an extra character or more
/*~...*/
/*~...~*/
/*"..."*/
/*{...}*/
/*[...]*/
/*%..%*/

Mark Proctor

unread,
Oct 8, 2013, 10:47:48 AM10/8/13
to project...@googlegroups.com
I would add we do have genuine use cases for this - for unit testing. We have over 2000 unit tests. Separating the data from the code, makes quick review of so many tests cumbersome. Adding the data into the code, makes it hard to read due to all the quotes. Obviously this isn't really a problem for main runtime code though.

You could allow any static method name, as long as it has an annotation on the static method to declare it was to be used this way. With S simply a convention.

I would suggest you support Object... var args. The first argument being the string. The optional 1...n arguments being interpolation variables - as per modern logging.

Mark

Mark Proctor

unread,
Oct 8, 2013, 10:53:01 AM10/8/13
to project...@googlegroups.com
+1 to an additional convention in the comment syntax. This makes it's use more explicit, and potentially IDE friendly.  If you combine that with insisting on an annotate on the static method the user provides; this adds another explicit intention of use.

You already have experimental annotations, why not add it as an optionally experimental one, and see how people like it?

Don't forget the var args, for variable interpolation :)

Mark

Fabrizio Giudici

unread,
Oct 8, 2013, 11:03:36 AM10/8/13
to project...@googlegroups.com, Mark Proctor
On Tue, 08 Oct 2013 16:53:01 +0200, Mark Proctor <mdpr...@gmail.com>
wrote:

> +1 to an additional convention in the comment syntax. This makes it's use
> more explicit, and potentially IDE friendly. If you combine that with
> insisting on an annotate on the static method the user provides; this
> adds
> another explicit intention of use.
>
> You already have experimental annotations, why not add it as an
> optionally
> experimental one, and see how people like it?
>
> Don't forget the var args, for variable interpolation :)

In fact I don't like it :-) In any case, experiments are always welcome
IMHO. Decisions can be postponed when there's something working on.

I wonder whether at this point we can discuss a generic point: whether it
makes sense to split lombok.jar in two parts, one more "conservative" and
one more "daring" (I'm not talking of experimental stuff, but also
consolidated stuff more on the edge). It's not a technical point, of
course: it's merely a political one.

Marius Kruger

unread,
Oct 8, 2013, 12:04:46 PM10/8/13
to project...@googlegroups.com
Awesome! Long strings and regexes are some of the most annoying java syntax limitations there is, especially if you have done some python stuff on the side. 

On 8 October 2013 14:20, Reinier Zwitserloot <rein...@gmail.com> wrote:

* regexp literals. Hey, look ma, no backslash escapes, and... compile-time checked for validity (which is less useful than you might think; almost any random collection of characters is a syntactically valid regexp. We're pretty much checking that all the parens and brackets are matched, that's pretty much it). 
Regexps:

Pattern p = Lombok.pattern(/*^[CD]:\([^\]+\)(.*)$*/ Pattern.CASE_INSENSITIVE);

name again debatable (either 'pattern' or 'regexp' seem like the obvious candidates). Again no deps at all.

in Python you prefix quotes with r to indicate this eg. r"\n"  so I vote for Lombok.r(/*\n*/)

* 'long' strings that can contain anything (well, except */), as per your example.

It would look like:
String longString = Lombok.S(/*This \ will be a long string
that can contain newlines and backslashes if you want*/);
With the 'S' name statically importable, and of course we can and should debate on a good name for it. It'll compile down to a plain jane string literal, there won't be a runtime dependency on either your source file or lombok.
 
if regex is .r I vote for .s for strings..

--
✝ Marius

Reinier Zwitserloot

unread,
Oct 8, 2013, 1:46:50 PM10/8/13
to project-lombok

Python's r"" is for raw strings, not regexps. Therefore, r is out because it is confusing and not descriptive. It's pattern, or regex, or regexp. Those are the only 3 choices.

--
You received this message because you are subscribed to the Google Groups "Project Lombok" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-lombo...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Martin Grajcar

unread,
Oct 9, 2013, 7:44:15 AM10/9/13
to project...@googlegroups.com
On Tue, Oct 8, 2013 at 4:15 PM, Mike Power <mpower...@alumni.calpoly.edu> wrote:
Might try some sort of comment convention that softens the blow...

Javadocs use an extra *
/**
 *
 */

Lombok could require an extra character or more
/*~...*/
/*~...~*/
/*"..."*/
/*{...}*/
/*[...]*/
/*%..%*/

For Strings, shouldn't it be /*"..."*/?

I wonder what to do with indentation... how would you interpret this:

String longString = Lombok.S(/*This is a multiline
                              string with
          a strange
       indentation.*/);

or this

String longString2 = Lombok.S(/*#Indentation example:
*for i in range(100):
*    if isPrime(i):
*        print i;
 *print "done";*/);

I'd go for the star-prefixed syntax as this is what eclipse does and it provides a clear syntax for indented strings.

Concerning trailing spaces... I'd say they should be always stripped, right?

Python has normal and raw string literals, shouldn't lombok have them too? The difference is the handling of backslashes. Sometimes you need to include some strange Unicode chars, sometimes you don't and then not interpreting backslashes is better.


Reinier Zwitserloot

unread,
Oct 9, 2013, 8:52:12 AM10/9/13
to project-lombok
I'd say this is primarily useful for raw strings only, especially because backslash escapes are used primarily to escape newlines, which aren't needed for long strings, and quotes, which again aren't needed, and tabs, which are never needed anyway, and that's pretty much it. backslash-u escapes will continue to work (as these apply everywhere, even in comments and outside of string literals).

I like /*". We can NOT just replace all /*""*/ with strings, it HAS to be wrapped in a method call, but we can demand that its formatted that way (generate an error if it's not formatted that way). So, it would look like S(/*"string goes here"*/) – that's would be a 'long string literal' in lombok.

For indents, I'm tempted to just say that all spacing is preserved, but I can see how that results in impractical formatting of source files. I'd say trailing spaces are always preserved (if you didn't want them, then don't put them in the source!), but leading spacing is a different matter; often those are actually indents.

How about this:

IF the 'string literal comment' is the first non-whitespace character on its line, THEN we take the spacing that precedes it and consider that the prefix of all further lines. That is, if further lines start with the same characters, we strip those. We don't strip anything else.

This still results in an annoying 3 character disjoint (the string literal starts with /*" but further lines won't). So, let's add another rule: _IF_, after stripping at least 1 character of whitespace due to the above rule, the remaining string starts with at least 3 spaces (and only spaces will do, not tabs), those are stripped.

This has the following properties:

* If you want to long-string something where whitespace is really really important, you can do that.

* You can indent your files with tabs. Or spaces. Or both. But if you want to keep the content exactly aligned, you MUST skip past the /*" of the opening line with spaces, not tabs. That might mean you indent with tabs, then write 3 spaces, then start the continuation of the long string.

* If your long string literal does not start on its own line, indentation is wonky anyway, so in this case we simply don't treat any spacing as indents; It's ALL preserved. An alternative would be to look at the indent of the line itself, but often a continued line is itself indented more than its parent line, so how do we know how much more? That's why I propose we don't try to guess which leading whitespace is indent and which is significant unless the long string comment itself starts the line. So, your code would look like:

System.out.println(S(
    /*" This is a really long line that is continued
        here. This line has 1 leading preserved space. 
       but this line doesn't."*/"));

Look at the above snippet in a monospace font or it won't make sense.

* You cannot put non-u backslash escapes in this thing. It is impossible to escape */ (you can't even escape this with a backslash-u escape!).

* You cannot reverse-escape a newline. I can imagine that you'd want to start a new line for the sake of source code formatting, but you do NOT want that newline to be in the string literal that lombok constructs for you. However, while that's nice, this proposal does not allow you to do that. A newline in your /*""*/ comment will end up as a \n in your string literal. Also, we'll always make that \n, not \r\n, even if your source file is formatted with dos-style line endings.

* We COULD allow escapable strings, possibly by using a different symbol or a prefix (/*\""*/ for example, or single quotes instead of double quotes). In these, you can escape as usual with \t, \n, etcetera, and you can also end a line in a backslash (backslash, then hit enter) to have newlines in your source that don't result in \n in your longstring.

* While we use the same mechanism to process longstrings and regexp literals, the rules are completely different. For example, I'd say in a regexp literal, ALL leading and trailing spaces are ALWAYS ignored. Possibly it's not even legal to multiline it in the first place. Certainly the comment's boundary symbols will not be /*" and "*/ but something else. Possibly just /* */ for regexps, possibly /*/ and /*/.


 --Reinier Zwitserloot


Martin Grajcar

unread,
Oct 11, 2013, 1:23:35 AM10/11/13
to project...@googlegroups.com
On Wed, Oct 9, 2013 at 2:52 PM, Reinier Zwitserloot <rei...@zwitserloot.com> wrote:
I'd say this is primarily useful for raw strings only, especially because backslash escapes are used primarily to escape newlines, which aren't needed for long strings, and quotes, which again aren't needed, and tabs, which are never needed anyway, and that's pretty much it. backslash-u escapes will continue to work (as these apply everywhere, even in comments and outside of string literals).

I like /*". We can NOT just replace all /*""*/ with strings, it HAS to be wrapped in a method call, but we can demand that its formatted that way (generate an error if it's not formatted that way). So, it would look like S(/*"string goes here"*/) – that's would be a 'long string literal' in lombok.

For indents, I'm tempted to just say that all spacing is preserved, but I can see how that results in impractical formatting of source files. I'd say trailing spaces are always preserved (if you didn't want them, then don't put them in the source!),

For me it's fine, as I let Eclipse remove the trailing garbage. But for people not doing this, it may lead to surprises, so maybe a rule like "what I can't see doesn't exist" would be better.
 
but leading spacing is a different matter; often those are actually indents.

How about this:

IF the 'string literal comment' is the first non-whitespace character on its line, THEN we take the spacing that precedes it and consider that the prefix of all further lines. That is, if further lines start with the same characters, we strip those. We don't strip anything else.

This still results in an annoying 3 character disjoint (the string literal starts with /*" but further lines won't). So, let's add another rule: _IF_, after stripping at least 1 character of whitespace due to the above rule, the remaining string starts with at least 3 spaces (and only spaces will do, not tabs), those are stripped.

This has the following properties:

Property 0: It looks very complicated. After the third reading it's pretty clear and logical, but...
 
* If you want to long-string something where whitespace is really really important, you can do that.

* You can indent your files with tabs. Or spaces. Or both. But if you want to keep the content exactly aligned, you MUST skip past the /*" of the opening line with spaces, not tabs. That might mean you indent with tabs, then write 3 spaces, then start the continuation of the long string.

* If your long string literal does not start on its own line, indentation is wonky anyway, so in this case we simply don't treat any spacing as indents; It's ALL preserved. An alternative would be to look at the indent of the line itself, but often a continued line is itself indented more than its parent line, so how do we know how much more? That's why I propose we don't try to guess which leading whitespace is indent and which is significant unless the long string comment itself starts the line. So, your code would look like:

System.out.println(S(
    /*" This is a really long line that is continued
        here. This line has 1 leading preserved space. 
       but this line doesn't."*/"));

What about this:

System.out.println(S(/*" A long string started with the slash-star-quotation_mark sequence
         preserves
            no
   indentation at all."*/"));

System.out.println(S(/** A long string started with the slash-star-star sequence
   *  - requires that each line starts with a star
   *  - and honors the leading spaces after the star**/"));

Just an idea.

Look at the above snippet in a monospace font or it won't make sense.

* You cannot put non-u backslash escapes in this thing. It is impossible to escape */ (you can't even escape this with a backslash-u escape!).

* You cannot reverse-escape a newline. I can imagine that you'd want to start a new line for the sake of source code formatting, but you do NOT want that newline to be in the string literal that lombok constructs for you. However, while that's nice, this proposal does not allow you to do that. A newline in your /*""*/ comment will end up as a \n in your string literal. Also, we'll always make that \n, not \r\n, even if your source file is formatted with dos-style line endings.

* We COULD allow escapable strings, possibly by using a different symbol or a prefix (/*\""*/ for example, or single quotes instead of double quotes). In these, you can escape as usual with \t, \n, etcetera, and you can also end a line in a backslash (backslash, then hit enter) to have newlines in your source that don't result in \n in your longstring.

Concerning the quotes vs. apostrophes, shell and perl do it in a sort of opposite way: '$a' does not get interpolated, while "$a" does.
 
* While we use the same mechanism to process longstrings and regexp literals, the rules are completely different. For example, I'd say in a regexp literal, ALL leading and trailing spaces are ALWAYS ignored. Possibly it's not even legal to multiline it in the first place.

This both makes sense.
 
Certainly the comment's boundary symbols will not be /*" and "*/ but something else. Possibly just /* */ for regexps, possibly /*/ and /*/.

Not interpreting backslashes inside regexes will be a huge win for readability.
 

Reinier Zwitserloot

unread,
Oct 11, 2013, 10:58:10 PM10/11/13
to project-lombok
The problem with ignoring all leading and trailing whitespace is that there are tons of things you can't write then. For example, if I write:
String commandLineHelp = S(/*"Yoyodyne tool v1.0
-------------------------------------------
    foo      - bargle bar bar barglebar
    bar      - hmmmmmmm
*/");

What if the indents for the 'foo' and bar' lines are intentional? How would I tell lombok this? I don't really like forcing the use of leading stars as a faux indicator of whitespace, for two reasons:

* This too is complicated; the actual leading indicator is * and a space. So now we need to make up rules for when those aren't there. The best rule is probably: That is an error; if you decide to use leading stars, it has to be done right.

* You now can NOT paste in any random string (and copy it into the clipboard from your source file), and have it preserved 100% in both directions, which is part of the charm of these things, I'd say.

The indent rules seem complicated but they aren't. Basically, the rule goes: If you want to indent your string literals, put the entire literal, including the start of it, on its own line, and indent as normal.


We have to freedom to make up an infinite amount of syntaxis for this; one for raw strings, one for escape-interpreted strings, one where indents are removed, one where they aren't, and so on and so forth. But we already have a huge delimiter of 5 characters prefix (capital-S, open paren, slash, star, quote) and 4 characters suffix (quote, star, slash, close paren), so if we would do that we're almost forced into using symbolics instead of english words, at which point it just becomes a punch of complex cartoon swearing. We have to only make a few options. I don't like the stars because they make it look like javadoc content or an actual comment, and not a string literal.


 --Reinier Zwitserloot


--

Martin Grajcar

unread,
Oct 12, 2013, 6:52:05 AM10/12/13
to project...@googlegroups.com
On Sat, Oct 12, 2013 at 4:58 AM, Reinier Zwitserloot <rei...@zwitserloot.com> wrote:
The problem with ignoring all leading and trailing whitespace is that there are tons of things you can't write then. For example, if I write:
String commandLineHelp = S(/*"Yoyodyne tool v1.0
-------------------------------------------
    foo      - bargle bar bar barglebar
    bar      - hmmmmmmm
*/");

What if the indents for the 'foo' and bar' lines are intentional? How would I tell lombok this? I don't really like forcing the use of leading stars as a faux indicator of whitespace, for two reasons:

* This too is complicated; the actual leading indicator is * and a space. So now we need to make up rules for when those aren't there. The best rule is probably: That is an error; if you decide to use leading stars, it has to be done right.

Fully agreed; a compile time error is surely better than any forgiving solution here.
 
* You now can NOT paste in any random string (and copy it into the clipboard from your source file), and have it preserved 100% in both directions, which is part of the charm of these things, I'd say.

You're right. But maybe there should be an indentation-preserving long string syntax for cases like this and an indentation-eating one? I can see that this could lead to too many syntaxes, which would be a problem. Maybe just chose the best one for the beginning and look what requests come in.
 
The indent rules seem complicated but they aren't. Basically, the rule goes: If you want to indent your string literals, put the entire literal, including the start of it, on its own line, and indent as normal.

But this is not what you did in your above example. Or am I missing something?

We have to freedom to make up an infinite amount of syntaxis for this; one for raw strings, one for escape-interpreted strings, one where indents are removed, one where they aren't, and so on and so forth. But we already have a huge delimiter of 5 characters prefix (capital-S, open paren, slash, star, quote) and 4 characters suffix (quote, star, slash, close paren), so if we would do that we're almost forced into using symbolics instead of english words, at which point it just becomes a punch of complex cartoon swearing.

That's a good point. Too many features could surely kill it. In theory, I could imagine something like

lombok.S(/*riuxyz" some fancy multiline text"*/)

with "r" meaning raw, "i" meaning interpolate, "u" meaning unindent, and the rest meaning something, but this perl-modifier-like syntax would be surely used as an argument that lombok is too hacky.

A simple rule could go like this: If a character should get a special meaning, put it before the leading quote, so /*\$" would give special meaning to backslash and dollar, thus switching on escaping and interpolation:

lombok.S(/*$\" The value of foo is $foo.
/*This is an embedded comment as escaped slash stands for itself.*\/
These are my \$0.02 (my uninterpreted two cents).
There are always exactly the three trailing chars: "*\/
"*/)
 
I sort of like it.

We have to only make a few options. I don't like the stars because they make it look like javadoc content or an actual comment, and not a string literal.

My reason for the stars was that eclipse inserts them after each newline, so I wanted to use them instead of fighting them. For copy-pasted strings the situation is different.

* While we use the same mechanism to process longstrings and regexp literals, the rules are completely different. For example, I'd say in a regexp literal, ALL leading and trailing spaces are ALWAYS ignored. Possibly it's not even legal to multiline it in the first place.
 
 This both makes sense.
 
I must disagree with myself: Multiline regexes would be great whenever you write something long. Obviously, indentation makes no sense here. This example matches any IP and you really don't want to pack it in one line:

Pattern IP_PATTERN = lombok.P(/*"
  \b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
    (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
    (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
    (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
"*/);

Mike Power

unread,
Oct 12, 2013, 12:09:20 PM10/12/13
to project...@googlegroups.com
--
You received this message because you are subscribed to the Google Groups "Project Lombok" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-lombo...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Getting lost, what are the problems we are trying to solve?  I can think of three from my reading.

1) I got a really long string and I want to brake it across lines without actually putting line breaks and indention whitespace in my string
2) I got a formatted block of text that I want in a string and I want it to stay formatted.
3) I want to put a regex in java, and not have to escape random characters to make it work
4) ... the things I missed.
4) I thought of another but I do not want to derail the current conversation

Reinier Zwitserloot

unread,
Oct 13, 2013, 6:52:48 PM10/13/13
to project-lombok
On Sat, Oct 12, 2013 at 6:09 PM, Mike Power <dodt...@gmail.com> wrote:
On 10/12/2013 03:52 AM, Martin Grajcar wrote:
But this is not what you did in your above example. Or am I missing something?

It wasn't. Here's an example of solving the indentation problem by looking at indentation of the first comment line:

System.out.println(S(
        /*"Hello, World!
           --------------------
           This line won't be indented.
               But this line will be indented with 4 spaces.*/"));
 
A simple rule could go like this: If a character should get a special meaning, put it before the leading quote, so /*\$" would give special meaning to backslash and dollar, thus switching on escaping and interpolation:

That's presuming we'd want to do interpolation and I'm not so sure on that either; the 'java' way of doing this is String.format/printf. I'm particularly concerned with the lack of syntax highlighting, refactor support, find references, and auto-complete for these. I really, _REALLY_ doubt the convenience of interpolation is worth the loss of all of that, so we can keep interpolation out of it for now. That won't happen anytime soon.
 
My reason for the stars was that eclipse inserts them after each newline, so I wanted to use them instead of fighting them. For copy-pasted strings the situation is different.

Eclipse only does that if you start a comment on its own line. It also puts the closing marker on its own line, which has all sorts of issues associated with it. If we go with the model of 'we make up convoluted rules so that it works with minimal hassle in eclipse', we also have to come up with rules for how closing the string works, so that the newline at the end is disregarded as an eclipse-ism. I guess we have a choice to make: Come up with a ruleset as long as my leg that caters perfectly to eclipse's defaults, or, keep it as simple as possible and accepting that newlines and indents and such will be easy to understand but result in messy-looking code that is hard to type in at least eclipse. Also, the stars suggest it's a comment, when it's not.

Python has a simpler rule for this: Whitespace isn't ignored. At all. It is not possible to indent continued longstrings. I'm tempted to roll with that.


Mark Proctor

unread,
Jan 2, 2014, 11:47:52 AM1/2/14
to project...@googlegroups.com
Was there any progress on this?

Mark

Reinier Zwitserloot

unread,
Jan 3, 2014, 6:12:07 AM1/3/14
to project-lombok
It's in the queue; we're first going to look at the netbeans issues (Jan Lahoda figured out what's wrong there), then we're going to finish the settings feature (which, amongst other things, will let you just blanket-apply a bunch of default extensionmethods to everything you do). Especially the settings stuff is relatively heavy.

 --Reinier Zwitserloot


Reply all
Reply to author
Forward
0 new messages