use of $ as xor versus syntactic sugar for DataFrames

943 views
Skip to first unread message

David van Leeuwen

unread,
Apr 19, 2015, 5:25:28 PM4/19/15
to juli...@googlegroups.com
Hello, 

I've seen this discussed before, but wasn't able to find search terms to continue the discussion...

I'm wondering what the actual usage of `$` as bitwise exclusive or in practice is (in current packages and other code), and whether it is possible to change this to `^` (which is is currently defined as `x ^ y = x | !y`, which, to me, appears to be the logic for "y implies x"). 

It would be really cool if x$y (without spaces) could be syntactic for x[:y] which would be useful in DataFrames, but has also use cases in Dicts. 

Cheers, 

---david

Milan Bouchet-Valat

unread,
Apr 20, 2015, 3:34:18 AM4/20/15
to juli...@googlegroups.com
Le dimanche 19 avril 2015 à 14:25 -0700, David van Leeuwen a écrit :
> Hello,
>
>
> I've seen this discussed before, but wasn't able to find search terms
> to continue the discussion...
You're probably looking for this:
https://github.com/JuliaLang/julia/issues/1974


Regards

Jeff Bezanson

unread,
Apr 21, 2015, 10:14:43 AM4/21/15
to juli...@googlegroups.com
I actually strongly agree with this. As ASCII characters have become
more precious, bitwise operators don't seem like a good use of them.
We could gain *three* more ASCII operators.

Stefan Karpinski

unread,
Apr 21, 2015, 10:17:07 AM4/21/15
to juli...@googlegroups.com
The trouble is that using & and | to mean much besides what they currently mean is likely to be very confusing to people coming from many languages.

Marcus Appelros

unread,
Apr 21, 2015, 11:22:05 AM4/21/15
to juli...@googlegroups.com
+1 for not having confusing shortcuts

Scott Jones

unread,
Apr 21, 2015, 11:45:04 AM4/21/15
to juli...@googlegroups.com
Amen to that!
I just wish that using * for string concatenation had been avoided...
It just is confusing for people coming from any other language that I know of (I would have expected *, if used on strings at
all, to be a “repeat n times” operator, i.e. string * n gives my a new string with n copies..., or character * n might give me a string a length n all of that character.
I would like to have something like Lua’s convention of .., and also have it as syntactic sugar for the vcat operation.
It avoids confusion where * on an Array{UInt8} means multiply each element by n...
+ also has that problem, so I wouldn’t like either the suggestion of using + (like JavaScript or Python) for string concatenation...

Scott

Patrick O'Leary

unread,
Apr 21, 2015, 1:47:52 PM4/21/15
to juli...@googlegroups.com
On Tuesday, April 21, 2015 at 10:45:04 AM UTC-5, Scott Jones wrote:
Amen to that!
I just wish that using * for string concatenation had been avoided...

It wouldn't free up any characters, though, to not use it, so it's somewhat different than the rest of this discussion.
 
It just is confusing for people coming from any other language that I know of (I would have expected *, if used on strings at
all, to be a “repeat n times” operator, i.e. string * n gives my a new string with n copies..., or character * n might give me a string a length n all of that character.

That is more typical. There have been *long* prior arguments/discussions about this, which you can search the list for. I might be the sole remaining defender of Julia's decision here, but there are a couple things about it which I personally like. In particular, addition (+) is typically commutative, but string concatenation is not. Multiplication is in general not commutative, so there is some sense in selecting that operator. If you need it, repeated concatenation is the exponentiation operator, so the same logic to the more typical +/* progression applies.
 
I would like to have something like Lua’s convention of .., and also have it as syntactic sugar for the vcat operation.

It took me several readings to realize that the operator you were using wasn't the ellipsis token "..."; but yeah, concatenations are hard, and there's a long discussion of concatenation and related topics at https://github.com/JuliaLang/julia/issues/7128.
 
It avoids confusion where * on an Array{UInt8} means multiply each element by n...
+ also has that problem, so I wouldn’t like either the suggestion of using + (like JavaScript or Python) for string concatenation...

We tried being super-disciplined about using dotted operators to denote elementwise operations even when one argument was a scalar.

The experiment: https://github.com/JuliaLang/julia/issues/5807
The outcome: https://github.com/JuliaLang/julia/issues/6417

Thanks for all the feedback; I hope you find some of those discussions informative!

Patrick

Scott Jones

unread,
Apr 21, 2015, 5:47:37 PM4/21/15
to juli...@googlegroups.com


On Tuesday, April 21, 2015 at 1:47:52 PM UTC-4, Patrick O'Leary wrote:
On Tuesday, April 21, 2015 at 10:45:04 AM UTC-5, Scott Jones wrote:
Amen to that!
I just wish that using * for string concatenation had been avoided...

It wouldn't free up any characters, though, to not use it, so it's somewhat different than the rest of this discussion.
 
It just is confusing for people coming from any other language that I know of (I would have expected *, if used on strings at
all, to be a “repeat n times” operator, i.e. string * n gives my a new string with n copies..., or character * n might give me a string a length n all of that character.

That is more typical. There have been *long* prior arguments/discussions about this, which you can search the list for. I might be the sole remaining defender of Julia's decision here, but there are a couple things about it which I personally like. In particular, addition (+) is typically commutative, but string concatenation is not. Multiplication is in general not commutative, so there is some sense in selecting that operator. If you need it, repeated concatenation is the exponentiation operator, so the same logic to the more typical +/* progression applies.

I always thought multiplication *was* commutative, at least that was what I was taught at school, and it seems all the examples on Google searching "commutative operations"
mention addition and multiplication as being commutative.
I suppose there is some rarified realm popular with Julia's designers where that is not true... but, it is NOT what most anybody (including people who went to MIT) would expect!

I would like to have something like Lua’s convention of .., and also have it as syntactic sugar for the vcat operation.

It took me several readings to realize that the operator you were using wasn't the ellipsis token "..."; but yeah, concatenations are hard, and there's a long discussion of concatenation and related topics at https://github.com/JuliaLang/julia/issues/7128.
 
It avoids confusion where * on an Array{UInt8} means multiply each element by n...
+ also has that problem, so I wouldn’t like either the suggestion of using + (like JavaScript or Python) for string concatenation...

I spent 29 years extending and improving a language where string + string meant take the numeric value of the strings, and add them (same for all numeric operations), so of course there
was a separate operator for concatenation, which was _ (so no names with _ :-( ).
Of course, people coming from Python, JS, Java were confused at first...
I've used a lot of different operators for concatenation, + (JS, Java, Python), & (Basic), _ (M,CacheObjectScript), .. (Lua), * (Julia), || (CLU), <> (Elixir) and others,
but the only one of those that doesn't overload another op in some confusible way is Lua's .. operator.

It is hard to pick *something* that doesn't end up causing problems elsewhere, but I thought that .. was the lesser of the evils...
Of course, one thing I tried made me love Julia even more (warts and all ;-) ) was the ability to simply do this:

..(a::AbstractString,b::AbstractString) = a * b

The discussing about array concatenation in issue 7128 is very interesting... personally, I found it *very* confusing that ; meant something special within [ ], and that spaces are also significant... (something I *really* hated [as the compiler guru] in the language I maintained previously [wasn't my choice, the language in question was designed by committee, committees of doctors, not software engineers, it actually was the third language to become an ANSI standard (after Fortran and Cobol!)])

I do hope that having spaces be significant goes away... I'd hoped never to have to deal with that again!

We tried being super-disciplined about using dotted operators to denote elementwise operations even when one argument was a scalar.

The experiment: https://github.com/JuliaLang/julia/issues/5807
The outcome: https://github.com/JuliaLang/julia/issues/6417

Thanks for all the feedback; I hope you find some of those discussions informative!

Yes - it's good to look in on the design discussions!

I hope that it is not *too* utterly late to fix up some of the issues in the language... (like issues of concatenation between string and Char, having mutable strings in addition to
immutable ones [I see there is a package for that, I don't know how complete it is, or how well it performs, but I think it should be part of Base...])
I think Julia could be great for a lot of other things besides numerical computing (string processing, financial/medical applications [where decimal floating point becomes important],
dealing with databases, etc.), but I think a little bit of work still needs to be done...


Patrick

Patrick O'Leary

unread,
Apr 21, 2015, 6:57:37 PM4/21/15
to juli...@googlegroups.com
On Tuesday, April 21, 2015 at 4:47:37 PM UTC-5, Scott Jones wrote:
I always thought multiplication *was* commutative, at least that was what I was taught at school, and it seems all the examples on Google searching "commutative operations"
mention addition and multiplication as being commutative.
I suppose there is some rarified realm popular with Julia's designers where that is not true... but, it is NOT what most anybody (including people who went to MIT) would expect!

One such "rarified realm," as you put it, is linear algebra, which is an important use case for us. While we hope Julia to be suitable as a general-purpose language, a key part of its attraction for me is that it does math--in particular, linear algebra--well. And doing mathy things (quickly!) is very much a design goal; math-like objects exploit multimethods well. A quick look at our standard dependencies (BLAS? FFTW? SuiteSparse?) I think does a good job of showing where most of the developers' heads are at.

That sometimes means that we hit trades most other languages don't spend too much time on. An ongoing example of such a discussion regards "vectorizing" common functions--which is a common feature in the technical languages from which Julia draws a lot of inspiration. But unlike some of those languages, Julia distinguishes between array-typed things and scalar-typed things, and as the language (and its ecosystem) grows, needing to provide two implementations of every single-argument function starts to get repetitive, and there's a strong push towards requiring the use of the map(f, A) construct. This doesn't sit well with everyone.

Likewise string concatenation, matrix construction (the space sensitivity doesn't bother me as a MATLAB user whose needs primarily fall in the 2-or-fewer-array-dimensions cases, but what "," vs. ";" will do tends to confuse me), etc.

The ".." operator may get taken for a particular meaning related to getfield overloading (https://github.com/JuliaLang/julia/pull/5848), so that's something to watch out for.

Base is unlikely to get much bigger; it will probably get smaller over time in part to better support general-purpose programming without the baggage all this silly math carries with it. The plan is something along the lines of preparing a "standard distribution", and you can see a lot of packages currently owned by JuliaLang on GitHub which will likely become a part of that when it comes together. That way you can still get your batteries, but can remove batteries you don't need.

For Char*String/String*Char, see https://github.com/JuliaLang/julia/issues/1771 (though that devolves into concatenation operator discussion, rather unfortunately.) There's probably a good argument for trying to add those again...

One nice thing is that it's really not troublesome to take on, say, binary-coded decimal types as an external package. The same machinery (method specializations, `convert()`, and `promote_type()`) that powers the standard Julia numeric type hierarchy works just fine outside of Base.

Hoping this doesn't come off as a rant (I promise it's not!),

Patrick

Scott Jones

unread,
Apr 21, 2015, 8:49:24 PM4/21/15
to juli...@googlegroups.com
On Tuesday, April 21, 2015 at 6:57:37 PM UTC-4, Patrick O'Leary wrote:
On Tuesday, April 21, 2015 at 4:47:37 PM UTC-5, Scott Jones wrote:
I always thought multiplication *was* commutative, at least that was what I was taught at school, and it seems all the examples on Google searching "commutative operations"
mention addition and multiplication as being commutative.
I suppose there is some rarified realm popular with Julia's designers where that is not true... but, it is NOT what most anybody (including people who went to MIT) would expect! 
One such "rarified realm," as you put it, is linear algebra, which is an important use case for us. While we hope Julia to be suitable as a general-purpose language, a key part of its attraction for me is that it does math--in particular, linear algebra--well. And doing mathy things (quickly!) is very much a design goal; math-like objects exploit multimethods well. A quick look at our standard dependencies (BLAS? FFTW? SuiteSparse?) I think does a good job of showing where most of the developers' heads are at.

Well, after I posted that I remembered something about cases where multiplication wasn't commutative, from 18.06 (Linear Algebra at MIT) with Gilbert Strang, however,
I think the designers of Julia should think a bit more about what would make sense to the general programmer population, especially when it comes to something that *isn't*
a mathematical operation... (at least, if they'd like Julia to become more generally accepted outside of a niche, and without a lot of programmers throwing virtual tomatoes at the designers for some non-intuitive choices).
I would posit that probably over 95% of "general" programmers would consider + for concatenation, and * for repetition, to make a lot more sense for string operations, if you have
to be overloading math operators for strings...
If you'd had * for repetition, and treated characters as 1 character strings where it makes sense, then you could have nice things like:

20' ' to make a string with 20 spaces, which would be consistent with 20n meaning 20*n...

If you didn't like + because of commutivity, why not pick / for string concatenation, which everyone would agree is not commutative, unlike *, which most everybody *does* think
is commutative (and would associate with have multiple copies of something [even the word multiply screams repetition!])

That sometimes means that we hit trades most other languages don't spend too much time on. An ongoing example of such a discussion regards "vectorizing" common functions--which is a common feature in the technical languages from which Julia draws a lot of inspiration. But unlike some of those languages, Julia distinguishes between array-typed things and scalar-typed things, and as the language (and its ecosystem) grows, needing to provide two implementations of every single-argument function starts to get repetitive, and there's a strong push towards requiring the use of the map(f, A) construct. This doesn't sit well with everyone.

Likewise string concatenation, matrix construction (the space sensitivity doesn't bother me as a MATLAB user whose needs primarily fall in the 2-or-fewer-array-dimensions cases, but what "," vs. ";" will do tends to confuse me), etc.

The ".." operator may get taken for a particular meaning related to getfield overloading (https://github.com/JuliaLang/julia/pull/5848), so that's something to watch out for.

Base is unlikely to get much bigger; it will probably get smaller over time in part to better support general-purpose programming without the baggage all this silly math carries with it. The plan is something along the lines of preparing a "standard distribution", and you can see a lot of packages currently owned by JuliaLang on GitHub which will likely become a part of that when it comes together. That way you can still get your batteries, but can remove batteries you don't need.

Well, I didn't mean to imply that I thought the math was *silly* ;-)  I used to love math too, but my life revolves around software architecture, language design, databases and performance issues instead these days...

Yes, I'd prefer to see it get *much* smaller, and have a Julia-lite, with just essentials, a Julia-standard, and possibly a Julia-GPL (including stuff like sparsesuite, RMath, and FFTW)...

For Char*String/String*Char, see https://github.com/JuliaLang/julia/issues/1771 (though that devolves into concatenation operator discussion, rather unfortunately.) There's probably a good argument for trying to add those again...

One nice thing is that it's really not troublesome to take on, say, binary-coded decimal types as an external package. The same machinery (method specializations, `convert()`, and `promote_type()`) that powers the standard Julia numeric type hierarchy works just fine outside of Base.

Yes, and I plan on doing just that ;-) 

Hoping this doesn't come off as a rant (I promise it's not!),

Patrick


No, no, and I hope my responses don't come off that way either!
I enjoy debating these sorts of things (politely! ;-) )
 
Scott 

Jeff Bezanson

unread,
Apr 21, 2015, 11:15:48 PM4/21/15
to juli...@googlegroups.com
I greatly prefer `string(a, b, c)` and `repeat` for concatenating and
repeating strings. Strings are, in fact, a monoid, and I think the use
of * and ^ is highly justifiable, but I'd be just as happy to see them
go.

I'd love to have a really good decimal type available in julia.

Stefan Karpinski

unread,
Apr 22, 2015, 3:52:09 PM4/22/15
to juli...@googlegroups.com
I'd also be happy to get rid of * for string concatenation. We could use ++ for concatenation in general, a la Haskell. There's not really any need for an operator for repeated concatenation. There was once a branch that made string juxtaposition do string concatenation, so you could write

"foo"  "bar" # "foobar"
 foo   "bar" # "$(foo)bar"
"foo"   bar  # "foo$bar"
 foo "" bar  # "$foo$bar"


David Anthoff

unread,
Apr 22, 2015, 4:03:08 PM4/22/15
to juli...@googlegroups.com

Here is a vote for just using + for string concatenation. No deep philosophical reason, it simply seems the choice of least unfamiliarity for the largest number of people. I think with + the topic of how string concatenation should be done would not come up again on this list. With any other choice, I’m sure this will resurface again and again and again :) I’ve read all the coherent arguments for other choices, but in this case I’m more convinced by “keep it familiar/simple for many people”.

Stefan Karpinski

unread,
Apr 22, 2015, 4:22:24 PM4/22/15
to juli...@googlegroups.com
+ isn't happening – avoid operator punning is crucial in a multiple dispatch language.

Scott Jones

unread,
Apr 22, 2015, 4:29:15 PM4/22/15
to juli...@googlegroups.com
But then, why * in the first place? 

Sent from my iPhone

Stefan Karpinski

unread,
Apr 22, 2015, 4:33:19 PM4/22/15
to juli...@googlegroups.com
Because, strings do form a non-commutative monoid and in academic literature on strings and parsing, multiplication – usually written as juxtaposition – is used for string concatenation. Whether that's a pun or not is a bit fuzzy. Using addition – which implies commutativity, mathematically – for concatenation is definitely a pun, however.

Carlo Baldassi

unread,
Apr 22, 2015, 4:46:45 PM4/22/15
to juli...@googlegroups.com
> I might be the sole remaining defender of Julia's decision here

No you are not :) I also quite like it. Or at the very least don't hate it. Also, don't see what would be gained by removing it, or the point in changing it.
As for using ++, it seems the only "advantage" would be that it's already used in another language, and it looks similar to python's +. Generally speaking, I still think that learning one operator or the other requires the same amount of effort from a novice (and quite a small effort, actually). The disadvantage of ++ is 1) a dedicated operator (implying ++ could not be used for anything else in the future) and 2) losing ^ for repetition, for no reason.

Also, I'm repeating old stuff I already wrote elsewhere, but so is everybody else :) I really don't understand the amount of heat this topic generates. It's amazing how this discussion on string concatenation completely overcame the original topic (bitwise operators).


> But then, why * in the first place?

because:


> Strings are, in fact, a monoid, and I think the use of * and ^ is highly justifiable,

Scott Jones

unread,
Apr 22, 2015, 4:53:17 PM4/22/15
to juli...@googlegroups.com
Ok, I've seen juxtaposition for string concatenation in literature, but not *...
I must not be well read enough! 
I'd agree that + would be a bad idea, but ++ would confuse lots of programmers, and juxtaposition I think would break tons of stuff.
I thought Lua's .. would be good, but somebody else said that would cause problems elsewhere ( I hadn't seen any ..'s yet in Julia, but I know I'm a newbie in Julian... 
Personally, I think it should be a new 2 character sequence, that does not currently have a meaning, in Julia or any major languages...
C, C++, Java, JavaScript, Python, Ruby...

Sent from my iPhone

Stefan Karpinski

unread,
Apr 22, 2015, 5:14:26 PM4/22/15
to juli...@googlegroups.com
On Wed, Apr 22, 2015 at 4:53 PM, Scott Jones <scott.pa...@gmail.com> wrote:
Personally, I think it should be a new 2 character sequence, that does not currently have a meaning, in Julia or any major languages...
C, C++, Java, JavaScript, Python, Ruby...

I suspect that's a very small set of sequences.

Gustavo Goretkin

unread,
Apr 22, 2015, 5:57:37 PM4/22/15
to juli...@googlegroups.com
If there is a dedicated concatenation operator (say, ++), then would it also work on 1-D arrays (or more generally on arrays with compatible dimensions)?

Stefan Karpinski

unread,
Apr 22, 2015, 6:07:11 PM4/22/15
to juli...@googlegroups.com
Yes, that seems reasonable – doing `v++w` would be equivalent to `[v;w]` for vectors at the very least.

Tim Holy

unread,
Apr 22, 2015, 6:10:03 PM4/22/15
to juli...@googlegroups.com
There's always Pizza-Pizza (https://github.com/JuliaLang/julia/issues/3721).
Maybe we could even get some sponsorship from Little Caesars?

--Tim

Ryan Northrup

unread,
Apr 22, 2015, 10:06:10 PM4/22/15
to juli...@googlegroups.com

I'd personally lean toward ++ for string concatenation (and other sorts of lists, while we're at it); this is already the case in a few other languages, so there's already at least some precedent/inertia in that direction.

Jeff Bezanson

unread,
Apr 23, 2015, 1:48:32 AM4/23/15
to juli...@googlegroups.com
Changes I'd consider worthwhile would be (1) use a general
concatenation operator like .. or ++ instead of *, or (2) don't use an
infix operator for string concatenation. Using an infix operator here
pretty much requires special N-ary parsing to avoid O(n^2)
performance, which we currently do for *. However I've been
increasingly unsure I actually like this particular feature.

Marcus Appelros

unread,
Apr 23, 2015, 3:15:28 AM4/23/15
to juli...@googlegroups.com

Changing to a new operator will result in both current Julia users and new ones having to look it up whereas keeping * means only new ones will have to find it and it will be much easier to find since it has gained lots of ground.

mschauer

unread,
Apr 23, 2015, 4:58:02 AM4/23/15
to juli...@googlegroups.com
On Thursday, April 23, 2015 at 12:07:11 AM UTC+2, Stefan Karpinski wrote:
Yes, that seems reasonable – doing `v++w` would be equivalent to `[v;w]` for vectors at the very least.

Then we can also have `v.++w` equivalent to `[v w]`!

 

Tim Holy

unread,
Apr 23, 2015, 6:45:21 AM4/23/15
to juli...@googlegroups.com
Frankly I do not understand the need for an infix operator when string(a, b, c)
works so well and lacks nasty performance traps.

--Tim

Tamas Papp

unread,
Apr 23, 2015, 7:00:28 AM4/23/15
to juli...@googlegroups.com
I have been wondering the same ever since I saw *(s::AbstractString...)
in Julia. Given that Julia is not primarily focused on string
processing, and it has nice interpolation and formatting facilities for
messages etc, why does it even need an infix string operator?

In v"0.4.0-dev+4234", there are more than 120 methods for *, and a quick
examination suggests that with the single exception above, they are
_all_ about some kind of multiplication (scalars, matrices, dates,
...). Non-commutative monoids notwithstanding, *(s::AbstractString...)
is really the odd one out.

Best,

Tamas

Scott Jones

unread,
Apr 23, 2015, 7:52:49 AM4/23/15
to juli...@googlegroups.com
I’d be concerned that with a++b, that it would seem like it might mean: a + (+b).

What about using ~ as a general concatenation operator for strings and arrays?
It only seems to be used as a unary operator, so there is no confusion or overloading problems if it is also defined as a binary operator...

It is used as a concatenation operator in Perl (6) and D.

“foo”~”bar”

Looks fine to me, as does
myarray ~= “appending this"

Scott

Scott Jones

unread,
Apr 23, 2015, 7:58:34 AM4/23/15
to juli...@googlegroups.com
Besides ~, I think ! would work well also... it is only used as a unary operator, so you could have:

“Julia” ! ” is a very interesting” ! ”language”

(and ! is easier to find on a keyboard usually than ~)

Scott

On Thursday, April 23, 2015 at 1:48:32 AM UTC-4, Jeff Bezanson wrote:

Toivo Henningsson

unread,
Apr 23, 2015, 8:02:46 AM4/23/15
to juli...@googlegroups.com
Binary ~ is used in DataFrames and also in PatternDispatch.jl

Pierre-Yves Gérardy

unread,
Apr 23, 2015, 8:37:03 AM4/23/15
to julia-dev
FWIW, I also like `..` for concatenation. In Lua, its default
implementation is indeed variadic (and, what more, it is right
associative, maybe in order to make binary implementations built on
top of ropes more efficient?).

Another possibility would be `hcat`. A matrix of strings doesn't make
much sense, or does it?

`["foo" bar "baz"]` -> `"foo$(bar)baz"

The trouble is, I don't think it's possible to efficiently dispatch on
"a series of arguments, at least one of which is a string".
`Union{String,Any}...` is factored down to `Any...`.
—Pierre-Yves

David van Leeuwen

unread,
Sep 16, 2015, 5:14:55 AM9/16/15
to julia-dev
Hello, 

I am sorry to have to revive this thread.  The discussion seems to have been diverted to "why does * do string concatenation"---which was not the issue here. 

There are a lot of things suggested in the thread quoted below by Milan, I think the overloadable `getfield()` interpretation of `.` is attractive.  The main reason I would want to free `$` for a custom getfield is the usage in DataFrames to mimic a more R-like syntax for accessing table columns.  The overloading of `.` could give some ambiguities with DataFrame's own fields.  

My argument about `$` being used as "exclusive or", again, is 
 - it seems a very expensive "waste" of ascii $ as an operator, only used for Bool type
 - it could be replaced by `^` (raise to the power)
    - `^` for xor is used in some other languages
    - the current definition of `^(a::Bool, b::Bool)` does not make any sense (is this trying to implement `Int(a)^Int(b)`? Is 0^0 not ill-defined?  Do we really need to raise a boolean to the power of another boolean?)

So, I would be very pleased if for v0.5-dev the use of `$` can be reconsidered, and hopefully in a way that will allow the syntactic sugar that will make it easier to select DataFrame columns (and perhaps dictionary keys, NamedArray indices, etc. as well...)

Cheers, 

---david


On Tuesday, April 21, 2015 at 4:14:43 PM UTC+2, Jeff Bezanson wrote:
I actually strongly agree with this. As ASCII characters have become
more precious, bitwise operators don't seem like a good use of them.
We could gain *three* more ASCII operators.

On Mon, Apr 20, 2015 at 3:34 AM, Milan Bouchet-Valat <nali...@club.fr> wrote:
> Le dimanche 19 avril 2015 à 14:25 -0700, David van Leeuwen a écrit :
>> Hello,
>>
>>
>> I've seen this discussed before, but wasn't able to find search terms
>> to continue the discussion...
> You're probably looking for this:
> https://github.com/JuliaLang/julia/issues/1974
>
>
> Regards
>
>> I'm wondering what the actual usage of `$` as bitwise exclusive or in
>> practice is (in current packages and other code), and whether it is
>> possible to change this to `^` (which is is currently defined as `x ^
>> y = x | !y`, which, to me, appears to be the logic for "y implies
>> x").
>>
>>
>> It would be really cool if x$y (without spaces) could be syntactic for
>> x[:y] which would be useful in DataFrames, but has also use cases in
>> Dicts.
>>
>>
>> Cheers,
>>
>>
>> ---david
>
>
>

Scott Jones

unread,
Sep 16, 2015, 8:23:26 AM9/16/15
to julia-dev
I'd agree, it would be good to get rid of $ as bitwise XOR, and either just use an xor function, or have "xor" as an infix operator
(what other languages use $ for bitwise XOR? I haven't found any, and since it isn't used all that heavily, and most programmers would expect it to be ^ anyway, it should be deprecated).

Tamas Papp

unread,
Sep 16, 2015, 8:33:07 AM9/16/15
to juli...@googlegroups.com
Just a clarifying question: in dataframe$col, is col planned to be a
variable (that has a key that identifies the column) or a symbol (that
is interpreted as a key per se)?

In R it is the latter, dataframe$col is equivalent to
dataframe[["col"]]. If that's what is planned for Julia then I guess $
would need to be a macro.

I have nothing against this, just curious.

Best,

Tamas

On Wed, Sep 16 2015, David van Leeuwen <david.va...@gmail.com> wrote:

> So, I would be very pleased if for v0.5-dev the use of `$` can be
> reconsidered, and hopefully in a way that will allow the syntactic sugar
> that will make it easier to select DataFrame columns (and perhaps
> dictionary keys, NamedArray indices, etc. as well...)
>
> Cheers,
>
> ---david
>
>
> On Tuesday, April 21, 2015 at 4:14:43 PM UTC+2, Jeff Bezanson wrote:
>>
>> I actually strongly agree with this. As ASCII characters have become
>> more precious, bitwise operators don't seem like a good use of them.
>> We could gain *three* more ASCII operators.
>>
>> On Mon, Apr 20, 2015 at 3:34 AM, Milan Bouchet-Valat <nali...@club.fr
>> <javascript:>> wrote:
>> > Le dimanche 19 avril 2015 à 14:25 -0700, David van Leeuwen a écrit :
>> >> Hello,
>> >>
>> >>
>> >> I've seen this discussed before, but wasn't able to find search terms
>> >> to continue the discussion...
>> > You're probably looking for this:
>> > https://github.com/JuliaLang/julia/issues/1974
>> <https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2FJuliaLang%2Fjulia%2Fissues%2F1974&sa=D&sntz=1&usg=AFQjCNF3fvX2KJKM8u2nsWFjv2colBvdDw>

David Gold

unread,
Sep 16, 2015, 12:25:58 PM9/16/15
to julia-dev
@Tamas I suspect that having `col` be a variable will incur too much collision with other names, e.g. methods.

If folks are thinking about including either parser support or infix macro support for $-based DataFrame indexing, then it might also be appropriate to consider whether or not such support could ameliorate type-uncertainty as well. Some suggestions in https://github.com/JuliaStats/DataFrames.jl/issues/744 have centered around wrapping field names (as symbols) as the parameter of a `Field` type. Then, given a sufficiently informative type parametrization for DataFrame objects, `df[Column{:col}, i]` is amenable to static analysis. So one might think about having `df$col` expand to `df[Column{:col}]`. But this involves a certain level of magic just to get type-inferrable indexing off the ground.

In any case, the point is that there are other issues apart from notational brevity that could be addressed with custom parsing/expansion of `$` for DataFrame indexing. Another feature that would be cool to support would be "environment reification" in which one could do something like

with(df) do
    x = 0.0
    for i in eachindex($col)
        x += $col[i]
    end
    x
end

David van Leeuwen

unread,
Sep 17, 2015, 9:24:56 AM9/17/15
to julia-dev
Hi,


On Wednesday, September 16, 2015 at 2:33:07 PM UTC+2, Tamas Papp wrote:
Just a clarifying question: in dataframe$col, is col planned to be a
variable (that has a key that identifies the column) or a symbol (that
is interpreted as a key per se)?

In R it is the latter, dataframe$col is equivalent to
dataframe[["col"]]. If that's what is planned for Julia then I guess $
would need to be a macro.
 
Yes, what I had in mind is that df$col is (some form of syntactic sugar for) df[:col], and that df$col[i] means df[i,:col].

R also has late binding, I believe, which makes it possible to use "col" in a function call and clever things like formulas and such, where the function will interpret "col" in the right dataframe.  That will be a bit hard in Julia, so I don't mind that in such cases you'd have to work with :col.  But for direct access of the column, df$col would be rather handy...

---david

Jeffrey Sarnoff

unread,
Sep 17, 2015, 9:48:27 AM9/17/15
to julia-dev

Were multiplication commutative, quantum entanglement would be tangle-free and children could give birth to their parents.

Tamas Papp

unread,
Sep 17, 2015, 10:04:13 AM9/17/15
to juli...@googlegroups.com
On Thu, Sep 17 2015, David van Leeuwen <david.va...@gmail.com> wrote:

> Hi,
>
> On Wednesday, September 16, 2015 at 2:33:07 PM UTC+2, Tamas Papp wrote:
>>
>> Just a clarifying question: in dataframe$col, is col planned to be a
>> variable (that has a key that identifies the column) or a symbol (that
>> is interpreted as a key per se)?
>>
>> In R it is the latter, dataframe$col is equivalent to
>> dataframe[["col"]]. If that's what is planned for Julia then I guess $
>> would need to be a macro.
>>
>
> Yes, what I had in mind is that df$col is (some form of syntactic sugar
> for) df[:col], and that df$col[i] means df[i,:col].
>
> R also has late binding, I believe, which makes it possible to use "col" in
> a function call and clever things like formulas and such, where the
> function will interpret "col" in the right dataframe. That will be a bit
> hard in Julia, so I don't mind that in such cases you'd have to work with
> :col. But for direct access of the column, df$col would be rather handy...

I am not experienced with programming language design, but I teach a
course that uses R and have seen students get confused by this
feature. For example, if someone wants go generalize

do_analysis(df$col1, df$col2) ## R code

with

## R code
general_analysis <- function(df, col1, col2, ...) {
do_analysis(df$col1, df$col2)
...
}

then they get stuck unless they are familiar with R and use

## R code
general_analysis <- function(df, col1, col2, ...) {
do_analysis(df[[col1]], df[[col2]])
...
}

This is much worse in formulas, of course, they have to know more arcane
functions to substitute there.

I have not used DataFrames.jl extensively so I don't know whether this
would be a problem for Julia. I guess having a single [] accessor
simplifies things. But then I don't really see the need for the magic $
in the context of accessing columns.

Best,

Tamas

Scott Jones

unread,
Sep 17, 2015, 11:13:11 AM9/17/15
to julia-dev
I'm all for getting rid of $ as xor, because it can be rather confusing, but not to just use it again in some even more limited use case that would be just as confusing.
Maybe it's my old Scheme days coming through, rather not have extra syntax in many cases (it has to be heavily used, by lots of users of the language, where the alternative
is really painful or long, before I'd want syntactic sugar for something).

Tom Breloff

unread,
Sep 17, 2015, 11:25:35 AM9/17/15
to juli...@googlegroups.com
I'm with Scott... I think it's a bad idea to add $ as something specific to dataframes.  What could be cool (and doesn't need to be in base!) is a macro that would handle the syntactic sugar for you:

@$ begin
    df = DataFrame(mydata)
    z = df$colname1
    q = df$colname2
end

This sort of stuff isn't crazy hard to make, it doesn't touch julia's parsing, it's opt-in, etc.  You can't use $ as a macro name right now, but you get the point.  Is this feasible? 

David van Leeuwen

unread,
Sep 17, 2015, 12:13:53 PM9/17/15
to julia-dev
Hello, 


On Thursday, September 17, 2015 at 5:25:35 PM UTC+2, Tom Breloff wrote:
I'm with Scott... I think it's a bad idea to add $ as something specific to dataframes.  What could be cool (and doesn't need to be in base!) is a macro that would handle the syntactic sugar for you:

There are more use cases.  Json-like structures, now accessed as `json["result"]["data"][1]` could be written as json$result$data[1] with proper operator precedence and some form of late binding.  There is some string<->symbol stuff going on in this example, but most json keys would be plain ascii. 
 

@$ begin
    df = DataFrame(mydata)
    z = df$colname1
    q = df$colname2
end

Macros are just too hard for me to comprehend...

But on a different note, I realized that I get almost what I want by overloading `$` as is:

Base.(:$)(df::DataFrame, col::Symbol) = df[col]

Now `df$:col` works, but unfortunately `df$:col[1]` needs to be written as `(df$:col)[1]` which kind-of defeats the purpose. 

This sort of stuff isn't crazy hard to make, it doesn't touch julia's parsing, it's opt-in, etc.  You can't use $ as a macro name right now, but you get the point.  Is this feasible? 

How would this work? `@$ df col`?  This doesn't make thing more readable imho.  

---david

Scott Jones

unread,
Sep 17, 2015, 12:31:01 PM9/17/15
to julia-dev
I still don't see any benefit from that, seems like makes things a lot less understandable,
and 1) should be able to be able to be handled by macros, 2) isn't really going to be that commonly
used.

Tom Breloff

unread,
Sep 17, 2015, 12:44:05 PM9/17/15
to juli...@googlegroups.com
From a high level, macros give you access to the parsed source code, let you make changes to it through the AST tree, then plop the new code where the old code was.  So

@$ begin
    df = DataFrame(mydata)
    z = df$colname1
    q = df$colname2
end

would effectively be replaced with

df = DataFrame(mydata)
z = df[:colname1]
q = df[:colname2

No need to force parsing cases on julia... you can change the parsing rules yourself with macros. 

John Myles White

unread,
Sep 17, 2015, 12:45:54 PM9/17/15
to juli...@googlegroups.com
Can we move this thread over to julia-users?

  -- John

Pierre-Yves Gérardy

unread,
Sep 17, 2015, 2:59:36 PM9/17/15
to julia-dev
If we can't debate language design, what is on topic here?

—Pierre-Yves

Stefan Karpinski

unread,
Sep 17, 2015, 3:03:26 PM9/17/15
to juli...@googlegroups.com
I think this is a fine topic for julia-dev, but it doesn't seem to be going anywhere and we've got more pressing matters to worry about than futzing with changing the meanings of operators.

Andy Ferris

unread,
Sep 27, 2015, 10:47:45 AM9/27/15
to julia-dev
Stefan and Jeff - surely the monoid in question is much more like a tensor sum \oplus (which is non-commutative) than a product? This is exactly vector concatenation, and from the mathematical point of view a string is a bit like a vector of characters, no?

I've been working heavily on tensors (multilinear algebra - a bimonoidal category) in Julia and would really like an operation for tensor sum and tensor product.

I would strongly support ++ behaving like \oplus and ** behaving like \otimes but I would more strongly prefer going straight to the unicode characters themselves. For vectors, \oplus and \otimes are well defined. For linear algebra, the same is true - we can replace kron(), for instance with ** or \otimes. For multilinear algebra, I could make this fit into my work (github.com/andyferris/Tensors.jl). And string concatenation would be sorted out at the same time, using familiar laymen terms (some kind of "addition") that is also consistent with mathematical theory!



On Wednesday, April 22, 2015 at 10:33:19 PM UTC+2, Stefan Karpinski wrote:
Because, strings do form a non-commutative monoid and in academic literature on strings and parsing, multiplication – usually written as juxtaposition – is used for string concatenation. Whether that's a pun or not is a bit fuzzy. Using addition – which implies commutativity, mathematically – for concatenation is definitely a pun, however.

On Wed, Apr 22, 2015 at 4:29 PM, Scott Jones <scott.pa...@gmail.com> wrote:
But then, why * in the first place? 

Sent from my iPhone

On Apr 22, 2015, at 4:21 PM, Stefan Karpinski <ste...@karpinski.org> wrote:

+ isn't happening – avoid operator punning is crucial in a multiple dispatch language.

On Wed, Apr 22, 2015 at 4:02 PM, David Anthoff <ant...@berkeley.edu> wrote:

Here is a vote for just using + for string concatenation. No deep philosophical reason, it simply seems the choice of least unfamiliarity for the largest number of people. I think with + the topic of how string concatenation should be done would not come up again on this list. With any other choice, I’m sure this will resurface again and again and again :) I’ve read all the coherent arguments for other choices, but in this case I’m more convinced by “keep it familiar/simple for many people”.

 

From: juli...@googlegroups.com [mailto:juli...@googlegroups.com] On Behalf Of Stefan Karpinski
Sent: Wednesday, April 22, 2015 12:51 PM
To: juli...@googlegroups.com
Subject: Re: [julia-dev] use of $ as xor versus syntactic sugar for DataFrames

 

I'd also be happy to get rid of * for string concatenation. We could use ++ for concatenation in general, a la Haskell. There's not really any need for an operator for repeated concatenation. There was once a branch that made string juxtaposition do string concatenation, so you could write

 

"foo"  "bar" # "foobar"

 foo   "bar" # "$(foo)bar"

"foo"   bar  # "foo$bar"

 foo "" bar  # "$foo$bar"

 

 

 

On Tue, Apr 21, 2015 at 11:15 PM, Jeff Bezanson <jeff.b...@gmail.com> wrote:

I greatly prefer `string(a, b, c)` and `repeat` for concatenating and
repeating strings. Strings are, in fact, a monoid, and I think the use
of * and ^ is highly justifiable, but I'd be just as happy to see them
go.

I'd love to have a really good decimal type available in julia.


On Tue, Apr 21, 2015 at 8:49 PM, Scott Jones <scott.pa...@gmail.com> wrote:
> On Tuesday, April 21, 2015 at 6:57:37 PM UTC-4, Patrick O'Leary wrote:
>>
>> On Tuesday, April 21, 2015 at 4:47:37 PM UTC-5, Scott Jones wrote:
>>>
>>> I always thought multiplication *was* commutative, at least that was what
>>> I was taught at school, and it seems all the examples on Google searching
>>> "commutative operations"
>>> mention addition and multiplication as being commutative.
>>> I suppose there is some rarified realm popular with Julia's designers
>>> where that is not true... but, it is NOT what most anybody (including people
>>> who went to MIT) would expect!
>>
>> One such "rarified realm," as you put it, is linear algebra, which is an
>> important use case for us. While we hope Julia to be suitable as a
>> general-purpose language, a key part of its attraction for me is that it
>> does math--in particular, linear algebra--well. And doing mathy things
>> (quickly!) is very much a design goal; math-like objects exploit
>> multimethods well. A quick look at our standard dependencies (BLAS? FFTW?
>> SuiteSparse?) I think does a good job of showing where most of the
>> developers' heads are at.
>
>
> Well, after I posted that I remembered something about cases where
> multiplication wasn't commutative, from 18.06 (Linear Algebra at MIT) with
> Gilbert Strang, however,
> I think the designers of Julia should think a bit more about what would make
> sense to the general programmer population, especially when it comes to
> something that *isn't*
> a mathematical operation... (at least, if they'd like Julia to become more
> generally accepted outside of a niche, and without a lot of programmers
> throwing virtual tomatoes at the designers for some non-intuitive choices).
> I would posit that probably over 95% of "general" programmers would consider
> + for concatenation, and * for repetition, to make a lot more sense for
> string operations, if you have
> to be overloading math operators for strings...
> If you'd had * for repetition, and treated characters as 1 character strings
> where it makes sense, then you could have nice things like:
>
> 20' ' to make a string with 20 spaces, which would be consistent with 20n
> meaning 20*n...
>
> If you didn't like + because of commutivity, why not pick / for string
> concatenation, which everyone would agree is not commutative, unlike *,
> which most everybody *does* think
> is commutative (and would associate with have multiple copies of something
> [even the word multiply screams repetition!])
>
>> That sometimes means that we hit trades most other languages don't spend
>> too much time on. An ongoing example of such a discussion regards
>> "vectorizing" common functions--which is a common feature in the technical
>> languages from which Julia draws a lot of inspiration. But unlike some of
>> those languages, Julia distinguishes between array-typed things and
>> scalar-typed things, and as the language (and its ecosystem) grows, needing
>> to provide two implementations of every single-argument function starts to
>> get repetitive, and there's a strong push towards requiring the use of the
>> map(f, A) construct. This doesn't sit well with everyone.
>>
>> Likewise string concatenation, matrix construction (the space sensitivity
>> doesn't bother me as a MATLAB user whose needs primarily fall in the
>> 2-or-fewer-array-dimensions cases, but what "," vs. ";" will do tends to
>> confuse me), etc.
>>
>> The ".." operator may get taken for a particular meaning related to
>> getfield overloading (https://github.com/JuliaLang/julia/pull/5848), so
>> that's something to watch out for.
>>
>> Base is unlikely to get much bigger; it will probably get smaller over
>> time in part to better support general-purpose programming without the
>> baggage all this silly math carries with it. The plan is something along the
>> lines of preparing a "standard distribution", and you can see a lot of
>> packages currently owned by JuliaLang on GitHub which will likely become a
>> part of that when it comes together. That way you can still get your
>> batteries, but can remove batteries you don't need.
>
>
> Well, I didn't mean to imply that I thought the math was *silly* ;-)  I used
> to love math too, but my life revolves around software architecture,
> language design, databases and performance issues instead these days...
>
> Yes, I'd prefer to see it get *much* smaller, and have a Julia-lite, with
> just essentials, a Julia-standard, and possibly a Julia-GPL (including stuff
> like sparsesuite, RMath, and FFTW)...
>
>> For Char*String/String*Char, see
>> https://github.com/JuliaLang/julia/issues/1771 (though that devolves into
>> concatenation operator discussion, rather unfortunately.) There's probably a
>> good argument for trying to add those again...
>>
>> One nice thing is that it's really not troublesome to take on, say,
>> binary-coded decimal types as an external package. The same machinery
>> (method specializations, `convert()`, and `promote_type()`) that powers the
>> standard Julia numeric type hierarchy works just fine outside of Base.
>
>
> Yes, and I plan on doing just that ;-)
>>
>>
>> Hoping this doesn't come off as a rant (I promise it's not!),
>>
>> Patrick
>
>
>
> No, no, and I hope my responses don't come off that way either!
> I enjoy debating these sorts of things (politely! ;-) )
>
>
> Scott

 



Eric Forgy

unread,
Sep 27, 2015, 8:40:06 PM9/27/15
to julia-dev
On Tue, Apr 21, 2015 at 10:14 AM, Jeff Bezanson <jeff.b...@gmail.com> wrote:
I actually strongly agree with this. As ASCII characters have become
more precious, bitwise operators don't seem like a good use of them.
We could gain *three* more ASCII operators.
I don't think I have ever used $ for xor in my life so using it as suggested above (even for macros) sounds good, but I would probably prefer to keep & and | as bitwise operators. I use that all the time. 

On Sunday, September 27, 2015 at 10:47:45 PM UTC+8, Andy Ferris wrote:
I would strongly support ++ behaving like \oplus and ** behaving like \otimes but I would more strongly prefer going straight to the unicode characters themselves. For vectors, \oplus and \otimes are well defined. For linear algebra, the same is true - we can replace kron(), for instance with ** or \otimes. For multilinear algebra, I could make this fit into my work (github.com/andyferris/Tensors.jl). And string concatenation would be sorted out at the same time, using familiar laymen terms (some kind of "addition") that is also consistent with mathematical theory!

I understand that generally unicode is frowned on for operators, but in this case it makes a lot of sense. I like this idea! \otimes for string concatenation makes sense mathematically as well. I doubt it will ever happen, but it would be pretty/elegant if it did. Making \oplus for string concatenation would probably also stop any discussions about + vs * since no one would ever suggest \oplus for string concatenation (I hope!) although + is found in other languages :)

 

Eric Forgy

unread,
Sep 27, 2015, 8:48:34 PM9/27/15
to julia-dev
On Monday, September 28, 2015 at 8:40:06 AM UTC+8, Eric Forgy wrote:
Making \oplus for string concatenation would probably also stop any discussions about + vs * since no one would ever suggest \oplus for string concatenation (I hope!) although + is found in other languages :) 

Oops. Typo. Should be "Making \otimes "  

Scott Jones

unread,
Sep 27, 2015, 10:54:15 PM9/27/15
to julia-dev
I thought that Andy Ferris was saying that \oplus, not \otimes, made sense mathematically for concatenation,
and also, that \oplus would make more sense to people used to + for concatenation.
Julia is the only language that I've ever seen that used * for concatenation (I think I've seen it for repetition before in other languages, which makes sense, multply x times = repeat x times).
I had already suggested \oplus on GitHub, some time ago.

Eric Forgy

unread,
Sep 27, 2015, 11:44:52 PM9/27/15
to julia-dev


On Monday, September 28, 2015 at 10:54:15 AM UTC+8, Scott Jones wrote:
I thought that Andy Ferris was saying that \oplus, not \otimes, made sense mathematically for concatenation,
and also, that \oplus would make more sense to people used to + for concatenation.
Julia is the only language that I've ever seen that used * for concatenation (I think I've seen it for repetition before in other languages, which makes sense, multply x times = repeat x times).
I had already suggested \oplus on GitHub, some time ago.

If that is the case, then it shows that even mathematicians cannot agree on notation :)

My initial thought was about graded algebras. For graded algebras, e.g. tensor algebras, the product (\otimes) of a tensor of degree r and a tensor of degree s is a tensor of degree r+s. The product (\oplus) of basis elements looks just like concatenation of the bases, so \otimes makes sense in this context for concatenation.

Andy's reference to monoids may be a better way to think about it, i.e. the set of all strings with product being concatenation, but even there, I would tend to think of the product in a monoid as \otimes rather than \oplus. Then again, I may be too influenced by my old friends at ncatlab (monoid in a monoidal category), where \otimes is used.

PS: Speaking of my old friends, this is a beautiful way to visualize monoids in terms of string diagrams. Just FYI.

Glen O

unread,
Sep 28, 2015, 1:48:45 AM9/28/15
to julia-dev
On Monday, 28 September 2015 00:47:45 UTC+10, Andy Ferris wrote:
Stefan and Jeff - surely the monoid in question is much more like a tensor sum \oplus (which is non-commutative) than a product? This is exactly vector concatenation, and from the mathematical point of view a string is a bit like a vector of characters, no?
Multiplication isn't commutative for matrices, either, but we use regular multiplication for that. It just so happens that integer/float multiplication ends up being commutative. If Julia is going to use a different operator for non-commutative operations, then matrix operations will need new operators. Personally, I think multiplication-for-concatenation makes sense, mathematically, since juxtaposition is equivalent to multiplication in most places where it applies, and juxtaposition of strings makes sense.

Seems a bit silly to use a unicode operator when regular multiplication, addition, and exponentiation (as per floats) can't apply to strings, anyway.

Incidentally, I have a suggestion for how to use + in the context of strings - addition of single characters. "test"+32 becomes "test ", while "test"+64 becomes "test@" and "test"+90 becomes "testZ". Obviously, intended to be done with chars, so "test"+' ' becomes "test ", etc.

All this being said, I don't see why + and * can't both be string concatenation, with * and ^ for string repetition. That is, "test"*3 and "test"^3 would both give "testtesttest", and "test"+"this" and "test"*"this" (and "test""this") would give "testthis". Is there any other use for any of these characters in the context of strings? Note that, with the above suggestion, addition would concatenate with chars, while multiplication would repeat. That is, "t"*' ' would result in 32 t's in a string, while "t"+' ' would result in "t ". In this way, choice of which operator to use depends on what you're going to do with it.



Anyway, if I may make a suggestion that could lead to some interesting discussion...

To deal with the fact that some operators should behave differently on arrays, there are already distinct "elementwise" operators. So how about coming up with a similar scheme for "bitwise" operators?

Bitwise, multiplication is the "and" operator, and addition is the "xor" operator, thus leaving only the "or" and "not" operators unused. If we extend ! to apply bitwise to integers (thus replacing ~), then only "or" remains unaccounted for. Given that this would free up ~, &, |, and $, and of these, the least "useful" in terms of interpretation is $, I'd suggest using $ as the "elementwise" marker, with single use ($) representing "or" (think of it as "bitwise combine" and thus equivalent to "bitwise or"), and then mixed operators for "and" (*$) and "xor" (+$). Note that I'm placing the operator before the elementwise marker because the other way around may lead to confusion in the case of "or" (a$+b could be parsed as a$(+b) as + can be a unary operator).

This frees up ~, &, and |... all three are symbols that would be useful in other contexts. For the use being requested in this thread, I'd suggest pipe as the natural operator, feeling somewhat akin to the restriction operator for functions in mathematics (f|_A meaning "the function f, restricted to domain A"). In fact, perhaps the pipe operator could become the "pass left" operator, with |> becoming "pass right". This could then be used to also enable infix operators ("12 |>mod| 7" or "A |>push!| 5"). Or not - this is just an example of a possible way to use it.

Andy Ferris

unread,
Sep 28, 2015, 2:06:52 AM9/28/15
to julia-dev
Eric - regarding monoidal categories (and nacatlab), \otimes was used in analogy with the tensor product (outer product) where the dimension multiplies (by which I mean total number of elements in the tensor, e.g. the length of a vector). Perhaps we might say it is the "prototypical" monoid. The degree (order?) of a tensor indeed would add under this notation, but the total number of elements would increase multiplicatively.

It is also true that the direct sum \oplus obeys the axioms of a monoid. Here the rank/order of the tensor remains constant and the number of elements adds (for vectors - in general the number of non-zero elements add linearly). Frequently we like to talk about categories where we are using the tensor sum as the monoid, but because \otimes is used throughout category theory, we sometimes see the outer product symbol used for the direct sum!

It turns out both monoids are important. Tensors obey a bi-monoidal category, where \otimes and \oplus are both included. This makes it hard to draw a tensor network diagram with both monoids in there at once, but nonetheless, we can write (in algebra) expressions with both \otimes and \oplus. 

If we wish to concatenate vectors, the correct operation is \oplus, where the number of elements of the vector sums (not multiplies). The tensor sum of two strings (interpreted as vectors of characters) should be "x" \oplus "y" = "xy".

For string repetition, the standard notation would yield "x" ^\otimes 4 = "xxxx" (that's hard to write in ASCII...). Perhaps we better define ^\otimes and ^\oplus (where the second characters are unicode) as infix operators too?

I don't think \otimes has a good meaning for strings, though... not sure.

Tamas Papp

unread,
Sep 28, 2015, 2:27:23 AM9/28/15
to juli...@googlegroups.com
Let s_{j,n} be 1 if message n in thread j is about the syntax of string
concatenation in Julia, 0 otherwise.

Conjecture:

\forall j, \lim_{n\to\infty} E[ s_{j,n} ] = 1

Corollary:

Let O be an operator that has never been used in any language, looks
weird, and requires between 8-16 keystrokes to type, possibly on a
separate keyboard, perhaps also using your toes.

Let P be any string concatenation operator. Then

effort(using O) <= effort(using your favourite P + discussions about O vs P)

Best,

Tamas
>> <http://ncatlab.org/nlab/show/monoid+in+a+monoidal+category>, where
>> \otimes is used.
>>
>> PS: Speaking of my old friends, this <http://ncatlab.org/nlab/show/monoid>

Andy Ferris

unread,
Sep 28, 2015, 2:31:18 AM9/28/15
to julia-dev


On Monday, September 28, 2015 at 7:48:45 AM UTC+2, Glen O wrote:
On Monday, 28 September 2015 00:47:45 UTC+10, Andy Ferris wrote:
Stefan and Jeff - surely the monoid in question is much more like a tensor sum \oplus (which is non-commutative) than a product? This is exactly vector concatenation, and from the mathematical point of view a string is a bit like a vector of characters, no?
Multiplication isn't commutative for matrices, either, but we use regular multiplication for that. It just so happens that integer/float multiplication ends up being commutative. If Julia is going to use a different operator for non-commutative operations, then matrix operations will need new operators. Personally, I think multiplication-for-concatenation makes sense, mathematically, since juxtaposition is equivalent to multiplication in most places where it applies, and juxtaposition of strings makes sense.

Glen - I think my comments regarding commutativity are possibly in line with yours - the core developers didn't like + for concatenation because it was too commutative (for matrices, and algebras of groups, etc) while * is not (for matrices and groups elements themselves). We see statements like "+ isn't the right monoid for concatenation". My point is that \oplus *is exactly* the right monoid for concatenation. I personally don't particularly mind what we do for concatenating strings - what I want to do is concatenate vectors in general (where * isn't going to work for vectors of numbers), but then if we have a special operator for that, then it might as well be used for vectors of characters also (i.e. strings).
 
Seems a bit silly to use a unicode operator when regular multiplication, addition, and exponentiation (as per floats) can't apply to strings, anyway.

Indeed, that is a legitimate concern! But how might we express concatenation of vectors of numbers? Use ++? Is it beneficial to think of a string as behaving like a vector of characters, or should we treat them completely differently?

Glen O

unread,
Sep 28, 2015, 3:17:01 AM9/28/15
to julia-dev
On Monday, 28 September 2015 16:31:18 UTC+10, Andy Ferris wrote:

Indeed, that is a legitimate concern! But how might we express concatenation of vectors of numbers? Use ++? Is it beneficial to think of a string as behaving like a vector of characters, or should we treat them completely differently?


I can think of a few options. If the bitwise operators (&, |, and ~) are freed up, then either & or | would be great for general concatenation. It's worth noting that, while strings are in some ways like vectors of characters, they aren't quite the same thing. Note that collect("test") produces an actual vector of characters. While the ability to do certain operations on strings like you do on vectors would be nice, it is worth keeping them separate.

As I see it, if there were a concatenation operator (for the purpose of example, let's suppose it's pipe), then "test"|"my code" should be equivalent to ["test","my code"], rather than concatenating the strings themselves, following the same logic as other array-based uses of strings.

I don't like ++, because + is a unary operator, and so A++B could be read as A+(+B), which could cause confusion. But other notations could work nicely. A+>B, for instance, or A|<B. And there's also the potential for A..B, perhaps. Indeed, .. is already usable as an operator (in v0.3). (..)=(i,j)->cat(1,i,j) allows you to type [1,3,2]..[4,1,3] to get [1,3,2,4,1,3] (and 3..5 gives [3,5]). And it kind of makes sense, since the dot means "Elementwise", so double-dot could be read as "extend elements". Note that "test".."this" will give ["test","this"], thus again demonstrating the difference between a vector of chars and a string.

Eric Forgy

unread,
Sep 28, 2015, 3:31:35 AM9/28/15
to julia-dev
On Monday, September 28, 2015 at 2:06:52 PM UTC+8, Andy Ferris wrote:
we sometimes see the outer product symbol used for the direct sum!

From the perspective of category theory, \otimes is a product and \oplus is a coproduct. One thing that probably contributes to differing opinions is that a coproduct in a category C is a product in the opposite category C^op. In the monoid, i.e. category with one object, we're talking about, the difference between C and C^op is fuzzy to me so the difference between \otimes and \oplus seems fuzzy. Its almost like it depends on whether you read left-to-right or right-to-left* :)

Like I said, unicode is probably a nonstarter, but I like the idea.

*In our monoid M, the unit morphism 1 is the empty character and a morphism a can be thought of as a path through "character space" starting at "" passing through "a" and back again, i.e. a = "" -> "a" -> "". Morphisms are often applied right-to-left, i.e. a morphism abc would be interpreted as a path through c, then through b and finally through a, i.e. function arguments are usually written to the right.

The opposite category M^op would replace every morphism abc with the opposite CBA, where A = a^op, B = b^op and C = c^op. Here, we would read the morphisms left-to-right, which is probably more natural for strings.

If we let \otimes be the product in M, then \oplus would be the product in M^op and vice versa, i.e. if we let \oplus be the product in M, then \otimes would be the product in M^op. However, in both cases, the product is basically composition of morphisms, so this lends itself to a natural suggestion to have \circ (composition of morphisms/functions) be the operator for concatenation of strings :) 

Eric Forgy

unread,
Sep 28, 2015, 4:02:23 AM9/28/15
to julia-dev
On Monday, September 28, 2015 at 3:17:01 PM UTC+8, Glen O wrote:
But other notations could work nicely. A+>B, for instance, or A|<B. 

Again, thinking about string concatenation in the context of monoids, the strings would be morphisms and concatenation of strings is composition of morphisms. I suggested \circ, but a more serious suggestion might be to take monoids seriously for strings in Julia and use pre- or post-fix operators.

Pre-Fix Operator

String1 <| String2 <| String3

This would correspond the monoid M I described.

Post-Fix Operator

String1 |> String2 |> String3

This would correspond to M^op.

This feels elegant to me but not sure how Julian it is.

Scott Jones

unread,
Sep 28, 2015, 5:08:53 AM9/28/15
to julia-dev
OK, here's the problem.
* is very confusing to most people (even mathematicians from comments I've seen) when they first come to Julia, for the very simple reason that people associate addition with concatenation, and multiplication with repetition.
Overloading any math operator with a very different concept (concatenation) can also lead to a lot of confusion, because if you know that operator O means concatenation on strings, you'll likely want to try to concatenate vectors with it also (which breaks because then you get the mathematical meaning - except
possibly for \oplus, as Andy as described).
I think that talking about monoids as a justification for language design is simply going to prove to everybody else that Julia is a niche language for mathematicians, instead of the great general purpose language that it can be.
Binary "++" (as opposed to unary prefix or postfix "++" as in C/C++/Java), is already used in Haskell at least for concatenation, and if it is used as an alias to \oplus, as Andy suggested, could be used as a general concatenation operator that *also* makes sense to the mathematicians, and doesn't have the problem that
Tamas brought up:

Andy Ferris

unread,
Sep 28, 2015, 5:41:06 AM9/28/15
to julia-dev
+1 to Scott and Tamas :)

Honestly, I'm surprised but not overly bothered by * for strings. I only bring up monoids in response to arguments I read by core developers like Stefan and Jeff and others. I'm not sure who is happy with the status quo, everyone seems reluctant in one post on one thread or another, but it seemed a monoid-compatible solution was the only one that would actually be considered. The community seemed to want something more 'plus'-like, and \oplus is a well-known monoidal operator, so...

* is very confusing to most people (even mathematicians from comments I've seen) when they first come to Julia, for the very simple reason that people associate addition with concatenation, and multiplication with repetition.

We indeed speak that way in natural language, and physicists/mathematicians working on linear algebra write the symbol \oplus not + or * for concatenation (and can write \otimes [1,1,1...] or ^\oplus for repetition). Julia (like MATLAB) inherits a lot of the language from linear algebra conventions. Anyway, this is going in circles... sorry everyone.

Tamas Papp

unread,
Sep 28, 2015, 5:58:49 AM9/28/15
to juli...@googlegroups.com
The string concatenation operator is one of the great bikeshedding
issues of Julia. Also, it won't stop until key developers strongly
suggest not to generate noise about this, and enforce it.

The situation is not unlike a land rush: people feel that Now Is The
Time to rectify issues about operators, and unless they do this, Bad
Design will become entrenched Forever. And we all know that naming
operators is the Key Issue of language design and adoption.

Proponents of alternative syntaxes are very vocal and hijack threads, so
people who could not care less about the whole thing, are OK with the
current syntax, or have other priorities, feel that they should
voice their own opinion (yes, I am doing this too).

While, in fact, Julia is flexible and powerful enough so that

1. everyone can experiment with their favorite concatenation semantics,
for strings, vectors, etc,

2. using either functions, or infix operators (a trivial syntax issue),

3. put it in a package, and let other people use it.

In due time,

4. it could end up in the language core if necessary.

Whenever this issue comes up, I perceive that some people have strong
opinions about (2) (It Should Be an Operator, it is not clear to me why
this is so clearcut); (1) strong suggestions about which operator it
should be, want to skip (3) and go directly to (4).

I think that the socially optimal solution would be to put this issue to
rest for a while, and enforce this. Hijacking threads and issues is
really impolite.

Best,

Tamas

On Mon, Sep 28 2015, Andy Ferris <ferri...@gmail.com> wrote:

Eric Forgy

unread,
Sep 28, 2015, 7:04:24 AM9/28/15
to juli...@googlegroups.com
Nicely said Tamas. I won't say another word on the subject until I have written a package or at least macro :)

Sent from my iPhone

Glen O

unread,
Sep 28, 2015, 9:11:35 AM9/28/15
to julia-dev
On Monday, 28 September 2015 19:58:49 UTC+10, Tamas Papp wrote:
Proponents of alternative syntaxes are very vocal and hijack threads, so
people who could not care less about the whole thing, are OK with the
current syntax, or have other priorities, feel that they should
voice their own opinion (yes, I am doing this too).

While most of what you say is somewhat right, it's worth noting that "fundamental operators" is an important issue, and one that DOES need to be addressed now. The top people within Julia's development have commented on the issue, noting that there aren't enough ASCII operators for the kinds of operations that are likely to be most-used, and while *new* operators might be something that can be proposed through the "start in a package, and it'll move to base if it's really popular" approach, changes to the existing fundamental operators cannot seriously be addressed in this way, as those operators will be used in other packages for other meanings, creating incompatibilities.

As such, if any fundamental operators are going to change in meaning, that needs to be decided in a central manner, by those who are the core developers of the language.

And that is precisely what this thread is about - the use of a fundamental operator ($), asking if its current use is really needed as a fundamental operator (and whether it can be used, instead, for a different purpose). While there is some merit in the assertion that discussion of OTHER fundamental operators isn't on-topic, and is thus hijacking the discussion, not everybody is taking it as a just-one-change issue. The thread has, by its existence, raised the question of whether the fundamental operators (~, !, $, %, ^, &, *, -, +, |, \, <, >, ?, :, / assuming that @ needs to be macro and # needs to be comment) should be used in the way they are currently being used.

The specific case of the string concatenation issue is one that arises specifically because it's a use of a fundamental operator in a context that is otherwise undefined, and the question is being raised as to whether it's the appropriate use. I do agree that discussion of *which* unicode operator it should be, if it's not going to be a fundamental operator, isn't appropriate here, but the "put this issue to rest for a while" argument is, quite simply, nonsense. Waiting until the language is more mature is entirely the *wrong* way to view this, as the more developed and the bigger the language support gets, the harder it gets to alter the language. If the discussion is going to happen, it should be happening now. Of course, it's probably best to branch off to a new thread on that front (and hopefully someone who has a solid argument to put forward for changing the string concatenation operator will create such a thread, to get the discussion going properly).


Anyway, in the hopes of getting the actual discussion back on track, I thought I'd make one of my points in a different way:

While I do agree that some nice syntactic sugar for things like x[:y] should exist, and should be generalisable beyond just that use-case, the dollar sign feels like a rather arbitrary choice. I do see that its current use as "bitwise xor" is unusual and possibly unnecessary, but I'm not sure that it's the right operator for this use, either. Among other things, it has no connection with the other current use of $, string interpolation. It also looks somewhat messy in context. Given that the core developers are uncomfortable with operator punning, it seems to me that a different operator would be better-placed.

But all of the single-ASCII operators are currently taken, which leaves three options - displace an existing operator, identify an operator that it's OK to pun (that wouldn't apply to arrays in this way), or go for an operator that isn't single-ASCII (unicode or multiple ASCII). The single-ASCII operators that, to my knowledge, don't apply to arrays in that way are (~, :, and !). Exclamation mark would cause problems because it's a valid character in symbols. Colon would probably cause too much confusion with its use in array construction. So it would have to be ~... but this is already being used as a macro character (apparently for this purpose in the DataFrames package, though); that being said, recovering ~ for other uses would be nice, since in mathematics it's normally used for "similarity" (among other things).

If, in principle, an operator to achieve this effect, but more generalised, is sought, then it leaves multi-ASCII, unicode, or operator displacement. Multi-ASCII will likely defeat the purpose (unless it was something like -|, or something else that looks minimal). Unicode is a natural option, but it would need to be a short input string to obtain it, otherwise again it would defeat the purpose. Perhaps something like \wr - but that's the only one I could find. And that leaves operator displacement.

Which is why I suggest displacing the bitwise operators. They shouldn't be too frequently used in scientific applications, so it's probably safe to move them to multi-ASCII operators, or unicode. Not only does it provide a nice operator for the DataFrames example, but it also frees up a few other operators for more frequent uses.

catc...@bromberger.com

unread,
Sep 28, 2015, 10:17:12 AM9/28/15
to julia-dev


On Monday, September 28, 2015 at 6:11:35 AM UTC-7, Glen O wrote:
Which is why I suggest displacing the bitwise operators. They shouldn't be too frequently used in scientific applications, so it's probably safe to move them to multi-ASCII operators, or unicode. Not only does it provide a nice operator for the DataFrames example, but it also frees up a few other operators for more frequent uses.

Which would be disastrous, as these bitwise operators are more consistent across more languages across more time than any string concatenation operator you could come up with. To the devs: please don't change them.

John Myles White

unread,
Sep 28, 2015, 10:33:10 AM9/28/15
to juli...@googlegroups.com
As I said before, this thread isn't well suited to julia-dev.

Please stop commenting here. You can move the thread to julia-users, although I think it's pretty clear that the topic is toxic and debating it will benefit no one.

 -- John
Reply all
Reply to author
Forward
0 new messages