Julia port of L.J.P. van der Maaten and G.E. Hintons T-SNE visualisation technique

598 views
Skip to first unread message

Leif Jonsson

unread,
Nov 22, 2013, 7:27:49 AM11/22/13
to julia...@googlegroups.com
Hi, 

I have ported L.J.P. van der Maaten and G.E. Hinton's python implementation of t-SNE (sometimes tsne) to Julia.
Undoubtedly there are improvements  that can be done and I would be very happy for feedback on this.

It is available here:
https://github.com/lejon/TSne.jl

You can read about the great work on t-SNE here:


Best Regards
-Leif Jonsson

S Wade

unread,
Nov 22, 2013, 7:47:40 AM11/22/13
to julia-users
very cool!  Out of curiosity, do you know how fast it is in comparison to the matlab, native and CUDA implementations?  Recently, I implemented ISOMAP as a benchmark, and the julia version was within 25% of the native implementation.

John Myles White

unread,
Nov 22, 2013, 10:27:01 AM11/22/13
to julia...@googlegroups.com
Awesome. Which license is this under? I found navigating the difference licenses of different t-SNE implementations really difficult.

— John

Leif Jonsson

unread,
Nov 22, 2013, 3:38:14 PM11/22/13
to julia...@googlegroups.com
Hi John, 

Well I'm not sure to be honest! :) I don't know enough nuances of the licensing business to say for sure. The original python t-SNE implementation has the following Copyright information:

# Created by Laurens van der Maaten on 20-12-08.
#  Copyright (c) 2008 Tilburg University. All rights reserved.

I'm not sure what applies for ports to other languages, but I e-mailed Laurens van der Maaten about the release  and asked him to let me know if he thought there was any problem with me porting it, but he did not express any such concerns. He seemed happy about the port and in fact it sounded that he would endorse it through his website on t-SNE. So if it is up to me to set the license on the Julia port (which I have assumed so far) I have set it to the MIT license since that was one available with the Julia project generation framework (and I like it).

So unless someone tells me that I cannot do that, it is MIT. :)

Cheers!
-Leif Jonsson

Leif Jonsson

unread,
Nov 22, 2013, 3:43:25 PM11/22/13
to julia...@googlegroups.com
It's a bit embarrassing really, but I haven't gotten around to benchmarking it yet! :) And I'm also a newbe Julia user so there might be improvements to be made, but I hope to get there soon!

Cheers!
-Leif

Jiahao Chen

unread,
Nov 22, 2013, 4:03:50 PM11/22/13
to julia...@googlegroups.com
Leif, Wade, 

Putting together a dimensionality reduction package has been on my list for awhile. Leif, thanks for doing this! Hope Wade will also contribute his Isomap code somewhere where it can become a package.

Hope you'll consider 

Kevin Squire

unread,
Nov 22, 2013, 4:12:25 PM11/22/13
to julia...@googlegroups.com
On Fri, Nov 22, 2013 at 12:38 PM, Leif Jonsson <leif.j...@gmail.com> wrote:
Well I'm not sure to be honest! :) I don't know enough nuances of the licensing business to say for sure. The original python t-SNE implementation has the following Copyright information:

# Created by Laurens van der Maaten on 20-12-08.
#  Copyright (c) 2008 Tilburg University. All rights reserved.

So, without an explicit license, copyright law applies, which generally means that "copying" (which generally includes porting) is allowed only by permission from the copyright owner. 


 
I'm not sure what applies for ports to other languages, but I e-mailed Laurens van der Maaten about the release  and asked him to let me know if he thought there was any problem with me porting it, but he did not express any such concerns. He seemed happy about the port and in fact it sounded that he would endorse it through his website on t-SNE. So if it is up to me to set the license on the Julia port (which I have assumed so far) I have set it to the MIT license since that was one available with the Julia project generation framework (and I like it).

So, it sounds like you have the author's permission. ;-)  You might double-check that he's okay with the MIT license, but I'll leave that to your discretion--from the author's response, it sounds like he's not planning to come after you for violating his copyright.

And thanks for creating the package!  It looks quite useful.

Cheers!

Kevin


John Myles White

unread,
Nov 23, 2013, 1:10:04 PM11/23/13
to julia...@googlegroups.com
Following up on this: I looked at the license of the Dimensionality Reduction toolbox with which t-SNE is associated. The license makes any open source distribution of derived code illegal, because it contains an explicit anti-commerical use clause: "You are free to use, modify, or redistribute this code in any way you want for non-commercial purposes. If you do so, I would appreciate it if you refer to the original author or refer to one of the papers mentioned above."

If you are friends with van der Maaten, it would be good to ask him to consider making his software open source by using a proper license. Not even the GPL, a very prohibitive license, includes an anti-commerical clause. These clauses creep up from time to time in academic licensing, but they should be an immediate red flag to you as they are indications that the author may be looking to cash out on their code at some point, either through creating a proprietary product or through copyright litigation. It is possible van der Maaten does not have this chilling effect in mind, but it is extremely problematic for Julia to be even faintly associated with potential copyright violations. Unless van der Maaten gives you an explicit open source license, I would take your code offline and stop sharing it with anyone.

— John

Stefan Karpinski

unread,
Nov 23, 2013, 1:16:50 PM11/23/13
to Julia Users
"Whackadoodle" licenses are a scourge. People really need to stop making up nonsense one-off licenses like this and stick with using well-known licenses.

Steven G. Johnson

unread,
Nov 23, 2013, 2:47:48 PM11/23/13
to julia...@googlegroups.com


On Friday, November 22, 2013 3:38:14 PM UTC-5, Leif Jonsson wrote:
I'm not sure what applies for ports to other languages, but I e-mailed Laurens van der Maaten about the release  and asked him to let me know if he thought there was any problem with me porting it, but he did not express any such concerns. He seemed happy about the port and in fact it sounded that he would endorse it through his website on t-SNE. So if it is up to me to set the license on the Julia port (which I have assumed so far) I have set it to the MIT license since that was one available with the Julia project generation framework (and I like it).


No, it is not (solely) up to you to set the license on a Julia port.  A port is a derived work of the original code, which means that the copyright is jointly held by you *and* by the authors of the original work.  That means that the permission of the original authors is required to redistribute (or modify) etcetera you port, and they effectively have veto power over the license.

The "all rights reserved" in the original code means that distributing your port at all is technically illegal.  Fortunately, it sounds like this was not the intent of the authors, so you may be able to get their permission to distribute your port under an open-source/free-software license.   However, you CANNOT assume this; for example, it is not at all clear from your description that the authors would permit, for example, usage of their code (or works derived therefrom) in proprietary commercial software, which *is* allowed by the MIT license.

It's really unfortunate that many programmers don't understand copyright law or the need for licensing; the lack of clear licenses has historically caused all sorts of difficulties down the road for free/open-source software.   I would email the original authors for explicit legal permission, e.g.

      Because my port is derived from your code, I need your legal permission to distribute my port as free/open-source software. To be free/open-source software, I need to attach a specific license to my code (see, for example http://choosealicense.com/) explaining the legal permissions to copy, modify, and redistribute it.   Do I have your permission to distribute my port under the "MIT license", a simple permissive free/open-source license (http://opensource.org/licenses/MIT) whose terms are attached below?   Thanks so much for your help with this.

       PS. I hope you will consider attaching a standard free/open-source license to your original code as well, but this is of course up to you.

Assuming you get this permission, your LICENSE file should list both you *and* the original T-SNE authors as copyright holders.

If you don't get permission from the authors to distribute the code under this or some other free/open-source software, I'm afraid your code will have to be pitched in the trash as far as the free/open-source world is concerned (or anyone concerned about the legality of their code, for that matter).  That's why it's usually a good idea to be sure of the copyright permissions *before* porting.   (In the absence of free/open-source code, your alternative would be to look at their papers for the English description of the mathematical algorithm and then to re-implement it from scratch, without looking at their code.)

--SGJ

Billou Bielour

unread,
Nov 25, 2013, 5:15:54 AM11/25/13
to julia...@googlegroups.com
Just so I know, if you redo an implementation from scratch, can you then attach any license you like ? Or is it a gray area ?

Tim Holy

unread,
Nov 25, 2013, 6:21:32 AM11/25/13
to julia...@googlegroups.com
If you do a "clean-room" implementation from just reading the paper, then you
can give your code any license you want. However, if you've already looked
carefully (e.g., by translating from one language to another) at source code
with a different license, I'm not sure that really counts as "clean-room."

--Tim

Chris Foster

unread,
Nov 25, 2013, 7:41:23 AM11/25/13
to julia...@googlegroups.com
There seems to be a huge legal grey area around what constitutes a
derivative work, as much as the authors of licenses like the GPL would
like people to believe their particular interpretation. For a
somewhat more in-depth look at the issues, check out the article here:
http://www.law.washington.edu/lta/swp/law/derivative.html

To quote without much context: "[...] framing the question in this
manner has been unfortunate, because it focuses the inquiry upon the
mechanism of inter-module communication rather than on the more
metaphysical - and legally significant - inquiry into whether one
module is in fact a derivative of the other. And as we have seen
above, courts answer this question not by reference to the technology
underlying the work but by reference to qualities such as
incorporation and substantial similarity, tempered by subject matter
limitations, fair use defenses, and public policy rationales."

US copyright law makes it clear that a translation is a derivative
work, so the status of a t-SNE port isn't in question (the answer may
be different in another jurisdiction!). However, if you read a piece
of code, and then implement something similar a day later without
further reference to the original, is it a derivative work? How about
a month or a year later? The answer seems rather simple in either
limit, but there seems to be plenty of room for legal grey area in the
middle.

~Chris

John Myles White

unread,
Nov 25, 2013, 9:45:07 AM11/25/13
to julia...@googlegroups.com
If there’s one life lesson I’ve gained from being raised by an IP lawyer, it’s that grey area might as well mean illegal. We simply don’t have enough resources to get involved in litigation, even if we might win in the long run.

— John

John Myles White

unread,
Nov 25, 2013, 9:45:07 AM11/25/13
to julia...@googlegroups.com
If there’s one life lesson I’ve gained from being raised by an IP lawyer, it’s that grey area might as well mean illegal. We simply don’t have enough resources to get involved in litigation, even if we might win in the long run.

— John

On Nov 25, 2013, at 4:41 AM, Chris Foster <chri...@gmail.com> wrote:

Chris Foster

unread,
Nov 26, 2013, 7:44:45 AM11/26/13
to julia...@googlegroups.com
To be clear, I'm not arguing for playing fast and loose with
licensing, I certainly think people should communicate clearly and
explicitly with the original authors in this case.

What bothers me about trying to put a blanket ban on grey areas, is
that every piece of code I've ever read will no doubt influence the
code I write tomorrow in some small way. So I accept that shades of
grey are inevitable, and try hard to keep to the light side of them
;-)

Regarding lawsuits, as an open source developer I just have to hope
that none of them come my way. The occasional unlucky project will
continue to suffer, even if they're legally (and morally) in the right
- consider the JMRI case for example - http://jmri.org/k/summary.shtml

~Chris

Robert Feldt

unread,
Nov 26, 2013, 10:04:57 AM11/26/13
to julia...@googlegroups.com
This is the approach I'm taking in BlackBoxOptim.jl. Either its MIT/BSD licensed or I only implement from the papers. I have a very hard time seeing that this would not cound as clean-room even though the final code might share similarities simply by being based on the same pseudo-code or description in the paper. If anyone has other experience with this I'd appreciate if you share.

Thanks,

Robert Feldt

John Myles White

unread,
Nov 26, 2013, 10:48:02 AM11/26/13
to julia...@googlegroups.com
We should get input from Steven G. Johnson, but my understanding is that implementations based only on verbal descriptions and pseudocode in papers are safe. Unfortunately, I’ve seen papers use Matlab or Python as “pseudocode”. That is clearly not safe.

— John

Tim Holy

unread,
Nov 26, 2013, 11:01:16 AM11/26/13
to julia...@googlegroups.com
On Tuesday, November 26, 2013 07:48:02 AM John Myles White wrote:
> We should get input from Steven G. Johnson, but my understanding is that
> implementations based only on verbal descriptions and pseudocode in papers
> are safe. Unfortunately, I’ve seen papers use Matlab or Python as
> “pseudocode”. That is clearly not safe.

Is the latter really true? The choice of language to write the paper in
(English, German, Python, ...) should not be factor in the licensing or
copyright status. I suspect there's a strong argument to be made that any code
appearing in the paper should be considered "prose" rather than a source file.

That said, I largely agree with your caution about gray areas.

--Tim

Avik Sengupta

unread,
Nov 26, 2013, 11:19:40 AM11/26/13
to julia...@googlegroups.com
> Is the latter really true? The choice of language to write the paper in 
(English, German, Python, ...) should not be factor in the licensing or 
copyright status. 

I suppose the only non-lawyerly answer possible is: it depends! I imagine it'd boil down to how much of the expression (as opposed to the idea) of the original is present in your fork. 

Steven G. Johnson

unread,
Nov 26, 2013, 11:39:07 AM11/26/13
to julia...@googlegroups.com


On Tuesday, November 26, 2013 10:48:02 AM UTC-5, John Myles White wrote:
We should get input from Steven G. Johnson, but my understanding is that implementations based only on verbal descriptions and pseudocode in papers are safe. Unfortunately, I’ve seen papers use Matlab or Python as “pseudocode”. That is clearly not safe.

Although I do my best to keep up on these matters, I'm not a lawyer!

According to the copyright office (http://www.copyright.gov/register/tx-programs.html):

     "Copyright protection extends to all the copyrightable expression embodied in the computer program. Copyright protection is not available for ideas, program logic, algorithms, systems, methods, concepts, or layouts." (emphasis added)

However, the difficulty lies in distinguishing between "expressive" elements in computer code (or pseudocode) and uncopyrightable mathematical algorithms.   Ultimately, it is up to a judge, and there are no easy answers short of going to court.

What a court may try to do is to distinguish elements that are "necessary" to execute the algorithm from elements that are incidental and therefore may be copyrightable creative expression.   One common approach is apparently: https://en.wikipedia.org/wiki/Abstraction-Filtration-Comparison_test

Anyway, there are no guarantees even with implementing pseudocode.  At least try to understand the essential mathematical steps that are being performed and implement them in your "own words", if there is more than one way; some point you have to accept the legal uncertainty or you can never implement any algorithm from any description.  But simply translating syntax from one computer language to another, leaving all other elements in place, seems to run a high risk of copyright infringement.

--SGJ

Robert Feldt

unread,
Nov 26, 2013, 11:58:20 AM11/26/13
to julia...@googlegroups.com
What I have found is that Julia is such a natural way to write linear algebra and mathematics that an implementation tends to be very close to the pseudo-code in a paper. I would assume something similar is true if the original author implemented the same pseduo-code in say Matlab. Thus, imho, there is often very little difference between the (mathematical/statistical) idea and its expression. Seems to complicate matters... ;)

Cheers, Robert
--
Best regards,

/Robert Feldt
--
Tech. Dr. (PhD), Professor of Software Engineering
Blekinge Institute of Technology, Software Engineering Research Lab, and
Chalmers, Software Engineering Dept
robert.feldt (a) bth.se    or    robert.feldt (a) chalmers.se    or    robert.feldt (a) gmail.com
Mobile phone: +46 (0) 733 580 580
http://www.robertfeldt.net

Leif Jonsson

unread,
Nov 26, 2013, 3:45:57 PM11/26/13
to julia...@googlegroups.com
Thanks for bringing this to my attention and for your clear description and suggestions for remedy!

I am happy to announce that I have discussed this with the very helpful and cooperative Laurens van der Maaten and he has granted me permission to publish the Julia port of t-SNE under the condition that we use a BSD license, so the license is now updated to a BSD license with both of our names as copyright holders.

Best Regards
-Leif Jonsson

Leif Jonsson

unread,
Nov 26, 2013, 3:47:34 PM11/26/13
to julia...@googlegroups.com
This was intended as a reply to Steven G. Johnson

Best
-Leif

John Myles White

unread,
Nov 26, 2013, 3:54:26 PM11/26/13
to julia...@googlegroups.com
Great news.

-- John

Stefan Karpinski

unread,
Nov 26, 2013, 3:59:22 PM11/26/13
to Julia Users
Excellent! Thanks for looking into that. BSD is a lovely license.

Chris Foster

unread,
Nov 27, 2013, 7:06:29 AM11/27/13
to julia...@googlegroups.com
On Wed, Nov 27, 2013 at 2:39 AM, Steven G. Johnson
<steve...@gmail.com> wrote:
> Anyway, there are no guarantees even with implementing pseudocode. At least
> try to understand the essential mathematical steps that are being performed
> and implement them in your "own words", if there is more than one way; some
> point you have to accept the legal uncertainty or you can never implement
> any algorithm from any description.

Thanks for putting it so clearly and informatively Steven - this is
what I was trying to get at.

~Chris
Reply all
Reply to author
Forward
0 new messages