Implementation of t-SNE

19 views
Skip to first unread message

Oleksandr Zaytsev

unread,
May 15, 2019, 9:13:46 AM5/15/19
to polymath...@googlegroups.com, Serge Stinckwich, Atharva Khare
Hello,

@Atharva Khare is a Google Summer of Code student who will be working on DataFrame this summer. During the community bonding period, he is trying to make some contributions to PolyMath and get to know the Pharo and PolyMath community. 

Atharva wants to implement the t-SNE algorithm. I remember that @Serge Stinckwich was working on t-SNE last year. I also know that implementing t-SNE is not an easy task.

So I have the following questions:
  1. What is the current status of t-SNE implementation?
  2. Do you think this is a doable task for a student for one week?
  3. Do you think we can split this task into multiple smaller subtasks and work on them one at a time? (this way, Atharva could take on a certain part of this project and wouldn't be stuck)
  4. @Atharva Khare what is your experience with t-SNE? Do you know the details of that algorithm, do you think it is doable?
Oleks

Atharva Khare

unread,
May 15, 2019, 11:48:59 AM5/15/19
to Oleksandr Zaytsev, polymath...@googlegroups.com, Serge Stinckwich
I have just applied the algorithm in the past. I do not know its ins and outs completely, but it will be a great learning project for me. I do know basics of Linear Algebra, and can read the literature and try to implement it.
As mentioned in the source code, https://lvdmaaten.github.io/tsne/ has multiple implementations which will make replication and checking easier.

Serge Stinckwich

unread,
May 16, 2019, 3:04:02 AM5/16/19
to Oleksandr Zaytsev, polymath...@googlegroups.com, Atharva Khare
On Wed, May 15, 2019 at 8:13 PM Oleksandr Zaytsev <olk.z...@gmail.com> wrote:
Hello,


Dear Oleks, dear all,
@Atharva Khare is a Google Summer of Code student who will be working on DataFrame this summer. During the community bonding period, he is trying to make some contributions to PolyMath and get to know the Pharo and PolyMath community. 


Maybe Atharva you can join PolyMath mailing list ?
 
Atharva wants to implement the t-SNE algorithm. I remember that @Serge Stinckwich was working on t-SNE last year. I also know that implementing t-SNE is not an easy task.


I start to work on this but the implementation is quite tricky. I spend a lot of time trying to understand the paper and the corresponding Python code.
And usually there is a lot of differences between what they say in the paper and the actual implementation.
One of the first step of implementing t-SNE was to implement PCA in fact :-)
The Python implementation of t-sne was among the easiest one to understand for me (maybe because I know better Python than other languages).

So I have the following questions:
  1. What is the current status of t-SNE implementation?

Only the initialization of the algorithm is done if I remember correctly. Actually, I shouldn't have committed it.
  1. Do you think this is a doable task for a student for one week?

Not sure this is doable. If Atharva is able to do it in one week, this is a really a good student :-)
  1. Do you think we can split this task into multiple smaller subtasks and work on them one at a time? (this way, Atharva could take on a certain part of this project and wouldn't be stuck)

Yes better to split as subtasks. I can try to work on decomposition.
But normally the main GSOC focus of Atharva is not directly related to PolyMath or t-SNE, but more on DataFrame.

Best,
--
Serge Stinckwic
h

Int. Research Unit
 on Modelling/Simulation of Complex Systems (UMMISCO)
Sorbonne University
 (SU)
French National Research Institute for Sustainable Development (IRD)
U
niversity of Yaoundé I, Cameroon
"Programs must be written for people to read, and only incidentally for machines to execute."
https://twitter.com/SergeStinckwich

Atharva Khare

unread,
May 16, 2019, 11:43:52 AM5/16/19
to PolyMath
Hi, I am on this list now! :)

I spent quite some time understanding the working of t-SNE along with the code, and I think it is doable. The tasks according to me are:
1. Proper PCA implementation
2. Hbeta method; which, for a given pairwise distance, calculates Gaussian kernel values
3. Complete the x2p method
4. Apply gradient descent
5. Write examples and remaining tests

I will try to complete 2 and 3 by end of this week and create a PR, to track the progress.

I have a few questions regard 1, I will post them on discord, since it is related to implementation of PCA, and discord has a wider audience.

On Thursday, May 16, 2019 at 12:34:02 PM UTC+5:30, Serge Stinckwich wrote:
On Wed, May 15, 2019 at 8:13 PM Oleksandr Zaytsev <olk.z...@gmail.com> wrote:
Hello,


Dear Oleks, dear all,
...@Atharva Khare is a Google Summer of Code student who will be working on DataFrame this summer. During the community bonding period, he is trying to make some contributions to PolyMath and get to know the Pharo and PolyMath community. 


Maybe Atharva you can join PolyMath mailing list ?
 
Atharva wants to implement the t-SNE algorithm. I remember that ...@Serge Stinckwich was working on t-SNE last year. I also know that implementing t-SNE is not an easy task.

Reply all
Reply to author
Forward
0 new messages