כמה רשמים על useR!2016

瀏覽次數：34 次

跳到第一則未讀訊息

Tal Galili

未讀,

2016年7月17日凌晨1:00:462016/7/17

收件者：israel-r-...@googlegroups.com

שלום כולם,

חזרתי לארץ לפני כמה ימים ואני עדיין מתאושש מג'ט לג אימתני.

לפני שבועיים השתתפתי בכנס useR!2016, וחשבתי שיעניין אתכם לקרוא קצת רשמים.

אני מפרט מזיכרון חלקי בלבד (כך שייתכן שבצפייה חוזרת בהרצאות נגלה שהיו אי דיוקים בתיאור שלי), אז אני מתנצל על כך מראש.

לשמחתנו הרבה הכנס השנה הוקלט (כמעט) במלואו, ואתם תוכלו לשבת ולצפות בשעות של הרצאות מהכנס (שזה ממש מגניב לדעתי):

https://channel9.msdn.com/Events/useR-international-R-User-conference/useR2016

השנה הייתה הפעם השביעית שלי ברצף שהלכתי לכנס הזה. השנה מדובר היה בכנס הגדול ביותר שהיה אי פעם (נטען שהגיעו קרוב ל- 900 משתתפים). מעבר להרצאות המעניינות, האינטראקציה עם אינספור משתמשי R הייתה מעוררת השראה עבורי. זה ללא ספק, עבורי, הכנס הכי מעניין שיש בעולם.

שנה הבאה הכנס יתקיים בבלגיה (בסביבות תחילת יולי אני חושב), ואני ממליץ לכם בחום רב לעשות מאמץ ולנסות לטוס לשם. עוד אין אתר אינטרנט באוויר (שאני מצאתי).

בברכה,

טל

==============

This is the first year that useR had (almost) all of its talk recorded by video. They can be seen here:

https://channel9.msdn.com/Events/useR-international-R-User-conference/useR2016

There are MANY interesting talks to watch there. I am not going to talk about the invited speakers, since all of their talks were worth while. I also attended fewer talks than I had intended to (both because they have been recorded, but mainly because of the interesting talks I had between sessions - that got carried out).

The schedule is listed here:

http://schedule.user2016.org/

The links I provide are for the abstracts of the schedule, but you can search the video site to find the talks.

Day 1

The first day was about workshops (these wereת, sadly, not recorded).

In the morning I partially went to

Machine Learning Algorithmic Deep Dive (Part 1)

Which was less deep than I had hoped. But the repository of the tutorial is very detailed and holds interesting references for common "machine learning" algorithms:

https://github.com/ledell/useR-machine-learning-tutorial

I also briefly went to

Never Tell Me the Odds! Machine Learning with Class Imbalances

where the big take home message is that for class imbalance we may want to use alternative measures than miss-classification rate (things like AUC, sensitivity, specificity, or Kappa). The problem is that some models don't allow us to build them based on these measures directly (such as CART/rpart), but alternative tuning parameters can be used (e.g. class weights) in order to search for alternative models.

I think he has more on this in his book "Applied Predictive Modeling" (but I'm not sure)

In the second part I went to

Regression Modeling Strategies and the rms package by Frank Harrell, who talked in details about the idea of using smoothing splines using the rms package. The idea there is to have flexible models to deal with various non-linear relationships where polynomials may not be flexible enough. One important note is that since model interpretation on the equation level is hard, using various plots is essential (at least for simple enough data sets).

More details are available in his book (regression modelling strategies).

The R consortium was mentioned many times during the conference as a new place for companies to donate money to R projects. They are currently funding several projects but nothing substantial came out of it yet (maybe within the next year - since they are funding some cool projects).

Day 2

http://schedule.user2016.org/event/7BXd/r-in-machine-learning-competitions

Kaggle seems to have some nice datasets. They are also working on encouraging sharing of data analysis. This is interesting.

It appears xgboost is getting good results (more than random forests). And that python is growing fast in popularity (due to one machine learning algorithm, I think it was deep learning, but I'm not sure).

A lot of people are thinking about how to teach R

http://schedule.user2016.org/event/7BXn/continuous-integration-and-teaching-statistical-computing-with-r

http://schedule.user2016.org/event/7BXo/integrated-r-labs-for-high-school-students

http://schedule.user2016.org/event/7BZM/a-first-year-undergraduate-data-science-course

http://schedule.user2016.org/event/7BRY/teaching-r-to-200-people-in-a-week

And also on how to teach statistics using shiny to non-R people:

http://schedule.user2016.org/event/7BYq/introducing-statistics-with-intro

The broom package (for getting data to a tidy format for piping to other functions, such as ggplot2 functions) is emerging as a very powerful tool, gaining more and more support from the community:

http://schedule.user2016.org/event/7BZJ/broom-converting-statistical-models-to-tidy-data-frames

Google is working on the next generation of R (called Rho):

http://schedule.user2016.org/event/7BZh/rho-high-performance-r

this work is still preliminary but very interesting.

There is some gap in the literature about color schemes. A recent cognitive research made in interesting distinction between variance and bias in color-value recall:

http://schedule.user2016.org/event/7BRM/colour-schemes-in-data-visualisation-bias-and-precision

The big take home message for me was that many colors (such as in rainbow) reduce variance. But the lack of perceptual uniformity results in bias (again, such as in rainbow). Sadly, viridis was not checked in this experiment.

Day 3

There is now a new ipython notebook alternative within Rstudio (!)

http://schedule.user2016.org/event/7BXl/notebooks-with-r-markdown

it is going to be very interesting to see the impact of this on peoples workflow with using R.

A nice package for helping with randomization tests:

http://schedule.user2016.org/event/7BYt/permuter-an-r-package-for-randomization-inference

My heatmap overview talk went well

http://schedule.user2016.org/event/7BRH/heatmaps-in-r-overview-and-best-practices

A very nice work on a package for simulations with R:

http://schedule.user2016.org/event/7BZb/the-simulator-an-engine-for-streamlining-simulations

Domino lab is offering a github-like system for data analysis, with as much transparency as they could think of. Interesting product:

http://schedule.user2016.org/event/7BRG/providing-digital-provenance-from-modeling-through-production

Day 4

Making books with R and R markdown is now easier:

http://schedule.user2016.org/event/7BXq/authoring-books-with-r-markdown

There are now more specialized ranfom forest packages for very wide or very long datasets, ranger is one of them:

http://schedule.user2016.org/event/7BYl/ranger-a-fast-implementation-of-random-forests-for-high-dimensional-data

DataSHIELD is an interesting (R based) project for distributed privacy preserving data analysis framework:

http://schedule.user2016.org/event/7BYz/datashield-taking-the-analysis-to-the-data

Torsten is working on a more generalized way of describing various regression models:

http://schedule.user2016.org/event/7BZ6/most-likely-transformations

On this day I chaired a lightning talk session, so I would recommend all of these talks as well :D

Jonathan Rosenblatt

未讀,

2016年7月17日凌晨4:02:012016/7/17

收件者：israel-r-user-group

תודה על העדכון!

--
You received this message because you are subscribed to the Google Groups "Israel R User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to israel-r-user-g...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jonathan Rosenblatt

Dept. of Industrial Engineering and Management

Ben Gurion University of the Negev

www.john-ros.com

回覆所有人

回覆作者

轉寄

0 則新訊息