Spring Events

9 views
Skip to first unread message

Tricia Hoffman

unread,
Jan 25, 2012, 1:53:49 PM1/25/12
to machine-lea...@googlegroups.com

The next session of the survey course is starting:

2612 Introduction to Machine Learning and Data Mining

Sign up here:   Machine Learning / Data Mining Survey Course 


Dates: Jan 31, 2012 - Apr 3, 2012  6:30 - 9:30 PM  First nine sessions on Tuesdays, while the tenth session will meet on a Monday

Location:  2505 Augustine Drive
                     Santa Clara, CA

 





The Spring schedule for the Association for Computing Machinery Data Mining SIG is detailed below. 

This group meets on the fourth Monday of each month at LinkedIn:

 

LinkedIn

2025 Stierlin Ct.

Mountain View, CA 94043

 

Feb 27

               Ron Bekkerman, LinkedIn

March 26

               Michael Mahoney, Stanford

April 23

               Lionel Jouffie,  Bayesia Labs

May  28     

              Giovanni Seni, Intuit

 

 

 

Feb:

 

Title:

Scaling Up Machine Learning: Parallel and Distributed Approaches

Abstract:

In this talk, I'll provide an extensive introduction to parallel and distributed machine learning. I'll answer the questions "How actually big is the big data?", "How much training data is enough?", "What do we do if we don't have enough training data?", "What are platform choices for parallel learning?" etc. Over an example of k-means clustering, I'll discuss pros and cons of machine learning in Pig, MPI, DryadLINQ, and CUDA. Time permitting, I'll take a deep dive into parallel information-theoretic clustering. 

Bio: 

Ron Bekkerman is a senior research scientist at LinkedIn where he develops machine learning and data mining algorithms to enhance LinkedIn products. Prior to LinkedIn, he was a researcher at HP Labs. Ron completed his PhD in Computer Science at the University of Massachusetts Amherst in 2007. He holds BSc and MSc degrees from the Technion---Israel Institute of Technology. Ron has published on various aspects of clustering, including multimodal clustering, semi-supervised clustering, interactive clustering, consensus clustering, one-class clustering, and clustering parallelization.

 

 

 

March

 

Randomized algorithms for matrices and data

Michael W. Mahoney

Randomized algorithms for very large matrix problems (such as matrix
multiplication, least-squares regression, the Singular Value
Decomposition, etc.) have received a great deal of attention in recent
years.  Much of this work was motivated by problems in large-scale
data analysis; this approach provides a novel paradigm and
complementary perspective to traditional numerical linear algebra
approaches to matrix computations; and the success of this line of
work opens the possibility of performing matrix-based computations
with truly massive data sets.  Originating within theoretical computer
science, this work was subsequently extended and applied in important
ways by researchers from numerical linear algebra, statistics, applied
mathematics, data analysis, and machine learning, as well as domain
scientists.

In this talk, we will provide an overview of this approach, with an
emphasis on a few simple core ideas that underlie not only recent
theoretical advances but also the usefulness of these tools in
large-scale data analysis applications.  Crucial in this context is
the connection with the concept of statistical leverage.
Historically, this notion, and in particular the diagonal elements of
the so-called hat matrix, has been used in regression diagnostics to
identify errors and outliers.  Recently, however, the connection with
statistical leverage has proved crucial in the development of improved
matrix algorithms that come with worst-case guarantees, that are
amenable to high-quality numerical implementation, and that are also
useful to domain scientists.  These developments, how to approximate
very precisely the statistical leverage scores in time qualitatively
faster than the usual naive method, and an example of how these ideas
can be applied in large-scale distributed and parallel computational
environments will all be described.

BIO:
Michael Mahoney is at Stanford University.  His research interests
center around algorithms for very large-scale statistical data
analysis, including both theoretical and applied aspects of problems
in scientific and Internet domains.  His current resear ch interests
include geometric network analysis; developing approximate computation
and regularization methods for large informatics graphs; and
applications to community detection, clustering, and information
dynamics in large social and information networks.  He has also worked
on randomized matrix algorithms and their applications to genetics,
medical imaging, and Internet problems.  He has been a faculty member
at Yale University and a researcher at Yahoo, and his PhD was is
computational statistical mechanics at Yale University.

 

 

 

April:

 

Title: Introduction to Bayesian Belief Networks and their Applications

Presenter: Dr. Lionel Jouffe, cofounder and CEO of France-based Bayesia S.A.S.
Lionel Jouffe received the Ph.D. degree in Computer Science from the Université of Rennes I, Rennes, France, in 1997. After one year dedicated to the industrialization of the results of his Ph.D. research (Fuzzy Inference System learning by Reinforcement methods – automatic pig house atmosphere controller), he received the Inov’Space Award and the medal of the town of Rennes.
He joined the ESIEA as a Professor/Researcher in 1998 and began his research on Bayesian network learning from data. 
Lionel then co-founded Bayesia in 2001, a company specialized in Bayesian networks technology. He and his team have been developing BayesiaLab since 1999 and it has emerged as the leading software package for knowledge discovery, data mining and knowledge modeling using Bayesian networks. BayesiaLab enjoys broad acceptance in academic communities as well as in business and industry. The relevance of Bayesian networks, especially in the context of consumer research, is highlighted by Bayesia’s strategic partnership with Procter & Gamble, who has deployed BayesiaLab globally since 2007

Abstract: Bayesian Belief networks have emerged as a new form of probabilistic knowledge representation and probabilistic inference engine through the seminal works of UCLA Professor Judea Pearl. Over the last 25 years the properties of Bayesian networks have been fully validated in the world of academia and they are now becoming powerful and practical tools for “deep understanding” of very complex, high-dimensional problem domains. Their computational efficiency and inherently visual structure make Bayesian Belief networks very attractive for Expert Knowledge Modeling, Data mining, and Causal Analysis.

This tutorial will provide an introduction to the wide-ranging applications of Bayesian Belief networks. Participants do not need to have any prior familiarity with Bayesian Belief networks. We will start the seminar by illustrating the conceptual foundations using several textbook examples. This will include an overview of unsupervised learning (knowledge discovery), supervised learning (dependent variable characterization), data clustering (segmentation), variable clustering (to find hidden concepts), and Probabilistic Structural Equation Models (mainly applied for drivers analysis).

Bayesia will provide all participants with an unrestricted 30-day license of BayesiaLab 5.0 Professional Edition, so they can participate in exercises on their own laptops.

 

 

 

 

 

May:

 

 

Title:  "Advances in Regularization: Bridge Regression and Coordinate Descent Algorithms." 


"
A widely held principle in Statistical model inference is that accuracy and simplicity are both desirable. But there is a tradeoff between the two: a flexible (more complex) model is often needed to achieve higher accuracy, but it is more susceptible to overfitting and less likely to generalize well. Regularization techniques “damp down” the flexibility of a model fitting procedure by augmenting the error function with a term that penalizes model complexity. Minimizing the augmented error criterion requires a certain increase in accuracy to "pay" for the increase in model complexity (e.g., adding another term to the model). This talk offers a concise introduction to this topic and a review of recent developments leading to very fast algorithms for parameter estimation with various types of penalties. It concludes with an example in R, showing an application of the techniques to a document classification task with 1-Million predictors."

 

Bio:

Giovanni Seni is currently a Senior Data Scientist with Intuit. As an active data mining practitioner in Silicon Valley, he has over 15 years R&D experience in statistical pattern recognition, data mining, and human-computer interaction applications. He has been a member of the technical staff at large technology companies, and a contributor at smaller organizations. He holds five US patents and has published over twenty conference and journal articles. His book with John Elder, "Ensemble Methods in Data Mining - Improving accuracy through combining predictions", was published in February 2010 by Morgan & Claypool.  Giovanni is also an adjunct faculty at the Computer Engineering Department of Santa Clara University, where he teaches an Introduction to Pattern Recognition and Data Mining class. 

-- 
Patricia Hoffman PhD

 

 



--
Patricia Hoffman PhD


ACMDataMiningSIGUsamaFayyadJan2012.pdf
Reply all
Reply to author
Forward
0 new messages