Austin ACM SIGKDD meeting Wednesday, September 19, 2012, 6:30 - 9:00 PM

7 views
Skip to first unread message

David G. Boney

unread,
Sep 17, 2012, 12:48:05 AM9/17/12
to aust...@googlegroups.com, semantic...@googlegroups.com
The next meeting of Austin ACM SIGKDD will be Wednesday, September 19, 2012, from 6:30 to 9:00 pm at the Northwest Recreation Center.

Our meeting dates for the Fall are:

Wednesday, September 19, 2012, 6:30 - 9:00 PM
Wednesday, October, 17, 2012, 6:30 - 9:00 PM
Wednesday, November, 21, 2012, 6:30 - 9:00 PM
Wednesday, December, 19, 2012, 6:30 - 9:00 PM
------------------------------------

The topic of this meeting will be the random forest implementation in Mahout. Random forest is an ensemble based method for decision trees. This meeting will focus on the basics of decision trees based on material from "Classification and Regression Trees" by Breiman, Friedman, Stone, and Olshen. We will be working toward the random forest implementation in Mahout.


________________

There is a new book on machine learning that was published in August, "Machine Learning: A Probabilistic Perspective", by Kevin Murphy. It covers many contemporary issues in probabilistic based machine learning. It also comes with an implementation of the algorithms in MATLAB. If anyone wants to start a book club and work through the book and software, let me know. It is about a 1000 pages and 28 chapters. It will probably take a year to work through if we meet weekly.

----------------------------
I am starting an open source project to implement MDX and a data cube on Hadoop. Data cubes are a type of data warehousing technology that allow multi-dimension modeling of data. MDX is a SQL like query language, and the industry standard query language for data cubes. The architecture will be similar to HIVE, in that there will be a MDX server to accept queries and it will process the queries as Hadoop jobs over data stored in HDFS or key-values stores like HBase or Cassandra. 

My interest, in addition to building an aggregation layer which is unique for data cubes, is to build in analysis tools for linear statistical modeling, machine learning, and data mining. I am currently working on some research at Texas A & M on issues about implementing linear statistical modeling in Hadoop for data cubes.

The project has a name, Hadoop Cube, and I have the domain name, hadoopcube.com. My goal for the Fall is to get the project accepted as an Apache incubator project. I would like to find a couple of more people who would be interested in working on the project and adding their names to the application to Apache. I am in the architecting phase and no code has been written. The project management will be based on the Apache model. Your contribution will be measured by the code you contribute.

If you are interested in working on the project, we can talk after the Wednesday Austin ACM SIGKDD meeting.

For those of you who work in data warehousing, you know this will be the next big thing in the Hadoop ecosystem. For those of you who are not familiar with data warehousing, below are some links:

----------------------------
Please bring your laptop and a power strip.

Please join Austin ACM SIGKDD to continue receiving notices of our activities.

Northwest Recreation Center
2913 Northland Dr. (2222 and Mopac)
Austin, TX 78757

Please note I have a new mailing address, ch...@austin-acm-sigkdd.org
-----------------

Reply all
Reply to author
Forward
0 new messages