Bunch of Mac minis as Hadoop Cluster?

377 views
Skip to first unread message

Abhishek Parolkar

unread,
Dec 31, 2011, 12:11:16 AM12/31/11
to bigd...@googlegroups.com, abhi...@viki.com
Hi All
   At work, I have been using Hadoop Cluster over Large EC2 Instances and reaching a point where I need to move the cluster to bare metal hardware. I could host it with dedicated hosting provider but would like to host it myself to save cost and ability to turn it off when I dont need it. Hosting 1U-rack servers in-house brings the issues of power + cooling. Mac minis are pretty energy efficient + makes no noise at all. Support for thunderbolt makes it more interesting.

Have you seen anybody around using Mac Minis for compute cluster? I am looking for opinions and experiences of using mac mini for such purpose.

I have created a 1-page document [1] (Attached)  with references to point out my reasoning in detail.

I am waiting to hear thoughts from the community :-)

Regards,
Abhishek Parolkar
http://parolkar.com


REMINDER: Did you signup for the first meetup? http://www.facebook.com/events/277239498992690/


[1] http://goo.gl/XyjoF
MacMiniHadoopCluster.pdf

JFXBerns

unread,
Jan 1, 2012, 9:55:47 PM1/1/12
to BigDataSG
Nice writeup in the PDF!

I thing your assumptions are good.

It you have the Mac Minis laying about and want to build a testbed--
then install CantOS and network them with gigabit Ethernet. Cloudera
is the standard distribution for Hadoop and they seem to be staying
ahead of the game with features and related apps (Sqoop, Flume, HUE,
Beeswax, etc.) and Cloudera, as you point out, loves CentOS.

Disks are usually the limiting factor; if you have big data and low
disk bandwidth--you have a bottleneck. If this is a testbed, just use
the internal HD's for now.

Hadoop does a good job of knowing how to process data where it
resides, but.... there area times when the data needs to move to a
different node to be processed. Wifi on Mac Mini's might have a
theoretical limit of 300Mbps--but Wifi rarely comes close to it's
theoretical limit; wired is always (except for extreme edge cases)
going to be far faster and more stable. Buy the switch.

If you are going ot build a bigger "production" cluster, go with
generic intel based hardware in a rack mount; that way you can tweak
the hardware performance to suit your applications better (RAID for
HD's, multiple NICs, etc.). Mac is Apple and there is only "One Apple
Way" when it comes to hardware and tweaking hardware on an Apple
is.... well, it just isn't.

T. H. Chiang

unread,
Mar 8, 2012, 7:37:42 PM3/8/12
to bigd...@googlegroups.com, abhi...@viki.com, abhishek...@gmail.com
too expensive to scale
Reply all
Reply to author
Forward
0 new messages