JFXBerns
unread,Jan 1, 2012, 9:55:47 PM1/1/12Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to BigDataSG
Nice writeup in the PDF!
I thing your assumptions are good.
It you have the Mac Minis laying about and want to build a testbed--
then install CantOS and network them with gigabit Ethernet. Cloudera
is the standard distribution for Hadoop and they seem to be staying
ahead of the game with features and related apps (Sqoop, Flume, HUE,
Beeswax, etc.) and Cloudera, as you point out, loves CentOS.
Disks are usually the limiting factor; if you have big data and low
disk bandwidth--you have a bottleneck. If this is a testbed, just use
the internal HD's for now.
Hadoop does a good job of knowing how to process data where it
resides, but.... there area times when the data needs to move to a
different node to be processed. Wifi on Mac Mini's might have a
theoretical limit of 300Mbps--but Wifi rarely comes close to it's
theoretical limit; wired is always (except for extreme edge cases)
going to be far faster and more stable. Buy the switch.
If you are going ot build a bigger "production" cluster, go with
generic intel based hardware in a rack mount; that way you can tweak
the hardware performance to suit your applications better (RAID for
HD's, multiple NICs, etc.). Mac is Apple and there is only "One Apple
Way" when it comes to hardware and tweaking hardware on an Apple
is.... well, it just isn't.