http://spectregroup.wordpress.com/2009/10/30/make-your-own-supercompute/
PLAYSTATION 3 MODIFICATION TUTORIAL
http://www.ps3cluster.umassd.edu/
http://www.xbox360forum.com/forum/chit-chat/87640-scientists-use-ps3s-create-supercomputer.html
Computer hobbyists and researchers take note: two U.S. scientists have
created a step-by-step guide on how to build a supercomputer using
multiple PlayStation 3 video-game consoles. The instructional guide,
posted this week online at
ps3cluster.org, allows users with some
programming knowledge to install a version of the open-source
operating system Linux on the video consoles and connect a number of
consoles into a computing cluster or grid. The two researchers say the
guide could provide scientists with another, cheaper alternative to
renting time on supercomputers to run their simulations.
University of Massachusetts Dartmouth physics professor Gaurav Khanna
first built the cluster a year ago to run his simulations estimating
the gravitational waves produced when two black holes merged.
Frustrated with the cost of renting time on supercomputers, which he
said can cost as much as $5,000 to run a 5,000-hour simulation, Khanna
decided to set up his own computer cluster using PS3s, which had both
a powerful processor developed by Sony, IBM and Toshiba, but also an
open platform that allows different system software to run on it.
PlayStation 3 systems retail for about $400 Cdn. On the how-to-guide
Khanna says the eight-console cluster is roughly comparable in speed
to a 200 node IBM Blue Gene supercomputer. Khanna says his research
now runs using a cluster of 16 PS3s. The fastest supercomputer in the
world, IBM's Roadrunner supercomputer at Los Alamos National
Laboratory, has 3,250 nodes and is capable of 1.105 petaflops, or
1.105 quadrillion floating point operations per second, about 100,000
times faster than a home computer.
Massachusetts Dartmouth computer scientist Chris Poulin, who co-wrote
the instructional manual with Khanna, wouldn't reveal the number of
flops the system can achieve, but said anecdotally the cluster has
allowed him to run simulations in hours that used to take days on a
powerful server computer. Khanna's not the first researcher to use
PS3s to simulate the effects of a supercomputer. The University of
Stanford's Folding at Home project allows people to help with research
into how proteins self-assemble — or fold — by downloading software
onto their home PS3s, creating a virtual supercomputer. Their research
is currently targeting proteins relevant to diseases such as
Alzheimer's and Huntington's disease. But the guide posted by Khanna
and Poulin is the first that might allow someone to set up a
supercomputer in their own home.
Poulin said there are two major practical issues, however, that might
limit the practicality of a PS3 cluster supercomputer. The first issue
is power. He said the video-game consoles use about 200 to 300 watts
per unit, so finding a room that could hook up eight of the consoles
might be an issue for hobbyists, he says. "I think if you put four or
more than four of the systems on one plug you'd probably blow a fuse,"
Poulin told CBC News. The second issue is memory. The console has only
256 MB of RAM, far less than most personal computers available now.
Poulin said that while the low memory wouldn't be a problem for
straightforward computations, running multiple simulations or programs
could tax the system. As a result, simulations running on the cluster
would have to be tailored to consider the cluster's memory
limitations. Poulin said he hopes the project will help open doors to
more partnerships between industry and universities that will lead to
better access to supercomputing power. "That's ultimately the goal
here," he said. "We want to make things easier, no matter what kind of
supercomputer you are using."
CONTACT
Gaurav Khanna
http://www.umassd.edu/engineering/phy/people/faculty/gkhanna/welcome.cfm
email : gkhanna [at] umassd [dot] edu
Lior Burko
http://gravity.uah.edu/~burko/
email : burko [at] uah [dot] edu
PREVIOUSLY ON SPECTRE
http://spectregroup.wordpress.com/2008/04/06/gravity-waves/
GRAVITY GRID
http://gravity.phy.umassd.edu/ps3.html
http://www.wired.com/techbiz/it/news/2007/10/ps3_supercomputer
Astrophysicist Replaces Supercomputer with Eight PlayStation 3s
BY Bryan Gardiner / 10.17.07
Gaurav Khanna’s eight PlayStation 3s aren’t running Heavenly Sword --
they’re using Linux plus custom code to solve complex computations.
Suffering from its exorbitant price point and a dearth of titles,
Sony's PlayStation 3 isn't exactly the most popular gaming platform on
the block. But while the console flounders in the commercial space,
the PS3 may be finding a new calling in the realm of science and
research. Right now, a cluster of eight interlinked PS3s is busy
solving a celestial mystery involving gravitational waves and what
happens when a super-massive black hole, about a million times the
mass of our own sun, swallows up a star.
As the architect of this research, Dr. Gaurav Khanna is employing his
so-called "gravity grid" of PS3s to help measure these theoretical
gravity waves -- ripples in space-time that travel at the speed of
light -- that Einstein's Theory of Relativity predicted would emerge
when such an event takes place. It turns out that the PS3 is ideal for
doing precisely the kind of heavy computational lifting Khanna
requires for his project, and the fact that it's a relatively open
platform makes programming scientific applications feasible. "The
interest in the PS3 really was for two main reasons," explains Khanna,
an assistant professor at the University of Massachusetts, Dartmouth
who specializes in computational astrophysics. "One of those is that
Sony did this remarkable thing of making the PS3 an open platform, so
you can in fact run Linux on it and it doesn't control what you do."
He also says that the console's Cell processor, co-developed by Sony,
IBM and Toshiba, can deliver massive amounts of power, comparable even
to that of a supercomputer -- if you know how to optimize code and
have a few extra consoles lying around that you can string together.
"The PS3/Linux combination offers a very attractive cost-performance
solution whether the PS3s are distributed (like Sony and Stanford's
Folding@home initiative) or clustered together (like Khanna's), says
Sony's senior development manager of research and development, Noam
Rimon.
According to Rimon, the Cell processor was designed as a parallel
processing device, so he's not all that surprised the research
community has embraced it. "It has a general purpose processor, as
well as eight additional processing cores, each of which has two
processing pipelines and can process multiple numbers, all at the same
time," Rimon says. This is precisely what Khanna needed. Prior to
obtaining his PS3s, Khanna relied on grants from the National Science
Foundation (NSF) to use various supercomputing sites spread across the
United States "Typically I'd use a couple hundred processors -- going
up to 500 -- to do these same types of things." However, each of those
supercomputer runs cost Khanna as much as $5,000 in grant money. Eight
60 GB PS3s would cost just $3,200, by contrast, but Khanna figured he
would have a hard time convincing the NSF to give him a grant to buy
game consoles, even if the overall price tag was lower. So after
tweaking his code this past summer so that it could take advantage of
the Cell's unique architecture, Khanna set about petitioning Sony for
some help in the form of free PS3s. "Once I was able to get to the
point that I had this kind of performance from a single PS3, I think
that's when Sony started paying attention," Khanna says of his
optimized code.
Khanna says that his gravity grid has been up and running for a little
over a month now and that, crudely speaking, his eight consoles are
equal to about 200 of the supercomputing nodes he used to rely on.
"Basically, it's almost like a replacement," he says. "I don't have to
use that supercomputer anymore, which is a good thing. For the same
amount of money -- well, I didn't pay for it, but even if you look
into the amount of funding that would go into buying something like
eight PS3s -- for the same amount of money I can do these runs
indefinitely." The point of the simulations Khanna and his team at
UMass are running on the cluster is to see if gravitational waves,
which have been postulated for almost 100 years but have never been
observed, are strong enough that we could actually observe them one
day. Indeed, with NASA and other agencies building some very big
gravitational wave observatories with the sensitivity to be able to
detect these waves, Khanna's sees his work as complementary to such
endeavors. Khanna expects to publish the results of his research in
the next few months. So while PS3 owners continue to wait for a fuller
range of PS3 titles and low prices, at least they'll have some reading
material to pass the time.
EARLY ADOPTERS
http://www.redorbit.com/news/technology/4631/scientists_create_supercomputer_from_sony_playstations/index.html
Scientists Create Supercomputer from Sony Playstations
BY John Markoff / 27 May 2003
NY Times/CNET News -- As perhaps the clearest evidence yet of the
power of sophisticated but inexpensive game consoles, the National
Center for Supercomputing Applications at the University of Illinois
at Urbana-Champaign has assembled a supercomputer from an army of Sony
PlayStation 2 devices. The resulting system, with components purchased
at retail prices, cost a little more than $50,000. Researchers at the
supercomputing center believe the system may be capable of a half
trillion operations a second, well within the definition of
supercomputer, although it may not rank among the world's 500 fastest
supercomputers.
Perhaps the most striking aspect of the project, which uses the open-
source Linux operating system, is that the only hardware engineering
involved was placing 70 of the individual game machines in a rack and
plugging them together with a high-speed Hewlett-Packard network
switch. The center's scientists bought 100 machines but are holding 30
in reserve, possibly for high-resolution display application. "It took
a lot of time because you have to cut all of these things out of the
plastic packaging," said Craig Steffen, a senior research scientist at
the center, who is one of four scientists working part time on the
project. The scientists are taking advantage of a standard component
of the PS2 that was originally intended to move and transform pixels
rapidly on a television screen to produce lifelike graphics.
That chip is not the PlayStation 2's MIPS microprocessor, but rather a
graphics co-processor known as the Emotion Engine. That custom-
designed silicon chip is capable of producing up to 6.5 billion
mathematical operations a second. The impressive performance of the
game machine, which has been on the market for a few years,
underscores a radical shift that has taken place in the computing
world since the end of the Cold War in the late 1980s, according to
the researchers. While the most advanced computing technologies have
historically been developed first for large corporate users and
military contractors, increasingly the fastest computers are being
developed for the consumer market and for products meant to be placed
under Christmas trees. "If you look at the economics of game platforms
and the power of computing on toys, this is a long-term market trend
and computing trend," said Dan Reed, the supercomputing center's
director. "The economics are just amazing. This is going to drive the
next big wave in high-performance computing."
The scientists have their eyes on a variety of consumer hardware, he
said. For example Nvidia, the maker of graphics cards for PCs, is now
selling a high-performance graphics card capable of executing 51
billion mathematical operations per second. The pace of the consumer
computing world is moving so quickly that the researchers are building
the PlayStation 2-based supercomputer as an experiment to see how
quickly they can take advantage of off-the-shelf, low-cost
technologies. "I think we'd like to be able to transfer a lot of our
experience to the next generation," he said. Despite the computing
promise of game consoles that sell for less than $200, the researchers
acknowledged that the experiment was likely to be most useful for a
group of relatively narrow scientific problems. They added that while
the system was already doing scientific calculations, they cannot be
certain about its ultimate computing potential until they write more
carefully tuned software routines that can move data in and out of the
custom processor quickly.
The limited memory of the Sony game console--32MB of memory--would
also restrict the practical applications of the supercomputer, they
said. But they noted that the computer was already running useful
calculations on quantum chromodynamics, or QCD, simulations. QCD is a
theory concerning the so-called strong interactions that bind
elementary particles like quarks and gluons together to form hadrons,
the constituents of nuclear matter. The ability to lower the cost of
QCD simulation in itself would be significant, the researchers said,
because such problems are the single largest consumer of computing
resources on supercomputers at the Department of Energy and the
National Energy Research Scientific Computing Center.
Still, several supercomputer experts said that the memory and
computing bandwidth limitations of the PlayStation would prohibit
broader applications of the machine. Gordon Bell, a Microsoft computer
scientist and a veteran of the supercomputer world, said the
PlayStation supercomputer might find its best application as a
computer for the large digital display walls that are used by the
Defense Department. Bell awards annual computing prizes that include a
category for the best price/performance in high performance computing.
"They should enter my contest," he said. The supercomputing center's
scientists said they had chosen the PlayStation 2 because Sony sells a
special Linux module that includes a high-speed network connection and
a disk drive. By contrast, it is almost impossible for researchers to
install the Linux system on Microsoft's Xbox game console. Using a
network of machines is not a new concept in the supercomputing world.
Linux, which plays a major role in that world, has been used to
assemble high-performance parallel computers built largely out of
commodity hardware components. These machines are generally called
Beowulf clusters.
ACADEMIC
http://www.engr.ncsu.edu/news/news_articles/ps3.html
http://blogs.techrepublic.com.com/itdojo/?p=359
PS3 supercomputer illustrates innovative IT cost savings
BY Bill Detwiler / March 4th, 2009
Back in 2007, Dr. Frank Mueller, an associate professor of computer
science at North Carolina State University, created a supercomputing
cluster of eight Sony PS3 systems. At the time, Mueller was quoted by
NC State University’s Engineering News as saying, “Places like Google,
the stock market, automotive design companies and scientist use
clusters, but this is the first academic computer cluster built from
PlayStation 3s.” Computer scientists at The University of Alabama in
Huntsville and the University of Massachusetts, Dartmouth, have taken
the clustering idea a step further and recently published research
using simulations run on the Sony game systems. Dr. Gaurav Khanna, an
assistant physics professor UMass Dartmouth, and Dr. Lior Burko, an
assistant physics professor at UAHuntsville, used a cluster of 16
PlayStation 3s, dubbed the PS3 Gravity Grid, to simulate a vibrating
black hole and determine the speed at which it stops vibrating. Why
use PS3s and not a traditional supercomputing platform, such as the
National Science Foundation’s TeraGrid? Cost. In a PhysOrg.com article
on the PS3 project, Burko was quoted as saying “If we had rented
computing time from a supercomputer center it would have cost us about
$5,000 to run our simulation one time.” And, for their experiment,
Khanna and Burko needed to run the simulation dozens of times.
Considering a new 80GB PS3 retails for about $400, the 16 PS3s needed
for Khanna’s cluster would cost around $6,400. For just over the cost
of a single run, researchers were able to build a resource that they
could use over and over again.
CONTACT
Frank Mueller
http://moss.csc.ncsu.edu/~mueller/
http://moss.csc.ncsu.edu/~mueller/cluster/ps3/
email : mueller [at] cs.ncsu [dot] edu
TOYS NO LONGER
http://www.telegraph.co.uk/science/science-news/3325757/Why-scientists-love-games-consoles.html
Why scientists love games consoles
BY Roger Highfield / 17 Feb 2008
Leading scientists are turning to the extraordinary power of games
consoles to do their sums and simulate everything from colliding black
holes to the effects of drugs. Reprogram a PlayStation and it will
perform feats that would be unthinkable on an ordinary PC because the
kinds of calculations required to produce the realistic graphics now
seen in sophisticated video games are similar to those used by
chemists and physicists as they simulate the interactions between
particles ranging from the molecular to the astronomical. Such
simulations are usually carried out on a supercomputer, but time on
these machines is expensive and in short supply. By comparison, games
consoles are cheap and easily available, says New Scientist. "There is
no doubt that the entertainment industry is helping to drive the
direction of high performance computational science - exploiting the
power available to the masses will lead to many research breakthroughs
in the future," comments Prof Peter Coveney of University College
London, who uses supercomputing in chemistry.
Prof Gaurav Khanna at the University of Massachusetts has used an
array of 16 PS3s to calculate what will happen when two black holes
merge. According to Prof Khanna, the PS3 has unique features that make
it suitable for scientific computations, namely, the Cell processor
dubbed a "supercomputer-on-a-chip." And it runs on Linux, "so it does
not limit what you can do. A single high-precision simulation can
sometimes cost more than 5,000 hours on the TeraGrid supercomputers.
For the same cost, you can build your own supercomputer using PS3s. It
works just as well, has no long wait times and can be used over and
over again, indefinitely," Prof Khanna says.
And Todd Martínez has persuaded the supercomputing centre at the
University of Illinois, Urbana-Champaign, to buy eight computers each
driven by two of the specialised chips that are at the heart of Sony's
PlayStation 3 console. Together with his student Benjamin Levine he is
using them to simulate the interactions between the electrons in
atoms, as part of work to see how proteins in the body dovetail with
drug molecules. He was inspired while browsing through his son's games
console's technical specification "I noticed that the architecture
looked a lot like high performance supercomputers I had seen before,"
he says. "That's when I thought about getting one for myself."
An effort to interconnect tends of thousands of PS3s is under way with
Folding@Home , an effort based at Stanford University to study the way
proteins fold, which plays a key role in Alzheimer's, Huntington's
Disease and Parkinson's disease. With about 50,000 such machines, the
organisers of this huge distributed computing effort hope to achieve
performance on the petaflop scale. The Wii, made by Nintendo, has a
motion tracking remote control unit that is cheaper than a comparable
device built from scratch. The device recently emerged as a tool to
help surgeons to improve their technique. Meanwhile, neurologist
Thomas Davis at the Vanderbilt Medical Centre in Nashville, Tennessee,
is using it to measure movement deficiencies in Parkinson's patients
to assess how well a patient can move when they take part in drug
trials.
FOLDING @ HOME
http://folding.stanford.edu/English/FAQ-PS3
http://www.wired.com/gamelife/2008/02/foldinghome-rea/
Folding@home Reaches Million PS3-User Milestone
BY Susan Arendt / February 4, 2008
Sony recently announced that more than one million PlayStation 3
owners are taking part in Folding@home, the distributed computing
project run by Stanford University. The participation of PS3 owners in
Folding@home allows the project "to address questions previously
considered impossible to tackle computationally, with the goal of
finding cures to some of the world’s most life-threatening diseases,"
said project lead Vijay Pande. More one million PS3 owners as
registered participants breaks down to about two new registrants per
minute, or about 3,000 new Folding@home members per day.
Folding@home’s mission is to try and better understand how proteins
fold, and how misfolds are related to various diseases like cancer,
Alzheimer’s and Parkinson’s. PS3s currently comprise about 74 percent
of the entire computing power of Folding@home. When the project
achieved a petaflop in September, it officially became the most
powerful distributed computing network in the world, at least
according to folks at Guinness World Records. A network of 10,000 PS3s
can accomplish the same amount of Folding@home work as 100,000 PCs,
making their computational ability an invaluable asset to the project.
XBOX
http://hackingthexbox.com/
http://xbox-linux.sourceforge.net/
http://www.shadowflux.com/xbox.html
http://www.bgfax.com/xbox/home.html
http://www.extremelinux.info/stonesoup/
http://www.free60.org/
http://www.llamma.com/xbox/beowulf.htm
http://www.llamma.com/xbox/links.htm
BABY STEPS
http://research.microsoft.com/apps/pubs/default.aspx?id=79271
http://news.bbc.co.uk/2/hi/technology/8254159.stm
Researchers have harnessed the powerful silicon chips used in the Xbox
360 console to solve scientific conundrums. Academics at the
University of Warwick believe they are the first to use the processors
as a cheap way to conduct "parallel processing". Parallel computing is
where a number of processors are run in tandem, allowing a system to
rapidly crunch data. Researchers traditionally have to book time on a
dedicated "cluster" system or splash out setting up a network of PCs.
Instead, the Warwick team harnessed a single Xbox 360 Graphical
Processing Unit (GPU). The chip was able to perform parallel
processing functions at a fraction of the cost a traditional systems.
Dr Simon Scarle, a researcher on the team, built the system to help
him model how electrical signals in the heart moved around damaged
cardiac cells. Dr Scarle, who previously worked as a software engineer
at Microsoft's Rare studio, had first hand experience of tapping into
the power of GPU technology.
Speaking to BBC News, Dr Scarle said that the the code controlling the
chip was modified, so instead of working out graphical calculations,
it could perform other ones instead. "You don't quite get the full
whammy of a cluster, but its close," he said. "Instead of pumping out
stunning graphics, it's reworked; in the case of my research, rather
than calculating the position of a structure and texture it's now
working out the different chemical levels in a cell."
Real world computing
There has been cross-pollination between game consoles and real world
computing in the past. Roadrunner, officially the worlds fastest
supercomputer, uses the same processor technology as that found in
Sony's PlayStation 3. However, it is thought that this is the first
time an Xbox has been used to perform parallel processing, albeit on a
single chip.
Dr Scarle said that linking more than one Xbox together using the
techniques would not be impossible. "It could be done, but you would
have to go over the internet - through something like Xbox live -
rather than a standard method. However, without development tools, it
wouldn't be easy." Xbox live allows gamers to play against each other
over the internet. "Sony have been into this [parallel processing] for
some time, releasing development kits, and Folding@home comes as
standard," he added.
Folding@home is a project that harnesses the spare processing power of
PCs, Macs, Linux systems and PlayStation 3's to help understand the
cause of diseases. The network has more than 4.3 petaflop of computing
power - the equivalent of more than 4,300 trillion calculations per
second. Roadrunner, by comparison can operate at just over one
petaflop. The results of the University of Warwick research are
published in the journal Computational Biology and Chemistry.
CONTACT
Simon Scarle
http://research.microsoft.com/en-us/people/sscarle/
http://www.eurekalert.org/multimedia/pub/16714.php
email : S.Scarle [at]
warwick.ac [dot] uk
OPEN CL
http://www.khronos.org/opencl/
http://eprints.cs.vt.edu/archive/00001081/01/gpu.pdf
http://zikkir.com/scitech/13097
New Software Could Smooth Supercomputing Speed Bumps
BY Larry Greenemeier / 16 October 2009
Supercomputers have long been an indispensable, albeit expensive, tool
for researchers who need to make sense of vast amounts of data. One
way that researchers have begun to make high-speed computing more
powerful and also more affordable is to build systems that split up
workloads among fast, highly parallel graphics processing units (GPUs)
and general-purpose central processing units (CPUs).
There is, however, a problem with building these co-processed
computing hot rods: A common programming interface for the different
GPU models has not been available. Even though the lion’s share of
GPUs are made by Advanced Micro Devices, Inc. (AMD) and NVIDIA Corp.,
the differences between the two companies’ processors mean that
programmers have had to write software to meet the requirements of the
particular GPU used by their computers.
Now, this is changing as AMD, NVIDIA and their customers (primarily
computer- and game system–makers) throw their support behind a
standard way of writing software called the OpenComputing Language
(OpenCL), which works across both GPU brands. A longer-term goal
behind OpenCL is to create a common programming interface that will
even let software writers create applications that run both GPUs and
CPUs with few modifications, cutting the time and effort required to
harness supercomputing power for scientific endeavors.
Researchers at Virginia Polytechnic Institute and State University
(Virginia Tech) in Blacksburg, Va., are hoping that OpenCL can help
them write software that can run on GPUs made either by AMD or NVIDIA.
Using a computer equipped with both a CPU and an AMD GPU, the Virginia
Tech researchers were able to compute and visualize biomolecular
electrostatic surface potential (pdf) 1,800 times faster (from 22.4
hours to less than a minute) than they could with a similar computer
driven only by a CPU.
The National Institutes of Health (NIH) has committed more than $1.3
million in funding from 2006 through 2011 for a project led by Alexey
Onufriev, an associate professor in Virginia Tech’s departments of
Computer Science and Physics, to represent water computationally,
because water is key to modeling biological molecules. “When you model
a molecule at the atomic level,” Onufriev says, “you need to know the
impact that water will have on that model.”
This is the type of program that GPUs map quite well, says Wu Feng,
director of Virginia Tech’s Synergy Laboratory and an associate
professor in the school’s departments of Computer Science and
Electrical & Computer Engineering. “These applications tend to be
compute-intensive and regular in their computation,” he adds, “regular
in the sense that you’re calculating electrostatic potential between
pairs of points.”
CPUs, however, are better suited than GPUs to computing tasks that
require the computer to make a decision. For example, if a string of
computing tasks were likened to a line of people waiting to enter a
stadium, Feng says, the GPU would be very good at dividing up the
people into multiple lines and taking their tickets as they enter—as
long as everyone has the same type of ticket. If some people had
special tickets that allowed them to go backstage or entitled them to
some other privilege, it would greatly slow the GPU’s capabilities as
the processor decided what to do with the nonconformists. “GPUs work
well today when they are given a single instruction for a repetitive
task,” he adds.
Feng and his team are adapting an electrostatic potential program for
Onufriev’s lab so that it will work specifically on computers running
GPUs made by AMD. Feng notes that as OpenCL is embraced more widely,
he will be able to write programs that can communicate with any type
of GPU supporting OpenCL, regardless of manufacturer, and eventually
write code that provides instructions for both CPUs and GPUs. (Earlier
this week, AMD made available the latest version of its software
development tools that the company says allows programmers to use
OpenCL to write applications that let GPUs operate in concert with
CPUs.)
With this type of computing power and versatility, Onufriev says many
limitations will be lifted regarding the types of research he can
tackle. Another of his projects is studying how the nearly two meters
of DNA in each cell is packed into the cell’s nucleus. “The way DNA is
packed determines the genetic message,” he says. “No one knows exactly
how this works. We’re hoping to get stacks of GPU machines where we
can run simulations requiring massive computations that help us better
understand DNA packing.” Such work would be aided greatly by systems
that can make use of both GPUs and CPUs.
BEOWULF
http://www.beowulf.org/
http://www.cacr.caltech.edu/research/beowulf/
http://beowulf-underground.org/
SALVAGED PCs
http://stonesoup.esd.ornl.gov/
http://extremelinux.esd.ornl.gov/
http://www.extremelinux.info/stonesoup/
http://cva.stanford.edu/classes/cs99s/papers/sterling-hypercomputer.pdf
http://www.scientificamerican.com/article.cfm?id=the-do-it-yourself-superc
The Do-It-Yourself Supercomputer
BY William W. Hargrove, Forrest M. Hoffman and Thomas Sterling
In the well-known stone soup fable, a wandering soldier stops at a
poor village and says he will make soup by boiling a cauldron of water
containing only a shiny stone. The townspeople are skeptical at first
but soon bring small offerings: a head of cabbage, a bunch of carrots,
a bit of beef. In the end, the cauldron is filled with enough hearty
soup to feed everyone. The moral: cooperation can produce significant
achievements, even from meager, seemingly insignificant contributions.
Researchers are now using a similar cooperative strategy to build
supercomputers, the powerful machines that can perform billions of
calculations in a second. Most conventional supercomputers employ
parallel processing: they contain arrays of ultrafast microprocessors
that work in tandem to solve complex problems such as forecasting the
weather or simulating a nuclear explosion. Made by IBM, Cray and other
computer vendors, the machines typically cost tens of millions of
dollars--far too much for a research team with a modest budget. So
over the past few years, scientists at national laboratories and
universities have learned how to construct their own supercomputers by
linking inexpensive PCs and writing software that allows these
ordinary computers to tackle extraordinary problems.
In 1996 two of us (Hargrove and Hoffman) encountered such a problem in
our work at Oak Ridge National Laboratory (ORNL) in Tennessee. We were
trying to draw a national map of ecoregions, which are defined by
environmental conditions: all areas with the same climate, landforms
and soil characteristics fall into the same ecoregion. To create a
high-resolution map of the continental U.S., we divided the country
into 7.8 million square cells, each with an area of one square
kilometer. For each cell we had to consider as many as 25 variables,
ranging from average monthly precipitation to the nitrogen content of
the soil. A single PC or workstation could not accomplish the task. We
needed a parallel-processing supercomputer--and one that we could
afford!
Our solution was to construct a computing cluster using obsolete PCs
that ORNL would have otherwise discarded. Dubbed the Stone
SouperComputer because it was built essentially at no cost, our
cluster of PCs was powerful enough to produce ecoregion maps of
unprecedented detail. Other research groups have devised even more
capable clusters that rival the performance of the world's best
supercomputers at a mere fraction of their cost. This advantageous
price-to-performance ratio has already attracted the attention of some
corporations, which plan to use the clusters for such complex tasks as
deciphering the human genome. In fact, the cluster concept promises to
revolutionize the computing field by offering tremendous processing
power to any research group, school or business that wants it.
Beowulf And Grendel
The notion of linking computers together is not new. In the 1950s and
1960s the U.S. Air Force established a network of vacuum-tube
computers called SAGE to guard against a Soviet nuclear attack. In the
mid-1980s Digital Equipment Corporation coined the term "cluster" when
it integrated its mid-range VAX minicomputers into larger systems.
Networks of workstations--generally less powerful than minicomputers
but faster than PCs--soon became common at research institutions. By
the early 1990s scientists began to consider building clusters of PCs,
partly because their mass-produced microprocessors had become so
inexpensive. What made the idea even more appealing was the falling
cost of Ethernet, the dominant technology for connecting computers in
local-area networks.
Advances in software also paved the way for PC clusters. In the 1980s
Unix emerged as the dominant operating system for scientific and
technical computing. Unfortunately, the operating systems for PCs
lacked the power and flexibility of Unix. But in 1991 Finnish college
student Linus Torvalds created Linux, a Unix-like operating system
that ran on a PC. Torvalds made Linux available free of charge on the
Internet, and soon hundreds of programmers began contributing
improvements. Now wildly popular as an operating system for stand-
alone computers, Linux is also ideal for clustered PCs.
The first PC cluster was born in 1994 at the NASA Goddard Space Flight
Center. NASA had been searching for a cheaper way to solve the knotty
computational problems typically encountered in earth and space
science. The space agency needed a machine that could achieve one
gigaflops--that is, perform a billion floating-point operations per
second. (A floating-point operation is equivalent to a simple
calculation such as addition or multiplication.) At the time, however,
commercial supercomputers with that level of performance cost about $1
million, which was too expensive to be dedicated to a single group of
researchers.
One of us (Sterling) decided to pursue the then radical concept of
building a computing cluster from PCs. Sterling and his Goddard
colleague Donald J. Becker connected 16 PCs, each containing an Intel
486 microprocessor, using Linux and a standard Ethernet network. For
scientific applications, the PC cluster delivered sustained
performance of 70 megaflops--that is, 70 million floating-point
operations per second. Though modest by today's standards, this speed
was not much lower than that of some smaller commercial supercomputers
available at the time. And the cluster was built for only $40,000, or
about one tenth the price of a comparable commercial machine in 1994.
NASA researchers named their cluster Beowulf, after the lean, mean
hero of medieval legend who defeated the giant monster Grendel by
ripping off one of the creature's arms. Since then, the name has been
widely adopted to refer to any low-cost cluster constructed from
commercially available PCs. In 1996 two successors to the original
Beowulf cluster appeared: Hyglac (built by researchers at the
California Institute of Technology and the Jet Propulsion Laboratory)
and Loki (constructed at Los Alamos National Laboratory). Each cluster
integrated 16 Intel Pentium Pro microprocessors and showed sustained
performance of over one gigaflops at a cost of less than $50,000, thus
satisfying NASA's original goal.
The Beowulf approach seemed to be the perfect computational solution
to our problem of mapping the ecoregions of the U.S. A single
workstation could handle the data for only a few states at most, and
we couldn't assign different regions of the country to separate
workstations--the environmental data for every section of the country
had to be compared and processed simultaneously. In other words, we
needed a parallel-processing system. So in 1996 we wrote a proposal to
buy 64 new PCs containing Pentium II microprocessors and construct a
Beowulf-class supercomputer. Alas, this idea sounded implausible to
the reviewers at ORNL, who turned down our proposal.
Undeterred, we devised an alternative plan. We knew that obsolete PCs
at the U.S. Department of Energy complex at Oak Ridge were frequently
replaced with newer models. The old PCs were advertised on an internal
Web site and auctioned off as surplus equipment. A quick check
revealed hundreds of outdated computers waiting to be discarded this
way. Perhaps we could build our Beowulf cluster from machines that we
could collect and recycle free of charge. We commandeered a room at
ORNL that had previously housed an ancient mainframe computer. Then we
began collecting surplus PCs to create the Stone SouperComputer.
A Digital Chop Shop
The strategy behind parallel computing is "divide and conquer." A
parallel-processing system divides a complex problem into smaller
component tasks. The tasks are then assigned to the system's nodes--
for example, the PCs in a Beowulf cluster--which tackle the components
simultaneously. The efficiency of parallel processing depends largely
on the nature of the problem. An important consideration is how often
the nodes must communicate to coordinate their work and to share
intermediate results. Some problems must be divided into myriad
minuscule tasks; because these fine-grained problems require frequent
internode communication, they are not well suited for parallel
processing. Coarse-grained problems, in contrast, can be divided into
relatively large chunks. These problems do not require much
communication among the nodes and therefore can be solved very quickly
by parallel-processing systems.
Anyone building a Beowulf cluster must make several decisions in
designing the system. To connect the PCs, researchers can use either
standard Ethernet networks or faster, specialized networks, such as
Myrinet. Our lack of a budget dictated that we use Ethernet, which is
free. We chose one PC to be the front-end node of the cluster and
installed two Ethernet cards into the machine. One card was for
communicating with outside users, and the other was for talking with
the rest of the nodes, which would be linked in their own private
network. The PCs coordinate their tasks by sending messages to one
another. The two most popular message-passing libraries are message-
passing interface (MPI) and parallel virtual machine (PVM), which are
both available at no cost on the Internet. We use both systems in the
Stone SouperComputer.
Many Beowulf clusters are homogeneous, with all the PCs containing
identical components and microprocessors. This uniformity simplifies
the management and use of the cluster but is not an absolute
requirement. Our Stone SouperComputer would have a mix of processor
types and speeds because we intended to use whatever surplus equipment
we could find. We began with PCs containing Intel 486 processors but
later added only Pentium-based machines with at least 32 megabytes of
RAM and 200 megabytes of hard-disk storage.
It was rare that machines met our minimum criteria on arrival; usually
we had to combine the best components from several PCs. We set up the
digital equivalent of an automobile thief's chop shop for converting
surplus computers into nodes for our cluster. Whenever we opened a
machine, we felt the same anticipation that a child feels when opening
a birthday present: Would the computer have a big disk, lots of memory
or (best of all) an upgraded motherboard donated to us by accident?
Often all we found was a tired old veteran with a fan choked with
dust.
Our room at Oak Ridge turned into a morgue filled with the picked-over
carcasses of dead PCs. Once we opened a machine, we recorded its
contents on a "toe tag" to facilitate the extraction of its parts
later on. We developed favorite and least favorite brands, models and
cases and became adept at thwarting passwords left by previous owners.
On average, we had to collect and process about five PCs to make one
good node.
As each new node joined the cluster, we loaded the Linux operating
system onto the machine. We soon figured out how to eliminate the need
to install a keyboard or monitor for each node. We created mobile
"crash carts" that could be wheeled over and plugged into an ailing
node to determine what was wrong with it. Eventually someone who
wanted space in our room bought us shelves to consolidate our
collection of hardware. The Stone SouperComputer ran its first code in
early 1997, and by May 2001 it contained 133 nodes, including 75 PCs
with Intel 486 microprocessors, 53 faster Pentium-based machines and
five still faster Alpha workstations, made by Compaq.
Upgrades to the Stone SouperComputer are straightforward: we replace
the slowest nodes first. Each node runs a simple speed test every hour
as part of the cluster's routine housekeeping tasks. The ranking of
the nodes by speed helps us to fine-tune our cluster. Unlike
commercial machines, the performance of the Stone SouperComputer
continually improves, because we have an endless supply of free
upgrades.
Parallel Problem Solving
Parallel programming requires skill and creativity and may be more
challenging than assembling the hardware of a Beowulf system. The most
common model for programming Beowulf clusters is a master-slave
arrangement. In this model, one node acts as the master, directing the
computations performed by one or more tiers of slave nodes. We run the
same software on all the machines in the Stone SouperComputer, with
separate sections of code devoted to the master and slave nodes. Each
microprocessor in the cluster executes only the appropriate section.
Programming errors can have dramatic effects, resulting in a digital
train wreck as the crash of one node derails the others. Sorting
through the wreckage to find the error can be difficult.
Another challenge is balancing the processing workload among the
cluster's PCs. Because the Stone SouperComputer contains a variety of
microprocessors with very different speeds, we cannot divide the
workload evenly among the nodes: if we did so, the faster machines
would sit idle for long periods as they waited for the slower machines
to finish processing. Instead we developed a programming algorithm
that allows the master node to send more data to the faster slave
nodes as they complete their tasks. In this load-balancing
arrangement, the faster PCs do most of the work, but the slower
machines still contribute to the system's performance.
Our first step in solving the ecoregion mapping problem was to
organize the enormous amount of data--the 25 environmental
characteristics of the 7.8 million cells of the continental U.S. We
created a 25-dimensional data space in which each dimension
represented one of the variables (average temperature, precipitation,
soil characteristics and so on). Then we identified each cell with the
appropriate point in the data space [see illustration A]. Two points
close to each other in this data space have, by definition, similar
characteristics and thus are classified in the same ecoregion.
Geographic proximity is not a factor in this kind of classification;
for example, if two mountaintops have very similar environments, their
points in the data space are very close to each other, even if the
mountaintops are actually thousands of miles apart.
Once we organized the data, we had to specify the number of ecoregions
that would be shown on the national map. The cluster of PCs gives each
ecoregion an initial "seed position" in the data space. For each of
the 7.8 million data points, the system determines the closest seed
position and assigns the point to the corresponding ecoregion. Then
the cluster finds the centroid for each ecoregion--the average
position of all the points assigned to the region. This centroid
replaces the seed position as the defining point for the ecoregion.
The cluster then repeats the procedure, reassigning the data points to
ecoregions depending on their distances from the centroids. At the end
of each iteration, new centroid positions are calculated for each
ecoregion. The process continues until fewer than a specified number
of data points change their ecoregion assignments. Then the
classification is complete.
The mapping task is well suited for parallel processing because
different nodes in the cluster can work independently on subsets of
the 7.8 million data points. After each iteration the slave nodes send
the results of their calculations to the master node, which averages
the numbers from all the subsets to determine the new centroid
positions for each ecoregion. The master node then sends this
information back to the slave nodes for the next round of
calculations. Parallel processing is also useful for selecting the
best seed positions for the ecoregions at the very beginning of the
procedure. We devised an algorithm that allows the nodes in the Stone
SouperComputer to determine collectively the most widely dispersed
data points, which are then chosen as the seed positions. If the
cluster starts with well-dispersed seed positions, fewer iterations
are needed to map the ecoregions.
The result of all our work was a series of maps of the continental
U.S. showing each ecoregion in a different color [see illustrations B
and C]. We produced maps showing the country divided into as few as
four ecoregions and as many as 5,000. The maps with fewer ecoregions
divided the country into recognizable zones--for example, the Rocky
Mountain states and the desert Southwest. In contrast, the maps with
thousands of ecoregions are far more complex than any previous
classification of the country's environments. Because many plants and
animals live in only one or two ecoregions, our maps may be useful to
ecologists who study endangered species.
In our first maps the colors of the ecoregions were randomly assigned,
but we later produced maps in which the colors of the ecoregions
reflect the similarity of their respective environments. We
statistically combined nine of the environmental variables into three
composite characteristics, which we represented on the map with
varying levels of red, green and blue. When the map is drawn this way,
it shows gradations of color instead of sharp borders: the lush
Southeast is mostly green, the cold Northeast is mainly blue, and the
arid West is primarily red [see illustration D].
Moreover, the Stone SouperComputer was able to show how the ecoregions
in the U.S. would shift if there were nationwide changes in
environmental conditions as a result of global warming. Using two
projected climate scenarios developed by other research groups, we
compared the current ecoregion map with the maps predicted for the
year 2099. According to these projections, by the end of this century
the environment in Pittsburgh will be more like that of present-day
Atlanta, and conditions in Minneapolis will resemble those in present-
day St. Louis. [see Stone SouperComputer's Global Warming Forecast]
The Future Of Clusters
The traditional measure of supercomputer performance is benchmark
speed: how fast the system runs a standard program. As scientists,
however, we prefer to focus on how well the system can handle
practical applications. To evaluate the Stone SouperComputer, we fed
the same ecoregion mapping problem to ORNL's Intel Paragon
supercomputer shortly before it was retired. At one time, this machine
was the laboratory's fastest, with a peak performance of 150
gigaflops. On a per-processor basis, the run time on the Paragon was
essentially the same as that on the Stone SouperComputer. We have
never officially clocked our cluster (we are loath to steal computing
cycles from real work), but the system has a theoretical peak
performance of about 1.2 gigaflops. Ingenuity in parallel algorithm
design is more important than raw speed or capacity: in this young
science, David and Goliath (or Beowulf and Grendel!) still compete on
a level playing field.
The Beowulf trend has accelerated since we built the Stone
SouperComputer. New clusters with exotic names--Grendel, Naegling,
Megalon, Brahma, Avalon, Medusa and theHive, to mention just a few--
have steadily raised the performance curve by delivering higher speeds
at lower costs. As of last November, 28 clusters of PCs, workstations
or servers were on the list of the world's 500 fastest computers. The
LosLobos cluster at the University of New Mexico has 512 Intel Pentium
III processors and is the 80th-fastest system in the world, with a
performance of 237 gigaflops. The Cplant cluster at Sandia National
Laboratories has 580 Compaq Alpha processors and is ranked 84th. The
National Science Foundation and the U.S. Department of Energy are
planning to build even more advanced clusters that could operate in
the teraflops range (one trillion floating-point operations per
second), rivaling the speed of the fastest supercomputers on the
planet.
Beowulf systems are also muscling their way into the corporate world.
Major computer vendors are now selling clusters to businesses with
large computational needs. IBM, for instance, is building a cluster of
1,250 servers for NuTec Sciences, a biotechnology firm that plans to
use the system to identify disease-causing genes. An equally important
trend is the development of networks of PCs that contribute their
processing power to a collective task. An example is SETI@home, a
project launched by researchers at the University of California at
Berkeley who are analyzing deep-space radio signals for signs of
intelligent life. SETI@home sends chunks of data over the Internet to
more than three million PCs, which process the radio-signal data in
their idle time. Some experts in the computer industry predict that
researchers will eventually be able to tap into a "computational grid"
that will work like a power grid: users will be able to obtain
processing power just as easily as they now get electricity.
Above all, the Beowulf concept is an empowering force. It wrests high-
level computing away from the privileged few and makes low-cost
parallel-processing systems available to those with modest resources.
Research groups, high schools, colleges or small businesses can build
or buy their own Beowulf clusters, realizing the promise of a
supercomputer in every basement. Should you decide to join the
parallel-processing proletariat, please contact us through our Web
site (
http://extremelinux.esd.ornl.gov/) and tell us about your
Beowulf-building experiences. We have found the Stone Soup to be
hearty indeed.
Further Information:
Cluster Computing: Linux Taken to the Extreme. F. M. Hoffman and W. W.
Hargrove in Linux Magazine, Vol. 1, No. 1, pages 56-59; Spring 1999.
Using Multivariate Clustering to Characterize Ecoregion Borders. W. W.
Hargrove and F. M. Hoffman in Computers in Science and Engineering,
Vol. 1, No. 4, pages 18-25; July/August 1999.
How to Build a Beowulf: A Guide to the Implementation and Application
of PC Clusters. Edited by T. Sterling, J. Salmon, D. J. Becker and D.
F. Savarese. MIT Press, 1999.