We have a project to run R atop a Hadoop cluster. The objective is to
run large R stats jobs on EC2.
R is open source, has many many packages written for it, but is
limited by the amount of data that can be read into memory.
This is at a very early stage, but results look good so far.
Paco
(apologies to people who've emailed / we've been buried under a large rock)
...for linear algebra based on Hadoop
Certainly some of the big financial organizations use their own internal grids to run MATLAB and use LSF as the mechanism to distribute the parallel pieces of the computation over that grid. Hundreds of CPUs/cores is not uncommon. That's a well known environment for both Platform Computing (LSF) and for The MathWorks (MATLAB). However, independently licensing these apps in a cloud environment seems a little problematic to me. The license managers in both these products (Flex/LM or FlexNet is a typical license manager) rely on specific attributes of the host machine to do their work. Is there a guarantee that this will be the case - especially for the license manager itself.
So, it would seem to me that these companies would have to have done a prior deal with the cloud operating company (as some commercial apps companies seem to have done) to get their infrastructure set up properly in the environment. In this case, a user could then use the cloud infrastructure purely on a pay as you go basis - but who gets paid? the operator - who is billing out use of his virtual machines at so much per hour, the application provider(s) who need to recover revenue for the use of their tools? both? all?
When I was associated with Sun's efforts on the Network.com infrastructure, these issues precluded commercial apps from the environment, and raised the value of open source apps as the target for the compute utility. Many vendors of commercial software that I spoke with about using such an environment as Sun proposed basically said "over my dead body" (or at least their sales VPs did) as the infrastructure posited (call it cloud computing ) had, or could have, such an impact on their current revenue models, and collecting the revenue was a function they didn't want to address themselves. That was left from the cloud operator to address (gee, sounds like the old timesharing stuff I used to do with Control Data's CYBERNET - back in the day!)I think it will take some time before this kind of computing model makes it big in the commercial application space.
We are doing a lot of work with Matlab applications. We run them,
mostly, via Swift.
(E.g., see https://twiki.grid.iu.edu/pub/Education/MWGS2008Syllabus/8_SwiftWorkflow.pdf.)
The target platform can then be a local cluster, one or more remote
sites (e.g., Open Science Grid), or alternatively, via Nimbus, our
local virtual machine system or EC2.
Ian.
On Wed, Oct 1, 2008 at 4:08 PM, Khazret Sapenov <sap...@gmail.com> wrote:
CHAMPAIGN, Ill., Nov. 5 -- Wolfram Research announced an initiative today to develop a cloud computing service for users of their flagship technical computing software Mathematica. This project is a collaborative effort by Wolfram Research, Nimbis Services, Inc., a clearing-house for accessing third party compute resources and commercial software, and R Systems NA, Inc., a provider of computing resources to the commercial and academic research community.
According to Deborah Wince-Smith, president of the Council on Competitiveness, "High-performance computing systems (HPC) remain a largely underutilized competitiveness asset in the United States for the majority of companies. Opening access to HPC represents a huge productivity opportunity for the nation and a competitiveness transformation challenge." The collaboration of Wolfram Research, Nimbis Services, and R Systems will make the transition from desktop to HPC systems easier for Mathematica users by providing efficiently structured access to larger, more powerful computing systems.
Nimbis Services will enable the Mathematica cloud service to access many diverse HPC systems, including TOP500 supercomputers and the Amazon Elastic Compute Cloud. Nimbis Services, Inc. President and CEO Robert Graybill echoes the council's views on HPC systems and explains that the foundational principle of Nimbis Services is to focus on "ease of use" by providing experimental and periodic business users the choice of large-scale computing service alternatives, all in one "instant" computing storefront.
"Our partnership with Wolfram Research immensely benefits software users attempting to increase efficiency and capacity," says R Systems founder Brian Kucic. "As Mathematica users seek to extend resource capacity, the exceptionally large memory of our multicore HPC resources and the double-data rate and quad-data rate InfiniBand network will increase performance." HPC resources such as the R Smarr cluster by R Systems, Inc., which was recently named the 44th fastest system on the TOP500 list for supercomputing pioneers, are responsible for bringing HPC technology to the forefront.
The Mathematica cloud computing service will provide flexible and scalable access to HPC from within Mathematica, simplifying the transition from desktop technical computing to HPC. "The two largest challenges in using HPC are programming the HPC application itself and ensuring that you can get enough computing power to do the job," says Tom Wickham-Jones, Wolfram Research executive director of kernel technology. "Mathematica answers the programming challenge by providing an integrated technical computing platform, enabling computation, visualization, and data access. Cloud computing offers consistent access to large-scale computing capabilities. We are excited to be working with Nimbis and R Systems to offer HPC access to our customers."