Group: http://groups.google.com/group/datacleaner-dev/topics
Pmohan <mprab...@gmail.com> Oct 13 09:29AM -0700 ^
I was just exploring, what would it take to
1) make Data cleaner be able to be deployed on the cloud
2) accessed through a web client / API
3) Distributed Job Processing across multiple machines ( long shot
though)
Thanks
Pmohan
"Kasper Sørensen" <kas...@eobjects.dk> Oct 13 07:03PM +0200 ^
--
Hi Pmohan,
Thanks for the question, an interesting one!
1+2) Actually we have been playing around with this idea already at Human
Inference. We've already made some loose plans to be able to deploy DC jobs
as invokable web services, running on a server. The architecture completely
supports this idea and I see no major impediments, except "just doing it".
3) For some tasks this is a good fit, for some features not. Specifically,
the transformer and filter components are very analogous with the "map" part
of a MapReduce system (like Hadoop or GridGain) and thus could be REALLY
scalable. The Analyzer components are also kinda analogous to "Reduce" in a
MapReduce system, but there to make it work it would impose certain
restrictions onto what an Analyzer can do, and specifically how it saves
state. So yes, it is in our thoughts but it's not likely to be something we
would create on the short term.
Now that you have a few answers, may I ask (out of curiosity) why you are
asking? Are you considering building such an application? Would you maybe be
interested in a cooperation?
Best regards,
Kasper
You received this message because you are subscribed to the Google Groups "DataCleaner-dev" group.
To post to this group, send email to datacle...@googlegroups.com.
To unsubscribe from this group, send email to datacleaner-d...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/datacleaner-dev?hl=en.
Prabhuram Mohan <mprab...@gmail.com> Oct 16 03:54PM -0400 ^
Hi Kasper,
appreciate your response.
I am a Data & BI Engineer by profession. I have consulted for several
banks. I feel at home with SQL, SAS & R.
I have administered Dataflux and worked on Informatica DQ.
Dataflux and IDQ are monolithic and expensive. I thought some thing better
can be possible.
I liked Data Cleaner. Sure i would like to collaborate .
to start with :- do u have any Amazon AMI images with Data Cleaner. If not i
would recommend creating one. Its an easier way to taking Data Cleaner for a
spin.
I have experience with Amazon EC2.
If you have some time we can have a chat.
thx
prabhu
"Kasper Sørensen" <kas...@eobjects.dk> Oct 17 10:53AM +0200 ^
Hi Prabhu,
No we don't have any Amazon images available, if you want to create one I
think that's fine. I don't see it as the primary way our users will want to
access DC, since the installation is already very easy and unintrusive.
But if you create an imagine I will be happy to link to it or add a newsitem
about it on the DC website.
Would deploying DC on Amazon qualify as "DC in the cloud" in your opinion?
Because then I think we're talking about slightly different things. I was
more replying on the grounds of creating a web application that uses the DC
engine to execute jobs in the browser. Obviously you're more talking about
using the computing power of eg. Amazon's cloud offering to run DC jobs.
Given that those machines work just like any other machines, I don't see any
issues in doing so!
Best regards,
Kasper
--