Re Hadoop / HBase distributions

37 views
Skip to first unread message

Assaf yardeni

unread,
Jan 4, 2011, 8:28:56 AM1/4/11
to iltec...@googlegroups.com
Hi Guys,

At FLR (www.firstliferesearch.com), we are about to launch first version of our system to production environment (US Colo DC), and I'm very interested in hearing the ones of you that uses Hadoop in production:

1. Which distribution are you using. Do you use Clouder's CDH? If yes, which version? If not, which version of Hadoop / HBase are you using?
2. Do you use the same distribution in your development, staging, production


Thanks,
Assaf Yardeni

Ori Lahav

unread,
Jan 4, 2011, 3:55:55 PM1/4/11
to iltec...@googlegroups.com
Assaf
Outbrain is using Hadoop & Hive.
next week - our NYC sysadmin that is handling the Hadoop set-up will be in IL and is about to give a talk about it in Outbrain (timing TBD) - let me know if you want to join.

Ori

Assaf yardeni

unread,
Jan 4, 2011, 5:09:32 PM1/4/11
to iltec...@googlegroups.com
Hi Ori,

I'll be glad to join.

Thanks,
Assaf

Moshe Kaplan

unread,
Jan 5, 2011, 3:09:00 AM1/5/11
to ILTechTalks
Hi Ori,

I'll be glad to join as well,

Best,
Moshe

On Jan 5, 12:09 am, Assaf yardeni <assaf.yard...@gmail.com> wrote:
> Hi Ori,
>
> I'll be glad to join.
>
> Thanks,
> AssafOn Tue, Jan 4, 2011 at 10:55 PM, Ori Lahav <ola...@gmail.com> wrote:
> > Assaf
> > Outbrain is using Hadoop & Hive.
> > next week - our NYC sysadmin that is handling the Hadoop set-up will be in
> > IL and is about to give a talk about it in Outbrain (timing TBD) - let me
> > know if you want to join.
>
> > Ori
>

Boris Shulman

unread,
Jan 5, 2011, 7:03:29 AM1/5/11
to iltec...@googlegroups.com
Hi Ori,

I would be glad to join if it is possible.

Boris Shulman.

On Tue, Jan 4, 2011 at 10:55 PM, Ori Lahav <ola...@gmail.com> wrote:

Oded Ben-Ozer

unread,
Jan 5, 2011, 7:38:06 AM1/5/11
to iltec...@googlegroups.com
Room for one more  ? if so count me in.

--
POONTANG !!!!

    get some

Amit Kahn

unread,
Jan 5, 2011, 7:59:19 AM1/5/11
to iltec...@googlegroups.com
I'd love to join too, and maybe one more developer from my company.

Thanks,
Amit.

Ori Lahav

unread,
Jan 5, 2011, 10:00:05 AM1/5/11
to iltec...@googlegroups.com
OK - guys - looks like there is a lot of interest.
I heard today that IGT and LivPerson are doing a Hadoop case study

Guys, 
seems like there is a big demand.

The meeting will be held in Outbrain office on sunday the 9th on 2pm.
The speaker will be Nathan Milford - our NY based Ops engineer which is visiting in IL.
Nathan build and maintain our hadoop cluster and also a good source of information.

If anyone else want to join - please let me know as we might need a bigger room.

Ori 

Amihay Zer-Kavod

unread,
Jan 5, 2011, 11:24:29 AM1/5/11
to ILTechTalks
Count me in as well, and maybe another guy from Amadesa.
10x
Amihay

On Jan 5, 5:00 pm, Ori Lahav <ola...@gmail.com> wrote:
> OK - guys - looks like there is a lot of interest.
> I heard today that IGT and LivPerson are doing a Hadoop case study
>
> Guys,
> seems like there is a big demand.
>
> The meeting will be held in Outbrain office on sunday the 9th on 2pm.
> The speaker will be Nathan Milford - our NY based Ops engineer which is
> visiting in IL.
> Nathan build and maintain our hadoop cluster and also a good source of
> information.
>
> If anyone else want to join - please let me know as we might need a bigger
> room.
>
> Ori
>
>
>
>
>
>
>
>
>
> On Wed, Jan 5, 2011 at 2:59 PM, Amit Kahn <amitk...@gmail.com> wrote:
> > I'd love to join too, and maybe one more developer from my company.
>
> > Thanks,
> > Amit.
>
> > On Wed, Jan 5, 2011 at 2:38 PM, Oded Ben-Ozer <oded.beno...@gmail.com>wrote:
>
> >> Room for one more  ? if so count me in.
>
> >> On Wed, Jan 5, 2011 at 2:03 PM, Boris Shulman <shulm...@gmail.com> wrote:
>
> >>> Hi Ori,
>
> >>> I would be glad to join if it is possible.
>
> >>> Boris Shulman.
>
> >>> On Tue, Jan 4, 2011 at 10:55 PM, Ori Lahav <ola...@gmail.com> wrote:
> >>> > Assaf
> >>> > Outbrain is using Hadoop & Hive.
> >>> > next week - our NYC sysadmin that is handling the Hadoop set-up will be
> >>> in
> >>> > IL and is about to give a talk about it in Outbrain (timing TBD) - let
> >>> me
> >>> > know if you want to join.
> >>> > Ori
>
> >>> > On Tue, Jan 4, 2011 at 3:28 PM, Assaf yardeni <assaf.yard...@gmail.com
>
> >>> > wrote:
>
> >>> >> Hi Guys,
>
> >>> >> At FLR (www.firstliferesearch.com), we are about to launch first
> >>> version
> >>> >> of our system to production environment (US Colo DC), and I'm very
> >>> >> interested in hearing the ones of you that uses Hadoop in production:
> >>> >> 1. Which distribution are you using. Do you use Clouder's CDH? If yes,
> >>> >> which version? If not, which version of Hadoop / HBase are you using?
> >>> >> 2. Do you use the same distribution in your development, staging,
> >>> >> production
>
> >>> >> Thanks,
> >>> >> Assaf Yardeni
> >>> >>+972 542021873begin_of_the_skype_highlighting            +972 542021873      end_of_the_skype_highlighting

Itay Kahana

unread,
Jan 7, 2011, 8:34:02 AM1/7/11
to ILTechTalks
Hey Ori,
I'll be glad to join.

Thanks,
Itay

On Jan 5, 5:00 pm, Ori Lahav <ola...@gmail.com> wrote:
> OK - guys - looks like there is a lot of interest.
> I heard today that IGT and LivPerson are doing a Hadoop case study
>
> Guys,
> seems like there is a big demand.
>
> The meeting will be held in Outbrain office on sunday the 9th on 2pm.
> The speaker will be Nathan Milford - our NY based Ops engineer which is
> visiting in IL.
> Nathan build and maintain our hadoop cluster and also a good source of
> information.
>
> If anyone else want to join - please let me know as we might need a bigger
> room.
>
> Ori
>
>
>
> On Wed, Jan 5, 2011 at 2:59 PM, Amit Kahn <amitk...@gmail.com> wrote:
> > I'd love to join too, and maybe one more developer from my company.
>
> > Thanks,
> > Amit.
>
> > On Wed, Jan 5, 2011 at 2:38 PM, Oded Ben-Ozer <oded.beno...@gmail.com>wrote:
>
> >> Room for one more  ? if so count me in.
>
> >> On Wed, Jan 5, 2011 at 2:03 PM, Boris Shulman <shulm...@gmail.com> wrote:
>
> >>> Hi Ori,
>
> >>> I would be glad to join if it is possible.
>
> >>> Boris Shulman.
>
> >>> On Tue, Jan 4, 2011 at 10:55 PM, Ori Lahav <ola...@gmail.com> wrote:
> >>> > Assaf
> >>> > Outbrain is using Hadoop & Hive.
> >>> > next week - our NYC sysadmin that is handling the Hadoop set-up will be
> >>> in
> >>> > IL and is about to give a talk about it in Outbrain (timing TBD) - let
> >>> me
> >>> > know if you want to join.
> >>> > Ori
>
> >>> > On Tue, Jan 4, 2011 at 3:28 PM, Assaf yardeni <assaf.yard...@gmail.com

Maxim Veksler

unread,
Jan 7, 2011, 9:00:39 AM1/7/11
to iltec...@googlegroups.com
+{1,2}

Nathan Milford

unread,
Jan 7, 2011, 2:56:03 PM1/7/11
to ILTechTalks
I'll be there too!

I'm gonna throw tomatoes.

This guys is a clueless hack. :P

- n

Nathan Milford

unread,
Jan 7, 2011, 4:30:40 PM1/7/11
to ILTechTalks
Assaf,

At outbrain we use CDH3b3 in our test environment and in production.
I've got a slide on ASF's Hadoop vs Cloudera's for Sunday. We don't
use Hbase, I believe Cassandra is filling the role that Hbase would in
our environment.

I just finished converting my 1.5 hour presentation for our Ops team
down to a shorter more general set of slides for general consumption.
I imagine I'll be tweaking it on my flight here in a few hours. Let
me know if there is anything you want me to cover.

Is FLR storing private medical information in Hadoop or are you only
scraping publicly available information from forums and medical sites?

Are you required to be HIPPA compliant or something similar?

Cloudera just released thier Securtiy guide:
http://www.cloudera.com/blog/2011/01/configuring-security-features-in-cdh3/

Aaron Myers & Todd Lipcon tag teamed a talk about the new security
features in Hadoop at HW2010:
http://www.cloudera.com/videos/hw10_video_making_hadoop_security_work_in_your_it_environment

You can see the back of my head when Todd start talking about
deployment :P

Great resources.

Amit Kahn

unread,
Jan 9, 2011, 3:17:40 AM1/9/11
to iltec...@googlegroups.com
Hope it's not too late, but looks like we won't be able to make it, so there are two more available seats.
I'd love to come to next Hadoop talk.

Amit,

Ori Lahav

unread,
Jan 9, 2011, 5:29:15 AM1/9/11
to iltec...@googlegroups.com
Sorry to hear that Amit.
Anyway - we have plenty of room for whoever want to attend this talk.

Ori

Assaf yardeni

unread,
Jan 9, 2011, 11:45:57 AM1/9/11
to iltec...@googlegroups.com, nat...@milford.io
(and now in the correct thread...)

Nathan, Ori,

Thanks for the invitation and the interesting presentation, It surely gave me some points to think of during the our last steps of design and DC setup.


Assaf Yardeni

Amihay Zer-Kavod

unread,
Jan 10, 2011, 6:58:35 AM1/10/11
to ILTechTalks
Yes, Thanks Nathan for the excelling talk and Ori for setting this up!

Any chance the slides will find there way to this forum ?

10x
Amihay

On Jan 9, 6:45 pm, Assaf yardeni <assaf.yard...@gmail.com> wrote:
> (and now in the correct thread...)
>
> Nathan, Ori,
>
> Thanks for the invitation and the interesting presentation, It surely gave
> me some points to think of during the our last steps of design and DC setup.
>
> Assaf Yardeni+972 54 2021873begin_of_the_skype_highlighting            +972 54 2021873      end_of_the_skype_highlighting
>
>
>
>
>
>
>
> On Sun, Jan 9, 2011 at 12:29 PM, Ori Lahav <ola...@gmail.com> wrote:
> > Sorry to hear that Amit.
> > Anyway - we have plenty of room for whoever want to attend this talk.
>
> > Ori
>
> > On Sun, Jan 9, 2011 at 10:17 AM, Amit Kahn <amitk...@gmail.com> wrote:
>
> >> Hope it's not too late, but looks like we won't be able to make it, so
> >> there are two more available seats.
> >> I'd love to come to next Hadoop talk.
>
> >> Amit,
>
> >> On Fri, Jan 7, 2011 at 11:30 PM, Nathan Milford <nat...@milford.io>wrote:
>
> >>> Assaf,
>
> >>> At outbrain we use CDH3b3 in our test environment and in production.
> >>> I've got a slide on ASF's Hadoop vs Cloudera's for Sunday. We don't
> >>> use Hbase, I believe Cassandra is filling the role that Hbase would in
> >>> our environment.
>
> >>> I just finished converting my 1.5 hour presentation for our Ops team
> >>> down to a shorter more general set of slides for general consumption.
> >>> I imagine I'll be tweaking it on my flight here in a few hours.  Let
> >>> me know if there is anything you want me to cover.
>
> >>> Is FLR storing private medical information in Hadoop or are you only
> >>> scraping publicly available information from forums and medical sites?
>
> >>> Are you required to be HIPPA compliant or something similar?
>
> >>> Cloudera just released thier Securtiy guide:
>
> >>>http://www.cloudera.com/blog/2011/01/configuring-security-features-in...
>
> >>> Aaron Myers & Todd Lipcon tag teamed a talk about the new security
> >>> features in Hadoop at HW2010:
>
> >>>http://www.cloudera.com/videos/hw10_video_making_hadoop_security_work...
>
> >>> You can see the back of my head when Todd start talking about
> >>> deployment :P
>
> >>> Great resources.
>
> >>> On Jan 4, 8:28 am, Assaf yardeni <assaf.yard...@gmail.com> wrote:
> >>> > Hi Guys,
>
> >>> > At FLR (www.firstliferesearch.com), we are about to launch first
> >>> version of
> >>> > our system to production environment (US Colo DC), and I'm very
> >>> interested
> >>> > in hearing the ones of you that uses Hadoop in production:
>
> >>> > 1. Which distribution are you using. Do you use Clouder's CDH? If yes,
> >>> which
> >>> > version? If not, which version of Hadoop / HBase are you using?
> >>> > 2. Do you use the same distribution in your development, staging,
> >>> production
>
> >>> > Thanks,
> >>> > Assaf Yardeni

Nathan Milford

unread,
Jan 10, 2011, 7:46:51 AM1/10/11
to ILTechTalks
I threw them up here:

http://blog.milford.io/2011/01/slides-and-notes-from-my-recent-hadoop-talk-in-israel/

I also gave my slides and notes to Ori if IL Tech Talks has thier own
archive.

- n

Udi h Bauman

unread,
Jan 10, 2011, 8:00:06 AM1/10/11
to iltec...@googlegroups.com
Nathan, Ori, thanks a lot for the talk!

If anyone wants, I posted my summary here:
http://www.mindmeister.com/75831919/hadoop-talk-nathan-milford-outbrain



Thanks,
Udi

--
[Arab sitting with nargila.]

ב-Jan 10, 2011, בשעה 2:46 PM, Nathan Milford <nat...@milford.io> כתב/ה:

sudhee...@gmail.com

unread,
Oct 1, 2013, 9:26:28 AM10/1/13
to iltec...@googlegroups.com

Apache Hadoop is an open-source program framework that supports data-intensive distributed applications, licensed under the Apache v2 license. It supports the jogging of applications on giant clusters of commodity hardware. Hadoop was derived from Google's MapReduce & Google File Technique (GFS) papers<a href="https://www.youtube.com/watch?v=PqbYn5LXzRw">Hadoop Online Training Demo in Hyderabad India</a>The Hadoop framework transparently provides both reliability & information motion to applications. Hadoop implements a computational paradigm named MapReduce, where the application is divided in to plenty of small fragments of work, each of which may be executed or re-executed on any node in the cluster. In addition, it provides a distributed file technique that stores information on the compute nodes, providing high aggregate bandwidth across the cluster. Both map/reduce & the distributed file technique are designed so that node failures are automatically handled by the framework. It allows applications to work with thousands of computation-independent computers & petabytes of information. The whole Apache Hadoop "platform" is now often thought about to consist of the Hadoop kernel, MapReduce & Hadoop Distributed File Technique (HDFS), & a variety of related projects including Apache Hive, Apache HBase, & others <a href=" http://hadooponlinetrainings.com/hadoop-online-training/">Hadoop Online Training</a>

Hadoop is written in the Java programming language & is an Apache top-level project being built & used by a worldwide community of contributors. Hadoop & its related projects (Hive, HBase, Zookeeper, & so on) have plenty of contributors from across the ecosystem. Though Java code is most common, any programming language can be used with "streaming" to implement the "map" & "reduce" parts of the technique.

 

Hadoop permits a computing solution that is:

 

 Scalable New nodes can be added as needed, & added without needing to fine-tune data formats, how data is loaded, how jobs are written, or the applications on top.

 Cost effective Hadoop brings massively parallel computing to commodity servers. The result is a sizeable decrease in the cost per terabyte of storage, which in turn makes it affordable to model all of your data.

 Flexible Hadoop is schema-less, & can absorb any type of data, structured or not, from any number of sources. Data from multiple sources can be joined & aggregated in arbitrary ways enabling deeper analyses than any process can provide.

 Fault tolerant When you lose a node, the process redirects work to another location of the data & continues processing without missing a beat.

 

Apache Hadoop is 100% open source, & pioneered a fundamentally new way of storing & processing data. In lieu of relying on costly, proprietary hardware & different systems to store & technique data, Hadoop permits distributed parallel processing of immense amounts of data across cheap, industry-standard servers that both store & technique the data, & can scale without limits. With Hadoop, no data is sizable. & in today's hyper-connected world where increasingly data is being created every day, Hadoop's breakthrough advantages mean that businesses & organizations can now find value in data that was recently thought about useless.  <a href=" http://hadooponlinetrainings.com/hadoop-online-training/">Online Hadoop Training</a>

sudheer babu

unread,
Oct 19, 2013, 5:51:11 AM10/19/13
to iltec...@googlegroups.com
It was nice article it was very useful for me as well as useful for <a href="http://123trainings.com/it-hadoop-bigdata-online-training.html">online Hadoop training</a> learners.thanks for providing this valuable information.123trainings provides best <a href="https://www.youtube.com/watch?v=PqbYn5LXzRw">Hadoop online training</a>

G Suresh

unread,
Oct 22, 2013, 1:30:53 AM10/22/13
to iltec...@googlegroups.com


On Tuesday, 4 January 2011 08:28:56 UTC-5, assaf yardeni wrote:
Hi Guys,

At FLR , we are about to launch first version of our system to production environment (US Colo DC), and I'm very interested in hearing the ones of you that uses Hadoop in production:

Which distribution are you using. Do you use Clouder's CDH? If yes, which version? If not, which version of Hadoop / HBase are you using?

G Suresh

unread,
Oct 22, 2013, 1:37:00 AM10/22/13
to iltec...@googlegroups.com
iq online training is the popular online training institute in Hyderabad

They Are Offering Hadoop, Informatica, SAP, Mobile Apps, Android, Iphone etc
Reply all
Reply to author
Forward
0 new messages