Hadoop Cluster Ready: What questions do you have?

56 views
Skip to first unread message

Matt Davies

unread,
Sep 3, 2014, 12:16:06 PM9/3/14
to uh...@googlegroups.com
Hey everyone.

As part of the air quality competition you've been hearing we have an Hadoop cluster available for use.  Well, it is here, and I've set the basics up last week.  We have a couple more things to iron out before we hand out instructions on how to access it.

The cluster is a CDH with the basics: pig, hive, sqoop, hcatalog, and oozie.  We do not, at this point, have HBase up and running or other technologies not listed here.  We will be loading all the datasets as received up on this cluster (one of the things to be ironed out).

We will be giving a single account out to each team to keep the complexity low on our management tasks.  At this point everyone is weighted evenly for cluster access.  Please be considerate of others...  

The cluster has 85 TB raw capacity, which equates to a little over 2 TB space for each team if we split evenly.  If you need more please reach out to us so we can help coordinate and not run out of space.

Given that - let's open the flood gates with q's.  I'm here, others are out there - what info can we help you with on this cluster (when you have access) to help make your lives easier?

-Matt 

Fitz Bushnell

unread,
Sep 9, 2014, 10:35:23 PM9/9/14
to uh...@googlegroups.com
Hey, Matt.

Thanks for taking on the administration of this cluster.  I'm figuring there's probably a Java compiler available on the cluster, but I wanted to check.  I think it'd also be helpful if Python were available, as it's handy for munging data without resorting to Java.  I don't know how much work it'd be to include that, but if it's significant, I'd be happy to help out (with the caveat that I can barely spell RPM).  Thanks.

Fitz

Matt Davies

unread,
Sep 10, 2014, 11:15:37 AM9/10/14
to uh...@googlegroups.com
Hey Fitz,

Not a problem - I'm excited we have this resource. I'm still loading data, but if you have data ready to go then it is ready for you.  

The cluster gateway does have python installed as well as the java compiler.

Versions:
Python 2.6.6
Oracle java version "1.6.0_31"

Let me know how else I can help!


--
--
Visit us on the net: http://www.uhug.org
 
You received this message because you are subscribed to the Google
Groups "Utah Hadoop Users Group" group.
To manage your subscription, visit this group at
http://groups.google.com/group/uhug?hl=en

---
You received this message because you are subscribed to the Google Groups "Utah Hadoop Users Group - Big Data Utah" group.
To unsubscribe from this group and stop receiving emails from it, send an email to uhug+uns...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Adams

unread,
Sep 10, 2014, 10:29:35 PM9/10/14
to uh...@googlegroups.com
Why not load the public data sets into a shared area?  Or allow users to load public data there?

--

Matt Davies

unread,
Sep 10, 2014, 11:09:57 PM9/10/14
to uh...@googlegroups.com
David,

Thanks for the suggestion.  I'm doing just that - loading the data into a public area to keep the number of copies to a minimum.  I'm also injecting the ones that I can into hive.

If there are other datasets people would like to make public please let us know. I know it may seem like there is a lot of space, but I'm really concerned about how fast it will be consumed once everyone creates derivative datasets. I'm just trying to head off those late night calls asking to free up space.

Having said all that - there is nothing that prevents one from creating a directory and leaving it wide open with permissions. 

-Matt

Gregg Cowley

unread,
Sep 11, 2014, 8:15:05 PM9/11/14
to uh...@googlegroups.com
Hey,

I got the e-mail with all the VPN and login information. But It keeps telling me that the login information is incorrect for the VPN connection via secure.supermicro.com/826e.  Is anyone else having issues?

--Gregg

Jon St. John

unread,
Sep 11, 2014, 11:16:48 PM9/11/14
to uh...@googlegroups.com
Greg,

I ran into trouble at first, but it was a simple copy-and-paste error on the password - had a trailing space.  Otherwise, everything seemed to be working fine -

Jon

Gregg Cowley

unread,
Sep 12, 2014, 10:30:25 AM9/12/14
to uh...@googlegroups.com
Thanks Jon,
 
I did notice the spaces in the cut and paste, but even after typing it all in fresh I still get the login error.  Was there something that I need to install, or should it all work from the web-page link in the e-mail?
 
Thanks
Gregg

Matt Davies

unread,
Sep 12, 2014, 10:39:20 AM9/12/14
to uh...@googlegroups.com
Gregg,

It should all work from the link in the email.  I did notice that on the Mac you need to use Safari to get the Java stuff going, but that only comes into play after the login.  

So, to confirm you are getting blocked at the login page?

Brett Ragozzine

unread,
Sep 13, 2014, 12:01:04 PM9/13/14
to uh...@googlegroups.com
Hello,

I'm glad to hear that python is on there. I am just recently getting into java.

Brett

Pat Wright

unread,
Sep 13, 2014, 3:45:53 PM9/13/14
to uh...@googlegroups.com
Just to make sure everyone has the info. 
This should be the info to get you into the cluster once on vpn.  

For All Users

CDH app: http://172.27.30.10:7180/cmf/home (ro/ro)

Hue app: http://name1:888 (username/password below)


SSH to name1(172.27.30.10) as gateway.



On Wednesday, September 3, 2014 10:16:06 AM UTC-6, Matt Davies wrote:
Reply all
Reply to author
Forward
0 new messages