Hadoop

90 views
Skip to first unread message

Chad Cottle

unread,
Nov 8, 2012, 4:13:32 PM11/8/12
to coll...@googlegroups.com
Yo yo yo.  Is anyone out there doing "big data" or anything with Hadoop?  I want to network with u if'n u do.  :)

nx

unread,
Nov 8, 2012, 4:20:33 PM11/8/12
to coll...@googlegroups.com
I'm not familiar with Hadoop, but I do know that choosing data storage
depends on a lot of different factors, i.e. what is it shaped like,
how long does it need to live, quantity, etc.

On Thu, Nov 8, 2012 at 4:13 PM, Chad Cottle <opend...@gmail.com> wrote:
> Yo yo yo. Is anyone out there doing "big data" or anything with Hadoop? I
> want to network with u if'n u do. :)
>
> --
> You received this message because you are subscribed to the Google Groups
> "Collexion" group.
> To post to this group, send email to coll...@googlegroups.com.
> To unsubscribe from this group, send email to
> collexion+...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/collexion?hl=en.

Chase Southard

unread,
Nov 8, 2012, 4:24:32 PM11/8/12
to coll...@googlegroups.com, Kentucky Ruby user group
Chad, 

I think we've talked about Hadoop in a previous KyRUG meetup. I've CC'd those fine folks in.

Chase


On Thu, Nov 8, 2012 at 4:13 PM, Chad Cottle <opend...@gmail.com> wrote:
Yo yo yo.  Is anyone out there doing "big data" or anything with Hadoop?  I want to network with u if'n u do.  :)

--
You received this message because you are subscribed to the Google Groups "Collexion" group.
To post to this group, send email to coll...@googlegroups.com.
To unsubscribe from this group, send email to collexion+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/collexion?hl=en.



--
R. Chase Southard
http://about.me/southard

Brian

unread,
Nov 8, 2012, 4:33:48 PM11/8/12
to coll...@googlegroups.com

I haven't an active hadoop cluster, however have worked with it for our "big data" tech before. As well as mongodb, postgres, mysql, couchdb etc.... if those can be considered big data storage engines.

-brian on droid

On Nov 8, 2012 4:13 PM, "Chad Cottle" <opend...@gmail.com> wrote:
Yo yo yo.  Is anyone out there doing "big data" or anything with Hadoop?  I want to network with u if'n u do.  :)

--

Chad Cottle

unread,
Nov 8, 2012, 4:38:33 PM11/8/12
to coll...@googlegroups.com
Brian, YES!!!  Would love to chat.

Brian

unread,
Nov 8, 2012, 4:41:07 PM11/8/12
to coll...@googlegroups.com

Sure no problem I'll talk, not sure how much help I'll be though.

-brian on droid

Adam Recktenwald

unread,
Nov 8, 2012, 5:09:18 PM11/8/12
to coll...@googlegroups.com, coll...@googlegroups.com
Hadoop - a little (playing)... Been wanting to dive back in!

Recently neck deep in SAP Hana - columnar / in-memory db. Wicked fast and based on need would qualify as 'big data' in nature. 

You can now get some cheap(er) Hana on aws. 


On Nov 8, 2012, at 4:13 PM, Chad Cottle <opend...@gmail.com> wrote:

Yo yo yo.  Is anyone out there doing "big data" or anything with Hadoop?  I want to network with u if'n u do.  :)

--

Benjamin Askren

unread,
Nov 8, 2012, 5:32:24 PM11/8/12
to coll...@googlegroups.com, coll...@googlegroups.com
Chad -

I'm very interested.  When are you going to get started?

Sent from my iPhone

On Nov 8, 2012, at 4:13 PM, Chad Cottle <opend...@gmail.com> wrote:

Yo yo yo.  Is anyone out there doing "big data" or anything with Hadoop?  I want to network with u if'n u do.  :)

--

Chad Cottle

unread,
Nov 9, 2012, 8:49:26 AM11/9/12
to coll...@googlegroups.com, dpau...@lexingtonky.gov, Nick Such, openle...@googlegroups.com
Perhaps those interested can have a meetup and we can discuss.  The next phase of Open Data is going to focus on our NIMS (Neighborhood Information Mgmt Systems) and VeloCITY (where we want to use big data to speed up decision making for state/local). 

We (city of Lex) have only dipped our toe in the waters but we want to explore how big data can help with the above efforts and our Bloomberg challenge (citizenlex).

I looped a few folks into this thread.  I'd be happy to arrange a coffee, lunch, beer or some other gathering to talk about these topics.

Chad Cottle

unread,
Nov 9, 2012, 8:51:49 AM11/9/12
to coll...@googlegroups.com, dpau...@lexingtonky.gov, Nick Such, openle...@googlegroups.com
Another thought is this:   Chase Southard, the local Code for America Brigade Captain, has "skill share" as the topic for the next meeting of the brigade.  We could use that as the venue if that works for everyone.

Chase?

Todd Willey

unread,
Nov 9, 2012, 9:05:39 AM11/9/12
to openle...@googlegroups.com, coll...@googlegroups.com, dpau...@lexingtonky.gov, Nick Such
I love that idea.

Brian Smith

unread,
Nov 6, 2012, 6:26:58 PM11/6/12
to coll...@googlegroups.com, Chad Cottle, dpau...@lexingtonky.gov, Nick Such, openle...@googlegroups.com
When is that meeting?
-brian

Chad Cottle

unread,
Nov 9, 2012, 9:57:37 AM11/9/12
to Brian Smith, coll...@googlegroups.com, dpau...@lexingtonky.gov, Nick Such, openle...@googlegroups.com
Date not set yet but I think it was tentatively set for next week or the week after.
Chase is flying to AZ today so he might not be on his email box.

Brian Smith

unread,
Nov 9, 2012, 11:05:08 AM11/9/12
to Chad Cottle, coll...@googlegroups.com, dpau...@lexingtonky.gov, Nick Such, openle...@googlegroups.com
I don't think I'll be able to attend anything next week. I've got some people from out of town coming to the house.

What kind of assistance you all looking for?
-brian

Chad Cottle

unread,
Nov 9, 2012, 11:08:01 AM11/9/12
to Brian Smith, coll...@googlegroups.com, dpau...@lexingtonky.gov, Nick Such, openle...@googlegroups.com
For me, and probably for others, just want to meet and learn about some Big Data "stuff."

Brian Smith

unread,
Nov 9, 2012, 1:16:28 PM11/9/12
to Chad Cottle, coll...@googlegroups.com, dpau...@lexingtonky.gov, Nick Such, openle...@googlegroups.com
Okay sure then! I'd love to learn from some other's experience other than my own :)
-brian

Brian

unread,
Nov 10, 2012, 2:26:07 PM11/10/12
to Wes Brooks, Nick Such, coll...@googlegroups.com, openle...@googlegroups.com, dpau...@lexingtonky.gov

Hah that's funny, I have a very similar stack that's worked well.

-brian on droid

On Nov 10, 2012 1:18 PM, "Wes Brooks" <wesgb...@gmail.com> wrote:
I'm new to the group, but I've been building a healthcare startup with big data / data science implications for the last few months.  I've been teaching myself everything (MongoDB, Python, Django, and some data science tools) from the ground up since my background is not CS and would love to chat about Big Data "stuff" :)


On Thursday, November 8, 2012 4:13:34 PM UTC-5, Open Data wrote:
Yo yo yo.  Is anyone out there doing "big data" or anything with Hadoop?  I want to network with u if'n u do.  :)

Ate Poorthuis

unread,
Nov 9, 2012, 3:43:52 PM11/9/12
to coll...@googlegroups.com, dpau...@lexingtonky.gov, Nick Such, openle...@googlegroups.com, Matthew Zook
Chad,

We've been building a repository of all geotagged tweets at UK since last December (currently at ~1.5 billion tweets). We use a combination of Cassandra and Elasticsearch to store, analyze and do real-time search. Happy to share experiences!

Best,

Ate


On Fri, Nov 9, 2012 at 8:49 AM, Chad Cottle <opend...@gmail.com> wrote:

Christopher Harn

unread,
Nov 11, 2012, 6:58:33 AM11/11/12
to coll...@googlegroups.com
Dear Ate,
 
I wonder about the privacy concerns with your project.
 
#CitizenLex
To unsubscribe from this group, send email to mailto:collexion%2Bunsu...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/collexion?hl=en.
--
You received this message because you are subscribed to the Google Groups "Collexion" group.
To post to this group, send email to coll...@googlegroups.com.
To unsubscribe from this group, send email to mailto:collexion%2Bunsu...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/collexion?hl=en.

Brian Smith

unread,
Nov 12, 2012, 10:33:12 AM11/12/12
to coll...@googlegroups.com, Christopher Harn
Tweets are public, thus no privacy can be implied.
-brian

Chris Harn

unread,
Nov 12, 2012, 10:34:32 AM11/12/12
to coll...@googlegroups.com
agreed, but then people need to be made more aware of that fact.

Jerome Hollon

unread,
Nov 12, 2012, 10:35:37 AM11/12/12
to coll...@googlegroups.com
If people are tweeting and they don't realize tweets are public, then they really, really, really shouldn't be tweeting, and probably shouldn't be on the internet.

Noah Adler

unread,
Nov 12, 2012, 10:44:19 AM11/12/12
to coll...@googlegroups.com
Aren't privacy concerns an endemic part of Big Data projects?  I'm sure people here are on the same page, and are taking cautious measures to protect privacy.  In that light, Twitter seems like a good candidate for this type of work because it's already well-known to be public.  Hopefully it's 'generally accepted' among this group that privacy is taken seriously, and we can focus on the more technically interesting parts.  (At least that's my hope!  I didn't not go to law school for nothing!)

Dave

unread,
Nov 12, 2012, 11:03:50 AM11/12/12
to coll...@googlegroups.com
Just make sure no medical information in included, else it may run afoul of HIPPA and
HITECH:

http://en.wikipedia.org/wiki/Health_Insurance_Portability_and_Accountability_Act
http://en.wikipedia.org/wiki/HITECH_Act

Dave

--- On Mon, 11/12/12, Noah Adler <noah....@gmail.com> wrote:

Dave

unread,
Nov 12, 2012, 11:10:44 AM11/12/12
to coll...@googlegroups.com
--- On Mon, 11/12/12, Jerome Hollon <jerome...@gmail.com> wrote:

> From: Jerome Hollon <jerome...@gmail.com>
> Subject: Re: Hadoop
> To: coll...@googlegroups.com
> Date: Monday, November 12, 2012, 10:35 AM
>
> If people are tweeting and they don't realize tweets are public,
> then they really, really, really shouldn't be tweeting, and probably
> shouldn't be on the internet.

Eternal September:

http://en.wikipedia.org/wiki/Eternal_September

Dave

Chase Southard

unread,
Nov 13, 2012, 1:27:33 PM11/13/12
to openle...@googlegroups.com, coll...@googlegroups.com, dpau...@lexingtonky.gov, Nick Such
Sorry. Buried under email. 

Hadoop-up sounds great. OpenLexington meetup slated for Nov. 21. But that's right before Thanksgiving. We can move to Nov 28 if that's better for folks.

Chad Cottle

unread,
Nov 13, 2012, 1:44:07 PM11/13/12
to coll...@googlegroups.com, openle...@googlegroups.com, dpau...@lexingtonky.gov, Nick Such
Nov 28th is proly better....a lot of folks and UK students are gone on that Wednesday.  I am fine with whatever though.  :)

Chad

Jerome Hollon

unread,
Nov 13, 2012, 2:32:43 PM11/13/12
to coll...@googlegroups.com
It's actually okay if there is medical information. HIPAA only protects medical information attached to PII (Personally Identifiable Information), so if we strip names and unique identifiers, we're not at risk for a HIPAA violation.

Also (according to the training I have to take every year), HIPAA doesn't apply to information faxed.

Ashley Greer

unread,
Nov 13, 2012, 2:48:51 PM11/13/12
to coll...@googlegroups.com
I'd like to emphasize Noah's earlier point. Regardless of HIPAA or any other regulation, as technically-minded members of society, we are tasked with a responsibility to educate and respect other's privacy. 

I believe we actively create our reality. If our reality is to not educate or not produce usable online tools which provide users with information that they can use for awareness then what kind of society are we creating?

Providing a tool which does not afford information about privacy risks to users is no longer a tool, it is disabling. It creates disabilities for more than just everyday people, it inhibits the people we care about: our friends and family.

-Ashley

Jerome Hollon

unread,
Nov 13, 2012, 2:59:09 PM11/13/12
to coll...@googlegroups.com
I don't believe you understood my email. I suggested we strip personally identifiable information from our dataset. You can't have a privacy leak if there's no person to attach private information too.
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Brian Smith

unread,
Nov 13, 2012, 3:11:09 PM11/13/12
to coll...@googlegroups.com, Jerome Hollon
Be careful, even though you may not have PII defined, someone could abuse a poorly designed dataset to identify an individual by inference of auxiliary information.
-brian
Sent from my Android phone with K-9 Mail. Please excuse my brevity. --

Todd Willey

unread,
Nov 13, 2012, 3:13:23 PM11/13/12
to Collexion
When we talk about big data, its possible to mine identity out of data, even if it is given tokens instead of explicit values to "anonymize" it: http://en.wikipedia.org/wiki/AOL_search_data_leak

Tweets and the like (eg: things that are explicitly public) aren't a problem.  I think what everyone is interested in here are public data sets anyway.  Use your best judgement otherwise.

Also, not all fax and data lines are hipaa compliant, and you need a statement from the person you're sending to that they have taken safeguards.  Its just that fax isn't treated any differently than email, etc. but it must have reasonable safeguards like being away from anywhere a patient can visit/see, etc.

-todd[1]

Chris Harn

unread,
Nov 13, 2012, 3:15:03 PM11/13/12
to coll...@googlegroups.com
BS. I found adam ant’s real handle on twitter.
 
At least I think I did Winking smile
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
--
You received this message because you are subscribed to the Google Groups "Collexion" group.
To post to this group, send email to coll...@googlegroups.com.
To unsubscribe from this group, send email to mailto:collexion%2Bunsu...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/collexion?hl=en.
wlEmoticon-winkingsmile[1].png

Chris Harn

unread,
Nov 13, 2012, 3:17:59 PM11/13/12
to coll...@googlegroups.com
“It will be OK if I design this nuclear weapon, because we just wont use it.”
 
I can’t believe twitter even allows such a feed,
 
Your sentence phrasing implies that stripping names out is optional... sure now they will strip out names but later they may decide not too.
 
or even if they continue to strip out names people can infer things without identies.
 
for example: my fish are hungry, I know this because they swim to the top.
Sent from my Android phone with K-9 Mail. Please excuse my brevity. --

Todd Willey

unread,
Nov 13, 2012, 3:24:37 PM11/13/12
to Collexion
Sure, and the countermeasure is to swim when nobody is looking.  So don't use twitter or use a protected account if you care.  The dataset is built from people who have explicitly stated they don't care about the privacy of their tweets (and the various implications such as personal exposure generated from those tweets).

Christopher Harn

unread,
Nov 13, 2012, 3:48:06 PM11/13/12
to coll...@googlegroups.com
I like your countermeasure Smile and I like nice clean arguments. Smile
 
OK so here is what my criminal mind conjured up while driving home...
Politics for example... we collect the musings of the entire nation region by region and ideally if your system is working and tweaked right you can deduce the outcome of the election.  Then you take your money and bet on the election.
 
 
 
 
 
Sent: Tuesday, November 13, 2012 3:24 PM
Subject: Re: Hadoop
 
wlEmoticon-smile[1].png

Dave Hempy

unread,
Nov 15, 2012, 10:02:43 AM11/15/12
to coll...@googlegroups.com

You’re probably well past any information InformationWeek has, but this week their cover story is about Sears adopting Hadoop to sweep out all its mainframe and ETL tech. 

 

-dave

Chad Cottle

unread,
Nov 15, 2012, 10:08:46 AM11/15/12
to coll...@googlegroups.com
Going there now.  Thanks for the heads-up Dave.
Reply all
Reply to author
Forward
0 new messages