Gathering "big data" - your advice?

103 views
Skip to first unread message

Aaron Lifshin

unread,
Sep 5, 2015, 4:19:09 PM9/5/15
to teamlessigtech
So, how would we go about starting to collect "big data" about visitors to the site?

Currently there is no system for this in place.

There was a lot of interest on this group about this.  Curious to hear your ideas and thoughts.

A

David Maust

unread,
Sep 5, 2015, 4:45:29 PM9/5/15
to teamlessigtech
Hi,

Start by logging every request, IP, cookie information, and some representation of the action the user performed. Raw data is best. Place it on S3 for processing on Hadoop or Spark. From Spark or Hadoop you can generate aggregations or machine learning models.

The other question is if you can enrich the data with third party data (BlueKai or others).

David

Blake West

unread,
Sep 5, 2015, 6:38:48 PM9/5/15
to teamlessigtech
There's a jillion tools for collecting data about visitors to a site. Google Analytics is really simple to get rather solid data collection.
Something like MixPanel is also a great solution. Implementing either one is, relatively speaking, not that much work. I've never personally used MixPanel, but I know it's very highly regarded. Google Analytics is something I've used, and it's really easy. No need to get Hadoop, or other technologies involved at this point. There simply isn't enough data to warrant that, nor do I believe our queries of what's happening would be complicated enough to necessitate such tools.

On Saturday, September 5, 2015 at 1:19:09 PM UTC-7, aaronlifshin wrote:

Aaron Lifshin

unread,
Sep 5, 2015, 6:44:25 PM9/5/15
to Blake West, teamlessigtech
Google Analytics does not give access to cookie or underlying data.  It's more of a consumer tool.

A


--
You received this message because you are subscribed to the Google Groups "teamlessigtech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to teamlessigtec...@googlegroups.com.
To post to this group, send email to teamles...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/teamlessigtech/17f9d9a6-9c35-4e87-9d08-b3e69f8ac274%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

David Maust

unread,
Sep 5, 2015, 7:01:29 PM9/5/15
to teamlessigtech
Actually a pretty easy method is using Snowplow analytics. They provide Javascript trackers and use cloudfront logging for data collection. This will directly dump the data onto S3.


David

Blake West

unread,
Sep 5, 2015, 7:10:43 PM9/5/15
to Aaron Lifshin, teamlessigtech
What do you mean access to cookies? We (as LessigForPres.com) are the one's creating cookies in the user's browser. Presumably, we already have access to the cookies we placed in their browser. Cookies from other sites are not things we could or should have access to, and even if we could, they're encrypted. Analytics tools will not give these to you.
What exactly do you mean when you say GA is more of a consumer tool? It's definitely designed to be used by businesses, and certainly is used by hundreds of thousands, or millions of businesses. MixPanel as well is a serious enterprise product used by tons of companies.

But seriously, let's not get hung up on those details. We just want to get Lessig elected. We should start with... "What questions are we trying to answer? What do we need to answer those?"
If the questions are like... 
 - How many people are coming to the site?
 - How long are they there?
 - How far do they get into the donation funnel?
 - Where do they come from in the country?
 - What directed them to the page? (google, a friend, social media, etc.)

All such questions can be answered by normal analytics tools (whether it's GA, or MixPanel, or whatever)

So what questions do we think we're looking to answer?

David Maust

unread,
Sep 5, 2015, 7:15:53 PM9/5/15
to teamlessigtech
I agree. Google Analytics is a valuable tool. I believe you should be going deeper than what Google analytics provides, but Google Analytics gives you a lot basically for free.

David

Andy Keil

unread,
Sep 6, 2015, 3:58:20 PM9/6/15
to teamlessigtech
Hey David,

What questions do you intend to answer by going deeper?

David Maust

unread,
Sep 6, 2015, 4:49:31 PM9/6/15
to teamlessigtech
First thing would be to explore the data. Currently I don't have access to any analytics from the campaign. I would also be interested in building a linear model (logistic regression with lasso regularization) to identify the most significant predictors of a user's likelihood to contribute. Also, we could look at groupings of users or possibly an alternating least squares model on some of the predictors to estimate the value of each user to the campaign. This could be used to optimize ad spend. If we can join this against other datasets, we might also be able to obtain some interesting features there. These are just a couple ideas. In my experience, there is no substitute for having raw logs, one line per event with as many features as possible stored as fields.

David


On Sunday, September 6, 2015 at 12:58:20 PM UTC-7, Andy Keil wrote:
Hey David, to 

Aaron Lifshin

unread,
Sep 8, 2015, 2:29:41 PM9/8/15
to teamlessigtech
Hey all,

A lot of great questions and good thinking in this thread.

I have to bang these answers out quickly, so I don't always say things in the best way.  Analytics is great, and I shouldn't have called it a "consumer" tool.  What I meant to say is that it doesn't give us full access to our data. Or the cookies, for that matter, as they are google cookies.

Let me assure everyone on this thread that the campaign has a very sophisticated (especially for a 4-week-old oganization) usage of both Google Analytics and A/B testing tech.  For the latter we are standardized on VWO. Which, by the way, if you ever need an optimizely replacement, is pretty ok.

The next level for us is to be more intelligent about our audience and begin to show different content depending to first time visitors vs. donors vs. volunteers, etc.

We would also like to move towards showing different content depending what we might know a user is interested in.  The most relevant issues to a user and how these might link to corruption.

Finally, the analysis David mentions would be great to do.  David, it is my understanding that Google Analytics would not give you the data that would allow you to do such an analysis, as it only provides aggregated and not individual data.

How would you guys set up the cookies?

Thank you for your input and advice,

A


Dan Acheson

unread,
Sep 8, 2015, 4:04:08 PM9/8/15
to Aaron Lifshin, teamlessigtech
Hi All,

Aaron, thanks a lot for clarifying some of the main questions you are interested in answering at the moment. In response to what Aaron wrote, here are a few thoughts / questions. Apologies that they extend a bit beyond the original thread.

1. Based on Aaron's response, it seems like first priority, however, is user segmentation, which is critical to delivering different content based on user interests.  What sort of data is available for segmentation? Are we simply thinking of using some form of demographic segmentation (e.g., age, gender, geographic location), or is there the possibility for behavioral segmentation as well (e.g., based on people's actions on the website or otherwise)? 

2. Regarding David's analysis, modeling likelihood to contribute and expected lifetime value (i.e., total contributions) seems like a reasonable thing to do, but would require more granular data than what Google Analytics provides. I might add to this the likelihood that someone signs up for a recurring donation. As he notes, though, we'd need individual level data to pull this off. 

3. What sorts of A/B testing is being implemented right now? Is there anything going on re: messanging? This would provide a fertile ground to validate some of the segmentation above, and more generally, a place to look at responses to different content / messages (e.g., time on site, likelihood to donate / volunteer, likelihood to share via social, etc.). 

4. Where do things stand with social media strategy / analytics at the moment? What tools are being used to collect social data? Any work being done with social listening on Facebook and Twitter? Experiments with different hashtags and messages to see what sticks? 

5. If we do want to start getting more granular, have there been any discussions around data warehousing / access? AWS seems an obvious choice here. This isn't my expertise, but probably something to figure out sooner rather than later as this thing ramp up. (Apologies for the obviousness of this statement)

As I've noted before, I'm happy to contribute my skills to any analytics going on, so please let me know how I can get involved. 

-Dan






ricardo ivanicci

unread,
Sep 9, 2015, 3:09:10 AM9/9/15
to teamlessigtech
blake, 
you appear to me (as i am totally uneducated in this realm) to have knowledge in this field ... some time ago i posted this:
and wonder if you or someone you know has access to the volunteer program re:

currently the labor day telethon volunteer phone call mechanism is continuing to be OPEN ...It is available from 9:00am to 9:00pm Eastern Standard Time (join? commit?) ...  Is there anyone on teamlessigtech who can interface with whoever is piecing this mechanism together to code a way to separate and differentiate the donors area codes so calls can be made to west coast donors until 9:00pm PST (as are being done to those in the EST zone)  ...  maybe we need a googler to get onboard?   someone who can help code ...  pass this along to the appropriate (?) person/link, please    and thank you.


Blake West

unread,
Sep 9, 2015, 3:31:47 AM9/9/15
to teamlessigtech
   Hi Ricardo,
   Yeah, I do web programming for a living. As for your question, that seems entirely possible, though I don't really know where to start without knowing more about how the phone call thing is currently setup. Are you guys using Twilio, or something similar? (Twilio is a service for routing phone calls programatically). Maybe I can speak with someone who's more involved with the implementation of whatever calling system you're using? - blake

Blake West

unread,
Sep 9, 2015, 3:45:53 AM9/9/15
to teamlessigtech
   We might want to use something like Segment. It doesn't get data for you, but it allows you to send up whatever user data you like to Segment, and then be able to send that data to any of hundreds of analytics services later on. 
   As for actual data about the user though, that may be hard to come by. I'm just thinking that getting people to fill out forms could be tough, and they certainly won't want an "account" on Lessig's site. My hunch is our best bet will be tracking referring sites, and understanding what campaigns are driving people here. The user's themselves may end up being a bit more of a black-box, but we could have luck in trying to understand which ad spends (and thus audience types) are performing best. To which... hooking up ad words and ad-sense together could be a great combo. Those tools 100% support conversion tracking (of anything, whether it's signing up for a mailing list, or actually donating), and we can correlate it back to which ad's or campaigns are driving them. And they support it by demographics, campaign, "cohort", whatever.

So yeah... take aways.. I'd say we could do Segment, but... I'm not really sure what data on the user we'd have to send to Segment. And without any individual data, we're left tracking campaigns and ads, which is likely to be totally sufficient for now, and can be done with our existing tools! yay!

thoughts?

Jason Pfaff

unread,
Oct 23, 2015, 9:44:41 AM10/23/15
to teamlessigtech
Is this still an open item?  I have significant data experience.  happpy to help.  Just let me know where the project stands.
Reply all
Reply to author
Forward
0 new messages