Understanding Network User ID

1,009 views
Skip to first unread message

Sonny Rivera

unread,
Mar 11, 2015, 6:08:29 PM3/11/15
to snowpl...@googlegroups.com
FYI - I'm a SP novice so I apologize if this is a uninformed question.

I'm running the Clojure collector across a network of our web properties.  Can someone help me understand the sample information below.  As, I track visitors, I noticed that the Network User ID change with every event even though the visitor is on the same site (domain).  How does this enable me to track across multiple sites

user_fingerprint network_userid                        domain_userid    user_id
1046671599       3ce9538d-abd9-431f-bb2d-5fb796ad7367  8b7abcb959acbfde 7b5e959626 
1046671599       2895c3e2-a105-4c08-bc65-5ae53b9bdfd0  8b7abcb959acbfde 7b5e959626



Alex Dean

unread,
Mar 11, 2015, 6:26:19 PM3/11/15
to snowpl...@googlegroups.com
Hi Sonny,

This is most likely a user whose browser is configured to disable third-party cookies; the Safari browser does this by default. Disabled third-party cookies means that the Clojure Collector will generate a new network_userid on each request.

To track users across a network of sites, you would typically use network_userid but also lean on the user_ipaddress, user_fingerprint etc to fill in the gaps.

Separately, the next Snowplow JavaScript Tracker will be able to decorate outbound links with the domain_userid:

https://github.com/snowplow/snowplow-javascript-tracker/issues/109

This will be helpful for sharing domain_userids between a network of sites that you control (e.g. acme.com, acme.de, acme.com.au).

Hope this helps,

Alex

--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Co-founder
Snowplow Analytics
The Roma Building, 32-38 Scrutton Street, London EC2A 4RQ, United Kingdom
+44 (0)203 589 6116
+44 7881 622 925
@alexcrdean

Sonny Rivera

unread,
Mar 11, 2015, 7:16:34 PM3/11/15
to snowpl...@googlegroups.com
Thanks Alex. 

As usual, you are very responsive with concise and clear answer.

Thanks,

Sonny Rivera

Iain Gray

unread,
Mar 18, 2015, 11:16:52 AM3/18/15
to snowpl...@googlegroups.com
To follow on from Alex's reply, a lot of browsers these days disable TP-cookies. We use the third party cookie more for cookie-syncing (matching up FP cookies across domains) than as standalone ID.

You can systematically work out which browsers have TP cookies disabled by using Redshift's window functions:

SELECT domain_userid, network_userid
FROM
(SELECT 
domain_userid, 
network_userid,  
 CASE     
    WHEN lag(network_userid) OVER (PARTITION BY domain_userid ORDER BY dvce_tstamp) = network_userid THEN 1
    ELSE 0
  END 
AS nid_consistent
from atomic.events
WHERE network_userid <> '-'
AND event = 'page_view' )
GROUP BY domain_userid, network_userid 
HAVING SUM(nid_consistent) > 0


Will give you a list of users (by first party cookie) who keep the same network user_id across multiple pageviews.  Alternatively you could just look for the same network userid on multiple domains.  From our experience, about 60-70% of network_userids are reliable, so it's worth using as part of your matchup process. What we then do for unmatched users is fall back to IP, browser fingerprint (best used in combination with IP etc 

Sonny Rivera

unread,
Mar 19, 2015, 10:11:07 AM3/19/15
to snowpl...@googlegroups.com
Thanks for the help.  I believe that I understand (conceptually) what you doing.  We are looking for network_userid(s) that re-occur over or data set.  Those that do are used to match users across domains.I have a couple of questions
  • Is my understanding correct?
  • I don't quite understand the 'lag' window function (or the others 'lead', etc).  Can you explain what this function does or point me to it.
  • We could use this same approach with finger prints?
Thanks again

Iain Gray

unread,
Mar 19, 2015, 2:35:00 PM3/19/15
to snowpl...@googlegroups.com
  • We could use this same approach with finger prints?

Yes, that's right.  


  • I don't quite understand the 'lag' window function (or the others 'lead', etc).  Can you explain what this function does or point me to it.
Window functions took me a while to get my head around as well :-)

What they do is take your result set and divide it into frames according to the criteria in the OVER() clause, so PARTITION BY domain_userid enables you to run a function over a subset of the results for each domain_userid.  LAG() looks at the previous result in the window, i.e. if you're on line 1 of the window, it will be NULL, if you're on line 2, it will be line 1 etc etc 

They are very useful in snowplow data for dividing it up into clickstreams by user.

http://docs.aws.amazon.com/redshift/latest/dg/c_Window_functions.html  is the docs page


  • We could use this same approach with finger prints?

Yes, but be careful - finger prints aren't very unique the way cookieIDs are. They are best used in combination with other datapoints such as IP address.  I'd use them as part of an iterative process, i.e. first use network IDs where they are reliable, then use IP + fingerprint.

Cheers

Iain
Reply all
Reply to author
Forward
0 new messages