Poor Mobile device detection in dvce_type and dvce_ismobile fields

491 views
Skip to first unread message

Daniel Ramagem

unread,
May 21, 2014, 8:01:05 AM5/21/14
to snowpl...@googlegroups.com
I've noticed that some problems with the reliability of dvce_type and dvce_ismobile fields.  The detection of "Mobile" in particular seems buggy, as I frequently see User Agent strings like

Mozilla/5.0 (iPhone; CPU iPhone OS 7_0_2 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Mobile/11A501

And these values in Snowplow:

dvce_type = Computer
dvce_ismobile = <blank>
os_name = Mac OS
os_family = Mac OS
os_manufacturer = Apple Inc.
br_name = Apple Mail
br_family = Apple Mail
br_type = Email Client
br_renderengine = WEBKIT

I looked at the ClientEnrichments.scala#extractClientAttributes method and it seems to be correctly using the UserAgent.java class of the user-agent-utils library (version 1.11).  I thought maybe there's a bug in user-agent-utils, but when I created a unit test for the above User Agent string I correct got it detected:

@Test
public void testSnowplow() {
  String s = "Mozilla/5.0 (iPhone; CPU iPhone OS 7_0_2 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Mobile/11A501";
  UserAgent userAgent = UserAgent.parseUserAgentString(s);
  assertEquals(Browser.APPLE_WEB_KIT, userAgent.getBrowser());
  assertEquals(OperatingSystem.iOS7_IPHONE, userAgent.getOperatingSystem());
  assertEquals(DeviceType.MOBILE, userAgent.getOperatingSystem().getDeviceType());
}

So the only thing I can think of right now is that somehow the static declarations of the enum types in OperatingSystem.java is not deterministic when this library is being added into the Snowplow's scala-common-enrich Scala project?  So the MAC_OS type gets created before the IOS type (even though it is declared later in the source code), which causes it to be added first to the list of comparisons attempted in OperatingSystem#parseUserAgentString--consequently the User Agent substring for hardware/platform "(iPhone; CPU iPhone OS 7_0_2 like Mac OS X)" will match "Mac OS" before it matches "iPhone OS 7".

Thoughts?

Daniel


Alex Dean

unread,
May 21, 2014, 8:23:35 AM5/21/14
to snowpl...@googlegroups.com
Hi Daniel,

Many thanks for the very detailed and thoughtful bug post. Would you mind adding it as an issue in GitHub? https://github.com/snowplow/snowplow/issues/new

The behavior you've captured is very odd - and would go some way to explaining why the results from user-agent-utils have historically disappointed (vs e.g. ua-parser, which we will move to eventually).

Related issues:
  1. https://github.com/snowplow/snowplow/pull/662
  2. https://github.com/snowplow/snowplow/issues/62

Thanks,

Alex




--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Co-founder
Snowplow Analytics
The Roma Building, 32-38 Scrutton Street, London EC2A 4RQ, United Kingdom
+44 (0)203 589 6116
+44 7881 622 925
@alexcrdean

Daniel Ramagem

unread,
May 21, 2014, 8:44:27 AM5/21/14
to snowpl...@googlegroups.com
Hi Alex,


I'm a Scala newbie, so I don't know if my theory of non-deterministic ordering/creation of Enum types under that environment holds up.  In fact, I'm now not even sure if it's guaranteed to hold up under Java.  I'm going to do some research and post back any conclusions I reach.

Thanks for the quick response (as always)!

Daniel

Alex Dean

unread,
May 21, 2014, 8:46:16 AM5/21/14
to snowpl...@googlegroups.com
Hi Daniel,

Great - thanks for raising. Please post back to the issue anything you find!

A

Alex Dean

unread,
Jun 2, 2014, 4:20:59 AM6/2/14
to snowpl...@googlegroups.com
HI Daniel,

Did you uncover anything here?

Thanks,

Alex

Daniel Ramagem

unread,
Jun 2, 2014, 3:02:12 PM6/2/14
to snowpl...@googlegroups.com
Alex,

I have not had time to investigate the issue of the (apparently) non-deterministic static enum loading order that seems to be the cause of the problem.  To address the issue of improved browser identification we decided to go with performing an ETL step, after Snowplow event retrieval, to get better results.  We are using UADetector, which seems robust in that it uses a constantly updated public database of User Agent information: http://user-agent-string.info/

I tried parsing the User Agent that gave buggy information from Snowplow:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.137 Safari/537.36

And got back (and tried mapping results onto Snowplow fields and the library's methods):

Field - Value - Snowplow Attribute - UADetector method
UA type - Browser - br_type - ReadableUserAgent.getTypeName()
UA name - Chrome 34.0.1847.137 - br_name - ReadableUserAgent.getName()
UA family - Chrome - br_family - ReadableUserAgent.getFamily().getName()
UA producer - Google Inc. - N/A - ReadableUserAgent.getProducer()
OS name - OS X 10.9 Mavericks - os_name ReadableUserAgent.getOperatingSystem().getName()
OS family OS X - os_family - ReadableUserAgent.getOperatingSystem().getFamilyName()
OS producer - Apple Computer, Inc. - os_manufacturer  - ReadableUserAgent.getOperatingSystem().getProducer()
Device - Personal computer - dvce_type - ReadableUserAgent.getDeviceCategory().getName()

Additional Snowplow attributes to be mapped:

br_renderengine - No UADetector method to extract render engine
br_version - ReadableUserAgent.getVersionNumber().getVersionString()
dvce_ismobile - dvce_type IN? [SMARTPHONE, TABLET, WEARABLE_COMPUTER] OR br_type == MOBILE_BROWSER

Hope this information is helpful.

Daniel

Simon Rumble

unread,
Jun 2, 2014, 8:14:26 PM6/2/14
to snowpl...@googlegroups.com
Yeah I've found the detection to be a bit poor too. I've ended up maintaining some rules in Tableau (which I'll port into the SQL queries at some point, possibly into the Looker model once I get into Looker more) for my purposes.

This one is called "dvce category" and groups things between Tablet, Phone and Desktop. Android tablets are lumped into phone, which isn't ideal:
IF CONTAINS([useragent], 'iPad') THEN 'Tablet' 
ELSEIF CONTAINS([useragent], 'iPhone') THEN 'Phone'
ELSEIF CONTAINS([useragent], 'Android') THEN 'Phone'
ELSEIF CONTAINS([useragent], 'Nokia') THEN 'Phone'
ELSEIF CONTAINS([useragent], 'Windows Phone') THEN 'Phone'
ELSEIF CONTAINS([useragent], 'BlackBerry') THEN 'Phone'
ELSEIF CONTAINS([useragent], 'Mobile') THEN 'Phone'
ELSEIF CONTAINS([useragent], 'Opera Mini') THEN 'Phone'
ELSEIF CONTAINS([useragent], 'Windows') THEN 'Desktop'
ELSEIF CONTAINS([useragent], 'X11') THEN 'Desktop'
ELSEIF CONTAINS([useragent], 'Macintosh') THEN 'Desktop'
ELSEIF CONTAINS([useragent], 'PLAYSTATION') THEN 'Other'
ELSE 'Other'
END

This one is called "dvce detect" and gives a bit more detail. Again, Android tablets end up under a blanket "Android":
IF CONTAINS([useragent], 'iPad') THEN 'iPad' 
ELSEIF CONTAINS([useragent], 'iPhone') THEN 'iPhone'
ELSEIF CONTAINS([useragent], 'Android') THEN 'Android'
ELSEIF CONTAINS([useragent], 'Nokia') THEN 'Nokia'
ELSEIF CONTAINS([useragent], 'Windows Phone') THEN 'Windows Phone'
ELSEIF CONTAINS([useragent], 'BlackBerry') THEN 'BlackBerry'
ELSEIF CONTAINS([useragent], 'Mobile') THEN 'Mobile'
ELSEIF CONTAINS([useragent], 'Opera Mini') THEN 'Opera Mini'
ELSEIF CONTAINS([useragent], 'Windows') THEN 'Windows'
ELSEIF CONTAINS([useragent], 'X11') THEN 'X11'
ELSEIF CONTAINS([useragent], 'Macintosh') THEN 'Macintosh'
ELSEIF CONTAINS([useragent], 'PLAYSTATION') THEN 'Playstation'
ELSE 'Other'
END

Alex Dean

unread,
Jun 3, 2014, 4:57:06 AM6/3/14
to snowpl...@googlegroups.com
Thanks guys,

That's all very helpful. When we do the upgrade to user-agent-utils (1.11 -> 1.13), then I'll take a look and see if I can get to the bottom of the enum issue.

More long-term, we plan to support a ua-parser-based lookup alongside user-agent-utils... this is on the path to a fully pluggable EnrichmentManager:

https://github.com/snowplow/snowplow/issues?milestone=46&page=1&state=open

Thanks,

Alex

Akhill Chopra

unread,
Jun 18, 2014, 7:24:11 PM6/18/14
to snowpl...@googlegroups.com
Simon - thank you for sharing the query above for device type! 

We've taken another swing, with help from Scott Hoover @ Looker, and am seeing high reliability for the Desktop / Tablet / Mobile break out.

Happy to discuss / improve as anyone sees fit!

This is plug and play for Looker users FYI:

dimension: device_type
sql: |
CASE
WHEN ((${useragent} LIKE '%iphone%' OR ${useragent} LIKE '%iPhone%' OR ${useragent} LIKE '%Windows mobile%' OR ${useragent} LIKE '%Windows phone%' OR ${useragent} LIKE '%Windows Phone%' OR ${useragent} LIKE '%Nexus 5%' OR ${useragent} LIKE '%GTI-9300%' OR ${useragent} LIKE '%Nokia%' OR ${useragent} LIKE '%SGH-M919V%' OR ${useragent} LIKE '%SCH-%' OR ${useragent} LIKE '%Mobile%' OR ${useragent} LIKE '%Opera mini%' OR ${useragent} LIKE '%Opera Mini%' OR ${useragent} LIKE '%SM-T217S%') AND (${useragent} NOT LIKE '%iPad%')) THEN 'mobile'
WHEN (${useragent} LIKE '%Tablet PC%' OR ${useragent} LIKE '%Touch%' OR ${useragent} LIKE '%MyPhone%' OR ${useragent} LIKE '%iPad%' OR ${useragent} LIKE '%ipad%' OR ${useragent} LIKE '%Tablet%' OR ${useragent} LIKE '%Nexus 7%' OR ${useragent} LIKE '%GT-N8013%' OR ${useragent} LIKE '%silk%' OR ${useragent} LIKE '%Silk%' OR ${useragent} LIKE '%GT-P5210%' OR ${useragent} LIKE '%GT-P5113%') THEN 'tablet'
WHEN ((${useragent} LIKE '%Windows%' OR ${useragent} LIKE '%WOW64%' OR ${useragent} LIKE '%Intel Mac OS%' OR ${useragent} LIKE '%Windows NT 6.1; Trident/7.0%' OR ${useragent} LIKE '%Media Center PC%') AND (${useragent} NOT LIKE '%iPad%')) THEN 'desktop'
ELSE 'unknown'
END

Cheers,
Akhill

http://www.custommade.comAkhill Chopra | Director, Finance & Analytics | office: 1 (617) 588-2938

Alex Dean

unread,
Jun 18, 2014, 7:29:30 PM6/18/14
to snowpl...@googlegroups.com
Awesome stuff - thanks for sharing Akhill!

Alex

Simon Rumble

unread,
Jun 18, 2014, 9:26:26 PM6/18/14
to snowpl...@googlegroups.com
Hey that's nice!

A small improvement is to have "android" and not the word "mobile" to catch the Android tablets. Note however that it also catches phablets like the Samsung Note. See this from Google:
Reply all
Reply to author
Forward
0 new messages